Preprocessing

The gtime.preprocessing module deals with the preprocessing of time series data.

class gtime.preprocessing.TimeSeriesPreparation(start: Optional[datetime] = None, end: Optional[datetime] = None, freq: Optional[Timedelta] = None, resample_if_not_equispaced: bool = False, output_name: str = 'time_series')

Transforms an array-like sequence in a period-index DataFrame with a single column.

Here is what happens: - if a list or np.array is passed, the PeriodIndex is built using the parameters

start, end and freq

  • if a pd.Series is passed, it checks if the index is a time index

    (DatetimeIndex, TimedeltaIndex, PeriodIndex) or not. If not the index is built as if it were a list or `np.array. If yes the index is converted to PeriodIndex.

Parameters

startdatetime, optional, default: None

The date to use as start date.

enddatetime, optional, default: None

The date to use as end date.

freqpd.Timedelta, optional, default: None

The frequency of the output time series. Not mandatory for all time series conversion.

resample_if_not_equispacedbool, optional, default: False

Not supported yet, leave it as True

output_namestr, optional, default: 'time_series'

The name of the output column

Raises

ValueError

Of the three parameters: start, end, and periods, exactly two must be specified.

Examples

>>> import pandas as pd
>>> from gtime.preprocessing import TimeSeriesPreparation
>>> time_series = [1, 2, 3, 5, 5, 7]
>>> period_index_time_series = pd.Series(
...     index = pd.period_range(start='01-01-2010', freq='10D', periods=6),
...     data=[1,2,3,5,5,7]
... )
>>> datetime_index_time_series = pd.Series(
...     index = pd.date_range(start='01-01-2010', freq='10D', periods=6),
...     data=[1,2,3,5,5,7]
... )
>>> timedelta_index_time_series = pd.Series(
...     index = pd.timedelta_range(start=pd.Timedelta(days=1), freq='10D', periods=6),
...     data=[1,2,3,5,5,7]
... )
>>> time_series_preparation = TimeSeriesPreparation()
>>> time_series_preparation.transform(time_series)
            time_series
1970-01-01            1
1970-01-02            2
1970-01-03            3
1970-01-04            5
1970-01-05            5
1970-01-06            7
>>> time_series_preparation.transform(period_index_time_series)
            time_series
2010-01-01            1
2010-01-11            2
2010-01-21            3
2010-01-31            5
2010-02-10            5
2010-02-20            7
>>> time_series_preparation.transform(datetime_index_time_series)
            time_series
2010-01-01            1
2010-01-11            2
2010-01-21            3
2010-01-31            5
2010-02-10            5
2010-02-20            7
>>> time_series_preparation.transform(timedelta_index_time_series)
            time_series
1970-01-02            1
1970-01-12            2
1970-01-22            3
1970-02-01            5
1970-02-11            5
1970-02-21            7
transform(time_series: Union[List, array, Series, DataFrame]) DataFrame

Transforms an array-like sequence in a period-index DataFrame with a single column.

Parameters

time_seriesUnion[List, np.array, pd.Series, pd.DataFrame], required

The input time series.

Returns

period_index_dataframepd.DataFrame

The output dataframe with a period index.