Feature Extraction

The gtime.feature_extraction module deals with the creation of features starting from a time series.

class gtime.feature_extraction.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)

Create a feature based on the national holidays of a specific country.

Parameters

countrystr, optional, default: 'Brazil'

The name of the country from which to retrieve the holidays.

start_datestr, optional, default: '01/01/2019'

The date starting from which to retrieve the holidays.

end_datestr, optional, default: '01/01/2020'

The date until which to retrieve the holidays.

kernelarray-like, optional, default: None

The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.

reindex_methodstr, optional, default: pad

Used only if X is passed in the transform method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Calendar
>>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18',
...                  end='2019-04-27', freq='d'))
>>> cal_feature = Calendar(country="Italy", kernel=[2, 1])
>>> cal_feature.fit_transform(X)
            status__Calendar
2019-04-18               0.0
2019-04-19               0.0
2019-04-20               0.0
2019-04-21               1.0
2019-04-22               2.0
2019-04-23               0.0
2019-04-24               1.0
2019-04-25               2.0
2019-04-26               0.0
2019-04-27               0.0
fit(X: DataFrame, y=None)

Fit the estimator. Just used to be compatible with the sklearn API.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: Optional[DataFrame] = None) DataFrame

Generate a DataFrame containing the events associated to the holidays of the selected country.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None

If provided, both start_date and end_date are going to be overwritten with the start and end date of the index of time_series. Also, if provided the output DataFrame is going to be re-indexed with the index of time_series, using the chosen reindex_method.

Returns

eventspd.DataFrame, shape (length, 1)

A DataFrame containing the events.

class gtime.feature_extraction.CrestFactorDetrending(window_size: int = 1, is_causal: bool = True)

Crest factor detrending model. This class removes the trend from the data by using the crest factor definition. Each sample is normalize by its weighted surrounding. Generalized detrending is defined in (eq. 1) of: H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev and N. Raghuvanshi, “A Sparsity Measure for Echo Density Growth in General Environments,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1-5. Parameters ———- window_size : int, optional, default: 1

The number of previous points on which to compute the crest factor detrending.

is_causalbool, optional, default: True

Whether the current sample is computed based only on the past or also on the future.

Examples >>> import pandas as pd >>> from CrestFactorDetrending import CrestFactorDetrending >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> gnrl_dtr = CrestFactorDetrending(window_size=2) >>> gnrl_dtr.fit_transform(ts)

0__CrestFactorDetrending

0 NaN 1 1.000000 2 0.800000 3 0.692308 4 0.640000 5 0.609756 ——–

transform(time_series: DataFrame) DataFrame
For every row of time_series, compute the moving crest factor detrending function of the

previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving custom function.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving custom function for each element.

class gtime.feature_extraction.CustomFeature(func: Callable, **kwargs: object)

Constructs a transformer from an arbitrary callable. This transformer is a wrapper of sklearn.preprocessing.FunctionTransformer but returns a pd.Dataframe.

Parameters

funcCallable, required.

The function to use to generate a pd.DataFrame containing the feature.

kwargsobject, optional.

Optional arguments to pass to the transform method.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import CustomFeature
>>> def custom_function(X, power):
...     return X**power
>>> X = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> custom_feature = CustomFeature(custom_function, power=3)
>>> custom_feature.fit_transform(X)
   0__CustomFeature
0                 0
1                 1
2                 8
3                27
4                64
5               125
fit(time_series: DataFrame, y=None) CustomFeature

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: Optional[DataFrame] = None) DataFrame

Generate a pd.DataFrame, given time_series as input to the func, as well as other optional arguments.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None

The DataFrame on which to apply the the custom function.

Returns

X_t_dfpd.DataFrame, shape (length, 1)

A DataFrame containing the generated feature.

class gtime.feature_extraction.Detrender(trend: str, trend_x0: ~numpy.array, loss: ~typing.Callable = <function mean_squared_error>, method: str = 'BFGS')

Apply a de-trend transformation to a time series.

The purpose of the class is to fit a model, define through the trend parameter, in order to find a trend in the time series. Then, the trend can be removed by removing the predictions of the fitted model.

Parameters

trend'polynomial' | 'exponential', required

The kind of trend removal to apply.

trend_x0np.array, required

Initialisation parameters passed to the trend function. This is used to select a starting point in order to minimize the loss function.

lossCallable, optional, default: mean_squared_error

The loss function to minimize.

methodstring, optional, default: "BFGS"

Loss function optimisation method.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.feature_extraction import Detrender
>>> detrender = Detrender(trend='polynomial', trend_x0=np.zeros(2))
>>> time_index = pd.date_range("2020-01-01", "2020-01-10")
>>> X = pd.DataFrame(range(0, 10), index=time_index)
>>> detrender.fit_transform(X)
            0__Detrender
2020-01-01  9.180937e-07
2020-01-02  8.020709e-07
2020-01-03  6.860481e-07
2020-01-04  5.700253e-07
2020-01-05  4.540024e-07
2020-01-06  3.379796e-07
2020-01-07  2.219568e-07
2020-01-08  1.059340e-07
2020-01-09 -1.008878e-08
2020-01-10 -1.261116e-07
fit(X: DataFrame, y=None) Detrender

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Transform the time_series by removing the trend.

Parameters

time_series: pd.DataFrame, shape (n_samples, 1), required

The time series to transform.

Returns

time_series_tpd.DataFrame, shape (n_samples, n_features)

The transformed time series, without the trend.

class gtime.feature_extraction.Exogenous

Reindex exogenous_time_series with the index of time_series. To check the documentation of pandas.DataFrame.reindex and to see which type of method are available, please refer to the pandas documentation.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Exogenous
>>> ts = pd.DataFrame({'exogenous': [10, 8, 1, 3, 2, 7]},  index=[3, 4, 5, 6, 7, 8])
>>> exog = Exogenous()
>>> exog.fit_transform(ts)
    exogenous__Exogenous
3                    10
4                     8
5                     1
6                     3
7                     2
8                     7
fit(time_series, y=None)

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

It returns the input time series adding the class name to it

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features), required

The input DataFrame.

Returns

time_series_tpd.DataFrame, shape (n_samples, n_features)

The original exogenous_time_series, adding the class name to it

class gtime.feature_extraction.Max(window_size: int = 1)

For each row in time_series, compute the moving max of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1

The number of previous points on which to compute the moving max.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Max
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_max = Max(window_size=2)
>>> mv_max.fit_transform(ts)
   0__MovingMax
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5
fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Compute the moving max, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving max

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving max for each element.

class gtime.feature_extraction.Min(window_size: int = 1)

For each row in time_series, compute the moving min of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1

The number of previous points on which to compute the moving average.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import MovingMin
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_min = MovingMin(window_size=2)
>>> mv_min.fit_transform(ts)
   0__MovingMin
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5
fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Compute the moving min, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving min

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving min for each element.

class gtime.feature_extraction.MovingAverage(window_size: int = 1)

For each row in time_series, compute the moving average of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1

The number of previous points on which to compute the moving average.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import MovingAverage
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_avg = MovingAverage(window_size=2)
>>> mv_avg.fit_transform(ts)
   0__MovingAverage
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5
fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Compute the moving average, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving average

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving average for each element.

class gtime.feature_extraction.MovingCustomFunction(custom_feature_function: Callable, window_size: int = 1, raw: bool = True)

For each row in time_series, compute the moving custom function of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

custom_feature_functionCallable, required.

The function to use to generate a pd.DataFrame containing the feature.

window_sizeint, optional, default: 1

The number of previous points on which to compute the custom function.

rawbool, optional, default: True
  • False : passes each row or column as a Series to the function.

  • True or None : the passed function will receive ndarray objects

instead.

If you are just applying a NumPy reduction function this will achieve much better performance. Credits: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.apply.html

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.feature_extraction import MovingCustomFunction
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_custom = MovingCustomFunction(np.max, window_size=2)
>>> mv_custom.fit_transform(ts)
   0__MovingCustomFunction
0                      NaN
1                      1.0
2                      2.0
3                      3.0
4                      4.0
5                      5.0
fit(time_series: DataFrame, y=None)

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame
For every row of time_series, compute the moving custom function of the

previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving custom function.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving custom function for each element.

class gtime.feature_extraction.MovingMedian(window_size: int = 1)

For each row in time_series, compute the moving median of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1

The number of previous points on which to compute the moving median.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import MovingMedian
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_median = MovingMedian(window_size=2)
>>> mv_median.fit_transform(ts)
   0__MovingMedian
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5
fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Compute the moving median, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving median

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving median for each element.

class gtime.feature_extraction.Polynomial(degree: int = 2)

Compute the polynomial feature_extraction, of a degree equal to the input degree. Wrapper of sklearn.preprocessing.PolynomialFeatures but returns a pd.DataFrame.

Parameters

degreeint, optional, default: 2

The degree of the polynomial feature_extraction.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Polynomial
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> pol = Polynomial(degree=3)
>>> pol.fit_transform(ts)
   0__Polynomial  1__Polynomial  2__Polynomial  3__Polynomial
0            1.0            0.0            0.0            0.0
1            1.0            1.0            1.0            1.0
2            1.0            2.0            4.0            8.0
3            1.0            3.0            9.0           27.0
4            1.0            4.0           16.0           64.0
5            1.0            5.0           25.0          125.0
fit(time_series: DataFrame, y=None)

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Compute the polynomial feature_extraction of time_series, up to a degree equal to degree.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The input DataFrame. Used only for its index.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

The computed polynomial feature_extraction.

class gtime.feature_extraction.Shift(shift: int = 1)

Perform a shift of a DataFrame of size equal to shift.

Parameters

shiftint, optional, default: 1

How much to shift.

Notes

The shift parameter can also accept negative values. However, this should be used carefully, since if the resulting feature is used for training or testing it might generate a leak from the feature.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Shift
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> shift = Shift(shift=3)
>>> shift.fit_transform(ts)
   0__Shift
0       NaN
1       NaN
2       NaN
3       0.0
4       1.0
5       2.0
fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: DataFrame) DataFrame

Create a shifted version of time_series.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame to shift.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

The shifted version of the original time_series.

class gtime.feature_extraction.SortedDensity(window_size: int = 1, is_causal: bool = True)

For each row in time_series, compute the sorted density function of the previous window_size rows. If there are not enough rows, the value is Nan. Sorted density measured is defined in (eq. 1) of: H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev and N. Raghuvanshi, “A Sparsity Measure for Echo Density Growth in General Environments,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1-5. Parameters ———- window_size : int, optional, default: 1

The number of previous points on which to compute the sorted density.

is_causalbool, optional, default: True

Whether the current sample is computed based only on the past or also on the future.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import SortedDensity
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_avg = SortedDensity(window_size=2)
>>> mv_avg.fit_transform(ts)
   0__SortedDensity
0                      NaN
1                 0.500000
2                 0.666667
3                 0.700000
4                 0.714286
5                 0.722222
--------
transform(time_series: DataFrame) DataFrame
For every row of time_series, compute the moving sorted density function of the

previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required

The DataFrame on which to compute the rolling moving custom function.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1)

A DataFrame, with the same length as time_series, containing the rolling moving custom function for each element.