Feature Extraction

The gtime.feature_extraction module deals with the creation of features starting from a time series.

class gtime.feature_extraction.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)

Create a feature based on the national holidays of a specific country.

Parameters

countrystr, optional, default: 'Brazil': The name of the country from which to retrieve the holidays.
start_datestr, optional, default: '01/01/2019': The date starting from which to retrieve the holidays.
end_datestr, optional, default: '01/01/2020': The date until which to retrieve the holidays.
kernelarray-like, optional, default: None: The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.
reindex_methodstr, optional, default: pad: Used only if X is passed in the transform method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Calendar
>>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18',
...                  end='2019-04-27', freq='d'))
>>> cal_feature = Calendar(country="Italy", kernel=[2, 1])
>>> cal_feature.fit_transform(X)
            status__Calendar
2019-04-18               0.0
2019-04-19               0.0
2019-04-20               0.0
2019-04-21               1.0
2019-04-22               2.0
2019-04-23               0.0
2019-04-24               1.0
2019-04-25               2.0
2019-04-26               0.0
2019-04-27               0.0

fit(X: DataFrame, y=None)

Fit the estimator. Just used to be compatible with the sklearn API.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: Optional[DataFrame] = None) → DataFrame

Generate a DataFrame containing the events associated to the holidays of the selected country.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None: If provided, both start_date and end_date are going to be overwritten with the start and end date of the index of time_series. Also, if provided the output DataFrame is going to be re-indexed with the index of time_series, using the chosen reindex_method.

Returns

eventspd.DataFrame, shape (length, 1): A DataFrame containing the events.

class gtime.feature_extraction.CrestFactorDetrending(window_size: int = 1, is_causal: bool = True)

Crest factor detrending model. This class removes the trend from the data by using the crest factor definition. Each sample is normalize by its weighted surrounding. Generalized detrending is defined in (eq. 1) of: H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev and N. Raghuvanshi, “A Sparsity Measure for Echo Density Growth in General Environments,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1-5. Parameters ———- window_size : int, optional, default: 1

The number of previous points on which to compute the crest factor detrending.

is_causalbool, optional, default: True: Whether the current sample is computed based only on the past or also on the future.

Examples >>> import pandas as pd >>> from CrestFactorDetrending import CrestFactorDetrending >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> gnrl_dtr = CrestFactorDetrending(window_size=2) >>> gnrl_dtr.fit_transform(ts)

0__CrestFactorDetrending

0 NaN 1 1.000000 2 0.800000 3 0.692308 4 0.640000 5 0.609756 ——–

transform(time_series: DataFrame) → DataFrame

For every row of time_series, compute the moving crest factor detrending function of the: previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving custom function.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving custom function for each element.

class gtime.feature_extraction.CustomFeature(func: Callable, **kwargs: object)

Constructs a transformer from an arbitrary callable. This transformer is a wrapper of sklearn.preprocessing.FunctionTransformer but returns a pd.Dataframe.

Parameters

funcCallable, required.: The function to use to generate a pd.DataFrame containing the feature.
kwargsobject, optional.: Optional arguments to pass to the transform method.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import CustomFeature
>>> def custom_function(X, power):
...     return X**power
>>> X = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> custom_feature = CustomFeature(custom_function, power=3)
>>> custom_feature.fit_transform(X)
   0__CustomFeature
0                 0
1                 1
2                 8
3                27
4                64
5               125

fit(time_series: DataFrame, y=None) → CustomFeature

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: Optional[DataFrame] = None) → DataFrame

Generate a pd.DataFrame, given time_series as input to the func, as well as other optional arguments.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None: The DataFrame on which to apply the the custom function.

Returns

X_t_dfpd.DataFrame, shape (length, 1): A DataFrame containing the generated feature.

class gtime.feature_extraction.Detrender(trend: str, trend_x0: ~numpy.array, loss: ~typing.Callable = <function mean_squared_error>, method: str = 'BFGS')

Apply a de-trend transformation to a time series.

The purpose of the class is to fit a model, define through the trend parameter, in order to find a trend in the time series. Then, the trend can be removed by removing the predictions of the fitted model.

Parameters

trend'polynomial' | 'exponential', required: The kind of trend removal to apply.
trend_x0np.array, required: Initialisation parameters passed to the trend function. This is used to select a starting point in order to minimize the loss function.
lossCallable, optional, default: mean_squared_error: The loss function to minimize.
methodstring, optional, default: "BFGS": Loss function optimisation method.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.feature_extraction import Detrender
>>> detrender = Detrender(trend='polynomial', trend_x0=np.zeros(2))
>>> time_index = pd.date_range("2020-01-01", "2020-01-10")
>>> X = pd.DataFrame(range(0, 10), index=time_index)
>>> detrender.fit_transform(X)
            0__Detrender
2020-01-01  9.180937e-07
2020-01-02  8.020709e-07
2020-01-03  6.860481e-07
2020-01-04  5.700253e-07
2020-01-05  4.540024e-07
2020-01-06  3.379796e-07
2020-01-07  2.219568e-07
2020-01-08  1.059340e-07
2020-01-09 -1.008878e-08
2020-01-10 -1.261116e-07

fit(X: DataFrame, y=None) → Detrender

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Transform the time_series by removing the trend.

Parameters

time_series: pd.DataFrame, shape (n_samples, 1), required: The time series to transform.

Returns

time_series_tpd.DataFrame, shape (n_samples, n_features): The transformed time series, without the trend.

class gtime.feature_extraction.Exogenous

Reindex exogenous_time_series with the index of time_series. To check the documentation of pandas.DataFrame.reindex and to see which type of method are available, please refer to the pandas documentation.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Exogenous
>>> ts = pd.DataFrame({'exogenous': [10, 8, 1, 3, 2, 7]},  index=[3, 4, 5, 6, 7, 8])
>>> exog = Exogenous()
>>> exog.fit_transform(ts)
    exogenous__Exogenous
3                    10
4                     8
5                     1
6                     3
7                     2
8                     7

fit(time_series, y=None)

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

It returns the input time series adding the class name to it

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features), required: The input DataFrame.

Returns

time_series_tpd.DataFrame, shape (n_samples, n_features): The original exogenous_time_series, adding the class name to it

class gtime.feature_extraction.Max(window_size: int = 1)

For each row in time_series, compute the moving max of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1: The number of previous points on which to compute the moving max.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Max
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_max = Max(window_size=2)
>>> mv_max.fit_transform(ts)
   0__MovingMax
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5

fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Compute the moving max, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving max

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving max for each element.

class gtime.feature_extraction.Min(window_size: int = 1)

For each row in time_series, compute the moving min of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1: The number of previous points on which to compute the moving average.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import MovingMin
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_min = MovingMin(window_size=2)
>>> mv_min.fit_transform(ts)
   0__MovingMin
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5

fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Compute the moving min, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving min

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving min for each element.

class gtime.feature_extraction.MovingAverage(window_size: int = 1)

For each row in time_series, compute the moving average of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1: The number of previous points on which to compute the moving average.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import MovingAverage
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_avg = MovingAverage(window_size=2)
>>> mv_avg.fit_transform(ts)
   0__MovingAverage
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5

fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Compute the moving average, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving average

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving average for each element.

class gtime.feature_extraction.MovingCustomFunction(custom_feature_function: Callable, window_size: int = 1, raw: bool = True)

For each row in time_series, compute the moving custom function of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

custom_feature_functionCallable, required.

The function to use to generate a pd.DataFrame containing the feature.

window_sizeint, optional, default: 1

The number of previous points on which to compute the custom function.

rawbool, optional, default: True

False : passes each row or column as a Series to the function.
True or None : the passed function will receive ndarray objects

instead.: If you are just applying a NumPy reduction function this will achieve much better performance. Credits: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.apply.html

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.feature_extraction import MovingCustomFunction
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_custom = MovingCustomFunction(np.max, window_size=2)
>>> mv_custom.fit_transform(ts)
   0__MovingCustomFunction
0                      NaN
1                      1.0
2                      2.0
3                      3.0
4                      4.0
5                      5.0

fit(time_series: DataFrame, y=None)

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

For every row of time_series, compute the moving custom function of the: previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving custom function.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving custom function for each element.

class gtime.feature_extraction.MovingMedian(window_size: int = 1)

For each row in time_series, compute the moving median of the previous window_size rows. If there are not enough rows, the value is Nan.

Parameters

window_sizeint, optional, default: 1: The number of previous points on which to compute the moving median.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import MovingMedian
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_median = MovingMedian(window_size=2)
>>> mv_median.fit_transform(ts)
   0__MovingMedian
0               NaN
1               0.5
2               1.5
3               2.5
4               3.5
5               4.5

fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Compute the moving median, for every row of time_series, of the previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving median

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving median for each element.

class gtime.feature_extraction.Polynomial(degree: int = 2)

Compute the polynomial feature_extraction, of a degree equal to the input degree. Wrapper of sklearn.preprocessing.PolynomialFeatures but returns a pd.DataFrame.

Parameters

degreeint, optional, default: 2: The degree of the polynomial feature_extraction.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Polynomial
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> pol = Polynomial(degree=3)
>>> pol.fit_transform(ts)
   0__Polynomial  1__Polynomial  2__Polynomial  3__Polynomial
0            1.0            0.0            0.0            0.0
1            1.0            1.0            1.0            1.0
2            1.0            2.0            4.0            8.0
3            1.0            3.0            9.0           27.0
4            1.0            4.0           16.0           64.0
5            1.0            5.0           25.0          125.0

fit(time_series: DataFrame, y=None)

Fit the estimator.

Parameters

time_seriespd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Compute the polynomial feature_extraction of time_series, up to a degree equal to degree.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The input DataFrame. Used only for its index.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): The computed polynomial feature_extraction.

class gtime.feature_extraction.Shift(shift: int = 1)

Perform a shift of a DataFrame of size equal to shift.

Parameters

shiftint, optional, default: 1: How much to shift.

Notes

The shift parameter can also accept negative values. However, this should be used carefully, since if the resulting feature is used for training or testing it might generate a leak from the feature.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Shift
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> shift = Shift(shift=3)
>>> shift.fit_transform(ts)
   0__Shift
0       NaN
1       NaN
2       NaN
3       0.0
4       1.0
5       2.0

fit(X, y=None)

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: DataFrame) → DataFrame

Create a shifted version of time_series.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame to shift.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): The shifted version of the original time_series.

class gtime.feature_extraction.SortedDensity(window_size: int = 1, is_causal: bool = True)

For each row in time_series, compute the sorted density function of the previous window_size rows. If there are not enough rows, the value is Nan. Sorted density measured is defined in (eq. 1) of: H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev and N. Raghuvanshi, “A Sparsity Measure for Echo Density Growth in General Environments,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1-5. Parameters ———- window_size : int, optional, default: 1

The number of previous points on which to compute the sorted density.

is_causalbool, optional, default: True: Whether the current sample is computed based only on the past or also on the future.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import SortedDensity
>>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5])
>>> mv_avg = SortedDensity(window_size=2)
>>> mv_avg.fit_transform(ts)
   0__SortedDensity
0                      NaN
1                 0.500000
2                 0.666667
3                 0.700000
4                 0.714286
5                 0.722222
--------

transform(time_series: DataFrame) → DataFrame

For every row of time_series, compute the moving sorted density function of the: previous window_size elements.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), required: The DataFrame on which to compute the rolling moving custom function.

Returns

time_series_tpd.DataFrame, shape (n_samples, 1): A DataFrame, with the same length as time_series, containing the rolling moving custom function for each element.