Feature Extraction
The gtime.feature_extraction
module deals with the creation of features
starting from a time series.
- class gtime.feature_extraction.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)
Create a feature based on the national holidays of a specific country.
Parameters
- countrystr, optional, default:
'Brazil'
The name of the country from which to retrieve the holidays.
- start_datestr, optional, default:
'01/01/2019'
The date starting from which to retrieve the holidays.
- end_datestr, optional, default:
'01/01/2020'
The date until which to retrieve the holidays.
- kernelarray-like, optional, default:
None
The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.
- reindex_methodstr, optional, default:
pad
Used only if X is passed in the
transform
method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Calendar >>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18', ... end='2019-04-27', freq='d')) >>> cal_feature = Calendar(country="Italy", kernel=[2, 1]) >>> cal_feature.fit_transform(X) status__Calendar 2019-04-18 0.0 2019-04-19 0.0 2019-04-20 0.0 2019-04-21 1.0 2019-04-22 2.0 2019-04-23 0.0 2019-04-24 1.0 2019-04-25 2.0 2019-04-26 0.0 2019-04-27 0.0
- fit(X: DataFrame, y=None)
Fit the estimator. Just used to be compatible with the sklearn API.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a DataFrame containing the events associated to the holidays of the selected
country
.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None
If provided, both
start_date
andend_date
are going to be overwritten with the start and end date of the index oftime_series
. Also, if provided the output DataFrame is going to be re-indexed with the index oftime_series
, using the chosenreindex_method
.
Returns
- eventspd.DataFrame, shape (length, 1)
A DataFrame containing the events.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- countrystr, optional, default:
- class gtime.feature_extraction.CrestFactorDetrending(window_size: int = 1, is_causal: bool = True)
Crest factor detrending model. This class removes the trend from the data by using the crest factor definition. Each sample is normalize by its weighted surrounding. Generalized detrending is defined in (eq. 1) of: H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev and N. Raghuvanshi, “A Sparsity Measure for Echo Density Growth in General Environments,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1-5. Parameters ———- window_size : int, optional, default:
1
The number of previous points on which to compute the crest factor detrending.
- is_causalbool, optional, default:
True
Whether the current sample is computed based only on the past or also on the future.
Examples >>> import pandas as pd >>> from CrestFactorDetrending import CrestFactorDetrending >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> gnrl_dtr = CrestFactorDetrending(window_size=2) >>> gnrl_dtr.fit_transform(ts)
0__CrestFactorDetrending
0 NaN 1 1.000000 2 0.800000 3 0.692308 4 0.640000 5 0.609756 ——–
- transform(time_series: DataFrame) DataFrame
- For every row of
time_series
, compute the moving crest factor detrending function of the previous
window_size
elements.
Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving custom function.
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving custom function for each element.
- For every row of
- is_causalbool, optional, default:
- class gtime.feature_extraction.CustomFeature(func: Callable, **kwargs: object)
Constructs a transformer from an arbitrary callable. This transformer is a wrapper of
sklearn.preprocessing.FunctionTransformer
but returns apd.Dataframe
.Parameters
- funcCallable, required.
The function to use to generate a
pd.DataFrame
containing the feature.- kwargs
object
, optional. Optional arguments to pass to the transform method.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import CustomFeature >>> def custom_function(X, power): ... return X**power >>> X = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> custom_feature = CustomFeature(custom_function, power=3) >>> custom_feature.fit_transform(X) 0__CustomFeature 0 0 1 1 2 8 3 27 4 64 5 125
- fit(time_series: DataFrame, y=None) CustomFeature
Fit the estimator.
Parameters
- time_seriespd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a
pd.DataFrame
, giventime_series
as input to thefunc
, as well as other optional arguments.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None
The DataFrame on which to apply the the custom function.
Returns
- X_t_dfpd.DataFrame, shape (length, 1)
A DataFrame containing the generated feature.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- class gtime.feature_extraction.Detrender(trend: str, trend_x0: ~numpy.array, loss: ~typing.Callable = <function mean_squared_error>, method: str = 'BFGS')
Apply a de-trend transformation to a time series.
The purpose of the class is to fit a model, define through the trend parameter, in order to find a trend in the time series. Then, the trend can be removed by removing the predictions of the fitted model.
Parameters
- trend
'polynomial'
|'exponential'
, required The kind of trend removal to apply.
- trend_x0np.array, required
Initialisation parameters passed to the trend function. This is used to select a starting point in order to minimize the loss function.
- lossCallable, optional, default:
mean_squared_error
The loss function to minimize.
- methodstring, optional, default:
"BFGS"
Loss function optimisation method.
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.feature_extraction import Detrender >>> detrender = Detrender(trend='polynomial', trend_x0=np.zeros(2)) >>> time_index = pd.date_range("2020-01-01", "2020-01-10") >>> X = pd.DataFrame(range(0, 10), index=time_index) >>> detrender.fit_transform(X) 0__Detrender 2020-01-01 9.180937e-07 2020-01-02 8.020709e-07 2020-01-03 6.860481e-07 2020-01-04 5.700253e-07 2020-01-05 4.540024e-07 2020-01-06 3.379796e-07 2020-01-07 2.219568e-07 2020-01-08 1.059340e-07 2020-01-09 -1.008878e-08 2020-01-10 -1.261116e-07
- fit(X: DataFrame, y=None) Detrender
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
Transform the
time_series
by removing the trend.Parameters
- time_series: pd.DataFrame, shape (n_samples, 1), required
The time series to transform.
Returns
- time_series_tpd.DataFrame, shape (n_samples, n_features)
The transformed time series, without the trend.
- trend
- class gtime.feature_extraction.Exogenous
Reindex
exogenous_time_series
with the index oftime_series
. To check the documentation ofpandas.DataFrame.reindex
and to see which type ofmethod
are available, please refer to the pandas documentation.Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Exogenous >>> ts = pd.DataFrame({'exogenous': [10, 8, 1, 3, 2, 7]}, index=[3, 4, 5, 6, 7, 8]) >>> exog = Exogenous() >>> exog.fit_transform(ts) exogenous__Exogenous 3 10 4 8 5 1 6 3 7 2 8 7
- fit(time_series, y=None)
Fit the estimator.
Parameters
- time_seriespd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
It returns the input time series adding the class name to it
Parameters
- time_seriespd.DataFrame, shape (n_samples, n_features), required
The input DataFrame.
Returns
- time_series_tpd.DataFrame, shape (n_samples, n_features)
The original
exogenous_time_series
, adding the class name to it
- class gtime.feature_extraction.Max(window_size: int = 1)
For each row in
time_series
, compute the moving max of the previouswindow_size
rows. If there are not enough rows, the value isNan
.Parameters
- window_sizeint, optional, default:
1
The number of previous points on which to compute the moving max.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Max >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> mv_max = Max(window_size=2) >>> mv_max.fit_transform(ts) 0__MovingMax 0 NaN 1 0.5 2 1.5 3 2.5 4 3.5 5 4.5
- fit(X, y=None)
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
Compute the moving max, for every row of
time_series
, of the previouswindow_size
elements.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving max
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving max for each element.
- window_sizeint, optional, default:
- class gtime.feature_extraction.Min(window_size: int = 1)
For each row in
time_series
, compute the moving min of the previouswindow_size
rows. If there are not enough rows, the value isNan
.Parameters
- window_sizeint, optional, default:
1
The number of previous points on which to compute the moving average.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import MovingMin >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> mv_min = MovingMin(window_size=2) >>> mv_min.fit_transform(ts) 0__MovingMin 0 NaN 1 0.5 2 1.5 3 2.5 4 3.5 5 4.5
- fit(X, y=None)
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
Compute the moving min, for every row of
time_series
, of the previouswindow_size
elements.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving min
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving min for each element.
- window_sizeint, optional, default:
- class gtime.feature_extraction.MovingAverage(window_size: int = 1)
For each row in
time_series
, compute the moving average of the previouswindow_size
rows. If there are not enough rows, the value isNan
.Parameters
- window_sizeint, optional, default:
1
The number of previous points on which to compute the moving average.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import MovingAverage >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> mv_avg = MovingAverage(window_size=2) >>> mv_avg.fit_transform(ts) 0__MovingAverage 0 NaN 1 0.5 2 1.5 3 2.5 4 3.5 5 4.5
- fit(X, y=None)
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
Compute the moving average, for every row of
time_series
, of the previouswindow_size
elements.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving average
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving average for each element.
- window_sizeint, optional, default:
- class gtime.feature_extraction.MovingCustomFunction(custom_feature_function: Callable, window_size: int = 1, raw: bool = True)
For each row in
time_series
, compute the moving custom function of the previouswindow_size
rows. If there are not enough rows, the value isNan
.Parameters
- custom_feature_functionCallable, required.
The function to use to generate a
pd.DataFrame
containing the feature.- window_sizeint, optional, default:
1
The number of previous points on which to compute the custom function.
- rawbool, optional, default:
True
False
: passes each row or column as a Series to the function.True
orNone
: the passed function will receive ndarray objects
- instead.
If you are just applying a NumPy reduction function this will achieve much better performance. Credits: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.apply.html
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.feature_extraction import MovingCustomFunction >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> mv_custom = MovingCustomFunction(np.max, window_size=2) >>> mv_custom.fit_transform(ts) 0__MovingCustomFunction 0 NaN 1 1.0 2 2.0 3 3.0 4 4.0 5 5.0
- fit(time_series: DataFrame, y=None)
Fit the estimator.
Parameters
- time_seriespd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
- For every row of
time_series
, compute the moving custom function of the previous
window_size
elements.
Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving custom function.
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving custom function for each element.
- For every row of
- class gtime.feature_extraction.MovingMedian(window_size: int = 1)
For each row in
time_series
, compute the moving median of the previouswindow_size
rows. If there are not enough rows, the value isNan
.Parameters
- window_sizeint, optional, default:
1
The number of previous points on which to compute the moving median.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import MovingMedian >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> mv_median = MovingMedian(window_size=2) >>> mv_median.fit_transform(ts) 0__MovingMedian 0 NaN 1 0.5 2 1.5 3 2.5 4 3.5 5 4.5
- fit(X, y=None)
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
Compute the moving median, for every row of
time_series
, of the previouswindow_size
elements.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving median
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving median for each element.
- window_sizeint, optional, default:
- class gtime.feature_extraction.Polynomial(degree: int = 2)
Compute the polynomial feature_extraction, of a degree equal to the input
degree
. Wrapper ofsklearn.preprocessing.PolynomialFeatures
but returns apd.DataFrame
.Parameters
- degreeint, optional, default:
2
The degree of the polynomial feature_extraction.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Polynomial >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> pol = Polynomial(degree=3) >>> pol.fit_transform(ts) 0__Polynomial 1__Polynomial 2__Polynomial 3__Polynomial 0 1.0 0.0 0.0 0.0 1 1.0 1.0 1.0 1.0 2 1.0 2.0 4.0 8.0 3 1.0 3.0 9.0 27.0 4 1.0 4.0 16.0 64.0 5 1.0 5.0 25.0 125.0
- fit(time_series: DataFrame, y=None)
Fit the estimator.
Parameters
- time_seriespd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: DataFrame) DataFrame
Compute the polynomial feature_extraction of
time_series
, up to a degree equal todegree
.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The input DataFrame. Used only for its index.
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
The computed polynomial feature_extraction.
- degreeint, optional, default:
- class gtime.feature_extraction.Shift(shift: int = 1)
Perform a shift of a DataFrame of size equal to
shift
.Parameters
- shiftint, optional, default:
1
How much to shift.
Notes
The
shift
parameter can also accept negative values. However, this should be used carefully, since if the resulting feature is used for training or testing it might generate a leak from the feature.Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Shift >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> shift = Shift(shift=3) >>> shift.fit_transform(ts) 0__Shift 0 NaN 1 NaN 2 NaN 3 0.0 4 1.0 5 2.0
- shiftint, optional, default:
- class gtime.feature_extraction.SortedDensity(window_size: int = 1, is_causal: bool = True)
For each row in
time_series
, compute the sorted density function of the previouswindow_size
rows. If there are not enough rows, the value isNan
. Sorted density measured is defined in (eq. 1) of: H. P. Tukuljac, V. Pulkki, H. Gamper, K. Godin, I. J. Tashev and N. Raghuvanshi, “A Sparsity Measure for Echo Density Growth in General Environments,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1-5. Parameters ———- window_size : int, optional, default:1
The number of previous points on which to compute the sorted density.
- is_causalbool, optional, default:
True
Whether the current sample is computed based only on the past or also on the future.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import SortedDensity >>> ts = pd.DataFrame([0, 1, 2, 3, 4, 5]) >>> mv_avg = SortedDensity(window_size=2) >>> mv_avg.fit_transform(ts) 0__SortedDensity 0 NaN 1 0.500000 2 0.666667 3 0.700000 4 0.714286 5 0.722222 --------
- transform(time_series: DataFrame) DataFrame
- For every row of
time_series
, compute the moving sorted density function of the previous
window_size
elements.
Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), required
The DataFrame on which to compute the rolling moving custom function.
Returns
- time_series_tpd.DataFrame, shape (n_samples, 1)
A DataFrame, with the same length as
time_series
, containing the rolling moving custom function for each element.
- For every row of
- is_causalbool, optional, default: