Feature Generation
The gtime.feature_generation module deals with the creation of features that do
not depend on the input data, but just on its index.
- class gtime.feature_generation.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)
Create a feature based on the national holidays of a specific country.
Parameters
- countrystr, optional, default:
'Brazil' The name of the country from which to retrieve the holidays.
- start_datestr, optional, default:
'01/01/2019' The date starting from which to retrieve the holidays.
- end_datestr, optional, default:
'01/01/2020' The date until which to retrieve the holidays.
- kernelarray-like, optional, default:
None The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.
- reindex_methodstr, optional, default:
pad Used only if X is passed in the
transformmethod. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Calendar >>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18', ... end='2019-04-27', freq='d')) >>> cal_feature = Calendar(country="Italy", kernel=[2, 1]) >>> cal_feature.fit_transform(X) status__Calendar 2019-04-18 0.0 2019-04-19 0.0 2019-04-20 0.0 2019-04-21 1.0 2019-04-22 2.0 2019-04-23 0.0 2019-04-24 1.0 2019-04-25 2.0 2019-04-26 0.0 2019-04-27 0.0
- fit(X: DataFrame, y=None)
Fit the estimator. Just used to be compatible with the sklearn API.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a DataFrame containing the events associated to the holidays of the selected
country.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None If provided, both
start_dateandend_dateare going to be overwritten with the start and end date of the index oftime_series. Also, if provided the output DataFrame is going to be re-indexed with the index oftime_series, using the chosenreindex_method.
Returns
- eventspd.DataFrame, shape (length, 1)
A DataFrame containing the events.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- countrystr, optional, default:
- class gtime.feature_generation.Constant(constant: int = 0, length: Optional[int] = None)
Generate a
pd.DataFramewith one column, of the same length as the inputXand containing the valueconstantacross the whole column.Parameters
- constantint, optional, default:
2 The value to use to generate the constant column of the
pd.DataFrame.- lengthint, optional, default:
50 The length of the DataFrame to generate. This is used only if X is not passed in the
transformmethod, otherwise the length is inferred from it.
Examples
>>> import pandas as pd >>> from gtime.feature_generation import Constant >>> X = pd.DataFrame(range(0, 5), index=pd.date_range(start='2019-04-18', end='2019-04-22', freq='d')) >>> constant = Constant(constant=3) >>> constant.fit_transform(X) 0__Constant 2019-04-18 3.0 2019-04-19 3.0 2019-04-20 3.0 2019-04-21 3.0 2019-04-22 3.0
- fit(X: DataFrame, y=None) Constant
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfConstant
Returns self.
- get_feature_names()
Return feature names for output features.
Returns
- output_feature_namesndarray, shape (n_output_features,)
Array of feature names.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a
pd.DataFramewith one column with the same length astime_seriesand with the same index, containing a value equal toconstant.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None The input DataFrame. If passed, the output DataFrame is going to have the same index as
time_series.
Returns
- constant_series_renamedpd.DataFrame, shape (length, 1)
A constant series, with the same length of
Xand with the same index.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- constantint, optional, default:
- class gtime.feature_generation.PeriodicSeasonal(period: Union[Timedelta, str] = '365 days', amplitude: float = 0.5, start_date: Optional[Union[Timestamp, str]] = None, length: Optional[int] = 50, index_period: Optional[Union[DatetimeIndex, int]] = None)
Create a sinusoid from a given date and with a given period and amplitude.
Parameters
- periodUnion[pd.Timedelta, str], optional, default:
'365 days' The period of the generated time series.
- amplitudefloat, optional, default:
0.5 The amplitude of the time series.
- start_dateUnion[pd.Timestamp, str], optional, default:
None The date from which to start generating the feature. This is used only if X is not passed in the
transformmethod, otherwise the start date is inferred from it.- lengthint, optional, default:
50 The length of the sinusoid. This is used only if X is not passed in the
transformmethod, otherwise the length is inferred from it.- index_periodUnion[DatetimeIndex, int], optional, default:
None The period of the index of the output
DataFrame. This is used only if X is not passed in thetransformmethod, otherwise the index period is taken from it.
Examples
>>> import pandas as pd >>> from gtime.feature_generation import PeriodicSeasonal >>> X = pd.DataFrame(range(0, 10), index=pd.date_range(start='2019-04-18', end='2019-04-27', freq='d')) >>> periodic = PeriodicSeasonal() >>> periodic.fit_transform(X) 0__PeriodicSeasonal 2019-04-18 0.000000 2019-04-19 0.008607 2019-04-20 0.017211 2019-04-21 0.025810 2019-04-22 0.034401 2019-04-23 0.042982 2019-04-24 0.051551 2019-04-25 0.060104 2019-04-26 0.068639 2019-04-27 0.077154
- fit(X: DataFrame, y=None) PeriodicSeasonal
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfPeriodicSeasonal
Returns self.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a sinusoid, with the given
period,amplitudeandlength, starting from the selectedstart_date. Iftime_seriesis notNone, thestart_dateis replaced by the start date of the time series and the output sinusoid will have the same index astime_series.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None The input DataFrame, If passed, the output DataFrame is going to have the same index as
time_series. If is not passed, then thestart_dateandindex_periodmust have been passed in the constructor when the object was instantiated.
Returns
- periodic_featurepd.DataFrame, shape (n_samples, 1)
The DataFrame containing the generated period feature.
Raises
- ValueError
Raised if
time_seriesis not passed and thestart_dateor theindex_periodare not present.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- periodUnion[pd.Timedelta, str], optional, default: