Feature Generation
The gtime.feature_generation
module deals with the creation of features that do
not depend on the input data, but just on its index.
- class gtime.feature_generation.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)
Create a feature based on the national holidays of a specific country.
Parameters
- countrystr, optional, default:
'Brazil'
The name of the country from which to retrieve the holidays.
- start_datestr, optional, default:
'01/01/2019'
The date starting from which to retrieve the holidays.
- end_datestr, optional, default:
'01/01/2020'
The date until which to retrieve the holidays.
- kernelarray-like, optional, default:
None
The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.
- reindex_methodstr, optional, default:
pad
Used only if X is passed in the
transform
method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.
Examples
>>> import pandas as pd >>> from gtime.feature_extraction import Calendar >>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18', ... end='2019-04-27', freq='d')) >>> cal_feature = Calendar(country="Italy", kernel=[2, 1]) >>> cal_feature.fit_transform(X) status__Calendar 2019-04-18 0.0 2019-04-19 0.0 2019-04-20 0.0 2019-04-21 1.0 2019-04-22 2.0 2019-04-23 0.0 2019-04-24 1.0 2019-04-25 2.0 2019-04-26 0.0 2019-04-27 0.0
- fit(X: DataFrame, y=None)
Fit the estimator. Just used to be compatible with the sklearn API.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a DataFrame containing the events associated to the holidays of the selected
country
.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None
If provided, both
start_date
andend_date
are going to be overwritten with the start and end date of the index oftime_series
. Also, if provided the output DataFrame is going to be re-indexed with the index oftime_series
, using the chosenreindex_method
.
Returns
- eventspd.DataFrame, shape (length, 1)
A DataFrame containing the events.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- countrystr, optional, default:
- class gtime.feature_generation.Constant(constant: int = 0, length: Optional[int] = None)
Generate a
pd.DataFrame
with one column, of the same length as the inputX
and containing the valueconstant
across the whole column.Parameters
- constantint, optional, default:
2
The value to use to generate the constant column of the
pd.DataFrame
.- lengthint, optional, default:
50
The length of the DataFrame to generate. This is used only if X is not passed in the
transform
method, otherwise the length is inferred from it.
Examples
>>> import pandas as pd >>> from gtime.feature_generation import Constant >>> X = pd.DataFrame(range(0, 5), index=pd.date_range(start='2019-04-18', end='2019-04-22', freq='d')) >>> constant = Constant(constant=3) >>> constant.fit_transform(X) 0__Constant 2019-04-18 3.0 2019-04-19 3.0 2019-04-20 3.0 2019-04-21 3.0 2019-04-22 3.0
- fit(X: DataFrame, y=None) Constant
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features)
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfConstant
Returns self.
- get_feature_names()
Return feature names for output features.
Returns
- output_feature_namesndarray, shape (n_output_features,)
Array of feature names.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a
pd.DataFrame
with one column with the same length astime_series
and with the same index, containing a value equal toconstant
.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None
The input DataFrame. If passed, the output DataFrame is going to have the same index as
time_series
.
Returns
- constant_series_renamedpd.DataFrame, shape (length, 1)
A constant series, with the same length of
X
and with the same index.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- constantint, optional, default:
- class gtime.feature_generation.PeriodicSeasonal(period: Union[Timedelta, str] = '365 days', amplitude: float = 0.5, start_date: Optional[Union[Timestamp, str]] = None, length: Optional[int] = 50, index_period: Optional[Union[DatetimeIndex, int]] = None)
Create a sinusoid from a given date and with a given period and amplitude.
Parameters
- periodUnion[pd.Timedelta, str], optional, default:
'365 days'
The period of the generated time series.
- amplitudefloat, optional, default:
0.5
The amplitude of the time series.
- start_dateUnion[pd.Timestamp, str], optional, default:
None
The date from which to start generating the feature. This is used only if X is not passed in the
transform
method, otherwise the start date is inferred from it.- lengthint, optional, default:
50
The length of the sinusoid. This is used only if X is not passed in the
transform
method, otherwise the length is inferred from it.- index_periodUnion[DatetimeIndex, int], optional, default:
None
The period of the index of the output
DataFrame
. This is used only if X is not passed in thetransform
method, otherwise the index period is taken from it.
Examples
>>> import pandas as pd >>> from gtime.feature_generation import PeriodicSeasonal >>> X = pd.DataFrame(range(0, 10), index=pd.date_range(start='2019-04-18', end='2019-04-27', freq='d')) >>> periodic = PeriodicSeasonal() >>> periodic.fit_transform(X) 0__PeriodicSeasonal 2019-04-18 0.000000 2019-04-19 0.008607 2019-04-20 0.017211 2019-04-21 0.025810 2019-04-22 0.034401 2019-04-23 0.042982 2019-04-24 0.051551 2019-04-25 0.060104 2019-04-26 0.068639 2019-04-27 0.077154
- fit(X: DataFrame, y=None) PeriodicSeasonal
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfPeriodicSeasonal
Returns self.
- transform(time_series: Optional[DataFrame] = None) DataFrame
Generate a sinusoid, with the given
period
,amplitude
andlength
, starting from the selectedstart_date
. Iftime_series
is notNone
, thestart_date
is replaced by the start date of the time series and the output sinusoid will have the same index astime_series
.Parameters
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
None
The input DataFrame, If passed, the output DataFrame is going to have the same index as
time_series
. If is not passed, then thestart_date
andindex_period
must have been passed in the constructor when the object was instantiated.
Returns
- periodic_featurepd.DataFrame, shape (n_samples, 1)
The DataFrame containing the generated period feature.
Raises
- ValueError
Raised if
time_series
is not passed and thestart_date
or theindex_period
are not present.
- time_seriespd.DataFrame, shape (n_samples, 1), optional, default:
- periodUnion[pd.Timedelta, str], optional, default: