Feature Generation

The gtime.feature_generation module deals with the creation of features that do not depend on the input data, but just on its index.

class gtime.feature_generation.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)

Create a feature based on the national holidays of a specific country.

Parameters

countrystr, optional, default: 'Brazil'

The name of the country from which to retrieve the holidays.

start_datestr, optional, default: '01/01/2019'

The date starting from which to retrieve the holidays.

end_datestr, optional, default: '01/01/2020'

The date until which to retrieve the holidays.

kernelarray-like, optional, default: None

The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.

reindex_methodstr, optional, default: pad

Used only if X is passed in the transform method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Calendar
>>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18',
...                  end='2019-04-27', freq='d'))
>>> cal_feature = Calendar(country="Italy", kernel=[2, 1])
>>> cal_feature.fit_transform(X)
            status__Calendar
2019-04-18               0.0
2019-04-19               0.0
2019-04-20               0.0
2019-04-21               1.0
2019-04-22               2.0
2019-04-23               0.0
2019-04-24               1.0
2019-04-25               2.0
2019-04-26               0.0
2019-04-27               0.0
fit(X: DataFrame, y=None)

Fit the estimator. Just used to be compatible with the sklearn API.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

transform(time_series: Optional[DataFrame] = None) DataFrame

Generate a DataFrame containing the events associated to the holidays of the selected country.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None

If provided, both start_date and end_date are going to be overwritten with the start and end date of the index of time_series. Also, if provided the output DataFrame is going to be re-indexed with the index of time_series, using the chosen reindex_method.

Returns

eventspd.DataFrame, shape (length, 1)

A DataFrame containing the events.

class gtime.feature_generation.Constant(constant: int = 0, length: Optional[int] = None)

Generate a pd.DataFrame with one column, of the same length as the input X and containing the value constant across the whole column.

Parameters

constantint, optional, default: 2

The value to use to generate the constant column of the pd.DataFrame.

lengthint, optional, default: 50

The length of the DataFrame to generate. This is used only if X is not passed in the transform method, otherwise the length is inferred from it.

Examples

>>> import pandas as pd
>>> from gtime.feature_generation import Constant
>>> X = pd.DataFrame(range(0, 5), index=pd.date_range(start='2019-04-18',  end='2019-04-22', freq='d'))
>>> constant = Constant(constant=3)
>>> constant.fit_transform(X)
            0__Constant
2019-04-18          3.0
2019-04-19          3.0
2019-04-20          3.0
2019-04-21          3.0
2019-04-22          3.0
fit(X: DataFrame, y=None) Constant

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfConstant

Returns self.

get_feature_names()

Return feature names for output features.

Returns

output_feature_namesndarray, shape (n_output_features,)

Array of feature names.

transform(time_series: Optional[DataFrame] = None) DataFrame

Generate a pd.DataFrame with one column with the same length as time_series and with the same index, containing a value equal to constant.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None

The input DataFrame. If passed, the output DataFrame is going to have the same index as time_series.

Returns

constant_series_renamedpd.DataFrame, shape (length, 1)

A constant series, with the same length of X and with the same index.

class gtime.feature_generation.PeriodicSeasonal(period: Union[Timedelta, str] = '365 days', amplitude: float = 0.5, start_date: Optional[Union[Timestamp, str]] = None, length: Optional[int] = 50, index_period: Optional[Union[DatetimeIndex, int]] = None)

Create a sinusoid from a given date and with a given period and amplitude.

Parameters

periodUnion[pd.Timedelta, str], optional, default: '365 days'

The period of the generated time series.

amplitudefloat, optional, default: 0.5

The amplitude of the time series.

start_dateUnion[pd.Timestamp, str], optional, default: None

The date from which to start generating the feature. This is used only if X is not passed in the transform method, otherwise the start date is inferred from it.

lengthint, optional, default: 50

The length of the sinusoid. This is used only if X is not passed in the transform method, otherwise the length is inferred from it.

index_periodUnion[DatetimeIndex, int], optional, default: None

The period of the index of the output DataFrame. This is used only if X is not passed in the transform method, otherwise the index period is taken from it.

Examples

>>> import pandas as pd
>>> from gtime.feature_generation import PeriodicSeasonal
>>> X = pd.DataFrame(range(0, 10), index=pd.date_range(start='2019-04-18',  end='2019-04-27', freq='d'))
>>> periodic = PeriodicSeasonal()
>>> periodic.fit_transform(X)
            0__PeriodicSeasonal
2019-04-18             0.000000
2019-04-19             0.008607
2019-04-20             0.017211
2019-04-21             0.025810
2019-04-22             0.034401
2019-04-23             0.042982
2019-04-24             0.051551
2019-04-25             0.060104
2019-04-26             0.068639
2019-04-27             0.077154
fit(X: DataFrame, y=None) PeriodicSeasonal

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfPeriodicSeasonal

Returns self.

transform(time_series: Optional[DataFrame] = None) DataFrame

Generate a sinusoid, with the given period, amplitude and length, starting from the selected start_date. If time_series is not None, the start_date is replaced by the start date of the time series and the output sinusoid will have the same index as time_series.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None

The input DataFrame, If passed, the output DataFrame is going to have the same index as time_series. If is not passed, then the start_date and index_period must have been passed in the constructor when the object was instantiated.

Returns

periodic_featurepd.DataFrame, shape (n_samples, 1)

The DataFrame containing the generated period feature.

Raises

ValueError

Raised if time_series is not passed and the start_date or the index_period are not present.