Feature Generation

The gtime.feature_generation module deals with the creation of features that do not depend on the input data, but just on its index.

class gtime.feature_generation.Calendar(country: str = 'Brazil', start_date: str = '01/01/2018', end_date: str = '01/01/2020', kernel: Optional[Union[List, ndarray]] = None, reindex_method: str = 'pad', freq: Optional[str] = None)

Create a feature based on the national holidays of a specific country.

Parameters

countrystr, optional, default: 'Brazil': The name of the country from which to retrieve the holidays.
start_datestr, optional, default: '01/01/2019': The date starting from which to retrieve the holidays.
end_datestr, optional, default: '01/01/2020': The date until which to retrieve the holidays.
kernelarray-like, optional, default: None: The kernel to use when creating the feature. The holiday feature is created by taking the dot product between the kernel and the column which contains a 1 if the corresponding day is a holiday and a 0 if the day is not a holiday. The rolling window has the same size as the kernel and the calculated value of the dot product is divided by the number of holidays in the window to get the value of the holiday feature.
reindex_methodstr, optional, default: pad: Used only if X is passed in the transform method. It is used as the method with which to reindex the holiday events with the index of X. This method should be compatible with the reindex methods provided by pandas. Please refer to the pandas documentation for further details.

Examples

>>> import pandas as pd
>>> from gtime.feature_extraction import Calendar
>>> X = pd.DataFrame(range(0, 10), index=pd.period_range(start='2019-04-18',
...                  end='2019-04-27', freq='d'))
>>> cal_feature = Calendar(country="Italy", kernel=[2, 1])
>>> cal_feature.fit_transform(X)
            status__Calendar
2019-04-18               0.0
2019-04-19               0.0
2019-04-20               0.0
2019-04-21               1.0
2019-04-22               2.0
2019-04-23               0.0
2019-04-24               1.0
2019-04-25               2.0
2019-04-26               0.0
2019-04-27               0.0

fit(X: DataFrame, y=None)

Fit the estimator. Just used to be compatible with the sklearn API.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

transform(time_series: Optional[DataFrame] = None) → DataFrame

Generate a DataFrame containing the events associated to the holidays of the selected country.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None: If provided, both start_date and end_date are going to be overwritten with the start and end date of the index of time_series. Also, if provided the output DataFrame is going to be re-indexed with the index of time_series, using the chosen reindex_method.

Returns

eventspd.DataFrame, shape (length, 1): A DataFrame containing the events.

class gtime.feature_generation.Constant(constant: int = 0, length: Optional[int] = None)

Generate a pd.DataFrame with one column, of the same length as the input X and containing the value constant across the whole column.

Parameters

constantint, optional, default: 2: The value to use to generate the constant column of the pd.DataFrame.
lengthint, optional, default: 50: The length of the DataFrame to generate. This is used only if X is not passed in the transform method, otherwise the length is inferred from it.

Examples

>>> import pandas as pd
>>> from gtime.feature_generation import Constant
>>> X = pd.DataFrame(range(0, 5), index=pd.date_range(start='2019-04-18',  end='2019-04-22', freq='d'))
>>> constant = Constant(constant=3)
>>> constant.fit_transform(X)
            0__Constant
2019-04-18          3.0
2019-04-19          3.0
2019-04-20          3.0
2019-04-21          3.0
2019-04-22          3.0

fit(X: DataFrame, y=None) → Constant

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfConstant: Returns self.

get_feature_names()

Return feature names for output features.

Returns

output_feature_namesndarray, shape (n_output_features,): Array of feature names.

transform(time_series: Optional[DataFrame] = None) → DataFrame

Generate a pd.DataFrame with one column with the same length as time_series and with the same index, containing a value equal to constant.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None: The input DataFrame. If passed, the output DataFrame is going to have the same index as time_series.

Returns

constant_series_renamedpd.DataFrame, shape (length, 1): A constant series, with the same length of X and with the same index.

class gtime.feature_generation.PeriodicSeasonal(period: Union[Timedelta, str] = '365 days', amplitude: float = 0.5, start_date: Optional[Union[Timestamp, str]] = None, length: Optional[int] = 50, index_period: Optional[Union[DatetimeIndex, int]] = None)

Create a sinusoid from a given date and with a given period and amplitude.

Parameters

periodUnion[pd.Timedelta, str], optional, default: '365 days': The period of the generated time series.
amplitudefloat, optional, default: 0.5: The amplitude of the time series.
start_dateUnion[pd.Timestamp, str], optional, default: None: The date from which to start generating the feature. This is used only if X is not passed in the transform method, otherwise the start date is inferred from it.
lengthint, optional, default: 50: The length of the sinusoid. This is used only if X is not passed in the transform method, otherwise the length is inferred from it.
index_periodUnion[DatetimeIndex, int], optional, default: None: The period of the index of the output DataFrame. This is used only if X is not passed in the transform method, otherwise the index period is taken from it.

Examples

>>> import pandas as pd
>>> from gtime.feature_generation import PeriodicSeasonal
>>> X = pd.DataFrame(range(0, 10), index=pd.date_range(start='2019-04-18',  end='2019-04-27', freq='d'))
>>> periodic = PeriodicSeasonal()
>>> periodic.fit_transform(X)
            0__PeriodicSeasonal
2019-04-18             0.000000
2019-04-19             0.008607
2019-04-20             0.017211
2019-04-21             0.025810
2019-04-22             0.034401
2019-04-23             0.042982
2019-04-24             0.051551
2019-04-25             0.060104
2019-04-26             0.068639
2019-04-27             0.077154

fit(X: DataFrame, y=None) → PeriodicSeasonal

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required: Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfPeriodicSeasonal: Returns self.

transform(time_series: Optional[DataFrame] = None) → DataFrame

Generate a sinusoid, with the given period, amplitude and length, starting from the selected start_date. If time_series is not None, the start_date is replaced by the start date of the time series and the output sinusoid will have the same index as time_series.

Parameters

time_seriespd.DataFrame, shape (n_samples, 1), optional, default: None: The input DataFrame, If passed, the output DataFrame is going to have the same index as time_series. If is not passed, then the start_date and index_period must have been passed in the constructor when the object was instantiated.

Returns

periodic_featurepd.DataFrame, shape (n_samples, 1): The DataFrame containing the generated period feature.

Raises

ValueError: Raised if time_series is not passed and the start_date or the index_period are not present.