Time Series Models

The gtime.time_series_models module contains time series models.

class gtime.time_series_models.AR(p: int, horizon: Union[int, List[int]], explainer_type: Optional[str] = None)

Standard AR model for time series


p: int, required

p parameter in AR

horizon: int, required

how many steps to predict in the future


>>> import pandas._testing as testing
>>> from gtime.time_series_models import AR
>>> testing.N, testing.K = 20, 1
>>> data = testing.makeTimeDataFrame(freq="s")
>>> ar = AR(p=2, horizon=3, column_name='A')
>>> ar.fit(data)
>>> ar.predict()
                          y_1       y_2       y_3
2000-01-01 00:00:17  0.037228  0.163446 -0.237299
2000-01-01 00:00:18 -0.139627 -0.018082  0.063273
2000-01-01 00:00:19 -0.107707  0.052031 -0.105526
class gtime.time_series_models.Average(horizon: int)

Average model pipeline, no feature creation and AverageModel() as a model


horizon: int - prediction horizon, in time series periods


>>> import pandas as pd
>>> import numpy as np
>>> from gtime.time_series_models import Average
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(0)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> model = Average(horizon=5)
>>> model.fit(df)
>>> model.predict()
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.558475  0.558475  0.558475  0.558475  0.558475
2011-12-29  0.556379  0.556379  0.556379  0.556379  0.556379
2011-12-30  0.543946  0.543946  0.543946  0.543946  0.543946
2011-12-31  0.581512  0.581512  0.581512  0.581512  0.581512
2012-01-01  0.569221  0.569221  0.569221  0.569221  0.569221
class gtime.time_series_models.CVPipeline(models_sets: Dict, n_splits: int = 4, blocking: bool = True, metrics: Optional[Dict] = None, selection: Optional[Callable] = None)

Cross-validation for models of time_series_models classes


models_sets: Dict, a dictionary with models as keys and model parameter grid dictionaries as values n_splits: int, number of intervals for cross-validation blocking: bool, whether to perform a basic time series split or a blocking one metrics: Dict, a dictionary with metric names as keys and metric functions as values selection: Callable, a function to select the best model given score table


>>> from gtime.preprocessing import TimeSeriesPreparation
>>> from gtime.time_series_models import CVPipeline
>>> from gtime.metrics import rmse, mape
>>> import numpy as np
>>> import pandas as pd
>>> from gtime.time_series_models import Naive, AR, TimeSeriesForecastingModel
>>> from gtime.forecasting import NaiveForecaster, DriftForecaster
>>> from gtime.feature_extraction import MovingAverage, Polynomial, Shift
>>> from sklearn.model_selection import ParameterGrid
>>> idx = pd.period_range(start="2011-01-01", end="2012-01-01")
>>> np.random.seed(5)
>>> df = pd.DataFrame(np.random.standard_normal((len(idx), 1)), index=idx, columns=["time_series"])
>>> shift_feature = [('s3', Shift(1), ['time_series'])]
>>> ma_feature = [('ma10', MovingAverage(10), ['time_series'])]
>>> scoring = {'RMSE': rmse, 'MAPE': mape}
>>> models = {
...     TimeSeriesForecastingModel: {'features': [shift_feature, ma_feature],
...                                  'horizon': [3, 5],
...                                  'model': [NaiveForecaster(), DriftForecaster()]},
...     Naive: {'horizon': [3, 5, 9]},
...     AR: {'horizon': [3, 5],
...          'p': [2, 3]}
... }
>>> c = CVPipeline(models_sets=models, metrics=scoring)
>>> c.fit(df).predict()
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.025198  0.005753  0.041398  0.008531 -0.053772
2011-12-29  0.024587  0.004619 -0.021253 -0.086931 -0.012732
2011-12-30  0.000045  0.011903 -0.055153  0.007690  0.151219
2011-12-31  0.025556  0.006280  0.071624  0.054207 -0.073940
2012-01-01 -0.017229  0.018712 -0.000043  0.199268  0.219392
fit(X: DataFrame, y: Optional[DataFrame] = None, refit: Union[str, List] = 'best')

Performs cross-validation, selecting the best model from self.model_list according to self.selection and refits all the models on all available data.


X: pd.DataFrame, input time series y: pd.DataFrame, left for compatibility, not used refit: Union[str, List], models to refit on whole train data, all, best or a list of model keys


self: CVPipeline

predict(X: Optional[DataFrame] = None) DataFrame

Predicting with selected self.best_model_


X: pd.DataFrame, optional, default: None

time series to predict, optional. If not present, it predicts on the time series given as input in self.fit()


predictions: pd.DataFrame

class gtime.time_series_models.Drift(horizon: int)

Simple drift model pipeline, no feature creation and DriftModel() as a model


horizon: int - prediction horizon, in time series periods


>>> import pandas as pd
>>> import numpy as np
>>> from gtime.time_series_models import Drift
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(0)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> model = Drift(horizon=5)
>>> model.fit(df)
>>> model.predict()

y_1 y_2 y_3 y_4 y_5

2011-12-28 0.903984 0.902982 0.901980 0.900978 0.899976 2011-12-29 0.543806 0.542804 0.541802 0.540800 0.539798 2011-12-30 0.456911 0.455910 0.454908 0.453906 0.452904 2011-12-31 0.882041 0.881040 0.880038 0.879036 0.878034 2012-01-01 0.458604 0.457602 0.456600 0.455598 0.454596

class gtime.time_series_models.Naive(horizon: int)

Naive model pipeline, no feature creation and NaiveModel() as a model


horizon: int - prediction horizon, in time series periods


>>> import pandas as pd
>>> import numpy as np
>>> from gtime.time_series_models import Naive
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(0)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> model = Naive(horizon=4)
>>> model.fit(df)
>>> model.predict()
                 y_1       y_2       y_3       y_4
2011-12-29  0.543806  0.543806  0.543806  0.543806
2011-12-30  0.456911  0.456911  0.456911  0.456911
2011-12-31  0.882041  0.882041  0.882041  0.882041
2012-01-01  0.458604  0.458604  0.458604  0.458604
class gtime.time_series_models.SeasonalNaive(horizon: int, seasonal_length: int)

Seasonal naive model pipeline, no feature creation and SeasonalNaiveModel() as a model


horizon: int - prediction horizon, in time series periods seasonal_length: int - full season cycle length, in time series periods


>>> import pandas as pd
>>> import numpy as np
>>> from gtime.time_series_models import SeasonalNaive
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(0)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> model = SeasonalNaive(horizon=5, seasonal_length=4)
>>> model.fit(df)
>>> model.predict()

y_1 y_2 y_3 y_4 y_5

2011-12-28 0.392676 0.956406 0.187131 0.128861 0.392676 2011-12-29 0.956406 0.187131 0.128861 0.392676 0.956406 2011-12-30 0.187131 0.128861 0.392676 0.956406 0.187131 2011-12-31 0.128861 0.392676 0.956406 0.187131 0.128861 2012-01-01 0.392676 0.956406 0.187131 0.128861 0.392676

class gtime.time_series_models.TimeSeriesForecastingModel(features: List[Tuple], horizon: Union[int, List[int]], model: RegressorMixin, cache_features: bool = False)

Base class for a generic time series forecasting model.

Internally this class follows our approach for time series forecasting: - feature creation - train and test split - forecasting model training - prediction


featuresList[Tuple]], required

input of class FeatureCreation, which inherits from sklearn.compose.ColumnTransformer. It is used internally to instantiate FeatureCreation

horizonUnion[int, List[int], required

how many steps to predict in the future

model: RegressorMixin, required

forecasting model used for predictions


>>> import pandas._testing as testing
>>> from sklearn.linear_model import LinearRegression
>>> from gtime.feature_extraction import Shift, MovingAverage
>>> from gtime.forecasting import GAR
>>> from gtime.time_series_models import TimeSeriesForecastingModel
>>> testing.N, testing.K = 20, 1
>>> data = testing.makeTimeDataFrame(freq="s")
>>> features = [('s1', Shift(1), ['A']), ('ma3', MovingAverage(window_size=3), ['A'])]
>>> lr = LinearRegression()
>>> time_series_pipeline = TimeSeriesForecastingModel(features=features, horizon=3, model=GAR(lr))
>>> time_series_pipeline.fit(data)
>>> time_series_pipeline.predict()
                          y_1       y_2       y_3
2000-01-01 00:00:17  0.574204 -0.147355  0.449696
2000-01-01 00:00:18  0.034620  0.308283 -0.113223
2000-01-01 00:00:19  0.801922  0.178843  0.518739
fit(X: DataFrame, y: Optional[DataFrame] = None, only_model: bool = False, **kwargs)

Fit function for a time series forecasting model.

It does the following: - creates the X and y feature matrices - splits them into train and test - train the forecasting model on the train


Xpd.DataFrame, required

input time series

ypd.DataFrame, optional, default: None

added for compatibility reasons with sklearn.compose.ColumnTransformer

only_model: bool, optional, default: False

if True only th model part is run, not the feature part. It is useful if the feature computation is expensive.



predict(X=None, **kwargs)



Xpd.DataFrame, optional, default: None

time series to predict, optional. If not present, it predicts on the time series given as input in self.fit()


predictions: pd.DataFrame

score(X: Optional[DataFrame] = None, y: Optional[DataFrame] = None, metrics: Optional[Dict] = None) DataFrame

Returns a pd.DataFrame of train and test scores of all metrics provided


X: pd.DataFrame, test data, if None self.X_test_ is used y: pd.DataFrame, true y values, if None self.y_test_ is used metrics: Dict, a dictionary of metric names and callables, default is rmse


score: pd.DataFrame

set_model(model: BaseEstimator)

Changes the model in the pipeline


model: BaseEstimator, should be compatible with the features


Sets model parameters


params: Dict or name=value pair, parameters