Time Series Models
The gtime.time_series_models
module contains time series models.
- class gtime.time_series_models.AR(p: int, horizon: Union[int, List[int]], explainer_type: Optional[str] = None)
Standard AR model for time series
Parameters
- p: int, required
p parameter in AR
- horizon: int, required
how many steps to predict in the future
Examples
>>> import pandas._testing as testing >>> from gtime.time_series_models import AR >>> >>> testing.N, testing.K = 20, 1 >>> data = testing.makeTimeDataFrame(freq="s") >>> ar = AR(p=2, horizon=3, column_name='A') >>> >>> ar.fit(data) >>> ar.predict() y_1 y_2 y_3 2000-01-01 00:00:17 0.037228 0.163446 -0.237299 2000-01-01 00:00:18 -0.139627 -0.018082 0.063273 2000-01-01 00:00:19 -0.107707 0.052031 -0.105526
- class gtime.time_series_models.Average(horizon: int)
Average model pipeline, no feature creation and
AverageModel()
as a modelParameters
horizon: int - prediction horizon, in time series periods
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.time_series_models import Average >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(0) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> model = Average(horizon=5) >>> model.fit(df) >>> model.predict() y_1 y_2 y_3 y_4 y_5 2011-12-28 0.558475 0.558475 0.558475 0.558475 0.558475 2011-12-29 0.556379 0.556379 0.556379 0.556379 0.556379 2011-12-30 0.543946 0.543946 0.543946 0.543946 0.543946 2011-12-31 0.581512 0.581512 0.581512 0.581512 0.581512 2012-01-01 0.569221 0.569221 0.569221 0.569221 0.569221
- class gtime.time_series_models.CVPipeline(models_sets: Dict, n_splits: int = 4, blocking: bool = True, metrics: Optional[Dict] = None, selection: Optional[Callable] = None)
Cross-validation for models of
time_series_models
classesParameters
models_sets: Dict, a dictionary with models as keys and model parameter grid dictionaries as values n_splits: int, number of intervals for cross-validation blocking: bool, whether to perform a basic time series split or a blocking one metrics: Dict, a dictionary with metric names as keys and metric functions as values selection: Callable, a function to select the best model given score table
Examples
>>> from gtime.preprocessing import TimeSeriesPreparation >>> from gtime.time_series_models import CVPipeline >>> from gtime.metrics import rmse, mape >>> import numpy as np >>> import pandas as pd >>> from gtime.time_series_models import Naive, AR, TimeSeriesForecastingModel >>> from gtime.forecasting import NaiveForecaster, DriftForecaster >>> from gtime.feature_extraction import MovingAverage, Polynomial, Shift >>> from sklearn.model_selection import ParameterGrid >>> idx = pd.period_range(start="2011-01-01", end="2012-01-01") >>> np.random.seed(5) >>> df = pd.DataFrame(np.random.standard_normal((len(idx), 1)), index=idx, columns=["time_series"]) >>> shift_feature = [('s3', Shift(1), ['time_series'])] >>> ma_feature = [('ma10', MovingAverage(10), ['time_series'])] >>> scoring = {'RMSE': rmse, 'MAPE': mape} >>> models = { ... TimeSeriesForecastingModel: {'features': [shift_feature, ma_feature], ... 'horizon': [3, 5], ... 'model': [NaiveForecaster(), DriftForecaster()]}, ... Naive: {'horizon': [3, 5, 9]}, ... AR: {'horizon': [3, 5], ... 'p': [2, 3]} ... } >>> c = CVPipeline(models_sets=models, metrics=scoring) >>> c.fit(df).predict() y_1 y_2 y_3 y_4 y_5 2011-12-28 0.025198 0.005753 0.041398 0.008531 -0.053772 2011-12-29 0.024587 0.004619 -0.021253 -0.086931 -0.012732 2011-12-30 0.000045 0.011903 -0.055153 0.007690 0.151219 2011-12-31 0.025556 0.006280 0.071624 0.054207 -0.073940 2012-01-01 -0.017229 0.018712 -0.000043 0.199268 0.219392
- fit(X: DataFrame, y: Optional[DataFrame] = None, refit: Union[str, List] = 'best')
Performs cross-validation, selecting the best model from
self.model_list
according toself.selection
and refits all the models on all available data.Parameters
X: pd.DataFrame, input time series y: pd.DataFrame, left for compatibility, not used refit: Union[str, List], models to refit on whole train data,
all
,best
or a list of model keysReturns
self: CVPipeline
- class gtime.time_series_models.Drift(horizon: int)
Simple drift model pipeline, no feature creation and
DriftModel()
as a modelParameters
horizon: int - prediction horizon, in time series periods
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.time_series_models import Drift >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(0) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> model = Drift(horizon=5) >>> model.fit(df) >>> model.predict()
y_1 y_2 y_3 y_4 y_5
2011-12-28 0.903984 0.902982 0.901980 0.900978 0.899976 2011-12-29 0.543806 0.542804 0.541802 0.540800 0.539798 2011-12-30 0.456911 0.455910 0.454908 0.453906 0.452904 2011-12-31 0.882041 0.881040 0.880038 0.879036 0.878034 2012-01-01 0.458604 0.457602 0.456600 0.455598 0.454596
- class gtime.time_series_models.Naive(horizon: int)
Naive model pipeline, no feature creation and
NaiveModel()
as a modelParameters
horizon: int - prediction horizon, in time series periods
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.time_series_models import Naive >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(0) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> model = Naive(horizon=4) >>> model.fit(df) >>> model.predict() y_1 y_2 y_3 y_4 2011-12-29 0.543806 0.543806 0.543806 0.543806 2011-12-30 0.456911 0.456911 0.456911 0.456911 2011-12-31 0.882041 0.882041 0.882041 0.882041 2012-01-01 0.458604 0.458604 0.458604 0.458604
- class gtime.time_series_models.SeasonalNaive(horizon: int, seasonal_length: int)
Seasonal naive model pipeline, no feature creation and
SeasonalNaiveModel()
as a modelParameters
horizon: int - prediction horizon, in time series periods seasonal_length: int - full season cycle length, in time series periods
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.time_series_models import SeasonalNaive >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(0) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> model = SeasonalNaive(horizon=5, seasonal_length=4) >>> model.fit(df) >>> model.predict()
y_1 y_2 y_3 y_4 y_5
2011-12-28 0.392676 0.956406 0.187131 0.128861 0.392676 2011-12-29 0.956406 0.187131 0.128861 0.392676 0.956406 2011-12-30 0.187131 0.128861 0.392676 0.956406 0.187131 2011-12-31 0.128861 0.392676 0.956406 0.187131 0.128861 2012-01-01 0.392676 0.956406 0.187131 0.128861 0.392676
- class gtime.time_series_models.TimeSeriesForecastingModel(features: List[Tuple], horizon: Union[int, List[int]], model: RegressorMixin, cache_features: bool = False)
Base class for a generic time series forecasting model.
Internally this class follows our approach for time series forecasting: - feature creation - train and test split - forecasting model training - prediction
Parameters
- featuresList[Tuple]], required
input of class FeatureCreation, which inherits from
sklearn.compose.ColumnTransformer
. It is used internally to instantiate FeatureCreation- horizonUnion[int, List[int], required
how many steps to predict in the future
- model: RegressorMixin, required
forecasting model used for predictions
Examples
>>> import pandas._testing as testing >>> from sklearn.linear_model import LinearRegression >>> from gtime.feature_extraction import Shift, MovingAverage >>> from gtime.forecasting import GAR >>> from gtime.time_series_models import TimeSeriesForecastingModel >>> >>> testing.N, testing.K = 20, 1 >>> data = testing.makeTimeDataFrame(freq="s") >>> features = [('s1', Shift(1), ['A']), ('ma3', MovingAverage(window_size=3), ['A'])] >>> lr = LinearRegression() >>> time_series_pipeline = TimeSeriesForecastingModel(features=features, horizon=3, model=GAR(lr)) >>> >>> time_series_pipeline.fit(data) >>> time_series_pipeline.predict() y_1 y_2 y_3 2000-01-01 00:00:17 0.574204 -0.147355 0.449696 2000-01-01 00:00:18 0.034620 0.308283 -0.113223 2000-01-01 00:00:19 0.801922 0.178843 0.518739
- fit(X: DataFrame, y: Optional[DataFrame] = None, only_model: bool = False, **kwargs)
Fit function for a time series forecasting model.
It does the following: - creates the X and y feature matrices - splits them into train and test - train the forecasting model on the train
Parameters
- Xpd.DataFrame, required
input time series
- ypd.DataFrame, optional, default:
None
added for compatibility reasons with
sklearn.compose.ColumnTransformer
- only_model: bool, optional, default:
False
if True only th model part is run, not the feature part. It is useful if the feature computation is expensive.
Returns
self
- predict(X=None, **kwargs)
Predict
Parameters
- Xpd.DataFrame, optional, default:
None
time series to predict, optional. If not present, it predicts on the time series given as input in
self.fit()
Returns
predictions: pd.DataFrame
- Xpd.DataFrame, optional, default:
- score(X: Optional[DataFrame] = None, y: Optional[DataFrame] = None, metrics: Optional[Dict] = None) DataFrame
Returns a pd.DataFrame of train and test scores of all metrics provided
Parameters
X: pd.DataFrame, test data, if None self.X_test_ is used y: pd.DataFrame, true y values, if None self.y_test_ is used metrics: Dict, a dictionary of metric names and callables, default is
rmse
Returns
score: pd.DataFrame