Forecasting

The gtime.forecasting module contains a collection of machine learning models, for dealing with time series data.

class gtime.forecasting.AverageForecaster

Predicts all future data points as an average of all train items and all test items prior to is

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import AverageForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = AverageForecaster()
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.510285  0.510285  0.510285  0.510285  0.510285
2011-12-29  0.511362  0.511362  0.511362  0.511362  0.511362
2011-12-30  0.511445  0.511445  0.511445  0.511445  0.511445
2011-12-31  0.512714  0.512714  0.512714  0.512714  0.512714
2012-01-01  0.513053  0.513053  0.513053  0.513053  0.513053
fit(X: DataFrame, y: DataFrame)

Stores average of all train data points

Parameters

X : pd.DataFrame, shape (n_samples, n_features), train sample, required for compatibility, not used for a naive model.

yNone

Used to store the predict feature names and prediction horizon.

Returns

selfSeasonalNaiveForecaster

Returns self.

class gtime.forecasting.DriftForecaster

Simple drift model, calculates drift as the difference between the first and the last elements of the train series, divided by the number of periods.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import DriftForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = DriftForecaster()
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.143006  0.142682  0.142359  0.142035  0.141712
2011-12-29  0.901308  0.900985  0.900661  0.900338  0.900015
2011-12-30  0.541559  0.541236  0.540912  0.540589  0.540265
2011-12-31  0.974740  0.974417  0.974093  0.973770  0.973446
2012-01-01  0.636604  0.636281  0.635957  0.635634  0.635311
fit(X: DataFrame, y: DataFrame)

Calculates and stores the drift as a difference between the first and the last observations of the train set divided by number of observations.

Parameters

X : pd.DataFrame, shape (n_samples, n_features), train sample.

ypd.DataFrame

Used to store the predict feature names and prediction horizon.

Returns

selfDriftForecaster

Returns self.

class gtime.forecasting.GAR(estimator, explainer_type: Optional[str] = None, n_jobs: Optional[int] = None)

Generalized Auto Regression model.

This model is a wrapper of sklearn.multioutput.MultiOutputRegressor but returns a pd.DataFrame.

Fit one model for each target variable contained in the y matrix.

Parameters

estimatorestimator object, required

The model used to make the predictions step by step. Regressor object such as derived from RegressorMixin.

n_jobsint, optional, default: None

The number of jobs to use for the parallelization.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from gtime.forecasting import GAR
>>> from sklearn.ensemble import RandomForestRegressor
>>> time_index = pd.date_range("2020-01-01", "2020-01-30")
>>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index)
>>> y_columns = ["y_1", "y_2", "y_3"]
>>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns)
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>> random_forest = RandomForestRegressor()
>>> gar = GAR(estimator=random_forest)
>>> gar.fit(X_train, y_train)
>>> predictions = gar.predict(X_test)
>>> predictions.shape
(10, 3)
fit(X: DataFrame, y: DataFrame, sample_weight=None, **kwargs)

Fit the model.

Train the models, one for each target variable in y.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required.

The data.

ypd.DataFrame, shape (n_samples, horizon), required.

The matrix containing the target variables.

Returns

self : object

predict(X: DataFrame, **kwargs) DataFrame

For each row in X, make a prediction for each fitted model, from 1 to horizon.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required

The data.

Returns

y_p_dfpd.DataFrame, shape (n_samples, horizon)

The predictions, one for each timestep in horizon.

class gtime.forecasting.GARFF(estimator, explainer_type: Optional[str] = None)

Generalized Auto Regression model with feedforward training. This model is a wrapper of sklearn.multioutput.RegressorChain but returns a pd.DataFrame.

Fit one model for each target variable contained in the y matrix, also using the predictions of the previous model.

Parameters

estimatorestimator object, required

The model used to make the predictions step by step. Regressor object such as derived from RegressorMixin.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from gtime.forecasting import GARFF
>>> from sklearn.ensemble import RandomForestRegressor
>>> time_index = pd.date_range("2020-01-01", "2020-01-30")
>>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index)
>>> y_columns = ["y_1", "y_2", "y_3"]
>>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns)
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>> random_forest = RandomForestRegressor()
>>> garff = GARFF(estimator=random_forest)
>>> garff.fit(X_train, y_train)
>>> predictions = garff.predict(X_test)
>>> predictions.shape
(10, 3)

Notes

sklearn.multioutput.RegressorChain order, cv and random_state parameters were set to None due to target order importance in a time-series forecasting context.

fit(X: DataFrame, y: DataFrame, **kwargs)

Fit the models, one for each time step. Each model is trained on the initial set of features and on the true values of the previous steps.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required

The data.

ypd.DataFrame, shape (n_samples, horizon), required

The matrix containing the target variables.

Returns

selfobject

The fitted object.

predict(X: DataFrame, **kwargs) DataFrame

For each row in X, make a prediction for each fitted model, from 1 to horizon.

Parameters

Xpd.DataFrame, shape (n_samples, n_features)

The data.

Returns

y_p_dfpd.DataFrame, shape (n_samples, horizon)

The predictions, one for each timestep in horizon.

class gtime.forecasting.HedgeForecaster(learning_rate: float = 0.001, loss: callable = <function l1>, random_state=None)

Regressor model using Hedge algorithm.

This algorithm is based on a multiplicative weight update method to create a dynamic combination of regressive models. In theory, there is no common training phase on data, only the loss is necessary to update the model.

Parameters

learning_ratefloat, (default=0.001)

The factor to use for the weight update.

losscallable, optional (default=`gtime.forecasting.online.l1`)

Loss function use to compute loss matrix.

random_stateint, RandomState instance or None, optional (default=None)

Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features). # TODO: write glossary See Glossary for details.

Attributes

loss_matrix_array, (n_samples, n_experts)

Loss matrix between X and y.

total_loss_int or float,

Sum of losses based on Hedge algorithm decisions.

weights_array, (n_experts)

Last weight of each expert.

decisions_array, (n_samples)

Indices of chosen expert depending on weights.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.forecasting.online import HedgeForecaster
>>> time_index = pd.date_range("2020-01-01", "2020-01-20")
>>> X = pd.DataFrame(np.random.randint(4, size=(20, 3)), index=time_index)
>>> y = pd.DataFrame(np.random.randint(4, size=(20, 1)), index=time_index, columns=["y_1"])
>>> hr = HedgeForecaster(random_state=42)
>>> hr.fit_predict(X, y).head()
            0
2020-01-01  2
2020-01-02  0
2020-01-03  3
2020-01-04  3
2020-01-05  2
>>> print(f"Estimator weights: {hr.weights_}")
Estimator weights: [0.97713925 0.97723619 0.97980439]
>>> print(f"Decisions: {hr.decisions_}")
Decisions: [1 2 2 1 0 0 0 2 1 2 0 2 2 0 0 0 0 1 1 0]
>>> print(f"Total loss: {hr.total_loss_}")
Total loss: 30
fit(X, y)

Fit the model to data, compute weights and decisions iteratively.

Parameters

Xarray-like, shape (n_samples, n_features)

Data.

Returns

self : object

fit_predict(X, y)

Fit and predict variable using Hedge algorithm.

Parameters

X(sparse) array-like, shape (n_samples, n_features)

Data.

y(sparse) array-like, shape (n_samples, n_outputs)

Predictions.

Returns

predictionspd.DataFrame

Predictions.

class gtime.forecasting.MultiFeatureGAR(estimator: RegressorMixin, explainer_type: Optional[str] = None, target_to_features_dict: Optional[Dict[str, List[str]]] = None)

Generalized Auto Regression model.

This model is a wrapper of MultiFeatureMultiOutputRegressor but returns a pd.DataFrame.

Fit one model for each target variable contained in the y matrix. You can select the feature columns to use for each model

Parameters

estimatorestimator object, required

The model used to make the predictions step by step. Regressor object such as derived from RegressorMixin.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from gtime.forecasting import MultiFeatureGAR
>>> from sklearn.ensemble import RandomForestRegressor
>>>
>>> time_index = pd.date_range("2020-01-01", "2020-01-30")
>>> X_columns = ['c1', 'c2', 'c3', 'c4', 'c5']
>>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index, columns=X_columns)
>>> y_columns = ["y_1", "y_2", "y_3"]
>>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns)
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>>
>>> random_forest = RandomForestRegressor()
>>> gar = MultiFeatureGAR(estimator=random_forest)
>>>
>>> target_to_features_dict = {'y_1': ['c1','c2','c3'], 'y_2': ['c1','c2','c4'], 'y_3': ['c1','c2','c5']}
>>> gar.fit(X_train, y_train, target_to_features_dict)
>>>
>>> predictions = gar.predict(X_test)
>>> predictions.shape
(10, 3)
fit(X: DataFrame, y: DataFrame, **kwargs)

Fit the model.

Train the models, one for each target variable in y.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required.

The data.

ypd.DataFrame, shape (n_samples, horizon), required.

The matrix containing the target variables.

Returns

self : object

predict(X: DataFrame) DataFrame

For each row in X, make a prediction for each fitted model, from 1 to horizon.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required

The data.

Returns

y_p_dfpd.DataFrame, shape (n_samples, horizon)

The predictions, one for each timestep in horizon.

class gtime.forecasting.MultiFeatureMultiOutputRegressor(estimator: RegressorMixin, target_to_features_dict: Optional[Dict[int, List[int]]] = None)

Multi target regression with option to choose the features for each target.

This strategy consists of fitting one regressor per target. It is built over sklearn.multioutput.MultiOutputRegressor. Compared to this, it allows to choose different features for each regressor.

Parameters

estimator: RegressorMixin, required

An estimator object implementing fit and predict.

Examples

>>> import numpy as np
>>> from gtime.regressors import MultiFeatureMultiOutputRegressor
>>> from sklearn.ensemble import RandomForestRegressor
>>> X = np.random.random((30, 5))
>>> y = np.random.random((30, 3))
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>>
>>> random_forest = RandomForestRegressor()
>>> regressor = MultiFeatureMultiOutputRegressor(estimator=random_forest)
>>>
>>> target_to_features_dict = {0: [0,1,2], 1: [0,1,3], 2: [0,1,4]}
>>> regressor.fit(X_train, y_train, target_to_features_dict=target_to_features_dict)
>>>
>>> predictions = regressor.predict(X_test)
>>> predictions.shape
(10, 3)
fit(X: ndarray, y: ndarray, **kwargs)

Fit the model.

Train the models, one for each target variable in y.

Parameters

Xnp.ndarray, shape (n_samples, n_features), required.

The data.

ynp.ndarray, shape (n_samples, horizon), required.

The matrix containing the target variables.

Returns

self : object

predict(X: ndarray) ndarray

For each row in X, make a prediction for each fitted model

Parameters

Xnp.ndarray, shape (n_samples, n_features), required

The data.

Returns

predictionsnp.ndarray, shape (n_samples, horizon)

The predictions

class gtime.forecasting.NaiveForecaster

Naïve model, all predicted values are equal to the most recent available observation.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import NaiveForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = NaiveForecaster()
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.143006  0.143006  0.143006  0.143006  0.143006
2011-12-29  0.901308  0.901308  0.901308  0.901308  0.901308
2011-12-30  0.541559  0.541559  0.541559  0.541559  0.541559
2011-12-31  0.974740  0.974740  0.974740  0.974740  0.974740
2012-01-01  0.636604  0.636604  0.636604  0.636604  0.636604
class gtime.forecasting.SeasonalNaiveForecaster(seasonal_length: int)

Seasonal naïve model. The forecast is expected to follow a seasonal pattern of seasonal_length data points, which is determined by the last seasonal_length observations of a training dataset available.

Parameters

seasonal_length: int, required

Length of a full seasonal cycle in number of periods. Period length is inferred from the training data.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import SeasonalNaiveForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = SeasonalNaiveForecaster(seasonal_length=3)
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.990472  0.300248  0.782749  0.990472  0.300248
2011-12-29  0.300248  0.782749  0.990472  0.300248  0.782749
2011-12-30  0.782749  0.990472  0.300248  0.782749  0.990472
2011-12-31  0.990472  0.300248  0.782749  0.990472  0.300248
2012-01-01  0.300248  0.782749  0.990472  0.300248  0.782749
fit(X: DataFrame, y: DataFrame)

Stores the seasonal pattern from the last self.seasonal_length observations

Parameters

X : pd.DataFrame, shape (n_samples, n_features), train sample.

yNone

Used to store the predict feature names and prediction horizon.

Returns

selfSeasonalNaiveForecaster

Returns self.

class gtime.forecasting.TrendForecaster(trend: str, trend_x0: ~numpy.array, loss: ~typing.Callable = <function mean_squared_error>, method: str = 'BFGS')

Trend forecasting model.

This estimator optimizes a trend function on train data and will forecast using this trend function with optimized parameters.

Parameters

trend"polynomial" | "exponential", required

The kind of trend removal to apply.

trend_x0np.array, required

Initialisation parameters passed to the trend function

lossCallable, optional, default: mean_squared_error

Loss function to minimize.

methodstr, optional, default: "BFGS"

Loss function optimisation method

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import TrendForecaster
>>>
>>> X = pd.DataFrame(np.random.random((10, 1)), index=pd.date_range("2020-01-01", "2020-01-10"))
>>> y = horizon_shift(X, horizon=2)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(X, y)
>>>
>>> tf = TrendForecaster(trend='polynomial', trend_x0=np.zeros(2))
>>> tf.fit(X_train).predict(X_test)
array([[0.39703029],
       [0.41734957]])
fit(X: DataFrame, y=None) TrendForecaster

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required

Input data.

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject

Returns self.

predict(X: DataFrame) DataFrame

Using the fitted polynomial, predict the values starting from X.

Parameters

X: pd.DataFrame, shape (n_samples, 1), required

The time series on which to predict.

Returns

predictionspd.DataFrame, shape (n_samples, 1)

The output predictions.

Raises

NotFittedError

Raised if the model is not fitted yet.