Forecasting

The gtime.forecasting module contains a collection of machine learning models, for dealing with time series data.

class gtime.forecasting.AverageForecaster

Predicts all future data points as an average of all train items and all test items prior to is

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import AverageForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = AverageForecaster()
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.510285  0.510285  0.510285  0.510285  0.510285
2011-12-29  0.511362  0.511362  0.511362  0.511362  0.511362
2011-12-30  0.511445  0.511445  0.511445  0.511445  0.511445
2011-12-31  0.512714  0.512714  0.512714  0.512714  0.512714
2012-01-01  0.513053  0.513053  0.513053  0.513053  0.513053

fit(X: DataFrame, y: DataFrame)

Stores average of all train data points

Parameters

X : pd.DataFrame, shape (n_samples, n_features), train sample, required for compatibility, not used for a naive model.

yNone: Used to store the predict feature names and prediction horizon.

Returns

selfSeasonalNaiveForecaster: Returns self.

class gtime.forecasting.DriftForecaster

Simple drift model, calculates drift as the difference between the first and the last elements of the train series, divided by the number of periods.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import DriftForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = DriftForecaster()
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.143006  0.142682  0.142359  0.142035  0.141712
2011-12-29  0.901308  0.900985  0.900661  0.900338  0.900015
2011-12-30  0.541559  0.541236  0.540912  0.540589  0.540265
2011-12-31  0.974740  0.974417  0.974093  0.973770  0.973446
2012-01-01  0.636604  0.636281  0.635957  0.635634  0.635311

fit(X: DataFrame, y: DataFrame)

Calculates and stores the drift as a difference between the first and the last observations of the train set divided by number of observations.

Parameters

X : pd.DataFrame, shape (n_samples, n_features), train sample.

ypd.DataFrame: Used to store the predict feature names and prediction horizon.

Returns

selfDriftForecaster: Returns self.

class gtime.forecasting.GAR(estimator, explainer_type: Optional[str] = None, n_jobs: Optional[int] = None)

Generalized Auto Regression model.

This model is a wrapper of sklearn.multioutput.MultiOutputRegressor but returns a pd.DataFrame.

Fit one model for each target variable contained in the y matrix.

Parameters

estimatorestimator object, required: The model used to make the predictions step by step. Regressor object such as derived from RegressorMixin.
n_jobsint, optional, default: None: The number of jobs to use for the parallelization.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from gtime.forecasting import GAR
>>> from sklearn.ensemble import RandomForestRegressor
>>> time_index = pd.date_range("2020-01-01", "2020-01-30")
>>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index)
>>> y_columns = ["y_1", "y_2", "y_3"]
>>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns)
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>> random_forest = RandomForestRegressor()
>>> gar = GAR(estimator=random_forest)
>>> gar.fit(X_train, y_train)
>>> predictions = gar.predict(X_test)
>>> predictions.shape
(10, 3)

fit(X: DataFrame, y: DataFrame, sample_weight=None, **kwargs)

Fit the model.

Train the models, one for each target variable in y.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required.: The data.
ypd.DataFrame, shape (n_samples, horizon), required.: The matrix containing the target variables.

Returns

self : object

predict(X: DataFrame, **kwargs) → DataFrame

For each row in X, make a prediction for each fitted model, from 1 to horizon.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required: The data.

Returns

y_p_dfpd.DataFrame, shape (n_samples, horizon): The predictions, one for each timestep in horizon.

class gtime.forecasting.GARFF(estimator, explainer_type: Optional[str] = None)

Generalized Auto Regression model with feedforward training. This model is a wrapper of sklearn.multioutput.RegressorChain but returns a pd.DataFrame.

Fit one model for each target variable contained in the y matrix, also using the predictions of the previous model.

Parameters

estimatorestimator object, required: The model used to make the predictions step by step. Regressor object such as derived from RegressorMixin.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from gtime.forecasting import GARFF
>>> from sklearn.ensemble import RandomForestRegressor
>>> time_index = pd.date_range("2020-01-01", "2020-01-30")
>>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index)
>>> y_columns = ["y_1", "y_2", "y_3"]
>>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns)
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>> random_forest = RandomForestRegressor()
>>> garff = GARFF(estimator=random_forest)
>>> garff.fit(X_train, y_train)
>>> predictions = garff.predict(X_test)
>>> predictions.shape
(10, 3)

Notes

sklearn.multioutput.RegressorChain order, cv and random_state parameters were set to None due to target order importance in a time-series forecasting context.

fit(X: DataFrame, y: DataFrame, **kwargs)

Fit the models, one for each time step. Each model is trained on the initial set of features and on the true values of the previous steps.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required: The data.
ypd.DataFrame, shape (n_samples, horizon), required: The matrix containing the target variables.

Returns

selfobject: The fitted object.

predict(X: DataFrame, **kwargs) → DataFrame

For each row in X, make a prediction for each fitted model, from 1 to horizon.

Parameters

Xpd.DataFrame, shape (n_samples, n_features): The data.

Returns

y_p_dfpd.DataFrame, shape (n_samples, horizon): The predictions, one for each timestep in horizon.

class gtime.forecasting.HedgeForecaster(learning_rate: float = 0.001, loss: callable = <function l1>, random_state=None)

Regressor model using Hedge algorithm.

This algorithm is based on a multiplicative weight update method to create a dynamic combination of regressive models. In theory, there is no common training phase on data, only the loss is necessary to update the model.

Parameters

learning_ratefloat, (default=0.001): The factor to use for the weight update.
losscallable, optional (default=`gtime.forecasting.online.l1`): Loss function use to compute loss matrix.
random_stateint, RandomState instance or None, optional (default=None): Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features). # TODO: write glossary See Glossary for details.

Attributes

loss_matrix_array, (n_samples, n_experts): Loss matrix between X and y.
total_loss_int or float,: Sum of losses based on Hedge algorithm decisions.
weights_array, (n_experts): Last weight of each expert.
decisions_array, (n_samples): Indices of chosen expert depending on weights.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.forecasting.online import HedgeForecaster
>>> time_index = pd.date_range("2020-01-01", "2020-01-20")
>>> X = pd.DataFrame(np.random.randint(4, size=(20, 3)), index=time_index)
>>> y = pd.DataFrame(np.random.randint(4, size=(20, 1)), index=time_index, columns=["y_1"])
>>> hr = HedgeForecaster(random_state=42)
>>> hr.fit_predict(X, y).head()
            0
2020-01-01  2
2020-01-02  0
2020-01-03  3
2020-01-04  3
2020-01-05  2
>>> print(f"Estimator weights: {hr.weights_}")
Estimator weights: [0.97713925 0.97723619 0.97980439]
>>> print(f"Decisions: {hr.decisions_}")
Decisions: [1 2 2 1 0 0 0 2 1 2 0 2 2 0 0 0 0 1 1 0]
>>> print(f"Total loss: {hr.total_loss_}")
Total loss: 30

fit(X, y)

Fit the model to data, compute weights and decisions iteratively.

Parameters

Xarray-like, shape (n_samples, n_features): Data.

Returns

self : object

fit_predict(X, y)

Fit and predict variable using Hedge algorithm.

Parameters

X(sparse) array-like, shape (n_samples, n_features): Data.
y(sparse) array-like, shape (n_samples, n_outputs): Predictions.

Returns

predictionspd.DataFrame: Predictions.

class gtime.forecasting.MultiFeatureGAR(estimator: RegressorMixin, explainer_type: Optional[str] = None, target_to_features_dict: Optional[Dict[str, List[str]]] = None)

Generalized Auto Regression model.

This model is a wrapper of MultiFeatureMultiOutputRegressor but returns a pd.DataFrame.

Fit one model for each target variable contained in the y matrix. You can select the feature columns to use for each model

Parameters

estimatorestimator object, required: The model used to make the predictions step by step. Regressor object such as derived from RegressorMixin.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from gtime.forecasting import MultiFeatureGAR
>>> from sklearn.ensemble import RandomForestRegressor
>>>
>>> time_index = pd.date_range("2020-01-01", "2020-01-30")
>>> X_columns = ['c1', 'c2', 'c3', 'c4', 'c5']
>>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index, columns=X_columns)
>>> y_columns = ["y_1", "y_2", "y_3"]
>>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns)
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>>
>>> random_forest = RandomForestRegressor()
>>> gar = MultiFeatureGAR(estimator=random_forest)
>>>
>>> target_to_features_dict = {'y_1': ['c1','c2','c3'], 'y_2': ['c1','c2','c4'], 'y_3': ['c1','c2','c5']}
>>> gar.fit(X_train, y_train, target_to_features_dict)
>>>
>>> predictions = gar.predict(X_test)
>>> predictions.shape
(10, 3)

fit(X: DataFrame, y: DataFrame, **kwargs)

Fit the model.

Train the models, one for each target variable in y.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required.: The data.
ypd.DataFrame, shape (n_samples, horizon), required.: The matrix containing the target variables.

Returns

self : object

predict(X: DataFrame) → DataFrame

For each row in X, make a prediction for each fitted model, from 1 to horizon.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required: The data.

Returns

y_p_dfpd.DataFrame, shape (n_samples, horizon): The predictions, one for each timestep in horizon.

class gtime.forecasting.MultiFeatureMultiOutputRegressor(estimator: RegressorMixin, target_to_features_dict: Optional[Dict[int, List[int]]] = None)

Multi target regression with option to choose the features for each target.

This strategy consists of fitting one regressor per target. It is built over sklearn.multioutput.MultiOutputRegressor. Compared to this, it allows to choose different features for each regressor.

Parameters

estimator: RegressorMixin, required: An estimator object implementing fit and predict.

Examples

>>> import numpy as np
>>> from gtime.regressors import MultiFeatureMultiOutputRegressor
>>> from sklearn.ensemble import RandomForestRegressor
>>> X = np.random.random((30, 5))
>>> y = np.random.random((30, 3))
>>> X_train, y_train = X[:20], y[:20]
>>> X_test, y_test = X[20:], y[20:]
>>>
>>> random_forest = RandomForestRegressor()
>>> regressor = MultiFeatureMultiOutputRegressor(estimator=random_forest)
>>>
>>> target_to_features_dict = {0: [0,1,2], 1: [0,1,3], 2: [0,1,4]}
>>> regressor.fit(X_train, y_train, target_to_features_dict=target_to_features_dict)
>>>
>>> predictions = regressor.predict(X_test)
>>> predictions.shape
(10, 3)

fit(X: ndarray, y: ndarray, **kwargs)

Fit the model.

Train the models, one for each target variable in y.

Parameters

Xnp.ndarray, shape (n_samples, n_features), required.: The data.
ynp.ndarray, shape (n_samples, horizon), required.: The matrix containing the target variables.

Returns

self : object

predict(X: ndarray) → ndarray

For each row in X, make a prediction for each fitted model

Parameters

Xnp.ndarray, shape (n_samples, n_features), required: The data.

Returns

predictionsnp.ndarray, shape (n_samples, horizon): The predictions

class gtime.forecasting.NaiveForecaster

Naïve model, all predicted values are equal to the most recent available observation.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import NaiveForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = NaiveForecaster()
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.143006  0.143006  0.143006  0.143006  0.143006
2011-12-29  0.901308  0.901308  0.901308  0.901308  0.901308
2011-12-30  0.541559  0.541559  0.541559  0.541559  0.541559
2011-12-31  0.974740  0.974740  0.974740  0.974740  0.974740
2012-01-01  0.636604  0.636604  0.636604  0.636604  0.636604

class gtime.forecasting.SeasonalNaiveForecaster(seasonal_length: int)

Seasonal naïve model. The forecast is expected to follow a seasonal pattern of seasonal_length data points, which is determined by the last seasonal_length observations of a training dataset available.

Parameters

seasonal_length: int, required: Length of a full seasonal cycle in number of periods. Period length is inferred from the training data.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import SeasonalNaiveForecaster
>>> idx = pd.period_range(start='2011-01-01', end='2012-01-01')
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1'])
>>> y = horizon_shift(df, horizon=5)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y)
>>> m = SeasonalNaiveForecaster(seasonal_length=3)
>>> m.fit(X_train, y_train).predict(X_test)
                 y_1       y_2       y_3       y_4       y_5
2011-12-28  0.990472  0.300248  0.782749  0.990472  0.300248
2011-12-29  0.300248  0.782749  0.990472  0.300248  0.782749
2011-12-30  0.782749  0.990472  0.300248  0.782749  0.990472
2011-12-31  0.990472  0.300248  0.782749  0.990472  0.300248
2012-01-01  0.300248  0.782749  0.990472  0.300248  0.782749

fit(X: DataFrame, y: DataFrame)

Stores the seasonal pattern from the last self.seasonal_length observations

Parameters

X : pd.DataFrame, shape (n_samples, n_features), train sample.

yNone: Used to store the predict feature names and prediction horizon.

Returns

selfSeasonalNaiveForecaster: Returns self.

class gtime.forecasting.TrendForecaster(trend: str, trend_x0: ~numpy.array, loss: ~typing.Callable = <function mean_squared_error>, method: str = 'BFGS')

Trend forecasting model.

This estimator optimizes a trend function on train data and will forecast using this trend function with optimized parameters.

Parameters

trend"polynomial" | "exponential", required: The kind of trend removal to apply.
trend_x0np.array, required: Initialisation parameters passed to the trend function
lossCallable, optional, default: mean_squared_error: Loss function to minimize.
methodstr, optional, default: "BFGS": Loss function optimisation method

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from gtime.model_selection import horizon_shift, FeatureSplitter
>>> from gtime.forecasting import TrendForecaster
>>>
>>> X = pd.DataFrame(np.random.random((10, 1)), index=pd.date_range("2020-01-01", "2020-01-10"))
>>> y = horizon_shift(X, horizon=2)
>>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(X, y)
>>>
>>> tf = TrendForecaster(trend='polynomial', trend_x0=np.zeros(2))
>>> tf.fit(X_train).predict(X_test)
array([[0.39703029],
       [0.41734957]])

fit(X: DataFrame, y=None) → TrendForecaster

Fit the estimator.

Parameters

Xpd.DataFrame, shape (n_samples, n_features), required: Input data.
yNone: There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns

selfobject: Returns self.

predict(X: DataFrame) → DataFrame

Using the fitted polynomial, predict the values starting from X.

Parameters

X: pd.DataFrame, shape (n_samples, 1), required: The time series on which to predict.

Returns

predictionspd.DataFrame, shape (n_samples, 1): The output predictions.

Raises

NotFittedError: Raised if the model is not fitted yet.