Forecasting
The gtime.forecasting
module contains a collection of machine learning models,
for dealing with time series data.
- class gtime.forecasting.AverageForecaster
Predicts all future data points as an average of all train items and all test items prior to is
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.model_selection import horizon_shift, FeatureSplitter >>> from gtime.forecasting import AverageForecaster >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(1) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> y = horizon_shift(df, horizon=5) >>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y) >>> m = AverageForecaster() >>> m.fit(X_train, y_train).predict(X_test) y_1 y_2 y_3 y_4 y_5 2011-12-28 0.510285 0.510285 0.510285 0.510285 0.510285 2011-12-29 0.511362 0.511362 0.511362 0.511362 0.511362 2011-12-30 0.511445 0.511445 0.511445 0.511445 0.511445 2011-12-31 0.512714 0.512714 0.512714 0.512714 0.512714 2012-01-01 0.513053 0.513053 0.513053 0.513053 0.513053
- fit(X: DataFrame, y: DataFrame)
Stores average of all train data points
Parameters
X : pd.DataFrame, shape (n_samples, n_features), train sample, required for compatibility, not used for a naive model.
- yNone
Used to store the predict feature names and prediction horizon.
Returns
- selfSeasonalNaiveForecaster
Returns self.
- class gtime.forecasting.DriftForecaster
Simple drift model, calculates drift as the difference between the first and the last elements of the train series, divided by the number of periods.
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.model_selection import horizon_shift, FeatureSplitter >>> from gtime.forecasting import DriftForecaster >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(1) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> y = horizon_shift(df, horizon=5) >>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y) >>> m = DriftForecaster() >>> m.fit(X_train, y_train).predict(X_test) y_1 y_2 y_3 y_4 y_5 2011-12-28 0.143006 0.142682 0.142359 0.142035 0.141712 2011-12-29 0.901308 0.900985 0.900661 0.900338 0.900015 2011-12-30 0.541559 0.541236 0.540912 0.540589 0.540265 2011-12-31 0.974740 0.974417 0.974093 0.973770 0.973446 2012-01-01 0.636604 0.636281 0.635957 0.635634 0.635311
- fit(X: DataFrame, y: DataFrame)
Calculates and stores the drift as a difference between the first and the last observations of the train set divided by number of observations.
Parameters
X : pd.DataFrame, shape (n_samples, n_features), train sample.
- ypd.DataFrame
Used to store the predict feature names and prediction horizon.
Returns
- selfDriftForecaster
Returns self.
- class gtime.forecasting.GAR(estimator, explainer_type: Optional[str] = None, n_jobs: Optional[int] = None)
Generalized Auto Regression model.
This model is a wrapper of
sklearn.multioutput.MultiOutputRegressor
but returns apd.DataFrame
.Fit one model for each target variable contained in the
y
matrix.Parameters
- estimatorestimator object, required
The model used to make the predictions step by step. Regressor object such as derived from
RegressorMixin
.- n_jobsint, optional, default:
None
The number of jobs to use for the parallelization.
Examples
>>> import numpy as np >>> import pandas as pd >>> from gtime.forecasting import GAR >>> from sklearn.ensemble import RandomForestRegressor >>> time_index = pd.date_range("2020-01-01", "2020-01-30") >>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index) >>> y_columns = ["y_1", "y_2", "y_3"] >>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns) >>> X_train, y_train = X[:20], y[:20] >>> X_test, y_test = X[20:], y[20:] >>> random_forest = RandomForestRegressor() >>> gar = GAR(estimator=random_forest) >>> gar.fit(X_train, y_train) >>> predictions = gar.predict(X_test) >>> predictions.shape (10, 3)
- fit(X: DataFrame, y: DataFrame, sample_weight=None, **kwargs)
Fit the model.
Train the models, one for each target variable in y.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required.
The data.
- ypd.DataFrame, shape (n_samples, horizon), required.
The matrix containing the target variables.
Returns
self : object
- predict(X: DataFrame, **kwargs) DataFrame
For each row in
X
, make a prediction for each fitted model, from 1 tohorizon
.Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required
The data.
Returns
- y_p_dfpd.DataFrame, shape (n_samples, horizon)
The predictions, one for each timestep in horizon.
- class gtime.forecasting.GARFF(estimator, explainer_type: Optional[str] = None)
Generalized Auto Regression model with feedforward training. This model is a wrapper of
sklearn.multioutput.RegressorChain
but returns apd.DataFrame
.Fit one model for each target variable contained in the
y
matrix, also using the predictions of the previous model.Parameters
- estimatorestimator object, required
The model used to make the predictions step by step. Regressor object such as derived from
RegressorMixin
.
Examples
>>> import numpy as np >>> import pandas as pd >>> from gtime.forecasting import GARFF >>> from sklearn.ensemble import RandomForestRegressor >>> time_index = pd.date_range("2020-01-01", "2020-01-30") >>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index) >>> y_columns = ["y_1", "y_2", "y_3"] >>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns) >>> X_train, y_train = X[:20], y[:20] >>> X_test, y_test = X[20:], y[20:] >>> random_forest = RandomForestRegressor() >>> garff = GARFF(estimator=random_forest) >>> garff.fit(X_train, y_train) >>> predictions = garff.predict(X_test) >>> predictions.shape (10, 3)
Notes
sklearn.multioutput.RegressorChain
order, cv and random_state parameters were set to None due to target order importance in a time-series forecasting context.- fit(X: DataFrame, y: DataFrame, **kwargs)
Fit the models, one for each time step. Each model is trained on the initial set of features and on the true values of the previous steps.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required
The data.
- ypd.DataFrame, shape (n_samples, horizon), required
The matrix containing the target variables.
Returns
- selfobject
The fitted object.
- class gtime.forecasting.HedgeForecaster(learning_rate: float = 0.001, loss: callable = <function l1>, random_state=None)
Regressor model using Hedge algorithm.
This algorithm is based on a multiplicative weight update method to create a dynamic combination of regressive models. In theory, there is no common training phase on data, only the loss is necessary to update the model.
Parameters
- learning_ratefloat, (default=0.001)
The factor to use for the weight update.
- losscallable, optional (default=`gtime.forecasting.online.l1`)
Loss function use to compute loss matrix.
- random_stateint, RandomState instance or None, optional (default=None)
Controls both the randomness of the bootstrapping of the samples used when building trees (if
bootstrap=True
) and the sampling of the features to consider when looking for the best split at each node (ifmax_features < n_features
). # TODO: write glossary See Glossary for details.
Attributes
- loss_matrix_array, (n_samples, n_experts)
Loss matrix between X and y.
- total_loss_int or float,
Sum of losses based on Hedge algorithm decisions.
- weights_array, (n_experts)
Last weight of each expert.
- decisions_array, (n_samples)
Indices of chosen expert depending on weights.
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.forecasting.online import HedgeForecaster >>> time_index = pd.date_range("2020-01-01", "2020-01-20") >>> X = pd.DataFrame(np.random.randint(4, size=(20, 3)), index=time_index) >>> y = pd.DataFrame(np.random.randint(4, size=(20, 1)), index=time_index, columns=["y_1"]) >>> hr = HedgeForecaster(random_state=42) >>> hr.fit_predict(X, y).head() 0 2020-01-01 2 2020-01-02 0 2020-01-03 3 2020-01-04 3 2020-01-05 2 >>> print(f"Estimator weights: {hr.weights_}") Estimator weights: [0.97713925 0.97723619 0.97980439] >>> print(f"Decisions: {hr.decisions_}") Decisions: [1 2 2 1 0 0 0 2 1 2 0 2 2 0 0 0 0 1 1 0] >>> print(f"Total loss: {hr.total_loss_}") Total loss: 30
- class gtime.forecasting.MultiFeatureGAR(estimator: RegressorMixin, explainer_type: Optional[str] = None, target_to_features_dict: Optional[Dict[str, List[str]]] = None)
Generalized Auto Regression model.
This model is a wrapper of
MultiFeatureMultiOutputRegressor
but returns apd.DataFrame
.Fit one model for each target variable contained in the
y
matrix. You can select the feature columns to use for each modelParameters
- estimatorestimator object, required
The model used to make the predictions step by step. Regressor object such as derived from
RegressorMixin
.
Examples
>>> import numpy as np >>> import pandas as pd >>> from gtime.forecasting import MultiFeatureGAR >>> from sklearn.ensemble import RandomForestRegressor >>> >>> time_index = pd.date_range("2020-01-01", "2020-01-30") >>> X_columns = ['c1', 'c2', 'c3', 'c4', 'c5'] >>> X = pd.DataFrame(np.random.random((30, 5)), index=time_index, columns=X_columns) >>> y_columns = ["y_1", "y_2", "y_3"] >>> y = pd.DataFrame(np.random.random((30, 3)), index=time_index, columns=y_columns) >>> X_train, y_train = X[:20], y[:20] >>> X_test, y_test = X[20:], y[20:] >>> >>> random_forest = RandomForestRegressor() >>> gar = MultiFeatureGAR(estimator=random_forest) >>> >>> target_to_features_dict = {'y_1': ['c1','c2','c3'], 'y_2': ['c1','c2','c4'], 'y_3': ['c1','c2','c5']} >>> gar.fit(X_train, y_train, target_to_features_dict) >>> >>> predictions = gar.predict(X_test) >>> predictions.shape (10, 3)
- fit(X: DataFrame, y: DataFrame, **kwargs)
Fit the model.
Train the models, one for each target variable in y.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required.
The data.
- ypd.DataFrame, shape (n_samples, horizon), required.
The matrix containing the target variables.
Returns
self : object
- class gtime.forecasting.MultiFeatureMultiOutputRegressor(estimator: RegressorMixin, target_to_features_dict: Optional[Dict[int, List[int]]] = None)
Multi target regression with option to choose the features for each target.
This strategy consists of fitting one regressor per target. It is built over sklearn.multioutput.MultiOutputRegressor. Compared to this, it allows to choose different features for each regressor.
Parameters
- estimator: RegressorMixin, required
An estimator object implementing fit and predict.
Examples
>>> import numpy as np >>> from gtime.regressors import MultiFeatureMultiOutputRegressor >>> from sklearn.ensemble import RandomForestRegressor >>> X = np.random.random((30, 5)) >>> y = np.random.random((30, 3)) >>> X_train, y_train = X[:20], y[:20] >>> X_test, y_test = X[20:], y[20:] >>> >>> random_forest = RandomForestRegressor() >>> regressor = MultiFeatureMultiOutputRegressor(estimator=random_forest) >>> >>> target_to_features_dict = {0: [0,1,2], 1: [0,1,3], 2: [0,1,4]} >>> regressor.fit(X_train, y_train, target_to_features_dict=target_to_features_dict) >>> >>> predictions = regressor.predict(X_test) >>> predictions.shape (10, 3)
- class gtime.forecasting.NaiveForecaster
Naïve model, all predicted values are equal to the most recent available observation.
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.model_selection import horizon_shift, FeatureSplitter >>> from gtime.forecasting import NaiveForecaster >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(1) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> y = horizon_shift(df, horizon=5) >>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y) >>> m = NaiveForecaster() >>> m.fit(X_train, y_train).predict(X_test) y_1 y_2 y_3 y_4 y_5 2011-12-28 0.143006 0.143006 0.143006 0.143006 0.143006 2011-12-29 0.901308 0.901308 0.901308 0.901308 0.901308 2011-12-30 0.541559 0.541559 0.541559 0.541559 0.541559 2011-12-31 0.974740 0.974740 0.974740 0.974740 0.974740 2012-01-01 0.636604 0.636604 0.636604 0.636604 0.636604
- class gtime.forecasting.SeasonalNaiveForecaster(seasonal_length: int)
Seasonal naïve model. The forecast is expected to follow a seasonal pattern of
seasonal_length
data points, which is determined by the lastseasonal_length
observations of a training dataset available.Parameters
- seasonal_length: int, required
Length of a full seasonal cycle in number of periods. Period length is inferred from the training data.
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.model_selection import horizon_shift, FeatureSplitter >>> from gtime.forecasting import SeasonalNaiveForecaster >>> idx = pd.period_range(start='2011-01-01', end='2012-01-01') >>> np.random.seed(1) >>> df = pd.DataFrame(np.random.random((len(idx), 1)), index=idx, columns=['1']) >>> y = horizon_shift(df, horizon=5) >>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(df, y) >>> m = SeasonalNaiveForecaster(seasonal_length=3) >>> m.fit(X_train, y_train).predict(X_test) y_1 y_2 y_3 y_4 y_5 2011-12-28 0.990472 0.300248 0.782749 0.990472 0.300248 2011-12-29 0.300248 0.782749 0.990472 0.300248 0.782749 2011-12-30 0.782749 0.990472 0.300248 0.782749 0.990472 2011-12-31 0.990472 0.300248 0.782749 0.990472 0.300248 2012-01-01 0.300248 0.782749 0.990472 0.300248 0.782749
- fit(X: DataFrame, y: DataFrame)
Stores the seasonal pattern from the last
self.seasonal_length
observationsParameters
X : pd.DataFrame, shape (n_samples, n_features), train sample.
- yNone
Used to store the predict feature names and prediction horizon.
Returns
- selfSeasonalNaiveForecaster
Returns self.
- class gtime.forecasting.TrendForecaster(trend: str, trend_x0: ~numpy.array, loss: ~typing.Callable = <function mean_squared_error>, method: str = 'BFGS')
Trend forecasting model.
This estimator optimizes a trend function on train data and will forecast using this trend function with optimized parameters.
Parameters
- trend
"polynomial"
|"exponential"
, required The kind of trend removal to apply.
- trend_x0np.array, required
Initialisation parameters passed to the trend function
- lossCallable, optional, default:
mean_squared_error
Loss function to minimize.
- methodstr, optional, default:
"BFGS"
Loss function optimisation method
Examples
>>> import pandas as pd >>> import numpy as np >>> from gtime.model_selection import horizon_shift, FeatureSplitter >>> from gtime.forecasting import TrendForecaster >>> >>> X = pd.DataFrame(np.random.random((10, 1)), index=pd.date_range("2020-01-01", "2020-01-10")) >>> y = horizon_shift(X, horizon=2) >>> X_train, y_train, X_test, y_test = FeatureSplitter().transform(X, y) >>> >>> tf = TrendForecaster(trend='polynomial', trend_x0=np.zeros(2)) >>> tf.fit(X_train).predict(X_test) array([[0.39703029], [0.41734957]])
- fit(X: DataFrame, y=None) TrendForecaster
Fit the estimator.
Parameters
- Xpd.DataFrame, shape (n_samples, n_features), required
Input data.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
Returns
- selfobject
Returns self.
- predict(X: DataFrame) DataFrame
Using the fitted polynomial, predict the values starting from
X
.Parameters
- X: pd.DataFrame, shape (n_samples, 1), required
The time series on which to predict.
Returns
- predictionspd.DataFrame, shape (n_samples, 1)
The output predictions.
Raises
- NotFittedError
Raised if the model is not fitted yet.
- trend