Topology in time series forecasting¶

This notebook shows how giotto-tda can be used to create topological features for time series forecasting tasks, and how to integrate them into scikit-learn–compatible pipelines.

In particular, we will concentrate on topological features which are created from consecutive sliding windows over the data. In sliding window models, a single time series array X of shape (n_timestamps, n_features) is turned into a time series of windows over the data, with a new shape (n_windows, n_samples_per_window, n_features). There are two main issues that arise when building forecasting models with sliding windows:

n_windows is smaller than n_timestamps. This is because we cannot have more windows than there are timestamps without padding X, and this is not done by giotto-tda. n_timestamps - n_windows is even larger if we decide to pick a large stride between consecutive windows.
The target variable y needs to be properly “aligned” with each window so that the forecasting problem is meaningful and e.g. we don’t “leak” information from the future. In particular, y needs to be “resampled” so that it too has length n_windows.

To deal with these issues, giotto-tda provides a selection of transformers with resample, transform_resample and fit_transform_resample methods. These are inherited from a TransformerResamplerMixin base class. Furthermore, giotto-tda provides a drop-in replacement for scikit-learn’s Pipeline which extends it to allow chaining TransformerResamplerMixins with regular scikit-learn estimators.

If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.

`SlidingWindow`¶

Let us start with a simple example of a “time series” X with a corresponding target y of the same length.

import numpy as np

n_timestamps = 10
X, y = np.arange(n_timestamps), np.arange(n_timestamps) - n_timestamps
X, y

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1]))

We can instantiate our sliding window transformer-resampler and run it on the pair (X, y):

from gtda.time_series import SlidingWindow

window_size = 3
stride = 2

SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr

(array([[1, 2, 3],
        [3, 4, 5],
        [5, 6, 7],
        [7, 8, 9]]),
 array([-7, -5, -3, -1]))

We note a couple of things: - fit_transform_resample returns a pair: the window-transformed X and the resampled and aligned y. - SlidingWindow has made a choice for us on how to resample y and line it up with the windows from X: a window on X corresponds to the last value in a corresponding window over y. This is common in time series forecasting where, for example, y could be a shift of X by one timestamp. - Some of the initial values of X may not be found in X_sw. This is because SlidingWindow only ensures the last value is represented in the last window, regardless of the stride.

Endogeneous target preparation with `Labeller`¶

Let us say that we simply wish to predict the future of a time series from itself. This is very common in the study of financial markets for example. giotto-tda provides convenience classes for target preparation from a time series. This notebook only shows a very simple example: many more options are described in Labeller’s documentation.

If we wished to create a target y from X such that y[i] is equal to X[i + 1], while also modifying X and y so that they still have the same length, we could proceed as follows:

from gtda.time_series import Labeller

X = np.arange(10)

Lab = Labeller(size=1, func=np.max)
Xl, yl = Lab.fit_transform_resample(X, X)
Xl, yl

(array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([1, 2, 3, 4, 5, 6, 7, 8, 9]))

Notice that we are feeding two copies of X to fit_transform_resample in this case!

This is what fitting an end-to-end pipeline for future prediction using topology could look like. Again, you are encouraged to include your own non-topological features in the mix!

SW = SlidingWindow(size=5)
TE = TakensEmbedding(time_delay=1, dimension=2)
VR = VietorisRipsPersistence()
Ampl = Amplitude()
RFR = RandomForestRegressor()

# Full pipeline including the regressor
pipe = make_pipeline(Lab, SW, TE, VR, Ampl, RFR)
pipe

Pipeline

Pipeline(steps=[('labeller',
                 Labeller(func=, size=1)),
                ('slidingwindow', SlidingWindow(size=5)),
                ('takensembedding', TakensEmbedding()),
                ('vietorisripspersistence', VietorisRipsPersistence()),
                ('amplitude', Amplitude()),
                ('randomforestregressor', RandomForestRegressor())])

Labeller

Labeller(func=, size=1)

SlidingWindow

SlidingWindow(size=5)

TakensEmbedding

TakensEmbedding()

VietorisRipsPersistence

VietorisRipsPersistence()

Amplitude

Amplitude()

RandomForestRegressor

RandomForestRegressor()

pipe.fit(X, X)
y_pred = pipe.predict(X)
y_pred

array([6.944, 6.944, 6.944, 6.944, 6.944])

Where to next?¶

There are two additional simple TransformerResamplerMixins in gtda.time_series: Resampler and Stationarizer.
The sort of pipeline for topological feature extraction using Takens embedding is a bit crude. More sophisticated methods exist for extracting robust topological summaries from (windows on) time series. A good source of inspiration is the following paper:

Persistent Homology of Complex Networks for Dynamic State Detection, by A. Myers, E. Munch, and F. A. Khasawneh.

The module gtda.graphs contains several transformers implementing the main algorithms proposed there.
Advanced users may be interested in ConsecutiveRescaling, which can be found in gtda.point_clouds.
The notebook Lorenz attractor is an advanced use-case for TakensEmbedding and other time series forecasting techniques inspired by topology.

Topology in time series forecasting¶

See also¶

`SlidingWindow`¶

Multivariate time series example: Sliding window + topology `Pipeline`¶

Univariate time series – `TakensEmbedding` and `SingleTakensEmbedding`¶

Option 1: `SlidingWindow` + `TakensEmbedding`¶

Option 2: `SingleTakensEmbeding` + `SlidingWindow`¶

Integrating non-topological features¶

Endogeneous target preparation with `Labeller`¶

Where to next?¶

Topology in time series forecasting¶

See also¶

SlidingWindow¶

Multivariate time series example: Sliding window + topology Pipeline¶

Univariate time series – TakensEmbedding and SingleTakensEmbedding¶

Option 1: SlidingWindow + TakensEmbedding¶

Option 2: SingleTakensEmbeding + SlidingWindow¶

Integrating non-topological features¶

Endogeneous target preparation with Labeller¶

Where to next?¶

`SlidingWindow`¶

Multivariate time series example: Sliding window + topology `Pipeline`¶

Univariate time series – `TakensEmbedding` and `SingleTakensEmbedding`¶

Option 1: `SlidingWindow` + `TakensEmbedding`¶

Option 2: `SingleTakensEmbeding` + `SlidingWindow`¶

Endogeneous target preparation with `Labeller`¶