Topology in time series forecasting¶
This notebook shows how giotto-tda
can be used to create topological
features for time series forecasting tasks, and how to integrate them
into scikit-learn
–compatible pipelines.
In particular, we will concentrate on topological features which are
created from consecutive sliding windows over the data. In sliding
window models, a single time series array X
of shape
(n_timestamps, n_features)
is turned into a time series of windows
over the data, with a new shape
(n_windows, n_samples_per_window, n_features)
. There are two main
issues that arise when building forecasting models with sliding windows:
n_windows
is smaller thann_timestamps
. This is because we cannot have more windows than there are timestamps without paddingX
, and this is not done bygiotto-tda
.n_timestamps - n_windows
is even larger if we decide to pick a large stride between consecutive windows.The target variable
y
needs to be properly “aligned” with each window so that the forecasting problem is meaningful and e.g. we don’t “leak” information from the future. In particular,y
needs to be “resampled” so that it too has lengthn_windows
.
To deal with these issues, giotto-tda
provides a selection of
transformers with resample
, transform_resample
and
fit_transform_resample
methods. These are inherited from a
TransformerResamplerMixin
base class. Furthermore, giotto-tda
provides a drop-in replacement for scikit-learn
’s Pipeline
which extends it to allow chaining TransformerResamplerMixin
s with
regular scikit-learn
estimators.
If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.
See also¶
Topology of time series, in which the Takens embedding technique used here is explained in detail and illustrated via simple examples.
Gravitational waves detection, where,following arXiv:1910.08245, the Takens embedding technique is shown to be effective for the detection of gravitational waves signals buried in background noise.
Topological feature extraction using VietorisRipsPersistence and PersistenceEntropy for a quick introduction to general topological feature extraction in
giotto-tda
.
License: AGPLv3
SlidingWindow
¶
Let us start with a simple example of a “time series” X
with a
corresponding target y
of the same length.
import numpy as np
n_timestamps = 10
X, y = np.arange(n_timestamps), np.arange(n_timestamps) - n_timestamps
X, y
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([-10, -9, -8, -7, -6, -5, -4, -3, -2, -1]))
We can instantiate our sliding window transformer-resampler and run it
on the pair (X, y)
:
from gtda.time_series import SlidingWindow
window_size = 3
stride = 2
SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr
(array([[1, 2, 3],
[3, 4, 5],
[5, 6, 7],
[7, 8, 9]]),
array([-7, -5, -3, -1]))
We note a couple of things: - fit_transform_resample
returns a pair:
the window-transformed X
and the resampled and aligned y
. -
SlidingWindow
has made a choice for us on how to resample y
and
line it up with the windows from X
: a window on X
corresponds to
the last value in a corresponding window over y
. This is common in
time series forecasting where, for example, y
could be a shift of
X
by one timestamp. - Some of the initial values of X
may not be
found in X_sw
. This is because SlidingWindow
only ensures the
last value is represented in the last window, regardless of the
stride.
Multivariate time series example: Sliding window + topology Pipeline
¶
giotto-tda
’s topology transformers expect 3D input. But our
X_sw
above is 2D. How do we capture interesting properties of the
topology of input time series then? For univariate time series, it turns
out that a good way is to use the “time delay embedding” or “Takens
embedding” technique explained in Topology of time
series.
But as this involves an extra layer of complexity, we leave it for later
and concentrate for now on an example with a simpler API which also
demonstrates the use of a giotto-tda
Pipeline
.
Surprisingly, this involves multivariate time series input!
rng = np.random.default_rng(42)
n_features = 2
X = rng.integers(0, high=20, size=(n_timestamps, n_features), dtype=int)
X
array([[ 1, 15],
[13, 8],
[ 8, 17],
[ 1, 13],
[ 4, 1],
[10, 19],
[14, 15],
[14, 15],
[10, 2],
[16, 9]])
We are interpreting this input as a time series in two variables, of
length n_timestamps
. The target variable is the same y
as
before.
SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr
(array([[[13, 8],
[ 8, 17],
[ 1, 13]],
[[ 1, 13],
[ 4, 1],
[10, 19]],
[[10, 19],
[14, 15],
[14, 15]],
[[14, 15],
[10, 2],
[16, 9]]]),
array([-7, -5, -3, -1]))
X_sw
is now a complicated-looking array, but it has a simple
interpretation. Again, X_sw[i]
is the i
-th window on X
, and
it contains window_size
samples from the original time series. This
time, the samples are not scalars but 1D arrays.
What if we suspect that the way in which the correlations between
the variables evolve over time can help forecast the target y
? This
is a common situation in neuroscience, where each variable could be data
from a single EEG sensor, for instance.
giotto-tda
exposes a PearsonDissimilarity
transformer which
creates a 2D dissimilarity matrix from each window in X_sw
, and
stacks them together into a single 3D object. This is the correct format
(and information content!) for a typical topological transformer in
gtda.homology
. See also Topological feature extraction from
graphs
for an in-depth look. Finally, we can extract simple scalar features
using a selection of transformers in gtda.diagrams
.
from gtda.time_series import PearsonDissimilarity
from gtda.homology import VietorisRipsPersistence
from gtda.diagrams import Amplitude
PD = PearsonDissimilarity()
X_pd = PD.fit_transform(X_sw)
VR = VietorisRipsPersistence(metric="precomputed")
X_vr = VR.fit_transform(X_pd) # "precomputed" required on dissimilarity data
Ampl = Amplitude()
X_a = Ampl.fit_transform(X_vr)
X_a
array([[0.18228669, 0. ],
[0.03606068, 0. ],
[0.28866041, 0. ],
[0.01781238, 0. ]])
Notice that we are not acting on y
above. We are simply creating
features from each window using topology! Note: it’s two features per
window because we used the default value for homology_dimensions
in
VietorisRipsPersistence
, not because we had two variables in the
time series initially!
We can now put this all together into a giotto-tda
Pipeline
which combines both the sliding window transformation on X
and
resampling of y
with the feature extraction from the windows on
X
.
Note: while we could import the Pipeline
class and use its
constructor, we use the convenience function make_pipeline
instead,
which is a drop-in replacement for
scikit-learn’s.
from sklearn import set_config
set_config(display='diagram') # For HTML representations of pipelines
from gtda.pipeline import make_pipeline
pipe = make_pipeline(SW, PD, VR, Ampl)
pipe
Pipeline(steps=[('slidingwindow', SlidingWindow(size=3, stride=2)), ('pearsondissimilarity', PearsonDissimilarity()), ('vietorisripspersistence', VietorisRipsPersistence(metric='precomputed')), ('amplitude', Amplitude())])
SlidingWindow(size=3, stride=2)
PearsonDissimilarity()
VietorisRipsPersistence(metric='precomputed')
Amplitude()
Finally, if we have a regression task on y
we can add a final
estimator such as scikit-learn’s RandomForestRegressor
as a final
step in the previous pipeline, and fit it!
from sklearn.ensemble import RandomForestRegressor
RFR = RandomForestRegressor()
pipe = make_pipeline(SW, PD, VR, Ampl, RFR)
pipe
Pipeline(steps=[('slidingwindow', SlidingWindow(size=3, stride=2)), ('pearsondissimilarity', PearsonDissimilarity()), ('vietorisripspersistence', VietorisRipsPersistence(metric='precomputed')), ('amplitude', Amplitude()), ('randomforestregressor', RandomForestRegressor())])
SlidingWindow(size=3, stride=2)
PearsonDissimilarity()
VietorisRipsPersistence(metric='precomputed')
Amplitude()
RandomForestRegressor()
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score
(array([-5.86, -3.98, -4.02, -2.34]), 0.7412000000000001)
Univariate time series – TakensEmbedding
and SingleTakensEmbedding
¶
The notebook Topology of time series explains a commonly used technique for converting a univariate time series into a single point cloud. Since topological features can be extracted from any point cloud, this is a gateway to time series analysis using topology. The second part of that notebook shows how to transform a batch of time series into a batch of point clouds, and how to extract topological descriptors from each of them independently. While in that notebook this is applied to a time series classification task, in this notebook we are concerned with topology-powered forecasting from a single time series.
Reasoning by analogy with the multivariate case above, we can look at
sliding windows over X
as small time series in their own right and
track the evolution of their topology against the variable of interest
(or against itself, if we are interested in unsupervised tasks such as
anomaly detection).
There are two ways in which we can implement this idea in
giotto-tda
: 1. We can first apply a SlidingWindow
, and then an
instance of TakensEmbedding
. 2. We can first compute a global
Takens embedding of the time series via SingleTakensEmbedding
, which
takes us from 1D/column data to 2D data, and then partition the 2D
data of vectors into sliding windows via SlidingWindow
.
The first route ensures that we can run our “topological feature
extraction track” in parallel with other feature-generation pipelines
from sliding windows, without experiencing shape mismatches. The second
route seems a little upside-down and it is not generally recommended,
but it has the advantange that globally “optimal” parameters for the
“time delay” and “embedding dimension” parameters can be computed
automatically by SingleTakensEmbedding
.
Below is what each route would look like.
Remark: In the presence of noise, a small sliding window size is likely to reduce the reliability of the estimate of the time series’ local topology.
Option 1: SlidingWindow
+ TakensEmbedding
¶
TakensEmbedding
is not a TransformerResamplerMixin
, but this is
not a problem in the context of a Pipeline
when we order things in
this way.
from gtda.time_series import TakensEmbedding
X = np.arange(n_timestamps)
window_size = 5
stride = 2
SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X, y)
X_sw, yr
(array([[1, 2, 3, 4, 5],
[3, 4, 5, 6, 7],
[5, 6, 7, 8, 9]]),
array([-5, -3, -1]))
time_delay = 1
dimension = 2
TE = TakensEmbedding(time_delay=time_delay, dimension=dimension)
X_te = TE.fit_transform(X_sw)
X_te
array([[[1, 2],
[2, 3],
[3, 4],
[4, 5]],
[[3, 4],
[4, 5],
[5, 6],
[6, 7]],
[[5, 6],
[6, 7],
[7, 8],
[8, 9]]])
VR = VietorisRipsPersistence() # No "precomputed" for point clouds
Ampl = Amplitude()
RFR = RandomForestRegressor()
pipe = make_pipeline(SW, TE, VR, Ampl, RFR)
pipe
Pipeline(steps=[('slidingwindow', SlidingWindow(size=5, stride=2)), ('takensembedding', TakensEmbedding()), ('vietorisripspersistence', VietorisRipsPersistence()), ('amplitude', Amplitude()), ('randomforestregressor', RandomForestRegressor())])
SlidingWindow(size=5, stride=2)
TakensEmbedding()
VietorisRipsPersistence()
Amplitude()
RandomForestRegressor()
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score
(array([-3.04666667, -3.04666667, -3.04666667]), -0.0008166666666666877)
Option 2: SingleTakensEmbeding
+ SlidingWindow
¶
Note that SingleTakensEmbedding
is also a
TransformerResamplerMixin
, and that the logic for
resampling/aligning y
is the same as in SlidingWindow
.
from gtda.time_series import SingleTakensEmbedding
X = np.arange(n_timestamps)
STE = SingleTakensEmbedding(parameters_type="search", time_delay=2, dimension=3)
X_ste, yr = STE.fit_transform_resample(X, y)
X_ste, yr
(array([[0, 2],
[1, 3],
[2, 4],
[3, 5],
[4, 6],
[5, 7],
[6, 8],
[7, 9]]),
array([-8, -7, -6, -5, -4, -3, -2, -1]))
window_size = 5
stride = 2
SW = SlidingWindow(size=window_size, stride=stride)
X_sw, yr = SW.fit_transform_resample(X_ste, yr)
X_sw, yr
(array([[[1, 3],
[2, 4],
[3, 5],
[4, 6],
[5, 7]],
[[3, 5],
[4, 6],
[5, 7],
[6, 8],
[7, 9]]]),
array([-3, -1]))
From here on, it is easy to push a very similar pipeline through as in the multivariate case:
VR = VietorisRipsPersistence() # No "precomputed" for point clouds
Ampl = Amplitude()
RFR = RandomForestRegressor()
pipe = make_pipeline(STE, SW, VR, Ampl, RFR)
pipe
Pipeline(steps=[('singletakensembedding', SingleTakensEmbedding(dimension=3, time_delay=2)), ('slidingwindow', SlidingWindow(size=5, stride=2)), ('vietorisripspersistence', VietorisRipsPersistence()), ('amplitude', Amplitude()), ('randomforestregressor', RandomForestRegressor())])
SingleTakensEmbedding(dimension=3, time_delay=2)
SlidingWindow(size=5, stride=2)
VietorisRipsPersistence()
Amplitude()
RandomForestRegressor()
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score
(array([-2.03, -2.03]), -0.0008999999999999009)
Integrating non-topological features¶
The best results are obtained when topological methods are used not in
isolation but in combination with other methods. Here’s an example
where, in parallel with the topological feature extraction from local
sliding windows using Option 2 above, we also compute the mean and
variance in each sliding window. A scikit-learn
FeatureUnion
is
used to combine these very different sets of features into a single
pipeline object.
from functools import partial
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import FeatureUnion
from sklearn.base import clone
mean = FunctionTransformer(partial(np.mean, axis=1, keepdims=True))
var = FunctionTransformer(partial(np.var, axis=1, keepdims=True))
pipe_topology = make_pipeline(TE, VR, Ampl)
feature_union = FeatureUnion([("window_mean", mean),
("window_variance", var),
("window_topology", pipe_topology)])
pipe = make_pipeline(SW, feature_union, RFR)
pipe
Pipeline(steps=[('slidingwindow', SlidingWindow(size=5, stride=2)), ('featureunion', FeatureUnion(transformer_list=[('window_mean', FunctionTransformer(func=functools.partial(, axis=1, keepdims=True))), ('window_variance', FunctionTransformer(func=functools.partial( , axis=1, keepdims=True))), ('window_topology', Pipeline(steps=[('takensembedding', TakensEmbedding()), ('vietorisripspersistence', VietorisRipsPersistence()), ('amplitude', Amplitude())]))])), ('randomforestregressor', RandomForestRegressor())])
SlidingWindow(size=5, stride=2)
FeatureUnion(transformer_list=[('window_mean', FunctionTransformer(func=functools.partial(, axis=1, keepdims=True))), ('window_variance', FunctionTransformer(func=functools.partial( , axis=1, keepdims=True))), ('window_topology', Pipeline(steps=[('takensembedding', TakensEmbedding()), ('vietorisripspersistence', VietorisRipsPersistence()), ('amplitude', Amplitude())]))])
FunctionTransformer(func=functools.partial(, axis=1, keepdims=True))
FunctionTransformer(func=functools.partial(, axis=1, keepdims=True))
TakensEmbedding()
VietorisRipsPersistence()
Amplitude()
RandomForestRegressor()
pipe.fit(X, y)
y_pred = pipe.predict(X)
score = pipe.score(X, y)
y_pred, score
(array([-4.12, -3.28, -1.56]), 0.8542000000000001)
Endogeneous target preparation with Labeller
¶
Let us say that we simply wish to predict the future of a time series
from itself. This is very common in the study of financial markets for
example. giotto-tda
provides convenience classes for target
preparation from a time series. This notebook only shows a very simple
example: many more options are described in Labeller
’s
documentation.
If we wished to create a target y
from X
such that y[i]
is
equal to X[i + 1]
, while also modifying X
and y
so that they
still have the same length, we could proceed as follows:
from gtda.time_series import Labeller
X = np.arange(10)
Lab = Labeller(size=1, func=np.max)
Xl, yl = Lab.fit_transform_resample(X, X)
Xl, yl
(array([0, 1, 2, 3, 4, 5, 6, 7, 8]), array([1, 2, 3, 4, 5, 6, 7, 8, 9]))
Notice that we are feeding two copies of X
to
fit_transform_resample
in this case!
This is what fitting an end-to-end pipeline for future prediction using topology could look like. Again, you are encouraged to include your own non-topological features in the mix!
SW = SlidingWindow(size=5)
TE = TakensEmbedding(time_delay=1, dimension=2)
VR = VietorisRipsPersistence()
Ampl = Amplitude()
RFR = RandomForestRegressor()
# Full pipeline including the regressor
pipe = make_pipeline(Lab, SW, TE, VR, Ampl, RFR)
pipe
Pipeline(steps=[('labeller', Labeller(func=, size=1)), ('slidingwindow', SlidingWindow(size=5)), ('takensembedding', TakensEmbedding()), ('vietorisripspersistence', VietorisRipsPersistence()), ('amplitude', Amplitude()), ('randomforestregressor', RandomForestRegressor())])
Labeller(func=, size=1)
SlidingWindow(size=5)
TakensEmbedding()
VietorisRipsPersistence()
Amplitude()
RandomForestRegressor()
pipe.fit(X, X)
y_pred = pipe.predict(X)
y_pred
array([7.006, 7.006, 7.006, 7.006, 7.006])
Where to next?¶
There are two additional simple
TransformerResamplerMixin
s ingtda.time_series
:Resampler
andStationarizer
.The sort of pipeline for topological feature extraction using Takens embedding is a bit crude. More sophisticated methods exist for extracting robust topological summaries from (windows on) time series. A good source of inspiration is the following paper:
Persistent Homology of Complex Networks for Dynamic State Detection, by A. Myers, E. Munch, and F. A. Khasawneh.
The module
gtda.graphs
contains several transformers implementing the main algorithms proposed there.Advanced users may be interested in
ConsecutiveRescaling
, which can be found ingtda.point_clouds
.The notebook Lorenz attractor is an advanced use-case for
TakensEmbedding
and other time series forecasting techniques inspired by topology.