TakensEmbedding¶

class gtda.time_series.TakensEmbedding(parameters_type='search', time_delay=1, dimension=5, stride=1, n_jobs=None)[source]¶

Representation of a univariate time series as a time series of point clouds.

Based on a time-delay embedding technique named after F. Takens 1. Given a discrete time series \((X_0, X_1, \ldots)\) and a sequence of evenly sampled times \(t_0, t_1, \ldots\), one extracts a set of \(d\)-dimensional vectors of the form \((X_{t_i}, X_{t_i + \tau}, \ldots , X_{t_i + (d-1)\tau})\) for \(i = 0, 1, \ldots\). This set is called the Takens embedding of the time series and can be interpreted as a point cloud.

The difference between \(t_{i+1}\) and \(t_i\) is called the stride, \(\tau\) is called the time delay, and \(d\) is called the (embedding) dimension.

If \(d\) and \(\tau\) are not explicitly set, suitable values are searched for during fit. 2 3

Parameters

parameters_type ('search' | 'fixed', optional, default: 'search') – If set to 'fixed', the values of time_delay and dimension are used directly in transform. If set to 'search', those values are only used as upper bounds in a search as follows: first, an optimal time delay is found by minimising the time delayed mutual information; then, a heuristic based on an algorithm in 2 is used to select an embedding dimension which, when increased, does not reveal a large proportion of “false nearest neighbors”.
time_delay (int, optional, default: 1) – Time delay between two consecutive values for constructing one embedded point. If parameters_type is 'search', it corresponds to the maximal embedding time delay that will be considered.
dimension (int, optional, default: 5) – Dimension of the embedding space. If parameters_type is 'search', it corresponds to the maximum embedding dimension that will be considered.
stride (int, optional, default: 1) – Stride duration between two consecutive embedded points. It defaults to 1 as this is the usual value in the statement of Takens’s embedding theorem.
n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

time_delay_¶

Actual embedding time delay used to embed. If parameters_type is 'search', it is the calculated optimal embedding time delay and is less than or equal to time_delay. Otherwise it is equal to time_delay.

Type: int

dimension_¶

Actual embedding dimension used to embed. If parameters_type is 'search', it is the calculated optimal embedding dimension and is less than or equal to dimension. Otherwise it is equal to dimension.

Type: int

Examples

>>> import numpy as np
>>> from gtda.time_series import TakensEmbedding
>>> # Create a noisy signal
>>> n_samples = 10000
>>> signal_noise = np.asarray([np.sin(x / 50) + 0.5 * np.random.random()
...     for x in range(n_samples)])
>>> # Set up the transformer
>>> embedder = TakensEmbedding(parameters_type='search', dimension=5,
...                            time_delay=5, n_jobs=-1)
>>> # Fit and transform
>>> embedded_noise = embedder.fit_transform(signal_noise)
>>> print('Optimal embedding time delay based on mutual information:',
...       embedder.time_delay_)
Optimal embedding time delay based on mutual information: 5
>>> print('Optimal embedding dimension based on false nearest neighbors:',
...       embedder.dimension_)
Optimal embedding dimension based on false nearest neighbors: 2
>>> print(embedded_noise.shape)
(9995, 2)