TakensEmbedding¶

class gtda.time_series.TakensEmbedding(parameters_type='search', time_delay=1, dimension=5, stride=1, n_jobs=None)[source]

Representation of a univariate time series as a time series of point clouds.

Based on a time-delay embedding technique named after F. Takens 1. Given a discrete time series $$(X_0, X_1, \ldots)$$ and a sequence of evenly sampled times $$t_0, t_1, \ldots$$, one extracts a set of $$d$$-dimensional vectors of the form $$(X_{t_i}, X_{t_i + \tau}, \ldots , X_{t_i + (d-1)\tau})$$ for $$i = 0, 1, \ldots$$. This set is called the Takens embedding of the time series and can be interpreted as a point cloud.

The difference between $$t_{i+1}$$ and $$t_i$$ is called the stride, $$\tau$$ is called the time delay, and $$d$$ is called the (embedding) dimension.

If $$d$$ and $$\tau$$ are not explicitly set, suitable values are searched for during fit. 2 3

Parameters
• parameters_type ('search' | 'fixed', optional, default: 'search') – If set to 'fixed', the values of time_delay and dimension are used directly in transform. If set to 'search', those values are only used as upper bounds in a search as follows: first, an optimal time delay is found by minimising the time delayed mutual information; then, a heuristic based on an algorithm in 2 is used to select an embedding dimension which, when increased, does not reveal a large proportion of “false nearest neighbors”.

• time_delay (int, optional, default: 1) – Time delay between two consecutive values for constructing one embedded point. If parameters_type is 'search', it corresponds to the maximal embedding time delay that will be considered.

• dimension (int, optional, default: 5) – Dimension of the embedding space. If parameters_type is 'search', it corresponds to the maximum embedding dimension that will be considered.

• stride (int, optional, default: 1) – Stride duration between two consecutive embedded points. It defaults to 1 as this is the usual value in the statement of Takens’s embedding theorem.

• n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

time_delay_

Actual embedding time delay used to embed. If parameters_type is 'search', it is the calculated optimal embedding time delay and is less than or equal to time_delay. Otherwise it is equal to time_delay.

Type

int

dimension_

Actual embedding dimension used to embed. If parameters_type is 'search', it is the calculated optimal embedding dimension and is less than or equal to dimension. Otherwise it is equal to dimension.

Type

int

Examples

>>> import numpy as np
>>> from gtda.time_series import TakensEmbedding
>>> # Create a noisy signal
>>> n_samples = 10000
>>> signal_noise = np.asarray([np.sin(x / 50) + 0.5 * np.random.random()
...     for x in range(n_samples)])
>>> # Set up the transformer
>>> embedder = TakensEmbedding(parameters_type='search', dimension=5,
...                            time_delay=5, n_jobs=-1)
>>> # Fit and transform
>>> embedded_noise = embedder.fit_transform(signal_noise)
>>> print('Optimal embedding time delay based on mutual information:',
...       embedder.time_delay_)
Optimal embedding time delay based on mutual information: 5
>>> print('Optimal embedding dimension based on false nearest neighbors:',
...       embedder.dimension_)
Optimal embedding dimension based on false nearest neighbors: 2
>>> print(embedded_noise.shape)
(9995, 2)

Notes

The current implementation favours the last value over the first one, in the sense that the last coordinate of the last vector in a Takens embedded time series always equals the last value in the original time series. Hence, a number of initial values (depending on the remainder of the division between $$n_\mathrm{samples} - d(\tau - 1) - 1$$ and the stride) may be lost.

References

1

F. Takens, “Detecting strange attractors in turbulence”. In: Rand D., Young LS. (eds) Dynamical Systems and Turbulence, Warwick 1980. Lecture Notes in Mathematics, vol. 898. Springer, 1981; doi: 10.1007/BFb0091924.

2(1,2)

M. B. Kennel, R. Brown, and H. D. I. Abarbanel, “Determining embedding dimension for phase-space reconstruction using a geometrical construction”; Phys. Rev. A 45, pp. 3403–3411, 1992; doi: 10.1103/PhysRevA.45.3403.

3

N. Sanderson, “Topological Data Analysis of Time Series using Witness Complexes”; PhD thesis, University of Colorado at Boulder, 2018; https://scholar.colorado.edu/math_gradetds/67.

 J. A. Perea and J. Harer, “Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis”; Foundations of Computational Mathematics, 15, pp. 799–838; doi:10.1007/s10208-014-9206-z.

__init__(parameters_type='search', time_delay=1, dimension=5, stride=1, n_jobs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

If necessary, compute the optimal time delay and embedding dimension. Then, return the estimator.

This method is here to implement the usual scikit-learn API and hence work in pipelines.

Parameters
• X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.

• y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
• X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.

• y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Output point cloud in Euclidean space of dimension given by dimension_. n_points = (n_samples - time_delay * (dimension - 1) - 1) // stride + 1.

Return type

ndarray of shape (n_points, n_dimensions)

fit_transform_resample(X, y, **fit_params)

Fit to data, then transform the input and resample the target. Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X ans a resampled version of y.

Parameters
• X (ndarray of shape (n_samples, ..)) – Input data.

• y (ndarray of shape (n_samples,)) – Target data.

Returns

• Xt (ndarray of shape (n_samples, …)) – Transformed input.

• yr (ndarray of shape (n_samples, …)) – Resampled target.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

resample(y, X=None)[source]

Resample y so that, for any i > 0, the minus i-th entry of the resampled vector corresponds in time to the last coordinate of the minus i-th embedding vector produced by transform.

Parameters
• y (ndarray of shape (n_samples,)) – Target.

• X (None) – There is no need for input data, yet the pipeline API requires this parameter.

Returns

yr – The resampled target. n_samples_new = (n_samples - time_delay * (dimension - 1) - 1) // stride + 1.

Return type

ndarray of shape (n_samples_new,)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Compute the Takens embedding of X.

Parameters
• X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.

• y (None) – Ignored.

Returns

Xt – Output point cloud in Euclidean space of dimension given by dimension_. n_points = (n_samples - time_delay * (dimension - 1) - 1) // stride + 1.

Return type

ndarray of shape (n_points, n_dimensions)

transform_resample(X, y)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
• X (ndarray of shape (n_samples, ..)) – Input data.

• y (ndarray of shape (n_samples,)) – Target data.

Returns

• Xt (ndarray of shape (n_samples, …)) – Transformed input.

• yr (ndarray of shape (n_samples, …)) – Resampled target.