OneDimensionalCover¶

class gtda.mapper.OneDimensionalCover(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]¶

Cover of one-dimensional data coming from open overlapping intervals.

In fit, given a training array X representing a collection of real numbers, a cover of the real line by open intervals \(I_k = (a_k, b_k)\) (\(k = 1, \ldots, n\), \(a_k < a_{k+1}\), \(b_k < b_{k+1}\)) is constructed based on the distribution of values in X. In transform, the cover is applied to a new array X’ to yield a cover of X’.

All covers constructed in fit have \(a_1 = -\infty\) and \(b_n = + \infty\). Two kinds of cover are currently available: “uniform” and “balanced”. A uniform cover is such that \(b_1 - m = b_2 - a_2 = \cdots = M - a_n\) where \(m\) and \(M\) are the minimum and maximum values in X respectively. A balanced cover is such that approximately the same number of unique values from X is contained in each cover interval.

Parameters

kind ('uniform' | 'balanced', optional, default: 'uniform') – The kind of cover to use.
n_intervals (int, optional, default: 10) – The number of intervals in the cover calculated in fit.
overlap_frac (float, optional, default: 0.1) – If the cover is uniform, this is the ratio between the length of the intersection between consecutive intervals and the length of each interval. If the cover is balanced, this is the analogous fractional overlap for a uniform cover of the closed interval \((0.5, N + 0.5)\) where \(N\) is the number of unique values in the training array (see the Notes).

left_limits_¶

Left limits of the cover intervals computed in fit. See the Notes.

Type: ndarray of shape (n_intervals,)

right_limits_¶

Right limits of the cover intervals computed in fit. See the Notes.

Type: ndarray of shape (n_intervals,)

Notes

In the case of a balanced cover, left_limits_ and right_limits_ are computed as follows given a training array X: first, entries in X are ranked in ascending order, starting at 1 and with the same rank repeated in the case of equal values; then, the closed interval \((0.5, N + 0.5)\), where \(N\) is the maximum rank observed, is covered uniformly with parameters n_intervals and overlap_frac, yielding intervals \((\alpha_k, \beta_k)\); the final cover is made of intervals \((a_k, b_k)\) where, for \(k > 1\) (resp. \(k < \)), \(a_k\) (resp. \(b_k\)) is the value of any entry in X ranked as the floor ( resp. ceiling) of \(\alpha_k\) (resp. \(\beta_k\)).