OneDimensionalCover¶
-
class
gtda.mapper.
OneDimensionalCover
(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]¶ Cover of one-dimensional data coming from open overlapping intervals.
In
fit
, given a training array X representing a collection of real numbers, a cover of the real line by open intervals \(I_k = (a_k, b_k)\) (\(k = 1, \ldots, n\), \(a_k < a_{k+1}\), \(b_k < b_{k+1}\)) is constructed based on the distribution of values in X. Intransform
, the cover is applied to a new array X’ to yield a cover of X’.All covers constructed in
fit
have \(a_1 = -\infty\) and \(b_n = + \infty\). Two kinds of cover are currently available: “uniform” and “balanced”. A uniform cover is such that \(b_1 - m = b_2 - a_2 = \cdots = M - a_n\) where \(m\) and \(M\) are the minimum and maximum values in X respectively. A balanced cover is such that approximately the same number of unique values from X is contained in each cover interval.- Parameters
kind (
'uniform'
|'balanced'
, optional, default:'uniform'
) – The kind of cover to use.n_intervals (int, optional, default:
10
) – The number of intervals in the cover calculated infit
.overlap_frac (float, optional, default:
0.1
) – If the cover is uniform, this is the ratio between the length of the intersection between consecutive intervals and the length of each interval. If the cover is balanced, this is the analogous fractional overlap for a uniform cover of the closed interval \((0.5, N + 0.5)\) where \(N\) is the number of unique values in the training array (see the Notes).
-
left_limits\_
Left limits of the cover intervals computed in
fit
. See the Notes.- Type
ndarray of shape (n_intervals,)
-
right_limits\_
Right limits of the cover intervals computed in
fit
. See the Notes.- Type
ndarray of shape (n_intervals,)
Notes
In the case of a balanced cover,
left_limits_
andright_limits_
are computed as follows given a training array X: first, entries in X are ranked in ascending order, starting at 1 and with the same rank repeated in the case of equal values; then, the closed interval \((0.5, N + 0.5)\), where \(N\) is the maximum rank observed, is covered uniformly with parameters n_intervals and overlap_frac, yielding intervals \((\alpha_k, \beta_k)\); the final cover is made of intervals \((a_k, b_k)\) where, for \(k > 1\) (resp. \(k < \)), \(a_k\) (resp. \(b_k\)) is the value of any entry in X ranked as the floor ( resp. ceiling) of \(\alpha_k\) (resp. \(\beta_k\)).See also
-
__init__
(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(X, y=None)[source]¶ Compute all cover interval limits according to X and store them in
left_limits_
andright_limits_
. Then, return the estimator.This method is here to implement the usual scikit-learn API and hence work in pipelines.
- Parameters
X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.
y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
self
- Return type
object
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.
y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
Xt – Encoding of the cover of X as a boolean array. In general,
n_cover_sets
is less than or equal to n_intervals as empty or duplicated cover sets are removed.- Return type
ndarray of shape (n_samples, n_cover_sets)
-
get_fitted_intervals
()[source]¶ Returns the open intervals computed in
fit
, as a list of tuples (a, b) where a < b.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
object
-
transform
(X, y=None)[source]¶ Compute a cover of X according to the cover of the real line computed in
fit
, and return it as a two-dimensional boolean array. Each column indicates the location of entries in X belonging to a common cover interval.- Parameters
X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.
y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
Xt – Encoding of the cover of X as a boolean array. In general,
n_cover_sets
is less than or equal to n_intervals as empty or duplicated cover sets are removed.- Return type
ndarray of shape (n_samples, n_cover_sets)