
class gtda.mapper.ParallelClustering(clusterer, n_jobs=None, parallel_backend_prefer=None)[source]

Employ joblib parallelism to cluster different portions of a dataset.

An arbitrary clustering class which stores a labels_ attribute in fit can be passed to the constructor. Examples are most classes in sklearn.cluster. The input of fit is of the form [X_tot, masks] where X_tot is the full dataset, and masks is a two-dimensional boolean array, each column of which indicates the location of a portion of X_tot to cluster separately. Parallelism is achieved over the columns of masks.

  • clusterer (object) – Clustering object derived from sklearn.base.ClusterMixin.

  • n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • parallel_backend_prefer ("processes" | "threads" | None, optional, default: None) – Soft hint for the selection of the default joblib backend. The default process-based backend is ‘loky’ and the default thread-based backend is ‘threading’. See 1.


Clones of clusterer fitted to the portions of the full data array specified in fit.


Labels and indices of each cluster found in fit. The i-th entry corresponds to the i-th portion of the data; it is a list of triples of the form (i, label, indices), where label is a cluster label and indices is the array of indices of points belonging to cluster (i, label).


“Thread-based parallelism vs process-based parallelism”, in joblib documentation.

__init__(clusterer, n_jobs=None, parallel_backend_prefer=None)[source]

fit(X, y=None, sample_weight=None)[source]

Fit the clusterer on each portion of the data.

clusterers_ and clusters_ are computed and stored.

  • X (list-like of form [X_tot, masks]) – Input data as a list of length 2. X_tot is an ndarray of shape (n_samples, n_features) or (n_samples, n_samples) specifying the full data. masks is a boolean ndarray of shape (n_samples, n_portions) whose columns are boolean masks on X_tot, specifying the portions of X_tot to be independently clustered.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

  • sample_weight (array-like or None, optional, default: None) – The weights for each observation in the full data. If None, all observations are assigned equal weight. Otherwise, it has shape (n_samples,).



fit_predict(X, y=None, sample_weight=None)[source]

Fit to the data, and return the found clusters.

  • X (list-like of form [X_tot, masks]) – Input data as a list of length 2. X_tot is an ndarray of shape (n_samples, n_features) or (n_samples, n_samples) specifying the full data. masks is a boolean ndarray of shape (n_samples, n_portions) whose columns are boolean masks on X_tot, specifying the portions of X_tot to be independently clustered.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

  • sample_weight (array-like or None, optional, default: None) – The weights for each observation in the full data. If None, all observations are assigned equal weight. Otherwise, it has shape (n_samples,).


clusters – See clusters_.

list of list of tuple

fit_transform(X, y=None, **fit_params)[source]

Alias for fit_predict.

Allows for this class to be used as an intermediate step in a scikit-learn pipeline.

  • X (list-like of form [X_tot, masks]) – Input data as a list of length 2. X_tot is an ndarray of shape (n_samples, n_features) or (n_samples, n_samples) specifying the full data. masks is a boolean ndarray of shape (n_samples, n_portions) whose columns are boolean masks on X_tot, specifying the portions of X_tot to be independently clustered.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.


Xt – See clusters_.

list of list of tuple


Get parameters for this estimator.


deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params – Parameter names mapped to their values.

mapping of string to any


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


**params (dict) – Estimator parameters.


self – Estimator instance.

transform(X, y=None)[source]

Not implemented.

Only present so that the class is a valid step in a scikit-learn pipeline.

  • X (Ignored) – Ignored.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.