ParallelClustering¶
-
class
gtda.mapper.ParallelClustering(clusterer, n_jobs=None, parallel_backend_prefer=None)[source]¶ Employ joblib parallelism to cluster different portions of a dataset.
An arbitrary clustering class which stores a
labels_attribute infitcan be passed to the constructor. Examples are most classes insklearn.cluster. The input offitis of the form[X_tot, masks]whereX_totis the full dataset, andmasksis a 2D boolean array, each column of which indicates the location of a portion ofX_totto cluster separately. Parallelism is achieved over the columns ofmasks.- Parameters
clusterer (object) – Clustering object derived from
sklearn.base.ClusterMixin.n_jobs (int or None, optional, default:
None) – The number of jobs to use for the computation.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.parallel_backend_prefer (
"processes"|"threads"|None, optional, default:None) – Soft hint for the selection of the default joblib backend. The default process-based backend is ‘loky’ and the default thread-based backend is ‘threading’. See 1.
-
labels_¶ For each point in the dataset passed to
fit, a tuple of pairs of the form(i, partial_label)whereiis the index of a boolean mask which selects that point andpartial_labelis the cluster label assigned to the point when clustering the subset of the data selected by maski.- Type
ndarray of shape (n_samples,)
References
- 1
“Thread-based parallelism vs process-based parallelism”, in joblib documentation.
-
__init__(clusterer, n_jobs=None, parallel_backend_prefer=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit(X, y=None, sample_weight=None)[source]¶ Fit the clusterer on each portion of the data.
clusterers_andclusters_are computed and stored.- Parameters
X (list-like of form
[X_tot, masks]) – Input data as a list of length 2.X_totis an ndarray of shape (n_samples, n_features) or (n_samples, n_samples) specifying the full data.masksis a boolean ndarray of shape (n_samples, n_portions) whose columns are boolean masks onX_tot, specifying the portions ofX_totto be independently clustered.y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
sample_weight (array-like or None, optional, default:
None) – The weights for each observation in the full data. IfNone, all observations are assigned equal weight. Otherwise, it has shape (n_samples,).
- Returns
self
- Return type
object
-
fit_predict(X, y=None, sample_weight=None)[source]¶ Fit to the data, and return the found clusters.
- Parameters
X (list-like of form
[X_tot, masks]) – Input data as a list of length 2.X_totis an ndarray of shape (n_samples, n_features) or (n_samples, n_samples) specifying the full data.masksis a boolean ndarray of shape (n_samples, n_portions) whose columns are boolean masks onX_tot, specifying the portions ofX_totto be independently clustered.y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
sample_weight (array-like or None, optional, default:
None) – The weights for each observation in the full data. IfNone, all observations are assigned equal weight. Otherwise, it has shape (n_samples,).
- Returns
labels – See
labels_.- Return type
ndarray of shape (n_samples,)
-
fit_transform(X, y=None, **fit_params)[source]¶ Alias for
fit_predict.Allows for this class to be used as an intermediate step in a scikit-learn pipeline.
- Parameters
X (list-like of form
[X_tot, masks]) – Input data as a list of length 2.X_totis an ndarray of shape (n_samples, n_features) or (n_samples, n_samples) specifying the full data.masksis a boolean ndarray of shape (n_samples, n_portions) whose columns are boolean masks onX_tot, specifying the portions ofX_totto be independently clustered.y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
Xt – See
labels_.- Return type
ndarray of shape (n_samples,)
-
get_params(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
-
set_params(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
object