CollectionTransformer¶
- 
class gtda.metaestimators.CollectionTransformer(transformer, n_jobs=None, parallel_backend_prefer=None, parallel_backend_require=None)[source]¶
- Meta-transformer for applying a fit-transformer to each input in a collection. - If transformer possesses a - fit_transformmethod,- CollectionTransformer(transformer)also possesses a- fit_transformmethod which, on each entry in its input- X, fit-transforms a clone of transformer. A collection (list or ndarray) of outputs is returned.- Note: to have compatibility with scikit-learn and giotto-tda pipelines, a - transformmethod is also present but it is simply an alias for- fit_transform.- Parameters
- transformer (object) – The fit-transformer instance from which the transformer acting on collections is built. Should implement - fit_transform.
- n_jobs (int or None, optional, default: - None) – The number of jobs to use in a joblib-parallel application of transformer’s- fit_transformto each input.- Nonemeans 1 unless in a- joblib.parallel_backendcontext.- -1means using all processors.
- parallel_backend_prefer ( - "processes"|- "threads"|- None, optional, default:- None) – Soft hint for the default joblib backend to use in a joblib-parallel application of transformer’s- fit_transformto each input. See 1.
- parallel_backend_require ( - "sharedmem"or None, optional, default:- None) – Hard constraint to select the backend. If set to- 'sharedmem', the selected backend will be single-host and thread-based even if the user asked for a non-thread based backend with parallel_backend.
 
 - Examples - >>> import numpy as np >>> from sklearn.decomposition import PCA >>> from gtda.metaestimators import CollectionTransformer >>> rng = np.random.default_rng() - Create a collection of 1000 2D inputs for PCA, as a single 3D ndarray (we could also create a list of 2D inputs instead). - >>> X = rng.random((1000, 100, 50)) - In the case of PCA, joblib parallelism can be very beneficial! - >>> multi_pca = CollectionTransformer(PCA(n_components=3), n_jobs=-1) >>> Xt = multi_pca.fit_transform(X) - Since all PCA outputs have the same shape, - Xtis an ndarray. >>> print(Xt.shape) (1000, 100, 3)- See also - gtda.mapper.utils.pipeline.transformer_from_callable_on_rows,- gtda.mapper.utils.decorators.method_to_transform- References - 1
- “Thread-based parallelism vs process-based parallelism”, in joblib documentation. 
 - 
__init__(transformer, n_jobs=None, parallel_backend_prefer=None, parallel_backend_require=None)[source]¶
- Initialize self. See help(type(self)) for accurate signature. 
 - 
fit(X, y=None)[source]¶
- Do nothing and return the estimator unchanged. - This method is here to implement the usual scikit-learn API and hence work in pipelines. - Parameters
- X (list of length n_samples, or ndarray of shape (n_samples, ..)) – Collection of inputs to be fit-transformed by transformer. 
- y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter. 
 
- Returns
- self 
- Return type
- object 
 
 - 
fit_transform(X, y=None)[source]¶
- Fit-transform a clone of transformer to each element in the collection X. - Parameters
- X (list of length n_samples, or ndarray of shape (n_samples, ..)) – Collection of inputs to be fit-transformed by transformer. 
- y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter. 
 
- Returns
- Xt – Collection of outputs. It is a list unless all outputs have the same shape, in which case it is converted to an ndarray. 
- Return type
- list of length n_samples, or ndarray of shape (n_samples, ..) 
 
 - 
get_params(deep=True)¶
- Get parameters for this estimator. - Parameters
- deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators. 
- Returns
- params – Parameter names mapped to their values. 
- Return type
- mapping of string to any 
 
 - 
set_params(**params)¶
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form - <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **params (dict) – Estimator parameters. 
- Returns
- self – Estimator instance. 
- Return type
- object 
 
 - 
transform(X, y=None)[source]¶
- Alias for - fit_transform.- Allows for this class to be used as an intermediate step in a scikit-learn pipeline. - Parameters
- X (list of length n_samples, or ndarray of shape (n_samples, ..)) – Collection of inputs to be fit-transformed by transformer. 
- y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter. 
 
- Returns
- Xt – Collection of outputs. It is a list unless all outputs have the same shape, in which case it is converted to an ndarray. 
- Return type
- list of length n_samples, or ndarray of shape (n_samples, ..)