CollectionTransformer¶
-
class
gtda.metaestimators.
CollectionTransformer
(transformer, n_jobs=None, parallel_backend_prefer=None, parallel_backend_require=None)[source]¶ Meta-transformer for applying a fit-transformer to each input in a collection.
If transformer possesses a
fit_transform
method,CollectionTransformer(transformer)
also possesses afit_transform
method which, on each entry in its inputX
, fit-transforms a clone of transformer. A collection (list or ndarray) of outputs is returned.Note: to have compatibility with scikit-learn and giotto-tda pipelines, a
transform
method is also present but it is simply an alias forfit_transform
.- Parameters
transformer (object) – The fit-transformer instance from which the transformer acting on collections is built. Should implement
fit_transform
.n_jobs (int or None, optional, default:
None
) – The number of jobs to use in a joblib-parallel application of transformer’sfit_transform
to each input.None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.parallel_backend_prefer (
"processes"
|"threads"
|None
, optional, default:None
) – Soft hint for the default joblib backend to use in a joblib-parallel application of transformer’sfit_transform
to each input. See 1.parallel_backend_require (
"sharedmem"
or None, optional, default:None
) – Hard constraint to select the backend. If set to'sharedmem'
, the selected backend will be single-host and thread-based even if the user asked for a non-thread based backend with parallel_backend.
Examples
>>> import numpy as np >>> from sklearn.decomposition import PCA >>> from gtda.metaestimators import CollectionTransformer >>> rng = np.random.default_rng()
Create a collection of 1000 2D inputs for PCA, as a single 3D ndarray (we could also create a list of 2D inputs instead).
>>> X = rng.random((1000, 100, 50))
In the case of PCA, joblib parallelism can be very beneficial!
>>> multi_pca = CollectionTransformer(PCA(n_components=3), n_jobs=-1) >>> Xt = multi_pca.fit_transform(X)
Since all PCA outputs have the same shape,
Xt
is an ndarray. >>> print(Xt.shape) (1000, 100, 3)See also
gtda.mapper.utils.pipeline.transformer_from_callable_on_rows
,gtda.mapper.utils.decorators.method_to_transform
References
- 1
“Thread-based parallelism vs process-based parallelism”, in joblib documentation.
-
__init__
(transformer, n_jobs=None, parallel_backend_prefer=None, parallel_backend_require=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(X, y=None)[source]¶ Do nothing and return the estimator unchanged.
This method is here to implement the usual scikit-learn API and hence work in pipelines.
- Parameters
X (list of length n_samples, or ndarray of shape (n_samples, ..)) – Collection of inputs to be fit-transformed by transformer.
y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
self
- Return type
object
-
fit_transform
(X, y=None)[source]¶ Fit-transform a clone of transformer to each element in the collection X.
- Parameters
X (list of length n_samples, or ndarray of shape (n_samples, ..)) – Collection of inputs to be fit-transformed by transformer.
y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
Xt – Collection of outputs. It is a list unless all outputs have the same shape, in which case it is converted to an ndarray.
- Return type
list of length n_samples, or ndarray of shape (n_samples, ..)
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
object
-
transform
(X, y=None)[source]¶ Alias for
fit_transform
.Allows for this class to be used as an intermediate step in a scikit-learn pipeline.
- Parameters
X (list of length n_samples, or ndarray of shape (n_samples, ..)) – Collection of inputs to be fit-transformed by transformer.
y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.
- Returns
Xt – Collection of outputs. It is a list unless all outputs have the same shape, in which case it is converted to an ndarray.
- Return type
list of length n_samples, or ndarray of shape (n_samples, ..)