PersistenceImage

class gtda.diagrams.PersistenceImage(sigma=0.1, n_bins=100, weight_function=None, n_jobs=None)[source]

Persistence images of persistence diagrams.

Based on ideas in 1. Given a persistence diagram consisting of birth-death-dimension triples [b, d, q], the equivalent diagrams of birth-persistence-dimension [b, d-b, q] triples are computed and subdiagrams corresponding to distinct homology dimensions are considered separately and regarded as sums of Dirac deltas. Then, the convolution with a Gaussian kernel is computed over a rectangular grid of locations evenly sampled from appropriate ranges of the filtration parameter. The result can be thought of as a (multi-channel) raster image.

Important note:

  • Input collections of persistence diagrams for this transformer must satisfy certain requirements, see e.g. fit.

Parameters
  • sigma (float, optional default 0.1) – Standard deviation for Gaussian kernel.

  • n_bins (int, optional, default: 100) – The number of filtration parameter values, per available homology dimension, to sample during fit.

  • weight_function (callable or None, default: None) – Function mapping the 1D array of sampled persistence values (see samplings_) to a 1D array of weights. None is equivalent to passing numpy.ones_like. More weight can be given to regions of high persistence by passing a monotonic function, e.g. the identity.

  • n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

effective_weight_function_

Effective function corresponding to weight_function. Set in fit.

Type

callable

homology_dimensions_

Homology dimensions seen in fit.

Type

tuple

samplings_

For each dimension in homology_dimensions_, a discrete sampling of birth parameters and one of persistence values, calculated during fit according to the minimum birth and maximum death values observed across all samples.

Type

dict

weights_

For each number in homology_dimensions_, an array of weights corresponding to the persistence values obtained from samplings_ calculated during fit using the weight_function.

Type

dict

Notes

The samplings in samplings_ are in general different between different homology dimensions. This means that the (i, j)-th pixel of a persistence image in homology dimension q typically arises from a different pair of parameter values to the (i, j)-th pixel of a persistence image in dimension q’.

References

1

H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier, “Persistence Images: A Stable Vector Representation of Persistent Homology”; Journal of Machine Learning Research 18, 1, pp. 218-252, 2017; DOI: 10.5555/3122009.3122017.

__init__(sigma=0.1, n_bins=100, weight_function=None, n_jobs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Store all observed homology dimensions in homology_dimensions_ and, for each dimension separately, store evenly sample filtration parameter values in samplings_. Then, return the estimator.

This method is here to implement the usual scikit-learn API and hence work in pipelines.

Parameters
  • X (ndarray of shape (n_samples, n_features, 3)) – Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q). It is important that, for each possible homology dimension, the number of triples for which q equals that homology dimension is constants across the entries of X.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (ndarray of shape (n_samples, n_features, 3)) – Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q). It is important that, for each possible homology dimension, the number of triples for which q equals that homology dimension is constants across the entries of X.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Multi-channel raster images: one image per sample and one channel per homology dimension seen in fit. Index i along axis 1 corresponds to the i-th homology dimension in homology_dimensions_.

Return type

ndarray of shape (n_samples, n_homology_dimensions, n_bins, n_bins)

fit_transform_plot(X, y=None, sample=0, **plot_params)

Fit to data, then apply transform_plot.

Parameters
  • X (ndarray of shape (n_samples, ..)) – Input data.

  • y (ndarray of shape (n_samples,) or None) – Target values for supervised problems.

  • sample (int) – Sample to be plotted.

  • **plot_params – Optional plotting parameters.

Returns

Xt – Transformed one-sample slice from the input.

Return type

ndarray of shape (1, ..)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

plot(Xt, sample=0, homology_dimension_idx=0, colorscale='blues', plotly_params=None)[source]

Plot a single channel -– corresponding to a given homology dimension -– in a sample from a collection of persistence images.

Parameters
  • Xt (ndarray of shape (n_samples, n_homology_dimensions, n_bins, n_bins)) – Collection of multi-channel raster images, such as returned by transform.

  • sample (int, optional, default: 0) – Index of the sample in Xt to be selected.

  • homology_dimension_idx (int, optional, default: 0) – Index of the channel in the selected sample to be plotted. If Xt is the result of a call to transform and this index is i, the plot corresponds to the homology dimension given by the i-th entry in homology_dimensions_.

  • colorscale (str, optional, default: "blues") – Color scale to be used in the heat map. Can be anything allowed by plotly.graph_objects.Heatmap.

  • plotly_params (dict or None, optional, default: None) – Custom parameters to configure the plotly figure. Allowed keys are "trace" and "layout", and the corresponding values should be dictionaries containing keyword arguments as would be fed to the update_traces and update_layout methods of plotly.graph_objects.Figure.

Returns

fig – Plotly figure.

Return type

plotly.graph_objects.Figure object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Compute multi-channel raster images from diagrams in X by convolution with a Gaussian kernel.

Parameters
  • X (ndarray of shape (n_samples, n_features, 3)) – Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q). It is important that, for each possible homology dimension, the number of triples for which q equals that homology dimension is constants across the entries of X.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Multi-channel raster images: one image per sample and one channel per homology dimension seen in fit. Index i along axis 1 corresponds to the i-th homology dimension in homology_dimensions_.

Return type

ndarray of shape (n_samples, n_homology_dimensions, n_bins, n_bins)

transform_plot(X, sample=0, **plot_params)

Take a one-sample slice from the input collection and transform it. Before returning the transformed object, plot the transformed sample.

Parameters
  • X (ndarray of shape (n_samples, ..)) – Input data.

  • sample (int) – Sample to be plotted.

  • **plot_params – Optional plotting parameters.

Returns

Xt – Transformed one-sample slice from the input.

Return type

ndarray of shape (1, ..)