ripser_parallel

gph.ripser_parallel function

Compute persistence diagrams from an input dense array or sparse matrix.

If X represents a point cloud, a distance matrix will be internally created using the chosen metric and its Vietoris–Rips persistent homology will be computed. Computations in homology dimensions 1 and above can be parallelized, see n_threads.

Parameters

Xndarray or sparse matrix

If metric is not set to "precomputed", input data of shape (n_samples, n_features) representing a point cloud. Otherwise, dense or sparse input data of shape (n_samples, n_samples) representing a distance matrix or adjacency matrix of a weighted undirected graph, with the following conventions:

Diagonal entries indicate vertex weights, i.e. the filtration parameters at which vertices appear.

If X is dense, only its upper diagonal portion (including the diagonal) is considered.

If X is sparse, it does not need to be upper diagonal or symmetric. If only one of entry (i, j) and (j, i) is stored, its value is taken as the weight of the undirected edge {i, j}. If both are stored, the value in the upper diagonal is taken. Off-diagonal entries which are not explicitly stored are treated as infinite, indicating absent edges.

Entries of X should be compatible with a filtration, i.e. the value at index (i, j) should be no smaller than the values at diagonal indices (i, i) and (j, j).

maxdimint, optional, default: 1

Maximum homology dimension computed. Will compute all dimensions lower than or equal to this value.

threshfloat, optional, default: numpy.inf

Maximum value of the Vietoris–Rips filtration parameter. Points whose distance is greater than this value will never be connected by an edge. If numpy.inf, compute the entire filtration. Otherwise, and if metric is not "precomputed", see nearest_neighbors_params.

coeffint prime, optional, default: 2

Compute homology with coefficients in the prime field Z/pZ for p=coeff.

metricstring or callable, optional, default: 'euclidean'

The metric to use when calculating distance between instances in a feature array. If set to 'precomputed', input data is interpreted as a distance matrix or of adjacency matrices of a weighted undirected graph. If a string, it must be one of the options allowed by scipy.spatial.distance.pdist() for its metric parameter, or a metric listed in sklearn.pairwise.PAIRWISE_DISTANCE_FUNCTIONS, including 'euclidean', 'manhattan' or 'cosine'. If a callable, it should take pairs of vectors (1D arrays) as input and, for each two vectors in a pair, it should return a scalar indicating the distance/dissimilarity between them.

metric_paramsdict, optional, default: {}

Additional parameters to be passed to the distance function.

nearest_neighbors_paramsdict, optional, default: {}

Additional parameters that can be passed when thresh is finite and metric is not "precomputed". Allowed keys and values are as follows:

"algorithm": "auto" | "ball_tree" | "kd_tree" | "brute" (default when not passed: "auto")

"leaf_size": int (default when not passed: 30)

These are passed as keyword arguments to an instance of sklearn.neighbors.NearestNeighbors to compute the thresholded distance matrix in a sparse format. See the relevant scikit-learn User Guide.

weights"DTM", ndarray or None, optional, default: None

If not None, the persistence of a weighted Vietoris-Rips filtration is computed as described in 6, and this parameter determines the vertex weights in the modified adjacency matrix. "DTM" denotes the empirical distance-to-measure function defined, following 6, by

\[w(x) = 2\left(\frac{1}{n+1} \sum_{k=1}^n \mathrm{dist}(x, x_k)^r \right)^{1/r}.\]

Here, \(\mathrm{dist}\) is the distance metric used, \(x_k\) is the \(k\)-th \(\mathrm{dist}\)-nearest neighbour of \(x\) (\(x\) is not considered a neighbour of itself), \(n\) is the number of nearest neighbors to include, and \(r\) is a parameter (see weight_params). If an ndarray is passed, it is interpreted as a user-defined list of vertex weights for the modified adjacency matrix. In either case, the edge weights \(\{w_{ij}\}_{i, j}\) for the modified adjacency matrix are computed from the original distances and the new vertex weights \(\{w_i\}_i\) as follows:

\[\begin{split}w_{ij} = \begin{cases} \max\{ w_i, w_j \} &\text{if } 2\mathrm{dist}_{ij} \leq |w_i^p - w_j^p|^{\frac{1}{p}} \\ t &\text{otherwise} \end{cases}\end{split}\]

where \(t\) is the only positive root of

\[2 \mathrm{dist}_{ij} = (t^p - w_i^p)^\frac{1}{p} + (t^p - w_j^p)^\frac{1}{p}\]

and \(p\) is a parameter specified in metric_params.

weight_paramsdict or None, optional, default: None

Parameters to be used in the case of weighted filtrations, see weights. In this case, the key "p" determines the power to be used in computing edge weights from vertex weights. It can be one of 1, 2 or np.inf and defaults to 1. If weights is "DTM", the additional keys "r" (default: 2) and "n_neighbors" (default: 3) are available (see weights, where the latter corresponds to \(n\)).

collapse_edgesbool, optional, default: False

Whether to use the edge collapse algorithm as described in 5 prior to computing persistence. Cannot be True if return_generators is also True.

n_threadsint, optional, default: 1

Maximum number of threads to be used during the computation in homology dimensions 1 and above. -1 means that the maximum number of threads available on the host machine will be used if possible.

return_generatorsbool, optional, default: False

Whether to return information on the simplex pairs and essential simplices corresponding to the finite and infinite bars (respectively) in the persistence barcode. If True, this information is stored in the return dictionary under the key gens. Cannot be True if collapse_edges is also True.

Returns

A dictionary holding the results of the computation. Keys and values are as follows:

‘dgms’: list (length maxdim + 1) of ndarray (n_pairs, 2)
A list of persistence diagrams, one for each dimension less than or equal to maxdim. Each diagram is an ndarray of size (n_pairs, 2) with the first column representing the birth time and the second column representing the death time of each pair.

‘gens’: tuple (length 4) of ndarray or list of ndarray
Information on the simplex pairs and essential simplices generating the points in ‘dgms’. Each simplex of dimension 1 or above is replaced with the vertices of the edges that gave it its filtration value. The 4 entries of this tuple are as follows:

index 0: int ndarray with 3 columns
Simplex pairs corresponding to finite bars in dimension 0, with one vertex for birth and two vertices for death.

index 1: list (length maxdim) of int ndarray with 4 columns
Simplex pairs corresponding to finite bars in dimensions 1 to maxdim, with two vertices (one edge) for birth and two for death.

index 2: 1D int ndarray
Essential simplices corresponding to infinite bars in dimension 0, with one vertex for each birth.

index 3: list (length maxdim) of int ndarray with 2 columns
Essential simplices corresponding to infinite bars in dimensions 1 to maxdim, with 2 vertices (edge) for each birth.

Notes

The C++ backend and Python API for the computation of Vietoris–Rips persistent homology are developments of the ones in the ripser.py project 1, with added optimizations from Ripser 2, lock-free reduction from 3, and additional performance improvements. See 4 for details.

Ripser supports two memory representations, dense and sparse. The sparse representation is used in the following cases:

input is sparse of type scipy.sparse;

collapser is enabled;

a threshold is provided.

The dense representation will be used in the following cases:

input is a point cloud or a distance matrix.

The implementation of the edge collapse algorithm 5 is a modification of GUDHI’s C++ implementation.

References

1: C. Tralie et al, “Ripser.py: A Lean Persistent Homology Library for Python”, Journal of Open Source Software, **3**(29), 2021; DOI: 10.21105/joss.00925.
2: U. Bauer, “Ripser: efficient computation of Vietoris–Rips persistence barcodes”, J Appl. and Comput. Topology, 5, pp. 391–423, 2021; DOI: 10.1007/s41468-021-00071-5.
3: D. Morozov and A. Nigmetov, “Towards Lockfree Persistent Homology”; in SPAA ‘20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures, pp. 555–557, 2020; DOI: 10.1145/3350755.3400244.
4: J. Burella Pérez et al, “giotto-ph: A Python Library for High-Performance Computation of Persistent Homology of Vietoris–Rips Filtrations”, 2021; arXiv:2107.05412.
5(1,2): J.-D. Boissonnat and S. Pritam, “Edge Collapse and Persistence of Flag Complexes”; in 36th International Symposium on Computational Geometry (SoCG 2020), pp. 19:1–19:15, Schloss Dagstuhl-Leibniz–Zentrum für Informatik, 2020; DOI: 10.4230/LIPIcs.SoCG.2020.19.
6(1,2): H. Anai et al, “DTM-Based Filtrations”; in Topological Data Analysis (Abel Symposia, vol 15), Springer, 2020; DOI: 10.1007/978-3-030-43408-3_2.