KNeighborsGraph

class gtda.graphs.KNeighborsGraph(n_neighbors=4, metric='euclidean', p=2, metric_params=None, n_jobs=None)[source]

Adjacency matrices of k-nearest neighbor graphs.

Given a two-dimensional array of row vectors seen as points in high-dimensional space, the corresponding kNN graph is a simple, undirected and unweighted graph with a vertex for every vector in the array, and an edge between two vertices whenever either the first corresponding vector is among the k nearest neighbors of the second, or vice-versa.

sklearn.neighbors.kneighbors_graph is used to compute the adjacency matrices of kNN graphs.

Parameters
  • n_neighbors (int, optional, default: 4) – Number of neighbors to use.

  • metric (string or callable, optional, default: 'euclidean') –

    Metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. Distance matrices are not supported. Valid values for metric are:

    • from scikit-learn: ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan']

    • from scipy.spatial.distance: ['braycurtis', 'canberra', 'chebyshev', 'correlation', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule']

    See the documentation for scipy.spatial.distance for details on these metrics.

  • metric_params (dict, optional, default: {}) – Additional keyword arguments for the metric function.

  • p (int, optional, default: 2) – Parameter for the Minkowski (i.e. \(\ell^p\)) metric from sklearn.metrics.pairwise.pairwise_distances. Only relevant when metric is 'minkowski'. p = 1 is the Manhattan distance, and p = 2 reduces to the Euclidean distance.

  • metric_params – Additional keyword arguments for the metric function.

  • n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

Examples

>>> import numpy as np
>>> from gtda.graphs import KNeighborsGraph
>>> X = np.array([[[0, 1, 3, 0, 0],
...                [1, 0, 5, 0, 0],
...                [3, 5, 0, 4, 0],
...                [0, 0, 4, 0, 0]]])
>>> kng = KNeighborsGraph(n_neighbors=2)
>>> Xg = kng.fit_transform(X)
>>> print(Xg[0].toarray())
[[0. 1. 1. 1.]
 [1. 0. 0. 1.]
 [1. 0. 0. 1.]
 [1. 1. 1. 0.]]
__init__(n_neighbors=4, metric='euclidean', p=2, metric_params=None, n_jobs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Do nothing and return the estimator unchanged.

This method is here to implement the usual scikit-learn API and hence work in pipelines.

Parameters
  • X (ndarray of shape (n_samples, n_points, n_dimensions)) – Input data. Each entry in X along axis 0 is an array of n_points row vectors in n_dimensions-dimensional space.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (ndarray of shape (n_samples, n_points, n_dimensions)) – Input data. Each entry in X along axis 0 is an array of n_points row vectors in n_dimensions-dimensional space.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Adjacency matrices of kNN graphs.

Return type

ndarray of sparse matrices in CSR format, shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Compute kNN graphs and return their adjacency matrices as sparse matrices.

Parameters
  • X (ndarray of shape (n_samples, n_points, n_dimensions)) – Input data. Each entry in X along axis 0 is an array of n_points row vectors in n_dimensions-dimensional space.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Adjacency matrices of kNN graphs.

Return type

ndarray of sparse matrices in CSR format, shape (n_samples,)