A high-performance topological machine learning toolbox in Python

giotto-tda is a high performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the GNU AGPLv3 license. It is part of the Giotto family of open-source projects.

Guiding principles

  • Seamless integration with scikit-learn
    Strictly adhere to the scikit-learn API and development guidelines, inherit the strengths of that framework.
  • Code modularity
    Topological feature creation steps as transformers. Allow for the creation of a large number of topologically-powered machine learning pipelines.
  • Standardisation
    Implement the most successful techniques from the literature into a generic framework with a consistent API.
  • Innovation
    Improve on existing algorithms, and make new ones available in open source.
  • Performance
    For the most demanding computations, fall back to state-of-the-art C++ implementations, bound efficiently to Python. Vectorized code and implements multi-core parallelism (with joblib).
  • Data structures
    Support for tabular data, time series, graphs, and images.

30s guide to giotto-tda

For installation instructions, see the installation instructions.

The functionalities of giotto-tda are provided in scikit-learn–style transformers. This allows you to generate topological features from your data in a familiar way. Here is an example with the VietorisRipsPersistence transformer:

from gtda.homology import VietorisRipsPersistence
VR = VietorisRipsPersistence()

which computes topological summaries, called persistence diagrams, from collections of point clouds or weighted graphs, as follows:

diagrams = VR.fit_transform(point_clouds)

A plotting API allows for quick visual inspection of the outputs of many of giotto-tda’s transformers. To visualize the i-th output sample, run

diagrams = VR.plot(diagrams, sample=i)

You can create scalar or vector features from persistence diagrams using giotto-tda’s dedicated transformers. Here is an example with the PersistenceEntropy transformer:

from gtda.diagrams import PersistenceEntropy
PE = PersistenceEntropy()
features = PE.fit_transform(diagrams)

features is a two-dimensional numpy array. This is important to making this type of topological feature generation fit into a typical machine learning workflow from scikit-learn. In particular, topological feature creation steps can be fed to or used alongside models from scikit-learn, creating end-to-end pipelines which can be evaluated in cross-validation, optimised via grid-searches, etc.:

from sklearn.ensemble import RandomForestClassifier
from gtda.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = train_test_split(point_clouds, labels)
RFC = RandomForestClassifier()
model = make_pipeline(VR, PE, RFC)
model.fit(X_train, y_train)
model.score(X_valid, y_valid)

giotto-tda also implements the Mapper algorithm as a highly customisable scikit-learn Pipeline, and provides simple plotting functions for visualizing output Mapper graphs and have real-time interaction with the pipeline parameters:

from gtda.mapper import make_mapper_pipeline
from sklearn.decomposition import PCA
from sklearn.cluster import DBSCAN

pipe = make_mapper_pipeline(filter_func=PCA(), clusterer=DBSCAN())
plot_interactive_mapper_graph(pipe, data)


Tutorials and examples

We provide a number of tutorials and examples, which offer:

  • quick start guides to the API;

  • in-depth examples showcasing more of the library’s features;

  • intuitive explanations of topological techniques.

Use cases

A selection of use cases for giotto-tda is collected at this page. The related GitHub repositories can be found at github.

What’s new

Major Features and Improvements

  • The documentation for gtda.mapper.utils.decorators.method_to_transform has been improved.

  • A table of contents has been added to the theory glossary.

  • The theory glossary has been restructured by including a section titled “Analysis”. Entries for l^p norms, L^p norms and heat vectorization have been added.

  • The project’s Azure CI for Windows versions has been sped-up by ensuring that the locally installed boost version is detected.

  • Several python bindings to external code from GUDHI, ripser.py and Hera have been made public: specifically, from gtda.externals import * now gives power users access to:

    • bottleneck_distance,

    • wasserstein_distance,

    • ripser,

    • SparseRipsComplex,

    • CechComplex,

    • CubicalComplex,

    • PeriodicCubicalComplex,

    • SimplexTree,

    • WitnessComplex,

    • StrongWitnessComplex.

    However, these functionalities are still undocumented.

  • The gtda.mapper.visualisation and gtda.mapper.utils._visualisation modules have been thoroughly refactored to improve code clarity, add functionality, change behaviour and fix bugs. Specifically, in figures generated by both plot_static_mapper_graph and plot_interactive_mapper_graph:

    • The colorbar no longer shows values rescaled to the interval [0, 1]. Instead, it always shows the true range of node summary statistics.

    • The values of the node summary statistics are now displayed in the hovertext boxes. A a new keyword argument n_sig_figs controls their rounding (3 is the default).

    • plotly_kwargs has been renamed to plotly_params (see “Backwards-Incompatible Changes” below).

    • The dependency on matplotlib’s rgb2hex and get_cmap functions has been removed. As no other component in giotto-tda required matplotlib, the dependency on this library has been removed completely.

    • A node_scale keyword argument has been added which can be used to controls the size of nodes (see “Backwards-Incompatible Changes” below).

    • The overall look of Mapper graphs has been improved by increasing the opacity of node colors so that edges do not hide them, and by reducing the thickness of marker lines.

    Furthermore, a clone_pipeline keyword argument has been added to plot_interactive_mapper_graph, which when set to False allows the user to mutate the input pipeline via the interactive widget.

  • The docstrings of plot_static_mapper_graph, plot_interactive_mapper_graph and make_mapper_pipeline have been improved.

Bug Fixes

  • A CI bug introduced by an update to the XCode compiler installed on the Azure Mac machines has been fixed.

  • A bug afflicting Mapper colors, which was due to an incorrect rescaling to [0, 1], has been fixed.

Backwards-Incompatible Changes

  • The keyword parameter plotly_kwargs in plot_static_mapper_graph and plot_interactive_mapper_graph has been renamed to plotly_params and has now slightly different specifications. A new logic controls how the information contained in plotly_params is used to update plotly figures.

  • The function get_node_sizeref in gtda.mapper.utils.visualization has been hidden by renaming it to _get_node_sizeref. Its main intended use is subsumed by the new node_scale parameter of plot_static_mapper_graph and plot_interactive_mapper_graph.

Thanks to our Contributors

This release contains contributions from many people:

Umberto Lupo, Julian Burella Pérez, Anibal Medina-Mardones, Wojciech Reise and Guillaume Tauzin.

We are also grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.