A high-performance topological machine learning toolbox in Python
giotto-tda is a high performance topological machine learning toolbox in Python built on top of
scikit-learn and is distributed under the GNU AGPLv3 license. It is part of the Giotto family of open-source projects.
- Seamless integration with
scikit-learnStrictly adhere to the
scikit-learnAPI and development guidelines, inherit the strengths of that framework.
- Code modularityTopological feature creation steps as transformers. Allow for the creation of a large number of topologically-powered machine learning pipelines.
- StandardisationImplement the most successful techniques from the literature into a generic framework with a consistent API.
- InnovationImprove on existing algorithms, and make new ones available in open source.
- PerformanceFor the most demanding computations, fall back to state-of-the-art C++ implementations, bound efficiently to Python. Vectorized code and implements multi-core parallelism (with
- Data structuresSupport for tabular data, time series, graphs, and images.
30s guide to
For installation instructions, see the installation instructions.
The functionalities of
giotto-tda are provided in
This allows you to generate topological features from your data in a familiar way. Here is an example with the
from gtda.homology import VietorisRipsPersistence VR = VietorisRipsPersistence()
diagrams = VR.fit_transform(point_clouds)
A plotting API allows for quick visual inspection of the outputs of many of
giotto-tda’s transformers. To visualize the i-th output sample, run
diagrams = VR.plot(diagrams, sample=i)
You can create scalar or vector features from persistence diagrams using
giotto-tda’s dedicated transformers. Here is an example with the
from gtda.diagrams import PersistenceEntropy PE = PersistenceEntropy() features = PE.fit_transform(diagrams)
features is a two-dimensional
numpy array. This is important to making this type of topological feature generation fit into a typical machine learning workflow from
In particular, topological feature creation steps can be fed to or used alongside models from
scikit-learn, creating end-to-end pipelines which can be evaluated in cross-validation,
optimised via grid-searches, etc.:
from sklearn.ensemble import RandomForestClassifier from gtda.pipeline import make_pipeline from sklearn.model_selection import train_test_split X_train, X_valid, y_train, y_valid = train_test_split(point_clouds, labels) RFC = RandomForestClassifier() model = make_pipeline(VR, PE, RFC) model.fit(X_train, y_train) model.score(X_valid, y_valid)
giotto-tda also implements the Mapper algorithm as a highly customisable
Pipeline, and provides simple plotting functions
for visualizing output Mapper graphs and have real-time interaction with the pipeline parameters:
from gtda.mapper import make_mapper_pipeline from sklearn.decomposition import PCA from sklearn.cluster import DBSCAN pipe = make_mapper_pipeline(filter_func=PCA(), clusterer=DBSCAN()) plot_interactive_mapper_graph(pipe, data)
Tutorials and examples¶
We provide a number of tutorials and examples, which offer:
quick start guides to the API;
in-depth examples showcasing more of the library’s features;
intuitive explanations of topological techniques.
Major Features and Improvements¶
This is a major release which substantially broadens the scope of
giotto-tda and introduces several improvements.
The library’s documentation has been greatly improved and is now hosted via GitHub pages.
It includes rendered jupyter notebooks from the repository’s
examples folder, as well as an improved theory glossary,
more detailed installation instructions, improved guidelines for contributing, and an FAQ.
Plotting functions and plotting API¶
This version introduces built-in plotting capabilities to
giotto-tda. These come in the form of:
plottingsubpackage populated with plotting functions for common data structures;
PlotterMixinand a class-level plotting API based on newly introduced
fit_transform_plotmethods which are now available in several of
Changes and additions to
The internal structure of this subpackage has been changed.
ConsistentRescaling has been moved to a new
subpackage (see below), and
gtda.homology no longer contains a
point_clouds submodule. Instead, it contains two
simplicial contains the
VietorisRipsPersistence class as well as the
following new classes:
cubical submodule contains
CubicalPersistence, a new class for computing persistent homology of filtered cubical
complexes such as those coming from 2D or 3D greyscale images.
gtda.images subpackage contains classes which, together with
the capabilities of
giotto-tda to computer vision, by handling input representing binary or greyscale 2D/3D images
represented as arrays.
The classes in
gtda.images.filtrations are responsible for converting binary image input into greyscale images in a
variety of ways. The greyscale output can then be fed to
gtda.homology.CubicalPersistence to extract topological
signatures in the form of persistence diagrams. These classes are:
The classes in
gtda.images.preprocessing perform a variety of preprocessing steps on either binary or greyscale image
input, as well as conversion to point cloud format. They are:
ConsistentRescaling is no longer placed in
gtda.homology. Instead, it is now in a
containing classes which process or modify the geometry of point cloud data.
gtda.point_clouds also contains the new
ConsecutiveRescaling, written with time series applications in mind.
List of point cloud input¶
All classes in the
homology subpackage (
can now take as inputs to the
transform methods lists of 2D arrays instead of simply 3D arrays. In this
way, collections of point clouds with varying numbers of points can be processed.
Changes and additions to
diagrams subpackage contains the following new classes:
Additionally, the subpackage has been reorganised as follows:
featuressubmodule now only contains the scalar feature generation classes
Amplitude(moved there from
Classes which produce vector representations from persistence diagrams have been moved to the new
Changes and additions to
validate_paramshas been thoroughly refactored, documented and exposed for the benefit of developers.
check_diagramshas been modified, documented and exposed for the benefit of developers.
check_point_cloudsperforms validation of inputs consisting of collections of point clouds of distance matrices. It accepts both lists of 2D ndarrays and 3D ndarrays, and is used in the
transformmethods of classes in
gtda.homology.simplicialto allow for list input (see above).
External modules and HPC improvements¶
A substantial effort has been put in improving the quality of the high-performance components contained in
The end result is a cleaner packaging as well as faster execution of C++ functions due to improved bindings. In particular:
Two binaries are now shipped for
ripser, one of them being optimised for calculations with mod 2 coefficients.
Recent improvements by the authors of the
heraC++ library have been integrated in
Compiler optimisations for Windows-based systems have been added.
The integration of
pybind11has been improved and several issues arising with
boostduring developer installations have been addressed.
Fixed a bug with
TakensEmbedding’s algorithm for search of optimal parameters.
Inconsistencies in between the meaning of “bottleneck amplitude” in the theory and in the code have been ironed out. The code has been modified to agree with the theory glossary. The outputs of the
Fixed bugs affecting color normalization in Mapper graph plots.
Python 3.5 is no longer supported.
Mac OS X versions below 10.14 are no longer supported by the wheels shipped via PyPI.
ConsistentRescalingis no longer found in
gtda.homologyand is now part of
The outputs of the
Filteringhave changed due to sqrt(2) factors (see Bug Fixes).
meta_transformersmodule has been removed.
plottingmodule has been removed from the
examplesfolder of the repository.
Thanks to our Contributors¶
This release contains contributions from many people:
Umberto Lupo, Guillaume Tauzin, Wojciech Reise, Julian Burella Pérez, Roman Yurchak, Lewis Tunstall, Anibal Medina-Mardones, and Adélie Garin.
We are also grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.