Topological feature extraction using VietorisRipsPersistence
and PersistenceEntropy
¶
In this notebook, we showcase the ease of use of one of the core
components of giotto-tda
: VietorisRipsPersistence
, along with
vectorisation methods. We first list steps in a typical,
topological-feature extraction routine and then show to encapsulate them
with a standard scikit-learn
–like pipeline.
If you are looking at a static version of this notebook and would like to run its contents, head over to github.
License: AGPLv3
Import libraries¶
from gtda.diagrams import PersistenceEntropy
from gtda.homology import VietorisRipsPersistence
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
Generate data¶
Let’s begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. Note: You will need the auxiliary module datasets.py to run this cell.
from datasets import generate_point_clouds
point_clouds, labels = generate_point_clouds(100, 10, 0.1)
Calculate persistent homology¶
Instantiate a VietorisRipsPersistence
transformer and calculate
persistence diagrams for this collection of point clouds.
vietorisrips_tr = VietorisRipsPersistence()
diagrams = vietorisrips_tr.fit_transform(point_clouds)
Extract features¶
Instantiate a PersistenceEntropy
transformer and extract features
from the persistence diagrams.
entropy_tr = PersistenceEntropy()
features = entropy_tr.fit_transform(diagrams)
Use the new features in a standard classifier¶
Leverage the compatibility with scikit-learn
to perform a train-test
split and score the features.
X_train, X_valid, y_train, y_valid = train_test_split(features, labels)
model = RandomForestClassifier()
model.fit(X_train, y_train)
model.score(X_valid, y_valid)
0.96
Encapsulates the steps above in a pipeline¶
Subdivide into train-validation first, and use the pipeline.
from gtda.pipeline import make_pipeline
Define the pipeline¶
Chain transformers from giotto-tda
with scikit-learn
ones.
steps = [VietorisRipsPersistence(),
PersistenceEntropy(),
RandomForestClassifier()]
pipeline = make_pipeline(*steps)
Prepare the data¶
Train-test split on the point-cloud data
pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(
point_clouds, labels)