Topological feature extraction using
In this notebook, we showcase the ease of use of one of the core
VietorisRipsPersistence, along with
vectorisation methods. We first list steps in a typical,
topological-feature extraction routine and then show to encapsulate them
with a standard
If you are looking at a static version of this notebook and would like to run its contents, head over to github.
from gtda.diagrams import PersistenceEntropy from gtda.homology import VietorisRipsPersistence from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split
Let’s begin by generating 3D point clouds of spheres and tori, along with a label of 0 (1) for each sphere (torus). We also add noise to each point cloud, whose effect is to displace the points sampling the surfaces by a random amount in a random direction. Note: You will need the auxiliary module datasets.py to run this cell.
from datasets import generate_point_clouds point_clouds, labels = generate_point_clouds(100, 10, 0.1)
Calculate persistent homology¶
VietorisRipsPersistence transformer and calculate
persistence diagrams for this collection of point clouds.
vietorisrips_tr = VietorisRipsPersistence() diagrams = vietorisrips_tr.fit_transform(point_clouds)
PersistenceEntropy transformer and extract features
from the persistence diagrams.
entropy_tr = PersistenceEntropy() features = entropy_tr.fit_transform(diagrams)
Use the new features in a standard classifier¶
Leverage the compatibility with
scikit-learn to perform a train-test
split and score the features.
X_train, X_valid, y_train, y_valid = train_test_split(features, labels) model = RandomForestClassifier() model.fit(X_train, y_train) model.score(X_valid, y_valid)
Encapsulates the steps above in a pipeline¶
Subdivide into train-validation first, and use the pipeline.
from gtda.pipeline import make_pipeline
Define the pipeline¶
Chain transformers from
steps = [VietorisRipsPersistence(), PersistenceEntropy(), RandomForestClassifier()] pipeline = make_pipeline(*steps)
Prepare the data¶
Train-test split on the point-cloud data
pcs_train, pcs_valid, labels_train, labels_valid = train_test_split( point_clouds, labels)
Train and score¶
pipeline.fit(pcs_train, labels_train) pipeline.score(pcs_valid, labels_valid)