Classifying handwritten digits

This notebook shows how giotto-tda can be used to generate topological features for image classification. We’ll be using the famous MNIST dataset, which contains images of handwritten digits and is a standard benchmark for testing new classification algorithms.

Figure 1: A few digits from the MNIST dataset. Figure reference: en.wikipedia.org/wiki/MNIST_database.

If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub.

Useful references

License: AGPLv3

Load the MNIST dataset

To get started, let’s fetch the MNIST dataset using one of scikit-learn’s helper functions:

from sklearn.datasets import fetch_openml

X, y = fetch_openml("mnist_784", version=1, return_X_y=True)

By looking at the shapes of these arrays

print(f"X shape: {X.shape}, y shape: {y.shape}")
X shape: (70000, 784), y shape: (70000,)

we see that there are 70,000 images, where each image has 784 features that represent pixel intensity. Let’s reshape the feature vector to a 28x28 array and visualise one of the “8” digits using giotto-tda’s plotting API:

import numpy as np
from gtda.plotting import plot_heatmap

im8_idx = np.flatnonzero(y == "8")[0]
img8 = X[im8_idx].reshape(28, 28)
plot_heatmap(img8)

Create train and test sets

For this example, we will work with a small subset of images – to run a full-blown analysis simply change the values of train_size and test_size below:

from sklearn.model_selection import train_test_split

train_size, test_size = 60, 10

# Reshape to (n_samples, n_pixels_x, n_pixels_y)
X = X.reshape((-1, 28, 28))

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=train_size, test_size=test_size, stratify=y, random_state=666
)

print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}, y_test shape: {y_test.shape}")
X_train shape: (60, 28, 28), y_train shape: (60,)
X_test shape: (10, 28, 28), y_test shape: (10,)

From pixels to topological features

As shown in the figure below, several steps are required to extract topological features from an image. Since our images are made of pixels, it is convenient to use filtrations of cubical complexes instead of simplicial ones. Let’s go through each of these steps for a single “8” digit using giotto-tda!

Figure 2: An example of a topological feature extraction pipeline. Figure reference: arXiv:1910.08345.

Binarize the image

In giotto-tda, filtrations of cubical complexes are built from binary images consisting of only black and white pixels. We can convert our greyscale image to binary by applying a threshold on each pixel value via the Binarizer transformer:

from gtda.images import Binarizer

# Pick out index of first 8 image
im8_idx = np.flatnonzero(y_train == "8")[0]
# Reshape to (n_samples, n_pixels_x, n_pixels_y) format
im8 = X_train[im8_idx][None, :, :]

binarizer = Binarizer(threshold=0.4)
im8_binarized = binarizer.fit_transform(im8)

binarizer.plot(im8_binarized)

From binary image to filtration

Now that we have a binary image \(\mathcal{B}\) of our “8” digit, we can build a wide variety of different filtrations – see the giotto-tda docs for a full list. For our example, we’ll use the radial filtration \(\mathcal{R}\), which assigns to each pixel \(p\) a value corresponding to its distance from a predefined center \(c\) of the image

\[\begin{split} \mathcal{R}(p) = \left\{ \begin{array}{cl} \lVert c - p \rVert_2 &\mbox{if } \mathcal{B}(p)=1 \\ \mathcal{R}_\infty &\mbox{if } \mathcal{B}(p)=0 \end{array} \right.\end{split}\]

where \(\mathcal{R}_\infty\) is the distance of the pixel that is furthest from \(c\). To reproduce the filtered image from the MNIST article, we’ll pick \(c = (20,6)\):

from gtda.images import RadialFiltration

radial_filtration = RadialFiltration(center=np.array([20, 6]))
im8_filtration = radial_filtration.fit_transform(im8_binarized)

radial_filtration.plot(im8_filtration, colorscale="jet")

We can see from the resulting plot that we’ve effectively transformed our binary image into a greyscale one, where the pixel values increase as we move from the upper-right to bottom-left of the image! These pixel values can be used to define a filtration of cubical complexes \(\{K_i\}_{i\in \mathrm{Im}(I)}\), where \(K_i\) contains all pixels with value less than the \(i\)th smallest pixel value in the greyscale image. In other words, \(K_i\) is the \(i\)th sublevel set of the image’s cubical complex \(K\).

From filtration to persistence diagram

Given a greyscale filtration it is straightforward to calculate the corresponding persistence diagram. In giotto-tda we make use of the CubicalPersistence transformer which is the cubical analogue to simplicial transformers like VietorisRipsPersistence:

from gtda.homology import CubicalPersistence

cubical_persistence = CubicalPersistence(n_jobs=-1)
im8_cubical = cubical_persistence.fit_transform(im8_filtration)

cubical_persistence.plot(im8_cubical)

It works! We can clearly see two persistent \(H_1\) generators corresponding to the loops in the digit “8”, along with a single \(H_0\) generator corresponding to the connected components.

As a postprocessing step, it is often convenient to rescale the persistence diagrams which can be achieved in giotto-tda as follows:

from gtda.diagrams import Scaler

scaler = Scaler()
im8_scaled = scaler.fit_transform(im8_cubical)

scaler.plot(im8_scaled)

From persistence diagram to representation

The final step is to define a vectorial representation of the persistence diagram that can be used to obtain machine learning features. Following our example from Figure 2, we convolve our persistence diagram with a Gaussian kernel and symmetrize along the main diagonal, a procedure achieved via the `HeatKernel <https://giotto-ai.github.io/gtda-docs/latest/modules/generated/diagrams/representations/gtda.diagrams.HeatKernel.html#gtda.diagrams.HeatKernel>`__ transformer:

from gtda.diagrams import HeatKernel

heat = HeatKernel(sigma=.15, n_bins=60, n_jobs=-1)
im8_heat = heat.fit_transform(im8_scaled)

# Visualise the heat kernel for H1
heat.plot(im8_heat, homology_dimension_idx=1, colorscale='jet')

Combining all steps as a single pipeline

We’ve now seen how each step in Figure 2 is implemented in giotto-tda – let’s combine them as a single scikit-learn pipeline:

from sklearn.pipeline import Pipeline
from gtda.diagrams import Amplitude

steps = [
    ("binarizer", Binarizer(threshold=0.4)),
    ("filtration", RadialFiltration(center=np.array([20, 6]))),
    ("diagram", CubicalPersistence()),
    ("rescaling", Scaler()),
    ("amplitude", Amplitude(metric="heat", metric_params={'sigma':0.15, 'n_bins':60}))
]

heat_pipeline = Pipeline(steps)
im8_pipeline = heat_pipeline.fit_transform(im8)
im8_pipeline
array([[0.57234335, 2.49781365]])

In the final step we’ve used the `Amplitude <https://giotto-ai.github.io/gtda-docs/latest/modules/generated/diagrams/features/gtda.diagrams.Amplitude.html>`__ transformer to “vectorize” the persistence diagram via the heat kernel method above. In our example, this produces a vector of amplitudes \(\mathbf{a} = (a_0, a_1)\) where each amplitude \(a_i\) corresponds to a given homology dimension in the persistence diagram. By extracting these feature vectors from each image, we can feed them into a machine learning classifier – let’s tackle this in the next section!

Building a full-blown feature extraction pipeline

Now that we’ve seen how to extract topological features for a single image, let’s make it more realistic and extract a wide variety of features over the whole training set. The resulting pipeline resembles the figure below, where different filtrations and vectorizations of persistence diagrams can be concatenated to produce informative feature vectors.

Figure 3: A full-blown topological feature extraction pipeline

To keep things simple, we’ll augment our radial filtration with a height filtration \(\mathcal{H}\), defined by choosing a unit vector \(v \in \mathbb{R}^2\) in some direction and assigning values \(\mathcal{H}(p) = \langle p, v \rangle\) based on the distance of \(p\) to the hyperplane defined by \(v\). Following the article by Garin and Tauzin, we’ll pick a uniform set of directions and centers for our filtrations as shown in the figure below.

We’ll also generate features from persistence diagrams by using persistence entropy and a broad set of amplitudes. Putting it all together yields the following pipeline

from sklearn.pipeline import make_pipeline, make_union
from gtda.diagrams import PersistenceEntropy
from gtda.images import HeightFiltration

direction_list = [[1, 0], [1, 1], [0, 1], [-1, 1], [-1, 0], [-1, -1], [0, -1], [1, -1]]

center_list = [
    [13, 6],
    [6, 13],
    [13, 13],
    [20, 13],
    [13, 20],
    [6, 6],
    [6, 20],
    [20, 6],
    [20, 20],
]

# Creating a list of all filtration transformer, we will be applying
filtration_list = (
    [
        HeightFiltration(direction=np.array(direction), n_jobs=-1)
        for direction in direction_list
    ]
    + [RadialFiltration(center=np.array(center), n_jobs=-1) for center in center_list]
)

# Creating the diagram generation pipeline
diagram_steps = [
    [
        Binarizer(threshold=0.4, n_jobs=-1),
        filtration,
        CubicalPersistence(n_jobs=-1),
        Scaler(n_jobs=-1),
    ]
    for filtration in filtration_list
]

# Listing all metrics we want to use to extract diagram amplitudes
metric_list = [
    {"metric": "bottleneck", "metric_params": {}},
    {"metric": "wasserstein", "metric_params": {"p": 1}},
    {"metric": "wasserstein", "metric_params": {"p": 2}},
    {"metric": "landscape", "metric_params": {"p": 1, "n_layers": 1, "n_bins": 100}},
    {"metric": "landscape", "metric_params": {"p": 1, "n_layers": 2, "n_bins": 100}},
    {"metric": "landscape", "metric_params": {"p": 2, "n_layers": 1, "n_bins": 100}},
    {"metric": "landscape", "metric_params": {"p": 2, "n_layers": 2, "n_bins": 100}},
    {"metric": "betti", "metric_params": {"p": 1, "n_bins": 100}},
    {"metric": "betti", "metric_params": {"p": 2, "n_bins": 100}},
    {"metric": "heat", "metric_params": {"p": 1, "sigma": 1.6, "n_bins": 100}},
    {"metric": "heat", "metric_params": {"p": 1, "sigma": 3.2, "n_bins": 100}},
    {"metric": "heat", "metric_params": {"p": 2, "sigma": 1.6, "n_bins": 100}},
    {"metric": "heat", "metric_params": {"p": 2, "sigma": 3.2, "n_bins": 100}},
]

#
feature_union = make_union(
    *[PersistenceEntropy(nan_fill_value=-1)]
    + [Amplitude(**metric, n_jobs=-1) for metric in metric_list]
)

tda_union = make_union(
    *[make_pipeline(*diagram_step, feature_union) for diagram_step in diagram_steps],
    n_jobs=-1
)

which can be visualised using scikit-learn’s nifty HTML feature:

from sklearn import set_config
set_config(display='diagram')

tda_union
FeatureUnion(n_jobs=-1,
             transformer_list=[('pipeline-1',
                                Pipeline(steps=[('binarizer',
                                                 Binarizer(n_jobs=-1,
                                                           threshold=0.4)),
                                                ('heightfiltration',
                                                 HeightFiltration(direction=array([1, 0]),
                                                                  n_jobs=-1)),
                                                ('cubicalpersistence',
                                                 CubicalPersistence(n_jobs=-1)),
                                                ('scaler', Scaler(n_jobs=-1)),
                                                ('featureunion',
                                                 FeatureUnion(transformer_list=[('persistenceentropy',
                                                                                 Persist...
                                                                                           metric_params={'n_bins': 100,
                                                                                                          'p': 1,
                                                                                                          'sigma': 1.6},
                                                                                           n_jobs=-1)),
                                                                                ('amplitude-11',
                                                                                 Amplitude(metric='heat',
                                                                                           metric_params={'n_bins': 100,
                                                                                                          'p': 1,
                                                                                                          'sigma': 3.2},
                                                                                           n_jobs=-1)),
                                                                                ('amplitude-12',
                                                                                 Amplitude(metric='heat',
                                                                                           metric_params={'n_bins': 100,
                                                                                                          'p': 2,
                                                                                                          'sigma': 1.6},
                                                                                           n_jobs=-1)),
                                                                                ('amplitude-13',
                                                                                 Amplitude(metric='heat',
                                                                                           metric_params={'n_bins': 100,
                                                                                                          'p': 2,
                                                                                                          'sigma': 3.2},
                                                                                           n_jobs=-1))]))]))])
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([1, 0]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([1, 1]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([0, 1]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([-1,  1]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([-1,  0]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([-1, -1]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([ 0, -1]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
HeightFiltration(direction=array([ 1, -1]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([13,  6]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([ 6, 13]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([13, 13]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([20, 13]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([13, 20]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([6, 6]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([ 6, 20]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([20,  6]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)
Binarizer(n_jobs=-1, threshold=0.4)
RadialFiltration(center=array([20, 20]), n_jobs=-1)
CubicalPersistence(n_jobs=-1)
Scaler(n_jobs=-1)
FeatureUnion(transformer_list=[('persistenceentropy',
                                PersistenceEntropy(nan_fill_value=-1)),
                               ('amplitude-1',
                                Amplitude(metric='bottleneck', metric_params={},
                                          n_jobs=-1)),
                               ('amplitude-2',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 1}, n_jobs=-1)),
                               ('amplitude-3',
                                Amplitude(metric='wasserstein',
                                          metric_params={'p': 2}, n_jobs=-1)),
                               ('amplitude-4',
                                Amplitude...
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-11',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 1,
                                                         'sigma': 3.2},
                                          n_jobs=-1)),
                               ('amplitude-12',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 1.6},
                                          n_jobs=-1)),
                               ('amplitude-13',
                                Amplitude(metric='heat',
                                          metric_params={'n_bins': 100, 'p': 2,
                                                         'sigma': 3.2},
                                          n_jobs=-1))])
PersistenceEntropy(nan_fill_value=-1)
Amplitude(metric='bottleneck', metric_params={}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 1}, n_jobs=-1)
Amplitude(metric='wasserstein', metric_params={'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 1}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 1, 'p': 2}, n_jobs=-1)
Amplitude(metric_params={'n_bins': 100, 'n_layers': 2, 'p': 2}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 1}, n_jobs=-1)
Amplitude(metric='betti', metric_params={'n_bins': 100, 'p': 2}, n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 1, 'sigma': 3.2},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 1.6},
          n_jobs=-1)
Amplitude(metric='heat', metric_params={'n_bins': 100, 'p': 2, 'sigma': 3.2},
          n_jobs=-1)

It’s now a simple matter to run the whole pipeline:

X_train_tda = tda_union.fit_transform(X_train)
X_train_tda.shape
(60, 476)

Training a classifier

We see we have generated \((8 + 9) \times 2 \times 14 = 476\) topological features per image! In general, some of these features will be highly correlated and a feature selection procedure could be used to select the most informative ones. Nevertheless, let’s train a Random Forest classifier on our training set to see what kind of performance we can get:

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(X_train_tda, y_train)

X_test_tda = tda_union.transform(X_test)
rf.score(X_test_tda, y_test)
0.6

For such a small dataset, this accuracy is not too bad but accuracies above 96% can be achieved by training on the full MNIST dataset together with feature selection strategies.

Using hyperparameter search with topological pipelines

In the above pipeline, we can think of our choices for the directions and centers of the filtrations as hyperparameter. To wrap up our analysis, let’s see how we can run a hyperparameter search over the directions of the height filtration. We’ll use a simplified pipeline to show the main steps, but note that a realistic application would involve running the search over a pipeline like the one in the previous section.

As usual, we define our pipeline in terms of topological transformers and an estimator as the final step:

height_pipeline = Pipeline([
    ('binarizer', Binarizer(threshold=0.4)),
    ('filtration', HeightFiltration()),
    ('diagram', CubicalPersistence()),
    ('feature', PersistenceEntropy(nan_fill_value=-1)),
    ('classifier', RandomForestClassifier(random_state=42))
])

Next we can search for the best combination of directions, homology dimensions, and number of trees in our Random Forest as follows:

from sklearn.model_selection import GridSearchCV

direction_list = [[1, 0], [1, 1], [0, 1], [-1, 1], [-1, 0], [-1, -1], [0, -1], [1, -1]]
homology_dimensions_list = [[0], [1]]
n_estimators_list = [500, 1000, 2000]

param_grid = {
    "filtration__direction": [np.array(direction) for direction in direction_list],
    "diagram__homology_dimensions": [
        homology_dimensions for homology_dimensions in homology_dimensions_list
    ],
    "classifier__n_estimators": [n_estimators for n_estimators in n_estimators_list],
}

grid_search = GridSearchCV(
    estimator=height_pipeline, param_grid=param_grid, cv=3, n_jobs=-1
)

grid_search.fit(X_train, y_train)
GridSearchCV(cv=3,
             estimator=Pipeline(steps=[('binarizer', Binarizer(threshold=0.4)),
                                       ('filtration', HeightFiltration()),
                                       ('diagram', CubicalPersistence()),
                                       ('feature',
                                        PersistenceEntropy(nan_fill_value=-1)),
                                       ('classifier',
                                        RandomForestClassifier(random_state=42))]),
             n_jobs=-1,
             param_grid={'classifier__n_estimators': [500, 1000, 2000],
                         'diagram__homology_dimensions': [[0], [1]],
                         'filtration__direction': [array([1, 0]), array([1, 1]),
                                                   array([0, 1]),
                                                   array([-1,  1]),
                                                   array([-1,  0]),
                                                   array([-1, -1]),
                                                   array([ 0, -1]),
                                                   array([ 1, -1])]})
Binarizer(threshold=0.4)
HeightFiltration()
CubicalPersistence()
PersistenceEntropy(nan_fill_value=-1)
RandomForestClassifier(random_state=42)

By looking at the best hyperparameters

grid_search.best_params_
{'classifier__n_estimators': 500,
 'diagram__homology_dimensions': [0],
 'filtration__direction': array([1, 0])}

we see that the direction [1, 0] with homology dimension 0 produces the best features. By comparing say a “6” and “9” digit, can you think of a reason why this might be the case?