Plotting in giotto-tda

giotto-tda includes a set of plotting functions and class methods, powered by plotly. The library’s plotting API is designed to facilitate the exploration of intermediate results in pipelines by harnessing the highly visual nature of topological signatures.

This notebook is a quick tutorial on how to use giotto-tda’s plotting functionalities and unified plotting API. The plotting functions in gtda.mapper are not covered here as they are somewhat tailored to the Mapper algorithm, see the dedicated tutorial.

If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.

License: AGPLv3

1. Basic philosophy and plot methods

The computational building blocks of giotto-tda are scikit-learn–style estimators. Typically, they are also transformers, i.e. they possess a transform and/or a fit-transform method which:

  • act on an array-like object X which collects a certain number of “samples” of a given kind;

  • return a transformed array-like object Xt which collects a (potentially different) number of “samples” of a potentially different kind.

The basic philosophy of giotto-tda’s class-level plotting API is to equip relevant transformers with plot methods taking two main arguments:

  • an object such as Xt above (i.e. consistent with the outputs of transform or fit-transform);

  • an integer index passed via the sample keyword and indicating which sample in Xt should be plotted.

In other words, <transformer>.plot(Xt, sample=i) will produce a plot of Xt[i] which is tailored to the nature of the samples in Xt.

1.1 Plotting functions

Several plot methods in giotto-tda actually fall back to specialised functions which can be found in the plotting subpackage and which can be used directly instead. However, unless the additional degree of control is necessary, plot methods should be preferred as they often exploit class parameters and/or attributes (e.g. those computed during fit) to automatically fill some parameters in the corresponding functions.

1.2 Example: Plotting persistence diagrams with VietorisRipsPersistence

Let’s take the example of VietorisRipsPersistence – a transformer also covered in another notebook. Let’s create the input collection X for this transformer as a collection of randomly generated point clouds, each containing 100 points positioned along two circles.

import numpy as np
np.random.seed(seed=42)
from gtda.homology import VietorisRipsPersistence
from sklearn.datasets import make_circles

X = np.asarray([
    make_circles(100, factor=np.random.random())[0]
    for i in range(10)
])

Incidentally, samples in X can be plotted using gtda.plotting.plot_point_cloud.

from gtda.plotting import plot_point_cloud
i = 0
plot_point_cloud(X[i])

Let us instantiate a VietorisRipsTransformer object, and call the fit-transform method on X to obtain the transformed object Xt.

VR = VietorisRipsPersistence()
Xt = VR.fit_transform(X)

For any sample index i, Xt[i] is a two-dimensional array encoding the multi-scale topological information which can be extracted from the i-th point cloud X[i].

It is typically too difficult to get a quick idea of the interesting information contained in Xt[i] by looking at the array directly. This information is best displayed as a so-called “persistence diagram” in 2D. The plot method of our VietorisRipsPersistence instance achieves precisely this:

VR.plot(Xt, sample=i)

In the case of VietorisRipsPersistence, plot is a thin wrapper around the function gtda.plotting.plot_diagram, so the same result could have been achieved by importing that function and calling plot_diagram(Xt[i]).

In the diagram, each point indicates a topological feature in the data which appears at a certain “birth” scale and remains present all the way up to a later “death” scale. A point’s distance from the diagonal is directly proportional to the difference between the point’s “death” and its “birth”. Hence, this distance visually communicates how “persistent” the associated topological feature is. Topological features are partitioned by dimension using colors: above, features in dimension 0 are red while those in dimension 1 are green. In dimension 0, the diagram describes connectivity structure in the data in a very similar way to linkage clustering: we see three points along the vertical axis, which are in one-to-one correspondence with “merge” events in the sense of hierarchical clustering. In dimension 1, the diagram describes the presence of “independent” one-dimensional holes in the data: as expected, there are only two significant points, corresponding to the two “persistent” circles.

2 Derived convenience methods: transform_plot and fit_transform_plot

Where appropriate, giotto-tda transformers which have a plot method can also implement the two derived methods transform_plot and fit_transform_plot.

2.1 transform_plot

This method takes two main arguments:

  • an object such as X above (i.e. consistent with the inputs of transform or fit-transform);

  • an integer index i passed via the sample keyword.

The logic of transform_plot can be roughly described as follows: first, the sample X[i] is transformed; then, the result is plotted using plot and returned. [More technically: we first create a trivial collection X_sing = [X[i]], which contains a single sample from X. Then, we compute Xt_sing = <transformer>.transform(X_sing). Assuming Xt_sing contains a single transformed sample, we call <transformer>.plot(Xt_sing, sample=0), and also return Xt_sing.]

In the example of Section 1.2, we would do:

VR = VietorisRipsPersistence()
VR.fit(X)
Xt = VR.transform_plot(X, sample=i);

2.2 fit_transform_plot

This method is equivalent to first fitting the transformer using X (and, optionally, a target variable y), and then calling transform_plot on X and a given sample index.

The workflow in the example of Section 1.2 can be simplified even further, turning the entire process into a simple one-liner:

VR = VietorisRipsPersistence()
Xt = VR.fit_transform_plot(X, sample=i);