Plotting in giotto-tda¶
giotto-tda includes a set of plotting functions and class methods,
powered by plotly. The library’s plotting API is designed to
facilitate the exploration of intermediate results in pipelines by
harnessing the highly visual nature of topological signatures.
This notebook is a quick tutorial on how to use giotto-tda’s
plotting functionalities and unified plotting API. The plotting
functions in gtda.mapper are not covered here as they are somewhat
tailored to the Mapper algorithm, see the dedicated
tutorial.
If you are looking at a static version of this notebook and would like to run its contents, head over to github.
License: AGPLv3
1. Basic philosophy and plot methods¶
The computational building blocks of giotto-tda are
scikit-learn–style estimators. Typically, they are also
transformers, i.e. they possess a transform and/or a
fit-transform method which:
act on an array-like object
Xwhich collects a certain number of “samples” of a given kind;return a transformed array-like object
Xtwhich collects a (potentially different) number of “samples” of a potentially different kind.
The basic philosophy of giotto-tda’s class-level plotting API is
to equip relevant transformers with plot methods taking two main
arguments:
an object such as
Xtabove (i.e. consistent with the outputs oftransformorfit-transform);an integer index passed via the
samplekeyword and indicating which sample inXtshould be plotted.
In other words, <transformer>.plot(Xt, sample=i) will produce a plot
of Xt[i] which is tailored to the nature of the samples in Xt.
1.1 Plotting functions¶
Several plot methods in giotto-tda actually fall back to
specialised functions which can be found in the plotting
subpackage
and which can be used directly instead. However, unless the additional
degree of control is necessary, plot methods should be preferred as
they often exploit class parameters and/or attributes (e.g. those
computed during fit) to automatically fill some parameters in the
corresponding functions.
1.2 Example: Plotting persistence diagrams with VietorisRipsPersistence¶
Let’s take the example of VietorisRipsPersistence – a transformer
also covered in another
notebook.
Let’s create the input collection X for this transformer as a
collection of randomly generated point clouds, each containing 100
points positioned along two circles.
import numpy as np
np.random.seed(seed=42)
from gtda.homology import VietorisRipsPersistence
from sklearn.datasets import make_circles
X = np.asarray([
make_circles(100, factor=np.random.random())[0]
for i in range(10)
])
Incidentally, samples in X can be plotted using
gtda.plotting.plot_point_cloud.
from gtda.plotting import plot_point_cloud
i = 0
plot_point_cloud(X[i])
Let us instantiate a VietorisRipsTransformer object, and call the
fit-transform method on X to obtain the transformed object
Xt.
VR = VietorisRipsPersistence()
Xt = VR.fit_transform(X)
For any sample index i, Xt[i] is a two-dimensional array encoding
the multi-scale topological information which can be extracted from the
i-th point cloud X[i].
It is typically too difficult to get a quick idea of the interesting
information contained in Xt[i] by looking at the array directly.
This information is best displayed as a so-called “persistence diagram”
in 2D. The plot method of our VietorisRipsPersistence instance
achieves precisely this:
VR.plot(Xt, sample=i)
In the case of VietorisRipsPersistence, plot is a thin wrapper
around the function gtda.plotting.plot_diagram, so the same result
could have been achieved by importing that function and calling
plot_diagram(Xt[i]).
In the diagram, each point indicates a topological feature in the data which appears at a certain “birth” scale and remains present all the way up to a later “death” scale. A point’s distance from the diagonal is directly proportional to the difference between the point’s “death” and its “birth”. Hence, this distance visually communicates how “persistent” the associated topological feature is. Topological features are partitioned by dimension using colors: above, features in dimension 0 are red while those in dimension 1 are green. In dimension 0, the diagram describes connectivity structure in the data in a very similar way to linkage clustering: we see three points along the vertical axis, which are in one-to-one correspondence with “merge” events in the sense of hierarchical clustering. In dimension 1, the diagram describes the presence of “independent” one-dimensional holes in the data: as expected, there are only two significant points, corresponding to the two “persistent” circles.
2 Derived convenience methods: transform_plot and fit_transform_plot¶
Where appropriate, giotto-tda transformers which have a plot
method can also implement the two derived methods transform_plot and
fit_transform_plot.
2.1 transform_plot¶
This method takes two main arguments:
an object such as
Xabove (i.e. consistent with the inputs oftransformorfit-transform);an integer index i passed via the
samplekeyword.
The logic of transform_plot can be roughly described as follows:
first, the sample X[i] is transformed; then, the result is plotted
using plot and returned. [More technically: we first create a
trivial collection X_sing = [X[i]], which contains a single sample
from X. Then, we compute
Xt_sing = <transformer>.transform(X_sing). Assuming Xt_sing
contains a single transformed sample, we call
<transformer>.plot(Xt_sing, sample=0), and also return Xt_sing.]
In the example of Section 1.2, we would do:
VR = VietorisRipsPersistence()
VR.fit(X)
VR.transform_plot(X, sample=i);
2.2 fit_transform_plot¶
This method is equivalent to first fitting the transformer using X
(and, optionally, a target variable y), and then calling
transform_plot on X and a given sample index.
The workflow in the example of Section 1.2 can be simplified even further, turning the entire process into a simple one-liner:
VR = VietorisRipsPersistence()
VR.fit_transform_plot(X, sample=i);