Plotting in giotto-tda
¶
giotto-tda
includes a set of plotting functions and class methods,
powered by plotly
. The library’s plotting API is designed to
facilitate the exploration of intermediate results in pipelines by
harnessing the highly visual nature of topological signatures.
This notebook is a quick tutorial on how to use giotto-tda
’s
plotting functionalities and unified plotting API. The plotting
functions in gtda.mapper
are not covered here as they are somewhat
tailored to the Mapper algorithm, see the dedicated
tutorial.
If you are looking at a static version of this notebook and would like to run its contents, head over to github.
License: AGPLv3
1. Basic philosophy and plot
methods¶
The computational building blocks of giotto-tda
are
scikit-learn
–style estimators. Typically, they are also
transformers, i.e. they possess a transform
and/or a
fit-transform
method which:
act on an array-like object
X
which collects a certain number of “samples” of a given kind;return a transformed array-like object
Xt
which collects a (potentially different) number of “samples” of a potentially different kind.
The basic philosophy of giotto-tda
’s class-level plotting API is
to equip relevant transformers with plot
methods taking two main
arguments:
an object such as
Xt
above (i.e. consistent with the outputs oftransform
orfit-transform
);an integer index passed via the
sample
keyword and indicating which sample inXt
should be plotted.
In other words, <transformer>.plot(Xt, sample=i)
will produce a plot
of Xt[i]
which is tailored to the nature of the samples in Xt
.
1.1 Plotting functions¶
Several plot
methods in giotto-tda
actually fall back to
specialised functions which can be found in the plotting
subpackage
and which can be used directly instead. However, unless the additional
degree of control is necessary, plot
methods should be preferred as
they often exploit class parameters and/or attributes (e.g. those
computed during fit
) to automatically fill some parameters in the
corresponding functions.
1.2 Example: Plotting persistence diagrams with VietorisRipsPersistence
¶
Let’s take the example of VietorisRipsPersistence
– a transformer
also covered in another
notebook.
Let’s create the input collection X
for this transformer as a
collection of randomly generated point clouds, each containing 100
points positioned along two circles.
import numpy as np
np.random.seed(seed=42)
from gtda.homology import VietorisRipsPersistence
from sklearn.datasets import make_circles
X = np.asarray([
make_circles(100, factor=np.random.random())[0]
for i in range(10)
])
Incidentally, samples in X
can be plotted using
gtda.plotting.plot_point_cloud
.
from gtda.plotting import plot_point_cloud
i = 0
plot_point_cloud(X[i])
Let us instantiate a VietorisRipsTransformer
object, and call the
fit-transform
method on X
to obtain the transformed object
Xt
.
VR = VietorisRipsPersistence()
Xt = VR.fit_transform(X)
For any sample index i, Xt[i]
is a two-dimensional array encoding
the multi-scale topological information which can be extracted from the
i-th point cloud X[i]
.
It is typically too difficult to get a quick idea of the interesting
information contained in Xt[i]
by looking at the array directly.
This information is best displayed as a so-called “persistence diagram”
in 2D. The plot
method of our VietorisRipsPersistence
instance
achieves precisely this:
VR.plot(Xt, sample=i)
In the case of VietorisRipsPersistence
, plot
is a thin wrapper
around the function gtda.plotting.plot_diagram
, so the same result
could have been achieved by importing that function and calling
plot_diagram(Xt[i])
.
In the diagram, each point indicates a topological feature in the data which appears at a certain “birth” scale and remains present all the way up to a later “death” scale. A point’s distance from the diagonal is directly proportional to the difference between the point’s “death” and its “birth”. Hence, this distance visually communicates how “persistent” the associated topological feature is. Topological features are partitioned by dimension using colors: above, features in dimension 0 are red while those in dimension 1 are green. In dimension 0, the diagram describes connectivity structure in the data in a very similar way to linkage clustering: we see three points along the vertical axis, which are in one-to-one correspondence with “merge” events in the sense of hierarchical clustering. In dimension 1, the diagram describes the presence of “independent” one-dimensional holes in the data: as expected, there are only two significant points, corresponding to the two “persistent” circles.
2 Derived convenience methods: transform_plot
and fit_transform_plot
¶
Where appropriate, giotto-tda
transformers which have a plot
method can also implement the two derived methods transform_plot
and
fit_transform_plot
.
2.1 transform_plot
¶
This method takes two main arguments:
an object such as
X
above (i.e. consistent with the inputs oftransform
orfit-transform
);an integer index i passed via the
sample
keyword.
The logic of transform_plot
can be roughly described as follows:
first, the sample X[i]
is transformed; then, the result is plotted
using plot
and returned. [More technically: we first create a
trivial collection X_sing = [X[i]]
, which contains a single sample
from X
. Then, we compute
Xt_sing = <transformer>.transform(X_sing)
. Assuming Xt_sing
contains a single transformed sample, we call
<transformer>.plot(Xt_sing, sample=0)
, and also return Xt_sing
.]
In the example of Section 1.2, we would do:
VR = VietorisRipsPersistence()
VR.fit(X)
VR.transform_plot(X, sample=i);
2.2 fit_transform_plot
¶
This method is equivalent to first fitting the transformer using X
(and, optionally, a target variable y
), and then calling
transform_plot
on X
and a given sample index.
The workflow in the example of Section 1.2 can be simplified even further, turning the entire process into a simple one-liner:
VR = VietorisRipsPersistence()
VR.fit_transform_plot(X, sample=i);