Theory Glossary

Symbols

k

An arbitrary field.

R

The field of real numbers.

¯R

The two point compactification [,+] of the real numbers.

N

The counting numbers 0,1,2, as a subset of R.

Rd

The vector space of d-tuples of real numbers.

Δ

The multiset {(s,s)sR} with multiplicity (s,s)+.

Analysis

Metric space

A set X with a function

d:X×XR

is said to be a metric space if the values of d are all non-negative and for all x,y,zX

d(x,y)=0  x=y
d(x,y)=d(y,x)
d(x,z)d(x,y)+d(y,z).

In this case the d is referred to as the metric or the distance function.

Normed space

A vector space V together with a function

||||:VR

is said to be an normed space if the values of |||| are all non-negative and for all u,vV and aR

||u||=0  u=0
||au||=|a|||u||
||u+v||||u||+||v||.

The function |||| is referred to as the norm.

A normed space is naturally a metric space with distance function

d(u,v)=||uv||.

Inner product space

A vector space V together with a function

,:V×VR

is said to be an inner product space if for all u,v,wV and aR

u0  u,u>0
u,v=v,u
au+v,w=au,w+v,w.

The function , is referred to as the inner product.

An inner product space is naturally a normed space with

||u||=u,u.

Vectorization, amplitude and kernel

Let X be a set, for example, the set of all persistence diagrams. A vectorization for X is a function

ϕ:XV

where V is a vector space.

An amplitude on X is a function

A:XR

for which there exists a vectorization ϕ:XV with V a normed space such that

A(x)=||ϕ(x)||

for all xX.

A kernel on the set X is a function

k:X×XR

for which there exists a vectorization ϕ:XV with V an inner product space such that

k(x,y)=ϕ(x),ϕ(y)

for each x,yX.

Euclidean distance and lp-norms

The vector space Rn is an inner product space with inner product

x,y=(x1y1)2++(xnyn)2.

This inner product is referred to as dot product and the associated norm and distance function are respectively named Euclidean norm and Euclidean distance.

For any p(0,] the pair Rn,||||p with

||x||p=(xp1++xpn)1/p

if p is finite and

||x||=max{xi | i=1,,n}

is a normed spaced and its norm is referred to as the lp-norm.

Distance matrices and point clouds

Let (X,d) be a finite metric space. A distance matrix associated to it is obtained by choosing a total order on X=x1<<xm and setting the (i,j)-entry to be equal to d(xi,xj).

A point cloud is a finite subset of Rn (for some n) together with the metric induced from the Euclidean distance.

Lp-norms

Let URn and C(U,R) be the set of continuous real-valued functions on U. A function fC(U,R) is said to be p-integrable if

U|f(x)|pdx

is finite. The subset of p-integrable functions together with the assignment ||||p

f(U|f(x)|pdx)1/p

is a normed space and ||||p is referred to as the Lp-norm.

The only Lp-norm that is induced from an inner product is L2, and the inner product is given by

f,g=(U|f(x)g(x)|2dx)1/2.

Homology

Cubical complex

An elementary interval Ia is a subset of R of the form [a,a+1] or [a,a]={a} for some aR. These two types are called respectively non-degenerate and degenerate. To a non-degenerate elementary interval we assign two degenerate elementary intervals

d+Ia=[a+1,a+1]anddIa=[a,a].

An elementary cube is a subset of the form

Ia1××IaNRN

where each Iai is an elementary interval. We refer to the total number of its non-degenerate factors Iak1,,Iakn as its dimension and, assuming

ak1<<akn,

we define for i=1,,n the following two elementary cubes

d±iIN=Ia1××d±Iaki××IaN.

A cubical complex is a finite set of elementary cubes of RN, and a subcomplex of X is a cubical complex whose elementary cubes are also in X.

Reference:

(Kaczynski, Mischaikow, and Mrozek 2004)

Simplicial complex

A set {v0,,vn}RN is said to be geometrically independent if the vectors {v0v1,,v0vn} are linearly independent. In this case, we refer to their convex closure as a simplex, explicitly

[v0,,vn]={ci(v0vi) | c1++cn=1, ci0}

and to n as its dimension. The i-th face of [v0,,vn] is defined by

di[v0,,vn]=[v0,,ˆvi,,vn]

where ˆvi denotes the absence of vi from the set.

A simplicial complex X is a finite union of simplices in RN satisfying that every face of a simplex in X is in X and that the non-empty intersection of two simplices in X is a face of each. Every simplicial complex defines an abstract simplicial complex.

Abstract simplicial complex

An abstract simplicial complex is a pair of sets (V,X) with the elements of X being subsets of V such that:

  1. for every v in V, the singleton {v} is in X and

  2. if x is in X and y is a subset of x, then y is in X.

We abuse notation and denote the pair (V,X) simply by X.

The elements of X are called simplices and the dimension of a simplex x is defined by |x|=#x1 where #x denotes the cardinality of x. Simplices of dimension d are called d-simplices. We abuse terminology and refer to the elements of V and to their associated 0-simplices both as vertices.

A simplicial map between simplicial complexes is a function between their vertices such that the image of any simplex via the induced map is a simplex.

A simplicial complex X is a subcomplex of a simplicial complex Y if every simplex of X is a simplex of Y.

Given a finite abstract simplicial complex X=(V,X) we can choose a bijection from V to a geometrically independent subset of RN and associate a simplicial complex to X called its geometric realization.

Ordered simplicial complex

An ordered simplicial complex is an abstract simplicial complex where the set of vertices is equipped with a partial order such that the restriction of this partial order to any simplex is a total order. We denote an n-simplex using its ordered vertices by [v0,,vn].

A simplicial map between ordered simplicial complexes is a simplicial map f between their underlying simplicial complexes preserving the order, i.e., vw implies f(v)f(w).

Directed simplicial complex

A directed simplicial complex is a pair of sets (V,X) with the elements of X being tuples of elements of V, i.e., elements in n1V×n such that:

  1. for every v in V, the tuple v is in X

  2. if x is in X and y is a subtuple of x, then y is in X.

With appropriate modifications the same terminology and notation introduced for ordered simplicial complex applies to directed simplicial complex.

Chain complex

A chain complex of is a pair (C,) where

C=nZCnand=nZn

with Cn a k-vector space and n:Cn+1Cn is a k-linear map such that n+1n=0. We refer to as the boundary map of the chain complex.

The elements of C are called chains and if cCn we say its degree is n or simply that it is an n-chain. Elements in the kernel of are called cycles, and elements in the image of are called boundaries. Notice that every boundary is a cycle. This fact is central to the definition of homology.

A chain map is a k-linear map f:CC between chain complexes such that f(Cn)Cn and f=f.

Given a chain complex (C,), its linear dual C is also a chain complex with Cn=Homk(Cn,k) and boundary map δ defined by δ(α)(c)=α(c) for any αC and cC.

Homology and cohomology

Let (C,) be a chain complex. Its n-th homology group is the quotient of the subspace of n-cycles by the subspace of n-boundaries, that is, Hn(C)=ker(n)/im(n+1). The homology of (C,) is defined by H(C)=nZHn(C).

When the chain complex under consideration is the linear dual of a chain complex we sometimes refer to its homology as the cohomology of the predual complex and write Hn for Hn.

A chain map f:CC induces a map between the associated homologies.

Simplicial chains and simplicial homology

Let X be an ordered or directed simplicial complex and denote the subset of n-simplices by Xn. Define its simplicial chain complex with k-coefficients C(X;k) by

Cn(X;k)=k{Xn},n(x)=ni=0(1)idix

and its homology and cohomology with k-coefficients as the homology and cohomology of this chain complex. We use the notation H(X;k) and H(X;k) for these.

A simplicial map induces a chain map between the associated simplicial chain complexes and, therefore, between the associated simplicial (co)homologies.

Cubical chains and cubical homology

Let X be a cubical complex and denote the subset of n-cubes by Xn. Define the cubical chain complex with k-coefficients C(X;k) by

Cn(X;k)=k{Xn},nx=ni=1(1)i1(d+ixdix)

where x=I1××IN and s(i) is the dimension of I1××Ii. Its homology and cohomology with k-coefficients is the homology and cohomology of this chain complex. We use the notation H(X;k) and H(X;k) for these.

Persistence

Filtered complex

A filtered complex is a collection of simplicial or cubical complexes {Xs}sR such that Xs is a subcomplex of Xt for each st.

Cellwise filtration

A cellwise filtration is a simplicial or cubical complex X together with a total order on its simplices or elementary cubes such that for each yX the set {xX : xy} is a subcomplex of X. A cellwise filtration can be naturally thought of as a filtered complex.

Clique and flag complexes

Let G be a 1-dimensional abstract (resp. directed) simplicial complex. The abstract (resp. directed) simplicial complex G has the same set of vertices as G and {v0,,vn} (resp. (v0,,vn)) is a simplex in G if an only if {vi,vj} (resp. (vi,vj)) is in G for each pair of vertices vi,vj.

An abstract (resp. directed) simplicial complex X is a clique (resp. flag) complex if X=G for some G.

Given a function

w:GR{}

consider the extension

w:GR{}

defined respectively by

w{v0,,vn}=max{w{vi,vj} | ij}w(v0,,vn)=max{w(vi,vj) | i<j}

and define the filtered complex {Gs}sR by

Gs={σG | w(σ)s}.

A filtered complex {Xs}sR is a filtered clique (resp. flag) complex if Xs=Gs for some (G,w).

Persistence module

A persistence module is a collection containing a k-vector spaces V(s) for each real number s together with k-linear maps fst:V(s)V(t), referred to as structure maps, for each pair st, satisfying naturality, i.e., if rst, then frt=fstfrs and tameness, i.e., all but finitely many structure maps are isomorphisms.

A morphism of persistence modules F:VW is a collection of linear maps F(s):V(s)W(s) such that F(t)fst=fstF(s) for each par of reals st. We say that F is an isomorphisms if each F(s) is.

Persistent simplicial (co)homology

Let {X(s)}sR be a set of ordered or directed simplicial complexes together with simplicial maps fst:X(s)X(t) for each pair st, such that

rst impliesfrt=fstfrs

for example, a filtered complex. Its persistent simplicial homology with k-coefficients is the persistence module

H(X(s);k)

with structure maps H(fst):H(X(s);k)H(X(t);k) induced form the maps fst. In general, the collection constructed this way needs not satisfy the tameness condition of a persistence module, but we restrict attention to the cases where it does. Its persistence simplicial cohomology with k-coefficients is defined analogously.

Vietoris-Rips complex and Vietoris-Rips persistence

Let (X,d) be a finite metric space. Define the Vietoris-Rips complex of X as the filtered complex VRs(X) that contains a subset of X as a simplex if all pairwise distances in the subset are less than or equal to s, explicitly

VRs(X)={[v0,,vn] | i,j d(vi,vj)s}.

The Vietoris-Rips persistence of (X,d) is the persistent simplicial (co)homology of VRs(X).

A more general version is obtained by replacing the distance function with an arbitrary function

w:X×XR{}

and defining VRs(X) as the filtered clique complex associated to (X×X,w).

Čech complex and Čech persistence

Let (X,d) be a point cloud. Define the Čech complex of X as the filtered complex ˇCs(X) that is empty if s<0 and, if s0, contains a subset of X as a simplex if the balls of radius s with centers in the subset have a non-empty intersection, explicitly

ˇCs(X)={[v0,,vn] | ni=0Bs(xi)}.

The Čech persistence (co)homology of (X,d) is the persistent simplicial (co)homology of ˇCs(X).

Multiset

A multiset is a pair (S,ϕ) where S is a set and ϕ:SN{+} is a function attaining positive values. For sS we refer to ϕ(s) as its multiplicity. The union of two multisets (S1,ϕ1),(S2,ϕ2) is the multiset (S1S2,ϕ1ϕ2) with

(ϕ1ϕ2)(s)={ϕ1(s)sS1,sS2ϕ2(s)sS2,sS1ϕ1(s)+ϕ2(s)sS1,sS2.

Persistence diagram

A persistence diagram is a multiset of points in

R×(R{+}).

Given a persistence module, its associated persistence diagram is determined by the following condition: for each pair s,t the number counted with multiplicity of points (b,d) in the multiset, satisfying bst<d is equal to the rank of fst.

A well known result establishes that there exists an isomorphism between two persistence module if and only if their persistence diagrams are equal.

Wasserstein and bottleneck distance

The p-Wasserstein distance between two persistence diagrams D1 and D2 is the infimum over all bijections γ:D1ΔD2Δ of

(xD1Δ||xγ(x)||p)1/p

where |||| is defined for (x,y)R2 by max{|x|,|y|}.

The limit p defines the bottleneck distance. More explicitly, it is the infimum over the same set of bijections of the value

supxD1Δ||xγ(x)||.

The set of persistence diagrams together with any of the distances above is a metric space.

Reference:

(Kerber, Morozov, and Nigmetov 2017)

Persistence landscape

Let {(bi,di)}iI be a persistence diagram. Its persistence landscape is the set {λk}kN of functions

λk:R¯R

defined by letting λk(t) be the k-th largest value of the set {Λi(t)}iI where

Λi(t)=[min{tbi,dit}]+

and c+:=max(c,0). The function λk is referred to as the k-layer of the persistence landscape.

We describe the graph of each λk intuitively. For each iI, draw an isosceles triangle with base the interval (bi,di) on the horizontal t-axis, and sides with slope 1 and 1. This subdivides the plane into a number of polygonal regions. Label each of these regions by the number of triangles containing it. If Pk is the union of the polygonal regions with values at least k, then the graph of λk is the upper contour of Pk, with λk(a)=0 if the vertical line t=a does not intersect Pk.

The persistence landscape construction defines a vectorization of the set of persistence diagrams with target the vector space of real-valued function on N×R. For any p=1,, we can restrict attention to persistence diagrams D whose associated persistence landscape λ is p-integrable, that is to say,

||λ||p=(iN||λi||pp)1/p

where

||λi||p=(Rλpi(x)dx)1/p

is finite. In this case we refer to [equation:persistence_landscape_norm] as the p-landscape norm of D and, for p=2, define the value of the landscape kernel on two persistence diagrams D and E as

λ,μ=(iNR|λi(x)μi(x)|2dx)1/2

where λ and μ are their associated persistence landscapes.

References:

(Bubenik 2015)

Weighted silhouette

Let D={(bi,di)}iI be a persistence diagram and w={wi}iI a set of positive real numbers. The silhouette of D weighted by w is the function ϕ:RR defined by

ϕ(t)=iIwiΛi(t)iIwi,

where

Λi(t)=[min{tbi,dit}]+

and c+:=max(c,0). When wi=|dibi|p for 0<p we refer to ϕ as the p-power-weighted silhouette of D. The silhouette construction defines a vectorization of the set of persistence diagrams with target the vector space of continuous real-valued functions on R.

References:

(Chazal et al. 2014)

Heat vectorizations

Considering the points in a persistence diagram as the support of Dirac deltas one can construct, for any t>0, two vectorizations of the set of persistence diagrams to the set of continuous real-valued function on the first quadrant R2>0. The symmetry heat vectorization is constructed for every persistence diagram D by solving the heat equation

Δx(u)=tuon Ω×R>0u=0on {x1=x2}×R0u=pDδpon Ω×0

where Ω={(x1,x2)R2 | x1x2}, then solving the same equation after precomposing the data of [equation: heat equation] with the change of coordinates (x1,x2)(x2,x1), and defining the image of D to be the difference between these two solutions at the chosen time t.

Similarly, the rotation heat vectorization is defined by sending D to the solution, evaluated at time t, of the equation obtained by precomposing the data of [equation: heat equation] with the change of coordinates (x1,x2)(x1,x2x1).

We recall that the solution to the heat equation with initial condition given by a Dirac delta supported at pR2 is

14πtexp(||px||24t)

and, to highlight the connection with normally distributed random variables, it is customary to use the the change of variable σ=2t.

References:

(Reininghaus et al. 2015; Adams et al. 2017)

Persistence entropy

Intuitively, this is a measure of the entropy of the points in a persistence diagram. Precisely, let D={(bi,di)}iI be a persistence diagram with each di<+. The persistence entropy of D is defined by

E(D)=iIpilog(pi)

where

pi=(dibi)LDandLD=iI(dibi).

References:

(Rucco et al. 2016)

Betti curve

Let D be a persistence diagram. Its Betti curve is the function βD:RN whose value on sR is the number, counted with multiplicity, of points (bi,di) in D such that bis<di.

The name is inspired from the case when the persistence diagram comes from persistent homology.

Time series

Time series

A time series is a sequence {yi}ni=0 of real numbers.

A common construction of a times series {xi}ni=0 is given by choosing x0 arbitrarily as well as a step parameter h and setting

xi=x0+hi.

Another usual construction is as follows: given a time series {xi}ni=0U and a function

f:URR

we obtain a new time series {f(xi)}ni=0.

Generalizing the previous construction we can define a time series from a function

φ:U×MM,UR,MRd

using a function f:MR as follows: let {ti}ni=0 be a time series taking values in U, then

{f(φ(ti,m))}ni=0

for an arbitrarily chosen mM.

Takens embedding

Let MRd be a compact manifold of dimension n. Let

φ:R×MM

and

f:MR

be generic smooth functions. Then, for any τ>0 the map

MR2n+1

defined by

x(f(x),f(x1),f(x2),,f(x2n))

where

xi=φ(iτ,x)

is an injective map with full rank.

Reference:

(Takens 1981)

Manifold

Intuitively, a manifold of dimension n is a space locally equivalent to Rn. Formally, a subset M of Rd is an n-dimensional manifold if for each xM there exists an open ball B(x)={yM; d(x,y)<ϵ} and a smooth function with smooth inverse

ϕx:B(x){vRn; ||v||<1}.

References:

(Milnor and Weaver 1997; Guillemin and Pollack 2010)

Compact subset

A subset K of a metric space (X,d) is said to be bounded if there exist a real number D such that for each pair of elements in K the distance between them is less than D. It is said to be complete if for any xX it is the case that xK if for any ϵ>0 the intersection between K and {y; d(x,y)<ϵ} is not empty. It is said to be compact if it is both bounded and complete.

Bibliography

Adams, Henry, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. 2017. “Persistence Images: A Stable Vector Representation of Persistent Homology.” The Journal of Machine Learning Research 18 (1): 218–52.

Bubenik, Peter. 2015. “Statistical Topological Data Analysis Using Persistence Landscapes.” The Journal of Machine Learning Research 16 (1): 77–102.

Chazal, Frédéric, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, and Larry Wasserman. 2014. “Stochastic Convergence of Persistence Landscapes and Silhouettes.” In Proceedings of the Thirtieth Annual Symposium on Computational Geometry, 474–83. SOCG’14. Kyoto, Japan: Association for Computing Machinery. https://doi.org/10.1145/2582112.2582128.

Guillemin, Victor, and Alan Pollack. 2010. Differential Topology. Vol. 370. American Mathematical Soc.

Kaczynski, Tomasz, Konstantin Mischaikow, and Marian Mrozek. 2004. Computational Homology. Vol. 157. Applied Mathematical Sciences. Springer-Verlag, New York. https://doi.org/10.1007/b97315.

Kerber, Michael, Dmitriy Morozov, and Arnur Nigmetov. 2017. “Geometry Helps to Compare Persistence Diagrams.” Journal of Experimental Algorithmics (JEA) 22: 1–4.

Milnor, John Willard, and David W Weaver. 1997. Topology from the Differentiable Viewpoint. Princeton university press.

Reininghaus, Jan, Stefan Huber, Ulrich Bauer, and Roland Kwitt. 2015. “A Stable Multi-Scale Kernel for Topological Machine Learning.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4741–48.

Rucco, Matteo, Filippo Castiglione, Emanuela Merelli, and Marco Pettini. 2016. “Characterisation of the Idiotypic Immune Network Through Persistent Entropy.” In Proceedings of ECCS 2014, 117–28. Springer.

Takens, Floris. 1981. “Detecting Strange Attractors in Turbulence.” In Dynamical Systems and Turbulence, Warwick 1980, 366–81. Springer.