{"title": "Statistical Topological Data Analysis - A Kernel Perspective", "book": "Advances in Neural Information Processing Systems", "page_first": 3070, "page_last": 3078, "abstract": "We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel which would enable a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.", "full_text": "Statistical Topological Data Analysis \u2013\n\nA Kernel Perspective\n\nRoland Kwitt\n\nDepartment of Computer Science\n\nUniversity of Salzburg\n\nrkwitt@gmx.at\n\nStefan Huber\nIST Austria\n\nstefan.huber@ist.ac.at\n\nDepartment of Computer Science and BRIC\n\nDepartment of Radiology and BRIC\n\nMarc Niethammer\n\nUNC Chapel Hill\nmn@cs.unc.edu\n\nWeili Lin\n\nUNC Chapel Hill\n\nweili_lin@med.unc.edu\n\nUlrich Bauer\n\nDepartment of Mathematics\n\nTechnische Universit\u00e4t M\u00fcnchen (TUM)\n\nulrich@bauer.org\n\nAbstract\n\nWe consider the problem of statistical computations with persistence diagrams, a\nsummary representation of topological features in data. These diagrams encode\npersistent homology, a widely used invariant in topological data analysis. While\nseveral avenues towards a statistical treatment of the diagrams have been explored\nrecently, we follow an alternative route that is motivated by the success of methods\nbased on the embedding of probability measures into reproducing kernel Hilbert\nspaces.\nIn fact, a positive de\ufb01nite kernel on persistence diagrams has recently\nbeen proposed, connecting persistent homology to popular kernel-based learning\ntechniques such as support vector machines. However, important properties of that\nkernel enabling a principled use in the context of probability measure embeddings\nremain to be explored. Our contribution is to close this gap by proving universality\nof a variant of the original kernel, and to demonstrate its e\ufb00ective use in two-\nsample hypothesis testing on synthetic as well as real-world data.\n\n1\n\nIntroduction\n\nOver the past years, advances in adopting methods from algebraic topology to study the \u201cshape\u201d of\ndata (e.g., point clouds, images, shapes) have given birth to the \ufb01eld of topological data analysis\n(TDA) [5]. In particular, persistent homology has been widely established as a tool for capturing\n\u201crelevant\u201d topological features at multiple scales. The output is a summary representation in the\nform of so called barcodes or persistence diagrams, which, roughly speaking, encode the life span\nof the features. These \u201ctopological summaries\u201d have been successfully used in a variety of di\ufb00erent\n\ufb01elds, including, but not limited to, computer vision and medical imaging. Applications range from\nthe analysis of cortical surface thickness [8] to the structure of brain networks [15], brain artery trees\n[2] or histology images for breast cancer analysis [22].\nDespite the success of TDA in these areas, a statistical treatment of persistence diagrams (e.g.,\ncomputing means or variances) turns out to be di\ufb03cult, not least because of the unusual structure\nof the barcodes as intervals, rather than numerical quantities [1]. While substantial advancements in\n\n1\n\n\fthe direction of statistical TDA have been made by studying the structure of the space of persistence\ndiagrams endowed with p-Wasserstein metrics (or variants thereof) [18, 19, 28, 11], it is technically\nand computationally challenging to work in this space. In a machine learning context, we would\nrather work with Hilbert spaces, primarily due to the highly regular structure and the abundance of\nreadily available and well-studied methods for statistics and learning.\nOne way to circumvent issues such as non-uniqueness of the Fr\u00e9chet mean [18] or computation-\nally intensive algorithmic strategies [28] is to consider mappings of persistence barcodes into linear\nfunction spaces. Statistical computations can then be performed based on probability theory on Ba-\nnach spaces [14]. However, the methods proposed in [4] cannot guarantee that di\ufb00erent probability\ndistributions can always be distinguished by a statistical test.\nContribution. In this work, we consider the task of statistical computations with persistence dia-\ngrams. Our contribution is to approach this problem by leveraging the theory of embedding prob-\nability measures into reproducing kernel Hilbert spaces [23], in our case, probability measures on\nthe space of persistence diagrams. In particular, we start with a recently introduced kernel on per-\nsistence diagrams by Reininghaus et al. [20] and identify missing properties that are essential for a\nwell-founded use in the aforementioned framework. By enforcing mild restrictions on the underly-\ning space, we can in fact close the remaining gaps and prove that a minor modi\ufb01cation of the kernel\nis universal in the sense of Steinwart [25] (see Section 3). Our experiments demonstrate, on a couple\nof synthetic and real-world data samples, how this universal kernel enables a principled solution to\nthe selected problem of (kernel-based) two-sample hypothesis testing.\nRelated work. In the following, we focus our attention on work related to a statistical treatment of\npersistent homology. Since this is a rather new \ufb01eld, several avenues are pursued in parallel. Mileyko\net al. [18] study properties of the set of persistence diagrams when endowed with the p-Wasserstein\nmetric. They show, for instance, that under this metric, the space is Polish and the Fr\u00e9chet mean\nexists. However, it is not unique and no algorithmic solution is provided. Turner et al. [28] later show\nthat the L2-Wasserstein metric on the set of persistence diagrams yields a geodesic space, and that\nthe additional structure can be leveraged to construct an algorithm for computing the Fr\u00e9chet mean\nand to prove a law of large numbers. In [19], Munch et al. take a di\ufb00erent approach and introduce\na probabilistic variant of the Fr\u00e9chet mean as a probability measure on persistence diagrams. While\nthis yields a unique mean, the solution itself is not a persistence diagram anymore. Techniques for\ncomputing con\ufb01dence sets for persistence diagrams are investigated by Fasy et al. [11]. The authors\nfocus on the Bottleneck metric (i.e., a special case of the p-Wasserstein metric when p = \u221e),\nremarking that similar results could potentially be obtained for the case of the p-Wasserstein metric\nunder stronger assumptions on the underlying topological space.\nWhile the aforementioned results concern properties of the set of persistence diagrams equipped\nwith p-Wasserstein metrics, a di\ufb00erent strategy is advocated by Bubenik in [4]. The key idea is to\ncircumvent the peculiarities of the metric by mapping persistence diagrams into function spaces.\nOne such representation is the persistence landscape, i.e., a sequence of 1-Lipschitz functions in\na Banach space. While it is in general not possible to go back and forth between landscapes and\npersistence diagrams, the Banach space structure enables a well-founded theoretical treatment of sta-\ntistical concepts, such as averages or con\ufb01dence intervals [14]. Chazal et al. [6] establish additional\nconvergence results and propose a bootstrap procedure for obtaining con\ufb01dence sets.\nAnother, less statistically oriented, approach towards a convenient summary of persistence barcodes\nis followed by Adcock et al. [1]. The idea is to attach numerical quantities to persistence barcodes,\nwhich can then be used as input to any machine learning algorithm in the form of feature vectors.\nThis strategy is rooted in a study of algebraic functions on barcodes. However, it does not necessarily\nguarantee stability of the persistence summary representation, which is typically a desired property\nof a feature map [20].\nOur proposed approach to statistical TDA is also closely related to work in the \ufb01eld of kernel-based\nlearning techniques [21] or, to be more speci\ufb01c, to the embedding of probability measures into a\nRKHS [23] and the study of suitable kernel functions in that context [7, 24]. In fact, the idea of\nmapping probability measures into a RKHS has led to many developments generalizing statistical\nconcepts, such as two-sample testing [13], testing for conditional independence, or statistical in-\nference [12], form Euclidean spaces to other domains equipped with a kernel. In the context of\nsupervised learning with TDA, Reininghaus et al. [20] recently established a \ufb01rst connection to\n\n2\n\n\fkernel-based learning techniques via the de\ufb01nition of a positive de\ufb01nite kernel on persistence dia-\ngrams. While positive de\ufb01niteness is su\ufb03cient for many techniques, such as support vector machines\nor kernel PCA, additional properties are required in the context of embedding probability measures.\nOrganization. Section 2 brie\ufb02y reviews some background material and introduces some notation.\nIn Section 3, we show how a slight modi\ufb01cation of the kernel in [20] \ufb01ts into the framework of\nembedding probability measures into a RKHS. Section 4 presents a set of experiments on synthetic\nand real data, highlighting the advantages of the kernel. Finally, Section 5 summarizes the main\ncontributions and discusses future directions.\n\n2 Background\nSince our discussion of statistical TDA from a kernel perspective is largely decoupled from how\nthe topological summaries are obtained, we only review two important notions for the theory of\npersistent homology: \ufb01ltrations and persistence diagrams. For a thorough treatment of the topic, we\nrefer the reader to [10]. We also brie\ufb02y review the concept of embedding probability measures into\na RKHS, following [23].\nFiltrations. A standard approach to TDA assigns to some metric space (M, dM) a growing sequence\nof simplicial complexes (indexed by a parameter t \u2208 R), typically referred to as a \ufb01ltration. Re-\ncall that an abstract simplicial complex is a collection of nonempty sets that is closed under taking\nnonempty subsets. Persistent homology then studies the evolution of the homology of these com-\nplexes for a growing parameter t. Some widely used constructions, particularly for point cloud data,\nare the Vietoris\u2013Rips and the \u02c7Cech complex. The Vietoris\u2013Rips complex is a simplicial complex\nwith vertex set M such that [x0, . . . , x m] is an m-simplex i\ufb00 maxi, j\u2264m dM(xi , x j ) \u2264 t. For a point\nset M \u2282 Rd in Euclidean space, the \u02c7Cech complex is a simplicial complex with vertex set M \u2282 Rd\nsuch that [x0, . . . , x m] is an m-simplex i\ufb00 the closed balls of radius t centered at the xi have a\nnon-empty common intersection.\nA more general way of obtaining a \ufb01ltration is to consider the sublevel sets f \u22121(\u2212\u221e,t], for t \u2208 R,\n: X \u2192 R on a topological space X. For instance, in the case of surfaces meshes,\nof a function f\na commonly used function is the heat kernel signature (HKS) [27]. The \u02c7Cech and Vietoris\u2013Rips\n\ufb01ltrations appear as special cases, both being sublevel set \ufb01ltrations of an appropriate function on\nthe subsets (abstract simplices) of the vertex set M: for the \u02c7Cech \ufb01ltration, the function assigns to\neach subset the radius of its smallest enclosing sphere, while for the Vietoris\u2013Rips \ufb01ltration, the\nfunction assigns to each subset its diameter (equivalently, the length of its longest edge).\nPersistence diagrams. Studying the evolution of the topology of a \ufb01ltration allows us to cap-\nture interesting properties of the metric or function used to generate the \ufb01ltration. Persistence di-\nagrams provide a concise description of the changes in homology that occur during this process.\nExisting connected components may merge,\ncycles may appear, etc. This leads to the\nappearance and disappearance of homological\nfeatures of di\ufb00erent dimension. Persistent ho-\nmology tracks the birth b and death d of such\ntopological features. The multiset of points p,\nwhere each point p = (b, d) corresponds to a\nbirth/death time pair, is called the persistence\nFig. 1: A function and its 0-th persistence diagram.\ndiagram of the \ufb01ltration. An example of a per-\nsistence diagram for 0-dimensional features (i.e., connected components) of a function f : X \u2192 R\nwith X = R is shown in Fig. 2. We use the identi\ufb01ers F,G to denote persistence diagrams in the\nremainder of the paper. Since d > b, all points lie in the half-plane above the diagonal.\nRKHS embedding of probability measures. An important concept for our work is the embedding\nof probability measures into reproducing kernel Hilbert spaces [23]. Consider a Borel probability\nmeasure P de\ufb01ned on a compact metric space (X, d), which we observe through the i.i.d. sample\ni=1 with xi \u223c P. Furthermore, let k : X \u00d7 X \u2192 R be a positive de\ufb01nite kernel, i.e., a\nX = {xi}m\nfunction which realizes an inner product k(x, y) = (cid:104)\u03c6(x), \u03c6(y)(cid:105)G with x, y \u2208 X in some Hilbert\nspace G for some (possibly unknown) map \u03c6 : X \u2192 G (see [26, De\ufb01nition 4.1.]). Also, let H be\nthe associated RKHS, generated by functions k x = k(x,\u00b7) : X \u2192 R induced by the kernel, i.e.,\nH = span{k x : x \u2208 X} = span{(cid:104)\u03c6(x), \u03c6(\u00b7)(cid:105)G : x \u2208 X}, with the scalar product (cid:104)k x , ky(cid:105)H = k(x, y).\n\n3\n\nf:R\u2192Rbirthdeath\fi=1 and Y = {yi}n\n\nMMD[F , P,Q] = sup\nf \u2208F\n\n(cid:80)\n\nThe linear structure on the RKHS H admits the construction of means. The embedding of a proba-\nbility measure P on X is now accomplished via the mean map \u00b5 : P (cid:55)\u2192 \u00b5P = Ex\u223cP[k x]. If this map\nis injective, the kernel k is called characteristic. This is true, in particular, if H is dense in the space\nof continuous functions X \u2192 R (with the supremum norm), in which case we refer to the kernel as\nuniversal [25]. While a universal kernel is always characteristic, the converse is not true.\nSince it has been shown [13] that the empirical estimate of the mean, \u00b5X = 1/m\ni k xi , is a good\nproxy for \u00b5P, the injectivity of \u00b5 can be used to de\ufb01ne distances between distributions P and Q,\nobserved via samples X = {xi}m\ni=1. Speci\ufb01cally, this can be done via the maximum\nmean discrepancy\n(Ex\u223cP[ f (x)] \u2212 Ey\u223cQ[ f (y)]),\n\n(1)\nwhere F denotes a suitable class of functions X \u2192 R, and Ex\u223cP[ f (x)] denotes the expectation of\nf (x) w.r.t. P (which can be written as (cid:104)\u00b5P, f(cid:105) by virtue of the reproducing property of k). Gretton\net al. [13] restrict F to functions on a unit ball in H , i.e., F = { f \u2208 H : (cid:107) f (cid:107)\u221e \u2264 1}, and show\nthat Eq. (1) can be expressed as the RHKS distance between the means \u00b5P and \u00b5Q of the measures\nP and Q as MMD2[F , P,Q] = (cid:107) \u00b5P \u2212 \u00b5Q(cid:107)2H . Empirical estimates of this quantity are given in [13].\nThis connection is of particular importance to us, since it allows for two-sample hypothesis testing\nin a principled manner given a suitable (characteristic/universal) kernel. Prominent examples of\nuniversal kernels for X = Rd are the Gaussian RBF kernel k(x, y) = e\u2212\u03b3(cid:107) x\u2212y (cid:107)2 and the kernel e(cid:104)x, y(cid:105).\nHowever, without a characteristic/universal kernel, MMD[F , P,Q] = 0 does not imply P = Q. A\nwell-known example of a non-characteristic kernel is the scalar product kernel k(x, y) = (cid:104)x, y(cid:105) with\nx, y \u2208 Rd. Even if P (cid:44) Q, e.g., if the variances of the distributions di\ufb00er, the MMD will still be zero\nif the means are equal.\nIn the context of a statistical treatment of persistent homology, the ability to embed probability\nmeasures on the space of persistence diagrams into a RKHS is appealing. Speci\ufb01cally, the problem\nof testing whether two di\ufb00erent samples exhibit signi\ufb01cantly di\ufb00erent homological features \u2013 as\ncaptured in the persistence diagram \u2013 boils down to a two-sample test with null hypothesis H0 :\n\u00b5P = \u00b5Q vs. a general alternative HA : \u00b5P (cid:44) \u00b5Q, where P and Q are probability measures on the\nset of persistence diagrams. The computation of this test only involves evaluations of the kernel.\nEnabling this procedure via a suitable universal kernel will be discussed next.\n\n3 The universal persistence scale space kernel\nIn the following, for 1 \u2264 q \u2264 \u221e we let Dq = {F | dW,q (F,\u2205) < \u221e}, denote the metric space of\npersistence diagrams with the q-Wasserstein metric dW,q (cid:49), where \u2205 is the empty diagram. In [18,\nTheorem 1], Mileyko et al. show that (Dq, dW,q) is a complete metric space. When the subscript q\nis omitted, we do not refer to any speci\ufb01c instance of q-Wasserstein metric.\nLet us \ufb01x the numbers N \u2208 N and R \u2208 R. We denote by S the subset of D consisting of those\npersistence diagrams that are birth-death bounded by R (i.e., for every D \u2208 S the birth/death time\nof its points is less or equal to R; see [18, De\ufb01nition 5]) and whose total multiplicities (i.e., the sum\nof multiplicities of all points in a diagram) are bounded by N. While this might appear restrictive at\n\ufb01rst sight, it does not really pose a limitation in practice. In fact, for data generated by some \ufb01nite\nprocess (e.g., meshes have a \ufb01nite number of vertices/faces, images have limited resolution, etc.),\nestablishing N and R is typically not a problem. We remark that the aforementioned restriction is\nsimilar to enforcing boundedness of the support of persistence landscapes in [4, Section 3.6].\nIn [20], Reininghaus et al. introduce the persistence scale space (PSS) kernel as a stable, multi-scale\nkernel on the set D of persistence diagrams of \ufb01nite total multiplicity, i.e., each diagram contains\nonly \ufb01nitely many points. Let p = (b, d) denote a point in a diagram F \u2208 D, and let p = (d, b)\ndenote its mirror image across the diagonal. Further, let \u2126 = {x = (x1, x2) \u2208 R2, x2 \u2265 x1}. The\nfeature map \u03a6\u03c3 : D \u2192 L2(\u2126) is given as the solution of a heat di\ufb00usion problem with a Dirichlet\nboundary condition on the diagonal by\n(cid:49)The q-Wasserstein metric is de\ufb01ned as dW,q (F,G) = inf\u03b3 ((cid:80)\n\nx\u2208F (cid:107)x \u2212 \u03b3(x)(cid:107)q\u221e)1/q, where \u03b3 ranges over\nall bijections from F \u222a D to G \u222a D, with D denoting the multiset of diagonal points (t,t), each with countably\nin\ufb01nite multiplicity.\n\n4\n\n\f(cid:88)\ne\u2212 (cid:107) x\u2212p(cid:107)2\n4\u03c3 \u2212 e\u2212 (cid:107) x\u2212p(cid:107)2\n4\u03c3 .\n(cid:88)\nThe kernel k\u03c3 : D \u00d7 D \u2192 R is then given in closed form as\n\n\u03a6\u03c3 (F) : \u2126 \u2192 R,\n\nx (cid:55)\u2192 1\n4\u03c0\u03c3\n\np\u2208F\n\nk\u03c3 (F,G) = (cid:104)\u03a6\u03c3 (F), \u03a6\u03c3 (G)(cid:105)L2(\u2126) =\n\ne\u2212 (cid:107) p\u2212q(cid:107)2\n\n8\u03c3 \u2212 e\u2212 (cid:107) p\u2212q(cid:107)2\n8\u03c3 .\n\n1\n\n8\u03c0\u03c3\n\np\u2208F\nq\u2208G\n\n(2)\n\n(3)\n\nfor \u03c3 > 0 and F,G \u2208 D. By construction, positive de\ufb01niteness of k\u03c3 is guaranteed. The kernel is\nk(F, F) + k(G,G) \u2212 2k(F,G) is bounded up to a\nstable in the sense that the distance d\u03c3 (F,G) =\nconstant by dW,1(F,G) [20, Theorem 2].\nWe have the following property:\nProposition 1. Restricting the kernel in Eq. (3) to S \u00d7 S, the mean map \u00b5 sends a probability\nmeasure P on S to an element \u00b5P \u2208 H .\n\n\u221a\n\nProof. The claim immediately follows from [13, Lemma 3] and [24, Proposition 2], since k\u03c3 is\nmeasurable and bounded on S, and hence \u00b5P \u2208 H .\n(cid:3)\n\nt \u2208 [\u2212r,r].\n\u221e(cid:88)\n\nWhile positive de\ufb01niteness enables the use of k\u03c3 in many kernel-based learning techniques [21], we\nare interested in assessing whether it is universal, or if we can construct a universal kernel from k\u03c3\n(see Section 2). The following theorem of Christmann and Steinwart [7] is particularly relevant to\nthis question.\nTheorem 1. (cf. Theorem 2.2 of [7]) Let X be a compact metric space and G a separable Hilbert\nspace such that there exists a continuous and injective map \u03a6 : X \u2192 G. Furthermore, let K : R \u2192 R\nbe a function that is analytic on some neighborhood of 0, i.e., it can locally be expressed by its Taylor\nseries\n\n\u221e(cid:88)\nIf an > 0 for all n \u2208 N0, then k : X \u00d7 X \u2192 R,\n\nK (t) =\n\nn=0\n\nant n,\n\nk(x, y) = K ((cid:104)\u03a6(x), \u03a6(y)(cid:105)G) =\n\nan(cid:104)\u03a6(x), \u03a6(y)(cid:105)nG .\n\n(4)\n\nis a universal kernel.\n\nn=0\n\nKernels of the form Eq. (4) are typically referred to as Taylor kernels.\nNote that universality of a kernel on X refers to a speci\ufb01c choice of metric on X. By using the\nsame argument as for the linear dot-product kernel in Rd (see above), the PSS kernel k\u03c3 cannot be\nuniversal with respect to the metric dk\u03c3 , which is induced by the scalar product de\ufb01ning k\u03c3. On the\nother hand, it is unclear whether k\u03c3 is universal with respect to the metric dW,q. However, we do\nhave the following result:\nProposition 2. The kernel kU\n\n\u03c3 : S \u00d7 S \u2192 R,\n\n\u03c3 (F,G) = exp(k\u03c3 (F,G)),\nkU\n\n(5)\n\nis universal with respect to the metric dW,1.\nProof. We prove this proposition by means of Theorem 1. We set G = L2(\u2126), which is a separable\nHilbert space. As shown in Reininghaus et al. [20], the feature map \u03a6\u03c3 : D \u2192 L2(\u2126) is injective.\nFurthermore, it is continuous by construction, as the metric on D is induced by the norm on L2(\u2126),\nand so is \u03a6\u03c3 restricted to S. The function K : R \u2192 R is de\ufb01ned as x (cid:55)\u2192 exp(x), and hence is\nanalytic on R. Its Taylor coe\ufb03cients an are 1/n!, and thus are positive for any n.\nIt remains to show that (S, dW,1) is a compact metric space. First, de\ufb01ne R = \u2126N \u2229 ([\u2212R, R]2) N ,\nwhich is a bounded, closed, and therefore compact subspace of (R2) N . Now consider the function\n\n5\n\n\fFig. 2: Visualization of the mean PSS function (right) taken over 30 samples from a double-annulus (cf. [19]).\nf : R \u2192 S that maps (p1, . . . , pN ) \u2208 R to the persistence diagram {pi\n: 1 \u2264 i \u2264 N if pi (cid:60)\n\u2202\u2126} \u2208 S. We note that for all D = {p1, . . . , pn} \u2208 S, with n \u2264 N, there exists an X \u2208 R, e.g.,\nX = (p1, . . . , pn,0, . . . ,0), such that f (X) = D; this implies S = f (R). Next, we show that f is\n1-Lipschitz continuous w.r.t. the 1-Wasserstein distance on persistence diagrams, i.e.,\n\u2200X = (p1, . . . , pN ),Y = (q1, . . . , qN ) \u2208 R : dW,1( f (X), f (Y )) \u2264 d(X,Y ),\n\n1\u2264i\u2264 N (cid:107)pi \u2212 \u03b3(pi)(cid:107)\u221e with \u03b3 ranging over all bijections between\nwhere we de\ufb01ned d as inf\u03b3\n{p1, . . . , pN} and {q1, . . . , qN}. In other words, d corresponds to the 1-Wasserstein distance with-\nout allowing matches to the diagonal. Now, by de\ufb01nition, dW,1( f (X), f (Y )) \u2264 d(X,Y ), because all\nbijections considered by d are also admissible for dW,1. Since thus R is compact and f is continu-\nous, we have that S = f (R) is compact as well.\n(cid:3)\n\n(cid:80)\n\n\u03c3 , since\n\nWe refer to the kernel of Eq. (5) as the universal persistence scale-space (u-PSS) kernel.\nRemark. While we prove Prop. 1 for the PSS kernel in Eq. (3), it obviously also holds for kU\nexponentiation does neither invalidate measurability nor boundedness.\nRelation to persistence landscapes. As the feature map \u03a6\u03c3 of Eq. (2) de\ufb01nes a function-valued\nsummary of persistent homology in the Hilbert space L2(\u2126), the results on probability in Banach\nspaces [14], used in [4] for persistence landscapes, naturally apply to \u03a6\u03c3 as well. This includes,\nfor instance, the law of large numbers or the central limit theorem [4, Theorems 9,10]. Conversely,\nconsidering a persistence landscape \u03bb(D) as a function in L2(N \u00d7 R) or L2(R2) yields a positive\nde\ufb01nite kernel (cid:104)\u03bb(\u00b7), \u03bb(\u00b7)(cid:105)L2 on persistence diagrams. However, it is unclear whether a universal\nkernel can be constructed from persistence landscapes in a way similar to the de\ufb01nition of kU\n\u03c3 . In\nparticular, we are not aware of a proof that the construction of persistence landscapes, considered\nas functions in L2, is continuous with respect to dW,qfor some 1 \u2264 q \u2264 \u221e. For a more detailed\ntreatment of the di\ufb00erences between \u03a6\u03c3 and persistence landscapes, we refer the reader to [20].\n4 Experiments\nWe \ufb01rst describe a set of experiments on synthetic data appearing in previous work to illustrate the\nuse of the PSS feature map \u03a6\u03c3 and the universal persistence scale-space kernel on two di\ufb00erent\ntasks. We then present two applications on real-world data, where we assess di\ufb00erences in the\npersistent homology of functions on 3D surfaces of lateral ventricles and corpora callosa with respect\nto di\ufb00erent group assignments (i.e., age, demented/non-demented). In all experiments, \ufb01ltrations\nand the persistence diagrams are obtained using Dipha(cid:50), which can directly handle our types of\ninput data. Source code to reproduce the experiments is available at https://goo.gl/KouBPT.\n\n4.1 Synthetic data\nComputation of the mean PSS function. We repeat the experiment from [19, 4] of sampling from\nthe union of two overlapping annuli. In particular, we repeatedly (N = 30 times) draw samples of\nsize 100 (out of 10000), and then compute persistence diagrams F1, . . . , FN for 1-dim. features by\nconsidering sublevel sets of the distance function from the points. Finally, we compute the mean\nof the PSS functions \u03a6\u03c3 (Fi) de\ufb01ned by the feature map from Eq. (2). This simply amounts to\ncomputing 1/N \u00b7 \u03a6\u03c3 (F1 \u222a \u00b7 \u00b7 \u00b7 \u222a FN ). A visualization of the pointwise average, for a \ufb01xed choice of\n\u03c3, is shown in Fig. 2. We remind the reader that the convergence results used in [4] equally hold\nfor this feature map, as explained in Section 3. In particular, the above process of taking means\nconverges to the expected value of the PSS function. As can be seen in Fig. 2, the two 1-dim. holes\nmanifest themselves as two \u201cbumps\u201d at di\ufb00erent positions in the mean PSS function.\n\n(cid:50)available online: https://code.google.com/p/dipha/\n\n6\n\n\u03a6\u03c3(F1)\u03a6\u03c3(F2)\u03a6\u03c3(F3)\u03a6\u03c3(FN)correspondstothe2holesaverage\fFig. 3: Left: Illustration of one random sample (of size 200) on a sphere and a torus in R3 with equal surface\narea. To generate a noisy sample, we add Gaussian noise N (0,0.1) to each point in a sample (indicated by\nthe vectors). Right: Two-sample hypothesis testing results (H0 : P = Q vs. HA : P (cid:44) Q) for 0- and 1-dim.\nfeatures. The box plots show the variation in p-values (y-axis) over a selection of values for \u03c3 as a function of\nincreasing sample size (x-axis). Sample sizes for which the median p-value is less than the chosen signi\ufb01cance\nlevel (here: 0.05) are marked green, and red otherwise.\n\nTorus vs. sphere. In this slightly more involved example, we repeat an experiment from [4, Section\n4.3] on the problem of discriminating between a sphere and a torus in R3, based on random samples\ndrawn from both objects. In particular, we repeatedly (N times) draw samples from the torus and the\nsphere (corresponding to measures P and Q) and then compute persistence diagrams. Eventually, we\ntest the null-hypothesis H0 : P = Q, i.e., that samples were drawn from the same object; cf. [4] for a\nthorough description of the full setup. We remark that our setup uses the Delaunay triangulation of\nthe point samples instead of the Coxeter\u2013Freudenthal\u2013Kuhn triangulation of a regular grid as in [4].\nConceptually, the important di\ufb00erence is in the two-sample testing strategy.\nIn [4], two factors\nin\ufb02uence the test: (1) the choice of a functional to map the persistence landscape to a scalar and\n(2) the choice of test statistic. Bubenik chooses a z-test to test for equality between the mean\npersistence landscapes. In contrast, we can test for true equality in distribution. This is possible since\nuniversality of the kernel ensures that the MMD of Eq. (1) is a metric for the space of probability\nmeasures on persistence diagrams. All p-values are obtained by bootstrapping the test statistic under\nH0 over 104 random permutations. We further vary the number of samples/object used to compute\nthe MMD statistic from N = 10 to N = 100 and add Gaussian noise N (0,0.1) in one experiment.\nResults are shown in Fig. 3 over a selection of u-PSS scales \u03c3 \u2208 {100,10,1,0.1,0.01,0.001}. For\n0-dimension features and no noise, we can always reject H0 at \u03b1 = 0.05 signi\ufb01cance. For 1-dim.\nfeatures and no noise, we need at least 60 samples to reliably reject H0 at the same level of \u03b1.\n4.2 Real-world data\nWe use two real-world datasets in our experiments: (1) 3D surfaces of the corpus callosum and (2)\n3D surfaces of the lateral ventricles from neotates. The corpus callosum surfaces were obtained from\nthe longitudinal dataset of the OASIS brain database(cid:51). We use all subject data from the \ufb01rst visit,\nand the grouping criteria is disease state: dementia vs. non-dementia. Note that the demented group\nis comprised of individuals with very mild to mild AD. This discrimination is based on the clinical\ndementia rating (CDR) score; Marcus et al. [17] explain this dataset in detail. The lateral ventricle\ndataset is an extended version of [3]. It contains data from 43 neonates. All subjects were repeatedly\nimaged approximately every 3 months (starting from 2 weeks) in the \ufb01rst year and every 6 months\nin the second year. According to Bompard et al. [3], the ventricle growth is the dominant e\ufb00ect\nand occurs in a non-uniform manner most signi\ufb01cantly during the \ufb01rst 6 months. This raises the\nquestion whether age also has an impact on the shape of these brain structures that can be detected\nby persistent homology of the HKS (see Setup below, or Section 2) function. Hence, we set our\ngrouping criteria to be developmental age: \u2264 6 months vs. > 6 months. It is important to note that\nthe heat kernel signature is not scale-invariant. For that reason, we normalize the (mean-subtracted)\ncon\ufb01guration matrices (containing the vertex coordinates of each mesh) by their Euclidean norm,\ncf. [9]. This ensures that our analysis is not biased by growth (scaling) e\ufb00ects.\n\n(cid:51)available online: http://www.oasis-brains.org\n\n7\n\n1020304050607080901000.00.20.40.60.81.01-dim./with-noise1020304050607080901000.00.20.40.60.81.00-dim./with-noise1020304050607080901000.000.010.020.030.040.050.060-dim./no-noise1020304050607080901000.00.10.20.30.40.50.60.71-dim./no-noisesigni\ufb01cancelevelTorus:(r\u22122)2+z2=1Sphere:r2=2\u03c0\f(a) (Right) lateral ventricles; Grouping: subjects \u2264 6months vs. > 6months\n\n(b) Corpora callosa; Grouping: demented vs. non-demented subjects\n\nFig. 4: Left: E\ufb00ect of increasing HKS time ti, illustrated on one exemplary surface mesh of both datasets.\nRight: Contour plots of p-values estimated via random permutations, shown as a function of the u-PSS kernel\nscale \u03c3 and the HKS time.\n\ni=1,{Gi}n\n\nSetup. We follow an experimental setup, similar to [16] and [20], and compute the heat kernel\nsignature [27] for various times ti as a function de\ufb01ned on the 3D surface meshes. In all experiments,\nwe use the proposed kernel u-PSS kernel kU\n\u03c3 of Eq. (5) and vary the HKS time ti in 1 = t1 < t2 <\n\u00b7 \u00b7 \u00b7 < t20 = 10.5; Regarding the u-PSS kernel scale \u03c3i, we sweep from 10\u22129 = \u03c31 < \u00b7 \u00b7 \u00b7 < \u03c310 =\n101. Null- (H0) and alternative (HA) hypotheses are de\ufb01ned as in Section 2 with two samples of\ni=1. The test statistic under H0 is bootstrapped using B = 5 \u00b7 104\npersistence diagrams {Fi}m\nrandom permutations. This is also the setup recommended in [13] for low samples sizes.\nResults. Figure 4 shows the estimated p-values for both datasets as a function of the u-PSS kernel\nscale and the HKS time for 1-dim. features. The false discovery rate is controlled by the Benjamini-\nHochberg procedure. On the lateral ventricle data, we observe p-values < 0.01 (for the right ventri-\ncles), especially around HKS times t10 to t15, cf. Fig. 4(a). Since the results for left and right lateral\nventricles are similar, only the p-values plots for the right lateral ventricle are shown. In general,\nthe results indicate that, at speci\ufb01c settings of ti, the HKS function captures salient shape features\nof the surface, which lead to statistically signi\ufb01cant di\ufb00erences in the persistent homology. We do,\nhowever, point out that there is no clear guideline on how to choose the HKS time. In fact, setting\nti too low might emphasize noise, while setting ti too high tends to smooth-out details, as can be\nseen in the illustration of the HKS time on the left-hand side of Fig. 4. On the corpus callosum\ndata, cf. Fig. 4(b), no signi\ufb01cant di\ufb00erences in the persistent homology of the two groups (again for\n1-dim. features) can be identi\ufb01ed with p-values ranging from 0.1 to 0.9. This does not allow to\nreject H0 at any reasonable level.\n5 Discussion\nWith the introduction of a universal kernel for persistence diagrams in Section 3, we enable the use\nof this topological summary representation in the framework of embedding probability measures\ninto reproducing kernel Hilbert spaces. While our experiments are mainly limited to two-sample\nhypothesis testing, our kernel allows to use a wide variety of statistical techniques and learning\nmethods which are situated in that framework. It is important to note that our construction, via The-\norem 1, essentially depends on a restriction of the set D to a compact metric space. We remark that\nsimilar conditions are required in [4] in order to enable statistical computations, e.g., constraining\nthe support of the persistence landscapes. However, it will be interesting to investigate which prop-\nerties of the kernel remain valid when lifting these restrictions. From an application point of view,\nwe have shown that we can test for a statistical di\ufb00erence in the distribution of persistence diagrams.\nThis is in contrast to previous work, where hypothesis testing is typically limited to test for speci\ufb01c\nproperties of the distributions, such as equality in mean.\n\nAcknowledgements. This work has been partially supported by the Austrian Science Fund, project\nno. KLI 00012. We also thank the anonymous reviewers for their valuable comments/suggestions.\n\n8\n\nHKStime10p-valueHKStime\u03c3inkU\u03c3t15t10t5t1t20HKStime10p-valueHKStime\u03c3inkU\u03c3t15t10t5t1t20\fReferences\n[1] A. Adcock, E. Carlsson, and G. Carlsson. The ring of algebraic functions on persistence bar codes. arXiv,\n\navailable at http://arxiv.org/abs/1304.0530, 2013.\n\n[2] P. Bendich, J.S. Marron, E. Miller, A. Pieloch, and S. Skwerer. Persistent homology analysis of brain\n\nartery trees. arXiv, available at http://arxiv.org/abs/1411.6652, 2014.\n\n[3] L. Bompard, S. Xu, M. Styner, B. Paniagua, M. Ahn, Y. Yuan, V. Jewells, W. Gao, D. Shen, H. Zhu, and\nW. Lin. Multivariate longitudinal shape analysis of human lateral ventricles during the \ufb01rst twenty-four\nmonths of life. PLoS One, 2014.\n\n[4] P. Bubenik. Statistical topological data analysis using persistence landscapes. JMLR, 16:77\u2013102, 2015.\n[5] G. Carlsson. Topology and data. Bull. Amer. Math. Soc., 46:255\u2013308, 2009.\n[6] F. Chazal, B.T. Fasy, F. Lecci A. Rinaldo, and L. Wasserman. Stochastic convergence of persistence\n\nlandscapes and silhouettes. In SoCG, 2014.\n\n[7] A. Christmann and I. Steinwart. Universal kernels on non-standard input spaces. In NIPS, 2010.\n[8] M.K. Chung, P. Bubenik, and P.T. Kim. Persistence diagrams of cortical surface data. In IPMI, 2009.\n[9] I.L. Dryden and K.V. Mardia. Statistical shape analysis. Wiley series in probability and statistics. Wiley,\n\n1998.\n\n[10] H. Edelsbrunner and J. Harer. Computational Topology. An Introduction. AMS, 2010.\n[11] B. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, and A. Singh. Con\ufb01dence sets for persis-\n\ntence diagrams. Ann. Statist., 42(6):2301\u20132339, 2014.\n\n[12] K. Fukumizu, L. Song, and A. Gretton. Kernel Bayes\u2019 rule: Bayesian inference with positive de\ufb01nite\n\nkernels. JMLR, 14:3753\u20133783, 2013.\n\n[13] A. Gretton, K.M. Borgwardt, M.J. Rasch, B. Sch\u00f6lkopf, and A. Smola. A kernel two-sample test. JMLR,\n\n13:723\u2013773, 2012.\n\n[14] M. Ledoux and M. Talagrand. Probability in Banach spaces. Classics in Mathematics. Springer, 1991.\n[15] H. Lee, M.K. Chung, H. Kang, and D.S. Lee. Hole detection in metabolic connectivity of Alzheimer\u2019s\n\ndisease using k-Laplacian. In MICCAI, 2014.\n\n[16] C. Li, M. Ovsjanikov, and F. Chazal. Persistence-based structural recognition. In CVPR, 2014.\n[17] D.S. Marcus, A.F. Fotenos, J.G. Csernansky, J.C. Morris, and R.L. Buckner. Open access series of imag-\ning studies: longitudinal MRI data in nondemented and demented older adults. J. Cognitive Neurosci.,\n22(12):2677\u20132684, 2010.\n\n[18] Y. Mileyko, S. Mukherjee, and J. Harer. Probability measures on the space of persistence diagrams.\n\nInverse Probl., 27(12), 2011.\n\n[19] E. Munch, P. Bendich, S. Mukherjee, J. Mattingly, and J. Harer. Probabilistic Fr\u00e9chet means and statistics\n\non vineyards. CoRR, 2013. http://arxiv.org/abs/1307.6530.\n\n[20] R. Reininghaus, U. Bauer, S. Huber, and R. Kwitt. A stable multi-scale kernel for topological machine\n\nlearning. In CVPR, 2015.\n\n[21] B. Sch\u00f6lkopf and A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Opti-\n\nmization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.\n\n[22] N. Singh, H. D. Couture, J. S. Marron, C. Perou, and M. Niethammer. Topological descriptors of histology\n\nimages. In MLMI, 2014.\n\n[23] A. Smola, A. Gretton, L. Song, and B. Sch\u00f6lkopf. Hilbert space embedding for distributions. In ALT,\n\n2007.\n\n[24] B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Sch\u00f6lkopf, and G. Lanckriet. Hilbert space embeddings\n\nand metrics on probability measures. JMLR, 11:1517\u20131561, 2010.\n\n[25] I. Steinwart. On the in\ufb02uence of the kernel on the consistency of support vector machines. JMLR, 2:67\u2013\n\n93, 2001.\n\n[26] I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.\n[27] J. Sun, M. Ovsjanikov, and L. Guibas. A concise and probably informative multi-scale signature based\n\non heat di\ufb00usion. In SGP, 2009.\n\n[28] K. Turner, Y. Mileyko, S. Mukherjee, and J. Harer. Fr\u00e9chet means for distributions of persistence dia-\n\ngrams. Discrete Comput. Geom., 52(1):44\u201370, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1726, "authors": [{"given_name": "Roland", "family_name": "Kwitt", "institution": "University of Salzburg"}, {"given_name": "Stefan", "family_name": "Huber", "institution": "IST Austria"}, {"given_name": "Marc", "family_name": "Niethammer", "institution": "UNC Chapel Hill"}, {"given_name": "Weili", "family_name": "Lin", "institution": "UNC Chapel Hill"}, {"given_name": "Ulrich", "family_name": "Bauer", "institution": "TU Munich"}]}