{"title": "Deep Learning with Topological Signatures", "book": "Advances in Neural Information Processing Systems", "page_first": 1634, "page_last": 1644, "abstract": "Inferring topological and geometrical information from data can offer an alternative perspective in machine learning problems. Methods from topological data analysis, e.g., persistent homology, enable us to obtain such information, typically in the form of summary representations of topological features. However, such topological signatures often  come with an unusual structure (e.g., multisets of intervals) that is highly impractical for most machine learning techniques. While many strategies have been proposed to map these topological signatures into machine learning compatible representations, they suffer from being agnostic to the target learning task. In contrast, we propose a technique that enables us to input topological signatures to deep neural networks and learn a task-optimal representation during training. Our approach is realized as a novel input layer with favorable theoretical properties. Classification experiments on 2D object shapes and social network graphs demonstrate the versatility of the approach and, in case of the latter, we even outperform the state-of-the-art by a large margin.", "full_text": "Deep Learning with Topological Signatures\n\nChristoph Hofer\n\nDepartment of Computer Science\nUniversity of Salzburg, Austria\n\nchofer@cosy.sbg.ac.at\n\nRoland Kwitt\n\nDepartment of Computer Science\nUniversity of Salzburg, Austria\nRoland.Kwitt@sbg.ac.at\n\nMarc Niethammer\n\nUNC Chapel Hill, NC, USA\n\nmn@cs.unc.edu\n\nAndreas Uhl\n\nDepartment of Computer Science\nUniversity of Salzburg, Austria\n\nuhl@cosy.sbg.ac.at\n\nAbstract\n\nInferring topological and geometrical information from data can offer an alternative\nperspective on machine learning problems. Methods from topological data analysis,\ne.g., persistent homology, enable us to obtain such information, typically in the form\nof summary representations of topological features. However, such topological\nsignatures often come with an unusual structure (e.g., multisets of intervals) that is\nhighly impractical for most machine learning techniques. While many strategies\nhave been proposed to map these topological signatures into machine learning\ncompatible representations, they suffer from being agnostic to the target learning\ntask. In contrast, we propose a technique that enables us to input topological\nsignatures to deep neural networks and learn a task-optimal representation during\ntraining. Our approach is realized as a novel input layer with favorable theoretical\nproperties. Classi\ufb01cation experiments on 2D object shapes and social network\ngraphs demonstrate the versatility of the approach and, in case of the latter, we\neven outperform the state-of-the-art by a large margin.\n\n1\n\nIntroduction\n\nMethods from algebraic topology have only recently emerged in the machine learning community,\nmost prominently under the term topological data analysis (TDA) [7]. Since TDA enables us to\ninfer relevant topological and geometrical information from data, it can offer a novel and potentially\nbene\ufb01cial perspective on various machine learning problems. Two compelling bene\ufb01ts of TDA\nare (1) its versatility, i.e., we are not restricted to any particular kind of data (such as images,\nsensor measurements, time-series, graphs, etc.) and (2) its robustness to noise. Several works have\ndemonstrated that TDA can be bene\ufb01cial in a diverse set of problems, such as studying the manifold\nof natural image patches [8], analyzing activity patterns of the visual cortex [28], classi\ufb01cation of 3D\nsurface meshes [27, 22], clustering [11], or recognition of 2D object shapes [29].\nCurrently, the most widely-used tool from TDA is persistent homology [15, 14]. Essentially1,\npersistent homology allows us to track topological changes as we analyze data at multiple \u201cscales\u201d.\nAs the scale changes, topological features (such as connected components, holes, etc.) appear and\ndisappear. Persistent homology associates a lifespan to these features in the form of a birth and\na death time. The collection of (birth, death) tuples forms a multiset that can be visualized as a\npersistence diagram or a barcode, also referred to as a topological signature of the data. However,\nleveraging these signatures for learning purposes poses considerable challenges, mostly due to their\n\n1We will make these concepts more concrete in Sec. 2.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Illustration of the proposed network input layer for topological signatures. Each signature, in the\nform of a persistence diagram D \u2208 D (left), is projected w.r.t. a collection of structure elements. The layer\u2019s\nlearnable parameters \u03b8 are the locations \u00b5i and the scales \u03c3i of these elements; \u03bd \u2208 R+ is set a-priori and\nmeant to discount the impact of points with low persistence (and, in many cases, of low discriminative power).\nThe layer output y is a concatenation of the projections. In this illustration, N = 2 and hence y = (y1, y2)(cid:62).\n\nunusual structure as a multiset. While there exist suitable metrics to compare signatures (e.g., the\nWasserstein metric), they are highly impractical for learning, as they require solving optimal matching\nproblems.\nRelated work. In order to deal with these issues, several strategies have been proposed. In [2] for\ninstance, Adcock et al. use invariant theory to \u201ccoordinatize\u201d the space of barcodes. This allows to\nmap barcodes to vectors of \ufb01xed size which can then be fed to standard machine learning techniques,\nsuch as support vector machines (SVMs). Alternatively, Adams et al. [1] map barcodes to so-called\npersistence images which, upon discretization, can also be interpreted as vectors and used with\nstandard learning techniques. Along another line of research, Bubenik [6] proposes a mapping\nof barcodes into a Banach space. This has been shown to be particularly viable in a statistical\ncontext (see, e.g., [10]). The mapping outputs a representation referred to as a persistence landscape.\nInterestingly, under a speci\ufb01c choice of parameters, barcodes are mapped into L2(R2) and the\ninner-product in that space can be used to construct a valid kernel function. Similar, kernel-based\ntechniques, have also recently been studied by Reininghaus et al. [27], Kwitt et al. [20] and Kusano\net al. [19].\nWhile all previously mentioned approaches retain certain stability properties of the original repre-\nsentation with respect to common metrics in TDA (such as the Wasserstein or Bottleneck distances),\nthey also share one common drawback: the mapping of topological signatures to a representation that\nis compatible with existing learning techniques is pre-de\ufb01ned. Consequently, it is \ufb01xed and therefore\nagnostic to any speci\ufb01c learning task. This is clearly suboptimal, as the eminent success of deep\nneural networks (e.g., [18, 17]) has shown that learning representations is a preferable approach.\nFurthermore, techniques based on kernels [27, 20, 19] for instance, additionally suffer scalability\nissues, as training typically scales poorly with the number of samples (e.g., roughly cubic in case of\nkernel-SVMs). In the spirit of end-to-end training, we therefore aim for an approach that allows to\nlearn a task-optimal representation of topological signatures. We additionally remark that, e.g., Qi et\nal. [25] or Ravanbakhsh et al. [26] have proposed architectures that can handle sets, but only with\n\ufb01xed size. In our context, this is impractical as the capability of handling sets with varying cardinality\nis a requirement to handle persistent homology in a machine learning setting.\n\nContribution. To realize this idea, we advocate a novel input layer for deep neural networks that\ntakes a topological signature (in our case, a persistence diagram), and computes a parametrized\nprojection that can be learned during network training. Speci\ufb01cally, this layer is designed such that\nits output is stable with respect to the 1-Wasserstein distance (similar to [27] or [1]). To demonstrate\nthe versatility of this approach, we present experiments on 2D object shape classi\ufb01cation and the\nclassi\ufb01cation of social network graphs. On the latter, we improve the state-of-the-art by a large\nmargin, clearly demonstrating the power of combining TDA with deep learning in this context.\n2 Background\n\nFor space reasons, we only provide a brief overview of the concepts that are relevant to this work and\nrefer the reader to [16] or [14] for further details.\nHomology. The key concept of homology theory is to study the properties of some object X by\nmeans of (commutative) algebra. In particular, we assign to X a sequence of modules C0, C1, . . .\n\n2\n\n\u03bd\u2206DeathBirth(\u00b51,\u03c31)(x0,x1)s\u00b5,\u03c3,\u03bd((x0,x1))x=\u03c1(p)p=(b,d)InputLayerParam.:\u03b8=(\u00b5i,\u03c3i)N\u22121i=0(1)RotatepointsinDby\u03c0/4(2)Transform&Project(y1,y2)(cid:62)\u2208R0+\u00d7R0Output:Death-BirthBirth+Death(persistence)(\u00b52,\u03c32)\u03bdInput:D\u2208D\fwhich are connected by homomorphisms \u2202n : Cn \u2192 Cn\u22121 such that im \u2202n+1 \u2286 ker \u2202n. A structure\nof this form is called a chain complex and by studying its homology groups Hn = ker \u2202n/ im \u2202n+1\nwe can derive properties of X.\nA prominent example of a homology theory is simplicial homology. Throughout this work, it is\nthe used homology theory and hence we will now concretize the already presented ideas. Let K\nbe a simplicial complex and Kn its n-skeleton. Then we set Cn(K) as the vector space generated\n(cid:80)n\ni=0[x0, . . . , xi\u22121, xi+1, . . . , xn] and linearly extend this to Cn(K), i.e., \u2202n((cid:80) \u03c3i) =(cid:80) \u2202n(\u03c3i).\n(freely) by Kn over Z/2Z2. The connecting homomorphisms \u2202n : Cn(K) \u2192 Cn\u22121(K) are\ncalled boundary operators. For a simplex \u03c3 = [x0, . . . , xn] \u2208 Kn, we de\ufb01ne them as \u2202n(\u03c3) =\nPersistent homology. Let K be a simplicial complex and (K i)m\ni=0 a sequence of simplicial com-\ni=0 is called a \ufb01ltration of K. If we\nplexes such that \u2205 = K 0 \u2286 K 1 \u2286 \u00b7\u00b7\u00b7 \u2286 K m = K. Then, (K i)m\nuse the extra information provided by the \ufb01ltration of K, we obtain the following sequence of chain\ncomplexes (left),\n\nwhere C i\nhomology groups, de\ufb01ned by\n\nn = Cn(K i\n\nn) and \u03b9 denotes the inclusion. This then leads to the concept of persistent\n\nH i,j\n\nn = ker \u2202i\nn , of these homology groups (i.e., the n-th persistent Betti numbers),\nThe ranks, \u03b2i,j\ncapture the number of homological features of dimensionality n (e.g., connected components for\nn = 0, holes for n = 1, etc.) that persist from i to (at least) j. In fact, according to [14, Fundamental\nLemma of Persistent Homology], the quantities\n\nn+1 \u2229 ker \u2202i\nn)\n\nn = rank H i,j\n\ni \u2264 j .\n\nn/(im \u2202j\n\nfor\n\nn = (\u03b2i,j\u22121\n\u00b5i,j\n\nn\n\nn ) \u2212 (\u03b2i\u22121,j\u22121\n\n\u2212 \u03b2i,j\n\nn\n\n\u2212 \u03b2i\u22121,j\n\nn\n\n)\n\nfor\n\ni < j\n\n(1)\n\nencode all the information about the persistent Betti numbers of dimension n.\nTopological signatures. A typical way to obtain a \ufb01ltration of K is to consider sublevel sets of a\nfunction f : C0(K) \u2192 R. This function can be easily lifted to higher-dimensional chain groups of\nK by\n\nf ([v0, . . . , vn]) = max{f ([vi]) : 0 \u2264 i \u2264 n} .\n\ni=0 by setting K0 = \u2205 and Ki = f\u22121((\u2212\u221e, ai]) for\nGiven m = |f (C0(K))|, we obtain (Ki)m\n1 \u2264 i \u2264 m, where a1 < \u00b7\u00b7\u00b7 < am is the sorted sequence of values of f (C0(K)). If we construct\na multiset such that, for i < j, the point (ai, aj) is inserted with multiplicity \u00b5i,j\nn , we effectively\nencode the persistent homology of dimension n w.r.t. the sublevel set \ufb01ltration induced by f. Upon\nadding diagonal points with in\ufb01nite multiplicity, we obtain the following structure:\nDe\ufb01nition 1 (Persistence diagram). Let \u2206 = {x \u2208 R2\ndiagonal R2\n(cid:63) = {(x0, x1) \u2208 R2 : x1 > x0}. A persistence diagram, D, is a multiset of the form\nR2\n\n\u2206 : mult(x) = \u221e} be the multiset of the\n\u2206 = {(x0, x1) \u2208 R2 : x0 = x1}, where mult denotes the multiplicity function and let\n\nD = {x : x \u2208 R2\n\n(cid:63)} \u222a \u2206 .\n\nWe denote by D the set of all persistence diagrams of the form |D \\ \u2206| < \u221e .\nFor a given complex K of dimension nmax and a function f (of the discussed form), we can interpret\npersistent homology as a mapping (K, f ) (cid:55)\u2192 (D0, . . . ,Dnmax\u22121), where Di is the diagram of\ndimension i and nmax the dimension of K. We can additionally add a metric structure to the space of\npersistence diagrams by introducing the notion of distances.\n\n2Simplicial homology is not speci\ufb01c to Z/2Z, but it\u2019s a typical choice, since it allows us to interpret n-chains\n\nas sets of n-simplices.\n\n3\n\n\u00b7\u00b7\u00b7C12C11C100\u00b7\u00b7\u00b7C22C21C200\u00b7\u00b7\u00b7Cm2Cm1Cm00\u22023\u03b9\u22022\u03b9\u22021\u03b9\u22020\u22023\u03b9\u22022\u03b9\u22021\u03b9\u22020\u22023\u22022\u22021\u22020ExampleK1K2K3\u2286\u2286v2v4v3v1C20=[[v1],[v2],[v3]]Z2C21=[[v1,v3],[v2,v3]]Z2C22=0C10=[[v1],[v2]]Z2C11=0C12=0C20=[[v1],[v2],[v3],[v4]]Z2C21=[[v1,v3],[v2,v3],[v3,v4]]Z2C32=0\fDe\ufb01nition 2 (Bottleneck, Wasserstein distance). For two persistence diagrams D and E, we de\ufb01ne\ntheir Bottleneck (w\u221e) and Wasserstein (wq\n\np) distances by\n\n(cid:32)(cid:88)\nx\u2208D ||x \u2212 \u03b7(x)||p\n\nq\n\n(cid:33) 1\n\np\n\n,\n\n(2)\n\nw\u221e(D,E) = inf\n\n\u03b7\n\nx\u2208D ||x \u2212 \u03b7(x)||\u221e and wq\nsup\n\np(D,E) = inf\n\n\u03b7\n\nwhere p, q \u2208 N and the in\ufb01mum is taken over all bijections \u03b7 : D \u2192 E.\nEssentially, this facilitates studying stability/continuity properties of topological signatures w.r.t.\nmetrics in the \ufb01ltration or complex space; we refer the reader to [12],[13], [9] for a selection of\nimportant stability results.\nRemark. By setting \u00b5i,\u221e\nreferred to as essential. This change can be lifted to D by setting R2\nx1 > x0}. In Sec. 5, we will see that essential features can offer discriminative information.\n3 A network layer for topological signatures\n\n, we extend Eq. (1) to features which never disappear, also\n(cid:63) = {(x0, x1) \u2208 R \u00d7 (R \u222a {\u221e}) :\n\nn \u2212\u03b2i\u22121,m\n\nn = \u03b2i,m\n\nn\n\nIn this section, we introduce the proposed (parametrized) network layer for topological signatures\n(in the form of persistence diagrams). The key idea is to take any D and de\ufb01ne a projection w.r.t. a\ncollection (of \ufb01xed size N) of structure elements.\nIn the following, we set R+ := {x \u2208 R : x > 0} and R+\n0 := {x \u2208 R : x \u2265 0}, resp., and start by\nrotating points of D such that points on R2\n\u2206 lie on the x-axis, see Fig. 1. The y-axis can then be\ninterpreted as the persistence of features. Formally, we let b0 and b1 be the unit vectors in directions\n(1, 1)(cid:62) and (\u22121, 1)(cid:62) and de\ufb01ne a mapping \u03c1 : R2\n0 such that x (cid:55)\u2192 ((cid:104)x, b0(cid:105),(cid:104)x, b1(cid:105)).\n\u2206 clock-wise by \u03c0/4. We will later see that this construction is bene\ufb01cial\nThis rotates points in R(cid:63) \u222aR2\nfor a closer analysis of the layers\u2019 properties.\nSimilar to [27, 19], we choose exponential functions as structure elements, but other choices are\npossible (see Lemma 1). Differently to [27, 19], however, our structure elements are not at \ufb01xed\nlocations (i.e., one element per point in D), but their locations and scales are learned during training.\nDe\ufb01nition 3. Let \u00b5 = (\u00b50, \u00b51)(cid:62)\n\n\u2208 R \u00d7 R+, \u03c3 = (\u03c30, \u03c31) \u2208 R+ \u00d7 R+ and \u03bd \u2208 R+. We de\ufb01ne\n\n\u2206 \u2192 R\u00d7R+\n\n(cid:63)\u222aR2\n\n(3)\n\n(4)\n\nas follows:\n\n(cid:0)(x0, x1)(cid:1) =\n\ns\u00b5,\u03c3,\u03bd\n\ns\u00b5,\u03c3,\u03bd : R \u00d7 R+\n\n0 \u2192 R\n\n0 (x0\u2212\u00b50)2\u2212\u03c32\n\n1 (x1\u2212\u00b51)2\n\n,\n\n0 (x0\u2212\u00b50)2\u2212\u03c32\n\n1 (ln(\n\nx1\n\n\u03bd )+\u03bd\u2212\u00b51)2\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\ne\u2212\u03c32\ne\u2212\u03c32\n\n0,\n\nx1 \u2208 [\u03bd,\u221e)\n, x1 \u2208 (0, \u03bd)\n\nx1 = 0\n\nRemark. Note that s\u00b5,\u03c3,\u03bd is continuous in x1 as\n\n(cid:88)\nA persistence diagram D is then projected w.r.t. s\u00b5,\u03c3,\u03bd via\n(cid:0)(x0, x1)(cid:1) = 0 = s\u00b5,\u03c3,\u03bd\n(cid:1) + \u03bd(cid:1)\n\nand e(\u00b7) is continuous. Further, s\u00b5,\u03c3,\u03bd is differentiable on R \u00d7 R+, since\n\nS\u00b5,\u03c3,\u03bd : D \u2192 R,\n(cid:17)\n\n\u2202(cid:0)ln(cid:0) x1\n\ns\u00b5,\u03c3,\u03bd(\u03c1(x)) .\n\nx = lim\nx\u2192\u03bd\n\n(cid:16) x\n\nD (cid:55)\u2192\n\nlim\nx1\u21920\n\ns\u00b5,\u03c3,\u03bd\n\nlim\nx\u2192\u03bd\n\nx\u2208D\n\nand\n\n+ \u03bd\n\nln\n\n\u03bd\n\n1 = lim\nx\u2192\u03bd+\n\n\u2202x1\n\u2202x1\n\n(x) and\n\nlim\nx\u2192\u03bd\u2212\n\n\u03bd\n\n\u2202x1\n\n(x) = lim\nx\u2192\u03bd\u2212\n\n= 1 .\n\n\u03bd\nx\n\n(cid:0)(x0, 0)(cid:1)\n\nAlso note that we use the log-transform in Eq. (4) to guarantee that s\u00b5,\u03c3,\u03bd satis\ufb01es the conditions of\nLemma 1; this is, however, only one possible choice.\n\n4\n\n\fRemark. The intuition behind \u03bd is the following. It is the threshold at which the log-transform starts\nto operate. The log-transform, on the other hand, stretches the space between the x-axis and the line\n\u2206. This is necessary since\ndrawn at x + \u03bd to in\ufb01nite length. As a consequence, s\u00b5,\u03c3,\u03bd = 0 for x \u2208 R2\notherwise S\u00b5,\u03c3,\u03bd(D) = \u221e for D \u2208 D (as each persistence diagram contains points at the diagonal\nwith in\ufb01nite multiplicity).\n\nFinally, given a collection of S\u00b5i,\u03c3i,\u03bd, we combine them to form the output of the network layer.\nDe\ufb01nition 4. Let N \u2208 N, \u03b8 = (\u00b5i, \u03c3i)N\u22121\ni=0 \u2208\nS\u03b8,\u03bd : D \u2192 (R+\n0 )N D (cid:55)\u2192\n\n(cid:0)(R \u00d7 R+) \u00d7 (R+ \u00d7 R+)(cid:1)N and \u03bd \u2208 R+. We de\ufb01ne\n\n(cid:0)S\u00b5i,\u03c3i,\u03bd(D)(cid:1)N\u22121\n\ni=0 .\n\nas the concatenation of all N mappings de\ufb01ned in Eq. (4).\n\nImportantly, a network layer implementing Def. 4 is trainable via backpropagation, as (1) s\u00b5i,\u03c3i,\u03bd is\ndifferentiable in \u00b5i, \u03c3i, (2) S\u00b5i,\u03c3i,\u03bd(D) is a \ufb01nite sum of s\u00b5i,\u03c3i,\u03bd and (3) S\u03b8,\u03bd is just a concatenation.\n4 Theoretical properties\n\nIn this section, we demonstrate that the proposed layer is stable w.r.t. the 1-Wasserstein distance wq\n1,\nsee Eq. (2). In fact, this claim will follow from a more general result, stating suf\ufb01cient conditions on\nfunctions s : R2\nLemma 1. Let\n\n0 such that a construction in the form of Eq. (3) is stable w.r.t. wq\n1.\n\n\u2206 \u2192 R+\n\n(cid:63) \u222a R2\n\ns : R2\n\n(cid:63) \u222a R2\n\n\u2206 \u2192 R+\n\n0\n\n(i) s is Lipschitz continuous w.r.t. (cid:107) \u00b7 (cid:107)q and constant Ks\n\nhave the following properties:\n\n(ii) s(x(cid:1) = 0, for x \u2208 R2\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:88)\n\nx\u2208D\n\n\u2206\n\nThen, for two persistence diagrams D,E \u2208 D, it holds that\n\n(cid:88)\n\ny\u2208E\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264 Ks \u00b7 wq\n\ns(x) \u2212\n\ns(y)\n\n1(D,E) .\n\n(5)\n\nProof. see Appendix B\nRemark. At this point, we want to clarify that Lemma 1 is not speci\ufb01c to s\u00b5,\u03c3,\u03bd (e.g., as in Def. 3).\nRather, Lemma 1 yields suf\ufb01cient conditions to construct a w1-stable input layer. Our choice of\ns\u00b5,\u03c3,\u03bd is just a natural example that ful\ufb01ls those requirements and, hence, S\u03b8,\u03bd is just one possible\nrepresentative of a whole family of input layers.\nWith the result of Lemma 1 in mind, we turn to the speci\ufb01c case of S\u03b8,\u03bd and analyze its stability\nproperties w.r.t. wq\nLemma 2. s\u00b5,\u03c3,\u03bd has absolutely bounded \ufb01rst-order partial derivatives w.r.t. x0 and x1 on R \u00d7 R+.\nProof. see Appendix B\nTheorem 1. S\u03b8,\u03bd is Lipschitz continuous with respect to wq\nProof. Lemma 2 immediately implies that s\u00b5,\u03c3,\u03bd from Eq. (3) is Lipschitz continuous w.r.t || \u00b7 ||q.\nConsequently, s = s\u00b5,\u03c3,\u03bd \u25e6 \u03c1 satis\ufb01es property (i) from Lemma 1; property (ii) from Lemma 1 is\nsatis\ufb01ed by construction. Hence, S\u00b5,\u03c3,\u03bd is Lipschitz continuous w.r.t. wq\n1. Consequently, S\u03b8,\u03bd is\nLipschitz in each coordinate and therefore Liptschitz continuous.\n\n1. The following lemma is important in this context.\n\n1 on D.\n\nInterestingly, the stability result of Theorem 1 is comparable to the stability results in [1] or [27]\n(which are also w.r.t. wq\n1 and in the setting of diagrams with \ufb01nitely-many points). However, contrary\nto previous works, if we would chop-off the input layer after network training, we would then have a\nmapping S\u03b8,\u03bd of persistence diagrams that is speci\ufb01cally-tailored to the learning task on which the\nnetwork was trained.\n\n5\n\n\fFigure 2: Height function \ufb01ltration of a \u201cclean\u201d (left, green points) and a \u201cnoisy\u201d (right, blue points) shape\nalong direction d = (0,\u22121)(cid:62). This example demonstrates the insensitivity of homology towards noise, as the\nadded noise only (1) slightly shifts the dominant points (upper left corner) and (2) produces additional points\nclose to the diagonal, which have little impact on the Wasserstein distance and the output of our layer.\n\n5 Experiments\n\nTo demonstrate the versatility of the proposed approach, we present experiments with two totally\ndifferent types of data: (1) 2D shapes of objects, represented as binary images and (2) social network\ngraphs, given by their adjacency matrix. In both cases, the learning task is classi\ufb01cation. In each\nexperiment we ensured a balanced group size (per label) and used a 90/10 random training/test\nsplit; all reported results are averaged over \ufb01ve runs with \ufb01xed \u03bd = 0.1. In practice, points in input\ndiagrams were thresholded at 0.01 for computational reasons. Additionally, we conducted a reference\nexperiment on all datasets using simple vectorization (see Sec. 5.3) of the persistence diagrams in\ncombination with a linear SVM.\nImplementation. All experiments were implemented in PyTorch3, using DIPHA4 and Perseus [23].\nSource code is publicly-available at https://github.com/c-hofer/nips2017.\n\n5.1 Classi\ufb01cation of 2D object shapes\n\nWe apply persistent homology combined with our proposed input layer to two different datasets of\nbinary 2D object shapes: (1) the Animal dataset, introduced in [3] which consists of 20 different\nanimal classes, 100 samples each; (2) the MPEG-7 dataset which consists of 70 classes of different\nobject/animal contours, 20 samples each (see [21] for more details).\nFiltration. The requirements to use persistent homology on 2D shapes are twofold: First, we need\nto assign a simplicial complex to each shape; second, we need to appropriately \ufb01ltrate the complex.\nWhile, in principle, we could analyze contour features, such as curvature, and choose a sublevel set\n\ufb01ltration based on that, such a strategy requires substantial preprocessing of the discrete data (e.g.,\nsmoothing). Instead, we choose to work with the raw pixel data and leverage the persistent homology\ntransform, introduced by Turner et al. [29]. The \ufb01ltration in that case is based on sublevel sets of\nthe height function, computed from multiple directions (see Fig. 2). Practically, this means that we\ndirectly construct a simplicial complex from the binary image. We set K0 as the set of all pixels\nwhich are contained in the object. Then, a 1-simplex [p0, p1] is in the 1-skeleton K1 iff p0 and p1\nare 4\u2013neighbors on the pixel grid. To \ufb01ltrate the constructed complex, we denote by b the barycenter\nof the object and with r the radius of its bounding circle around b. Finally, we de\ufb01ne, for [p] \u2208 K0\nand d \u2208 S1, the \ufb01ltration function by f ([p]) = 1/r \u00b7 (cid:104)p \u2212 b, d(cid:105). Function values are lifted to K1 by\ntaking the maximum, cf. Sec. 2. Finally, let di be the 32 equidistantly distributed directions in S1,\nstarting from (1, 0)(cid:62). For each shape, we get a vector of persistence diagrams (Di)32\ni=1 where Di is\nthe 0-th diagram obtained by \ufb01ltration along di. As most objects do not differ in homology groups of\nhigher dimensions (> 0), we did not use the corresponding persistence diagrams.\nNetwork architecture. While the full network is listed in the supplementary material, the key\narchitectural choices are: 32 independent input branches, i.e., one for each \ufb01ltration direction. Further,\nthe i-th branch gets, as input, the vector of persistence diagrams from directions di\u22121, di and di+1.\nThis is a straightforward approach to capture dependencies among the \ufb01ltration directions. We use\ncross-entropy loss to train the network for 400 epochs, using stochastic gradient descent (SGD) with\nmini-batches of size 128 and an initial learning rate of 0.1 (halved every 25-th epoch).\n\n3https://github.com/pytorch/pytorch\n4https://bitbucket.org/dipha/dipha\n\n6\n\na1a2a3b1b9b2b3b4b5b8b7b6\u03bdPersistencediagram(0-dim.features)shiftduetonoiseArti\ufb01ciallyaddednoiseS1Filtrationdirectionsa1a2a3a1a2a3b1b2,3,4b5b6b7b8b9\fMPEG-7\n\nAnimal\n\n\u2021Skeleton paths\n\u2021Class segment sets\n\u2020ICS\n\u2020BCF\nOurs\n\n86.7\n90.9\n96.6\n97.2\n\n91.8\n\n67.9\n69.7\n78.4\n83.4\n\n69.5\n\nFigure 3: Left: some examples from the MPEG-7 (bottom) and Animal (top) datasets. Right: Classi\ufb01cation\nresults, compared to the two best (\u2020) and two worst (\u2021) results reported in [30].\n\nResults. Fig. 3 shows a selection of 2D object shapes from both datasets, together with the obtained\nclassi\ufb01cation results. We list the two best (\u2020) and two worst (\u2021) results as reported in [30]. While,\non the one hand, using topological signatures is below the state-of-the-art, the proposed architecture\nis still better than other approaches that are speci\ufb01cally tailored to the problem. Most notably, our\napproach does not require any speci\ufb01c data preprocessing, whereas all other competitors listed in\nFig. 3 require, e.g., some sort of contour extraction. Furthermore, the proposed architecture readily\ngeneralizes to 3D with the only difference that in this case di \u2208 S2. Fig. 4 (Right) shows an exemplary\nvisualization of the position of the learned structure elements for the Animal dataset.\n\n5.2 Classi\ufb01cation of social network graphs\n\nIn this experiment, we consider the problem of graph classi\ufb01cation, where vertices are unlabeled\nand edges are undirected. That is, a graph G is given by G = (V, E), where V denotes the set of\nvertices and E denotes the set of edges. We evaluate our approach on the challenging problem of\nsocial network classi\ufb01cation, using the two largest benchmark datasets from [31], i.e., reddit-5k\n(5 classes, 5k graphs) and reddit-12k (11 classes, \u224812k graphs). Each sample in these datasets\nrepresents a discussion graph and the classes indicate subreddits (e.g., worldnews, video, etc.).\nFiltration. The construction of a simplicial complex from G = (V, E) is straightforward: we set\nK0 = {[v] \u2208 V } and K1 = {[v0, v1] : {v0, v1} \u2208 E}. We choose a very simple \ufb01ltration based on\nthe vertex degree, i.e., the number of incident edges to a vertex v \u2208 V . Hence, for [v0] \u2208 K0 we get\nf ([v0]) = deg(v0)/ maxv\u2208V deg(v) and again lift f to K1 by taking the maximum. Note that chain\ngroups are trivial for dimension > 1, hence, all features in dimension 1 are essential.\nNetwork architecture. Our network has four input branches: two for each dimension (0 and 1) of\nthe homological features, split into essential and non-essential ones, see Sec. 2. We train the network\nfor 500 epochs using SGD and cross-entropy loss with an initial learning rate of 0.1 (reddit_5k), or\n0.4 (reddit_12k). The full network architecture is listed in the supplementary material.\nResults. Fig. 5 (right) compares our proposed strategy to state-of-the-art approaches from the\nliterature. In particular, we compare against (1) the graphlet kernel (GK) and deep graphlet kernel\n(DGK) results from [31], (2) the Patchy-SAN (PSCN) results from [24] and (3) a recently reported\ngraph-feature + random forest approach (RF) from [4]. As we can see, using topological signatures\nin our proposed setting considerably outperforms the current state-of-the-art on both datasets. This is\nan interesting observation, as PSCN [24] for instance, also relies on node degrees and an extension of\nthe convolution operation to graphs. Further, the results reveal that including essential features is key\nto these improvements.\n\n5.3 Vectorization of persistence diagrams\n\nHere, we brie\ufb02y present a reference experiment we conducted following Bendich et al. [5]. The idea\nis to directly use the persistence diagrams as features via vectorization. For each point (b, d) in a\npersistence diagram D we calculate its persistence, i.e., d\u2212 b. We then sort the calculated persistences\nby magnitude from high to low and take the \ufb01rst N values. Hence, we get, for each persistence\ndiagram, a vector of dimension N (if |D \\ \u2206| < N, we pad with zero). We used this technique\non all four data sets. As can be seen from the results in Table 4 (averaged over 10 cross-validation\nruns), vectorization performs poorly on MPEG-7 and Animal but can lead to competitive rates on\nreddit-5k and reddit-12k. Nevertheless, the obtained performance is considerably inferior to our\nproposed approach.\n\n7\n\n\f5\n\n81.8\n48.8\n37.1\n24.2\n\n10\n\n82.3\n50.0\n38.2\n24.6\n\nN\n\n20\n\n79.7\n46.2\n39.7\n27.9\n\n40\n\n74.5\n42.4\n42.1\n29.8\n\n80\n\n68.2\n39.3\n43.8\n31.5\n\n160\n\n64.4\n36.0\n45.2\n31.6\n\nOurs\n\n91.8\n69.5\n54.5\n44.5\n\nMPEG-7\nAnimal\nreddit-5k\nreddit-12k\n\nFigure 4: Left: Classi\ufb01cation accuracies for a linear SVM trained on vectorized (in RN ) persistence diagrams\n(see Sec. 5.3). Right: Exemplary visualization of the learned structure elements (in 0-th dimension) for the\nAnimal dataset and \ufb01ltration direction d = (\u22121, 0)(cid:62). Centers of the learned elements are marked in blue.\n\nreddit-5k\n\nreddit-12k\n\nGK [31]\nDGK [31]\nPSCN [24]\nRF [4]\nOurs (w/o essential)\nOurs (w/ essential)\n\n41.0\n41.3\n49.1\n50.9\n\n49.1\n54.5\n\n31.8\n32.2\n41.3\n42.7\n\n38.5\n44.5\n\nFigure 5: Left: Illustration of graph \ufb01ltration by vertex degree, i.e., f \u2261 deg (for different choices of ai, see\nSec. 2). Right: Classi\ufb01cation results as reported in [31] for GK and DGK, Patchy-SAN (PSCN) as reported in\n[24] and feature-based random-forest (RF) classi\ufb01cation from [4].\n\n.\n\nFinally, we remark that in both experiments, tests with the kernel of [27] turned out to be computa-\ntionally impractical, (1) on shape data due to the need to evaluate the kernel for all \ufb01ltration directions\nand (2) on graphs due the large number of samples and the number of points in each diagram.\n\n6 Discussion\n\nWe have presented, to the best of our knowledge, the \ufb01rst approach towards learning task-optimal\nstable representations of topological signatures, in our case persistence diagrams. Our particular\nrealization of this idea, i.e., as an input layer to deep neural networks, not only enables us to learn with\ntopological signatures, but also to use them as additional (and potentially complementary) inputs to\nexisting deep architectures. From a theoretical point of view, we remark that the presented structure\nelements are not restricted to exponential functions, so long as the conditions of Lemma 1 are met.\nOne drawback of the proposed approach, however, is the arti\ufb01cial bending of the persistence axis (see\nFig. 1) by a logarithmic transformation; in fact, other strategies might be possible and better suited\nin certain situations. A detailed investigation of this issue is left for future work. From a practical\nperspective, it is also worth pointing out that, in principle, the proposed layer could be used to handle\nany kind of input that comes in the form of multisets (of Rn), whereas previous works only allow\nto handle sets of \ufb01xed size (see Sec. 1). In summary, we argue that our experiments show strong\nevidence that topological features of data can be bene\ufb01cial in many learning tasks, not necessarily to\nreplace existing inputs, but rather as a complementary source of discriminative information.\n\nAcknowledgements. This work was partially funded by the Austrian Science Fund FWF (KLI\nproject 00012) and the Spinal Cord Injury and Tissue Regeneration Center Salzburg (SCI-TReCS),\nParacelsus Medical University, Salzburg.\n\n8\n\n0.00.20.40.60.81.00.00.20.40.60.81.0BirthDeathG=(V,E)212311115f\u22121((\u2212\u221e,2])f\u22121((\u2212\u221e,5])f\u22121((\u2212\u221e,3])1\fA Technical results\nLemma 3. Let \u03b1 \u2208 R+ and \u03b2 \u2208 R. We have\n\ni)\n\nlim\nx\u21920\n\nln(x)\n\nx\n\n\u00b7 e\u2212\u03b1(ln(x)+\u03b2)2 = 0\n\nii)\n\nlim\nx\u21920\n\n1\n\nx \u00b7 e\u2212\u03b1(ln(x)+\u03b2)2 = 0 .\n\nProof. We omit the proof for brevity (see supplementary material for details), but remark that only i)\nneeds to be shown as ii) follows immediately.\n\nB Proofs\n\n(cid:124)\n\n(cid:124)\n\nD = \u03d5\n= (\u03d5\n\n\u22121(\u2206)\n\u222a (\u03d5\n\n(cid:123)(cid:122)\n(cid:125)\n\u22121(E0) \u2229 \u2206)\n\nProof of Lemma 1. Let \u03d5 be a bijection between D and E which realizes wq\n1(D,E) and let D0 =\nD \\ \u2206, E0 = E \\ \u2206. To show the result of Eq. (5), we consider the following decomposition:\n(cid:123)(cid:122)\n(cid:125)\n\u22121(\u2206) \u2229 \u2206)\nExcept for the term D, all sets are \ufb01nite. In fact, \u03d5 realizes the Wasserstein distance wq\n\n\u03d5(cid:12)(cid:12)D\n1 which implies\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n= id. Therefore, s(x) = s(\u03d5(x)) = 0 for x \u2208 D since D \u2282 \u2206. Consequently, we can ignore D\n(cid:88)\n(cid:88)\nin the summation and it suf\ufb01ces to consider E = A \u222a B \u222a C. It follows that\n\n(cid:88)\n\n(cid:88)\n\ns(x) \u2212\n\ns(x) \u2212\n\ns(\u03d5(x))\n\n\u222a (\u03d5\n\nx\u2208D\n\nx\u2208D\n\nx\u2208E\n\ns(y)\n\ny\u2208E\n\n(6)\n\nB\n\n(cid:124)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) =\n(cid:88)\n\nC\n\nD\n\n(cid:124)\n\n\u222a (\u03d5\n\n(cid:123)(cid:122)\n(cid:125)\n\u22121(\u2206) \\ \u2206)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)\n(cid:88)\n|s(x) \u2212 s(\u03d5(x))|\nx\u2208D ||x \u2212 \u03d5(x)||q = Ks \u00b7 wq\n\ns(x) \u2212\n\ns(\u03d5(x))\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nx\u2208E\n\nA\n\n\u22121(E0) \u222a \u03d5\n(cid:125)\n(cid:123)(cid:122)\n\u22121(E0) \\ \u2206)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) =\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88)\n\nx\u2208D\n\n=\n\nx\u2208E\n\u2264 Ks \u00b7\n\nx\u2208E\n||x \u2212 \u03d5(x)||q = Ks \u00b7\n\ns(x) \u2212 s(\u03d5(x))\n\n(cid:88)\n\nx\u2208E\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264\n\n1(D,E) .\n\nProof of Lemma 2. Since s\u00b5,\u03c3,\u03bd is de\ufb01ned differently for x1 \u2208 [\u03bd,\u221e) and x1 \u2208 (0, \u03bd), we need to\ndistinguish these two cases. In the following x0 \u2208 R.\n(1) x1 \u2208 [\u03bd,\u221e): The partial derivative w.r.t. xi is given as\n\u2212\u03c32\n\ni (xi\u2212\u00b5i)2(cid:19)\n\n(cid:18) \u2202\n\n(cid:18) \u2202\n\n(cid:19)\n\n(x0, x1)\n\n(7)\n\ns\u00b5,\u03c3,\u03bd\n\n\u2202xi\n\n(x0, x1) = C \u00b7\n= C \u00b7 e\n\ne\n\n\u2202xi\n\u2212\u03c32\n\ni (xi\u2212\u00b5i)2\n\n\u00b7 (\u22122\u03c32\n\ni )(xi \u2212 \u00b5i) ,\n\nwhere C is just the part of exp(\u00b7) which is not dependent on xi. For all cases, i.e., x0 \u2192 \u221e, x0 \u2192\n\u2212\u221e and x1 \u2192 \u221e, it holds that Eq. (7) \u2192 0.\n(2) x1 \u2208 (0, \u03bd): The partial derivative w.r.t. x0 is similar to Eq. (7) with the same asymptotic\nbehaviour for x0 \u2192 \u221e and x0 \u2192 \u2212\u221e. However, for the partial derivative w.r.t. x1 we get\n\n(cid:19)\n\n(cid:18) \u2202\n\ns\u00b5,\u03c3,\u03bd\n\n\u2202x1\n\n(x0, x1) = C \u00b7\n\ne\n\nx1\n\n1 (ln(\n\n\u2202x1\n\n\u2212\u03c32\n\n(cid:18) \u2202\n(cid:16)\n\n\u03bd )+\u03bd\u2212\u00b51)2(cid:19)\n(cid:16)\n(cid:16) x1\n(cid:17)\n(cid:19)\n(cid:18)\n(cid:16) x1\n(cid:17)\n= C \u00b7 e( ... ) \u00b7 (\u22122\u03c32\n1) \u00b7\n(cid:125)\n(cid:123)(cid:122)\nln\n= C\n\n(cid:124)\ne( ... ) \u00b7\n\n\u03bd\n1\nx1\n\nln\n\n\u03bd\n\n\u00b7\n\n\u00b7\n\n(cid:48)\n\n(a)\n\n(x0, x1)\n\n(cid:17)\n\n\u03bd\nx1\n\n\u00b7\n\n+ \u03bd \u2212 \u00b51\n(cid:123)(cid:122)\n+(\u03bd \u2212 \u00b51) \u00b7 e( ... ) \u00b7\n\n(cid:124)\n\n(8)\n\n(cid:17)\n\n.\n\n1\nx1\n\n(cid:125)\n\n(b)\n\nAs x1 \u2192 0, we can invoke Lemma 3 i) to handle (a) and Lemma 3 ii) to handle (b); conclusively,\nEq. (8) \u2192 0. As the partial derivatives w.r.t. xi are continuous and their limits are 0 on R, R+, resp.,\nwe conclude that they are absolutely bounded.\n\n9\n\n\fReferences\n[1] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson,\nF. Motta, and L. Ziegelmeier. Persistence images: A stable vector representation of persistent homology.\nJMLR, 18(8):1\u201335, 2017. 2, 5\n\n[2] A. Adcock, E. Carlsson, and G. Carlsson. The ring of algebraic functions on persistence bar codes. CoRR,\n\n2013. https://arxiv.org/abs/1304.0530. 2\n\n[3] X. Bai, W. Liu, and Z. Tu. Integrating contour and skeleton for shape classi\ufb01cation. In ICCV Workshops,\n\n2009. 6\n\n[4] I. Barnett, N. Malik, M.L. Kuijjer, P.J. Mucha, and J.-P. Onnela. Feature-based classi\ufb01cation of networks.\n\nCoRR, 2016. https://arxiv.org/abs/1610.05868. 7, 8\n\n[5] P. Bendich, J.S. Marron, E. Miller, A. Pieloch, and S. Skwerer. Persistent homology analysis of brain artery\n\ntrees. Ann. Appl. Stat, 10(2), 2016. 7\n\n[6] P. Bubenik. Statistical topological data analysis using persistence landscapes. JMLR, 16(1):77\u2013102, 2015.\n\n2\n\n[7] G. Carlsson. Topology and data. Bull. Amer. Math. Soc., 46:255\u2013308, 2009. 1\n[8] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian. On the local behavior of spaces of natural\n\nimages. IJCV, 76:1\u201312, 2008. 1\n\n[9] F. Chazal, D. Cohen-Steiner, L. J. Guibas, F. M\u00e9moli, and S. Y. Oudot. Gromov-Hausdorff stable signatures\n\nfor shapes using persistence. Comput. Graph. Forum, 28(5):1393\u20131403, 2009. 4\n\n[10] F. Chazal, B.T. Fasy, F. Lecci, A. Rinaldo, and L. Wassermann. Stochastic convergence of persistence\n\nlandscapes and silhouettes. JoCG, 6(2):140\u2013161, 2014. 2\n\n[11] F. Chazal, L.J. Guibas, S.Y. Oudot, and P. Skraba. Persistence-based clustering in Riemannian manifolds.\n\nJ. ACM, 60(6):41\u201379, 2013. 1\n\n[12] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams. Discrete Comput.\n\nGeom., 37(1):103\u2013120, 2007. 4\n\n[13] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko. Lipschitz functions have Lp-stable persistence.\n\nFound. Comput. Math., 10(2):127\u2013139, 2010. 4\n\n[14] H. Edelsbrunner and J. L. Harer. Computational Topology : An Introduction. American Mathematical\n\nSociety, 2010. 1, 2, 3\n\n[15] H. Edelsbrunner, D. Letcher, and A. Zomorodian. Topological persistence and simpli\ufb01cation. Discrete\n\nComput. Geom., 28(4):511\u2013533, 2002. 1\n\n[16] A. Hatcher. Algebraic Topology. Cambridge University Press, Cambridge, 2002. 2\n[17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 2\n[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classi\ufb01cation with deep convolutional neural\n\nnetworks. In NIPS, 2012. 2\n\n[19] G. Kusano, K. Fukumizu, and Y. Hiraoka. Persistence weighted Gaussian kernel for topological data\n\nanalysis. In ICML, 2016. 2, 4\n\n[20] R. Kwitt, S. Huber, M. Niethammer, W. Lin, and U. Bauer. Statistical topological data analysis - a kernel\n\nperspective. In NIPS, 2015. 2\n\n[21] L. Latecki, R. Lakamper, and T. Eckhardt. Shape descriptors for non-rigid shapes with a single closed\n\ncontour. In CVPR, 2000. 6\n\n[22] C. Li, M. Ovsjanikov, and F. Chazal. Persistence-based structural recognition. In CVPR, 2014. 1\n[23] K. Mischaikow and V. Nanda. Morse theory for \ufb01ltrations and ef\ufb01cient computation of persistent homology.\n\nDiscrete Comput. Geom., 50(2):330\u2013353, 2013. 6\n\n[24] M. Niepert, M. Ahmed, and K. Kutzkov. Learning convolutional neural networks for graphs. In ICML,\n\n2016. 7, 8\n\n[25] C.R. Qi, H. Su, K. Mo, and L.J. Guibas. PointNet: Deep learning on point sets for 3D classi\ufb01cation and\n\nsegmentation. In CVPR, 2017. 2\n\n[26] S. Ravanbakhsh, S. Schneider, and B. P\u00f3czos. Deep learning with sets and point clouds. In ICLR, 2017. 2\n[27] R. Reininghaus, U. Bauer, S. Huber, and R. Kwitt. A stable multi-scale kernel for topological machine\n\nlearning. In CVPR, 2015. 1, 2, 4, 5, 8\n\n[28] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D.L. Ringach. Topological analysis of\n\npopulation activity in visual cortex. J. Vis., 8(8), 2008. 1\n\n10\n\n\f[29] K. Turner, S. Mukherjee, and D. M. Boyer. Persistent homology transform for modeling shapes and\n\nsurfaces. Inf. Inference, 3(4):310\u2013344, 2014. 1, 6\n\n[30] X. Wang, B. Feng, X. Bai, W. Liu, and L.J. Latecki. Bag of contour fragments for robust shape classi\ufb01cation.\n\nPattern Recognit., 47(6):2116\u20132125, 2014. 7\n\n[31] P. Yanardag and S.V.N. Vishwanathan. Deep graph kernels. In KDD, 2015. 7, 8\n\n11\n\n\f", "award": [], "sourceid": 1027, "authors": [{"given_name": "Christoph", "family_name": "Hofer", "institution": "University of Salzburg"}, {"given_name": "Roland", "family_name": "Kwitt", "institution": "University of Salzburg"}, {"given_name": "Marc", "family_name": "Niethammer", "institution": "UNC Chapel Hill"}, {"given_name": "Andreas", "family_name": "Uhl", "institution": "University of Salzburg"}]}