{"title": "Stability of Graph Scattering Transforms", "book": "Advances in Neural Information Processing Systems", "page_first": 8038, "page_last": 8048, "abstract": "Scattering transforms are non-trainable deep convolutional architectures that exploit the multi-scale resolution of a wavelet filter bank to obtain an appropriate representation of data. More importantly, they are proven invariant to translations, and stable to perturbations that are close to translations. This stability property dons the scattering transform with a robustness to small changes in the metric domain of the data. When considering network data, regular convolutions do not hold since the data domain presents an irregular structure given by the network topology. In this work, we extend scattering transforms to network data by using multi-resolution graph wavelets, whose computation can be obtained by means of graph convolutions. Furthermore, we prove that the resulting graph scattering transforms are stable to metric perturbations of the underlying network. This renders graph scattering transforms robust to changes on the network topology, making it particularly useful for cases of transfer learning, topology estimation or time-varying graphs.", "full_text": "Stability of Graph Scattering Transforms\n\nDept. of Electrical and Systems Eng.\n\nCourant Institute of Math. Sci.\n\nJoan Bruna\n\nNew York University\nNew York, NY 10012\nbruna@cims.nyu.edu\n\nFernando Gama\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19104\nfgama@seas.upenn.edu\n\nAlejandro Ribeiro\n\nDept. of Electrical and Systems Eng.\n\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19104\n\naribeiro@seas.upenn.edu\n\nAbstract\n\nScattering transforms are non-trainable deep convolutional architectures that ex-\nploit the multi-scale resolution of a wavelet \ufb01lter bank to obtain an appropriate\nrepresentation of data. More importantly, they are proven invariant to translations,\nand stable to perturbations that are close to translations. This stability property\nprovides the scattering transform with a robustness to small changes in the metric\ndomain of the data. When considering network data, regular convolutions do not\nhold since the data domain presents an irregular structure given by the network\ntopology.\nIn this work, we extend scattering transforms to network data by using multires-\nolution graph wavelets, whose computation can be obtained by means of graph\nconvolutions. Furthermore, we prove that the resulting graph scattering transforms\nare stable to metric perturbations of the underlying network. This renders graph\nscattering transforms robust to changes on the network topology, making it partic-\nularly useful for cases of transfer learning, topology estimation or time-varying\ngraphs.\n\n1\n\nIntroduction\n\nLinear information processing architectures have been the preferred tool for extracting useful infor-\nmation from data due to their robustness and provable performance [1\u20136]. With the desire to model\nincreasingly more complex mappings between data and useful information, linear approaches started\nto fall short in terms of performance, giving rise to a myriad of other nonlinear alternatives [2, Chap.\n8], [6, Part 4]. Of these, arguably the most successful have been convolutional neural networks\n(CNNs) [7]. CNNs consist of a cascade of layers, each of which computes a convolution with a bank\nof \ufb01lters followed by a pointwise nonlinearity, and act as a parameterization of the nonlinear mapping\nbetween the input data and the desired useful information [8].\nThe inclusion of nonlinearities coupled with the use of trained coef\ufb01cients has effectively increased\nthe performance, but it also has obscured the limits and guarantees of CNNs [9]. In the theoretical\nrealm, [10, 11] opted for controlling for one of the sources of uncertainty, by \ufb01xing the bank of\n\ufb01lters to be a set of pre-de\ufb01ned, multiresolution wavelets. Then, [10] proved that under admissible\nconditions on the wavelets, the resulting non-trainable CNN (called scattering transform) satis\ufb01es\nenergy conservation, as well as stability to domain deformations that are close to translations. In\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fessence, the stability properties of non-trainable scattering transforms constitutes one of the main\ntheoretical results explaining the success of CNNs.\nData stemming from networks, however, does not exhibit a regular inherent structure that can\nbe effectively exploited by convolutions. Data elements are, instead, related by arbitrary pairwise\nrelationships described by an underlying graph support. Graph neural networks (GNNs) have emerged\nas successful architectures that exploit this graph structure [12\u201315]. GNNs, mimicking the overall\narchitecture of CNNs, also consist of a cascade of layers, but constrain the linear transform in each\nlayer to be a graph convolution with a bank of graph \ufb01lters [16\u201320]. Graph convolutions are, in\nanalogy with traditional (regular) convolutions, a weighted sum of shifted versions of the input\nsignal. The \ufb01lter taps (weights) of the bank of graph \ufb01lters are also obtained by minimizing a cost\nfunction over the training set. The mathematical challenges arising from the use of trainable \ufb01lters\nand pointwise nonlinearities have prevented a rapid development of the theory of GNNs as well.\nMoreover, the particularities of the underlying irregular structure supporting network data raises\nchallenges of its own.\nFollowing the roadmap of the Euclidean, regular case, in this paper we pursue the investigation of the\nbene\ufb01ts of GNN architectures through the lens of their non-trainable counterparts, where \ufb01lters are\ndesigned from multiresolution wavelet families. Several papers [21\u201323] have made initial progress in\nde\ufb01ning scattering graph representation and studying their stability properties with respect to metric\ndeformations of the domain. However, most of these results offer bounds that depend on the graph\ntopology and do not hold for certain graphs or when graphs are very large. Additionally, these works\ndo not recover the Euclidean scattering stability result on Euclidean grids. The main theoretical\ncontribution of this work is to establish stability to relative metric deformations for a wide class of\ngraph wavelet families, yielding a bound that is independent on the graph topology (it only depends\non the size of the deformation and the representation architecture).\nThe rest of the paper is structured as follows. In section 2 we discuss related works. In section 3\nwe de\ufb01ne the scattering transform architecture, use the graph signal processing framework to de-\nscribe network data (Sec. 3.1), and de\ufb01ne graph scattering transforms (GSTs) using graph wavelets\n(Sec. 3.2). Then, we proceed to prove our main theoretical claims in section 4. Namely, that GSTs\nare permutation invariant (Prop. 1), and that they are stable (Theorem 1) under a relative perturbation\nmodel (Sec. 4.1). Finally, we show through numerical experiments in section 5, that the GST rep-\nresentation is not only stable, but also captures rich enough information. Conclusions are drawn in\nsection 6.\n\n2 Related Work\n\nThe particular property of stability has been investigated, in analogy to scattering transforms, for\nthe case of non-trainable graph wavelet \ufb01lter banks [21, 22]. More speci\ufb01cally, [21] studies the\nstability of graph scattering transforms to permutations, as well as to perturbations on the eigenvalues\nand eigenvectors of the underlying graph support. Furthermore, [21] derives results on energy\nconservation. The bounds obtained on approximate permutation invariance grow with the size of\nthe graph, while the bounds on the stability to graph perturbations are applicable only for changes\nin edge weights that are smaller with increasing graph size (i.e. larger graphs admit smaller edge\nweight changes). Alternatively, in [22], graph scattering transforms using diffusion wavelets [24]\nare considered. Perturbations are de\ufb01ned in terms of changes in the underlying graph support, and\nmeasured using diffusion distances [25,26]. The bound obtained on the output for different underlying\ngraph supports, depends on the spectral gap of the \ufb01lter, making this bound quite loose in some\ncases [22]. We note that [27] isolates the bound on the powers of the graph shift operator [22, eq. (23)]\nand generalizes it for arbitrary graph \ufb01lters. As such, the resulting bound also depends on the spectral\ngap. Finally, we draw attention to the work in [28]. This work de\ufb01nes geometric scattering transforms,\nwhich are an extension of diffusion scattering [22], by using a lazy random walk adjacency as the\nmatrix representation and considering higher-order moments for the low-pass operator. Furthermore,\nthey do an exhaustive experimental comparison between geometric scattering transforms and a myriad\nof graph-based machine learning techniques.\n\n2\n\n\f3 Graph scattering transforms\n\nA scattering transform network [10, 11] is a deep convolutional architecture comprised of three basic\nj=1, (ii) a pointwise nonlinearity \u03c1 (absolute\nelements: (i) a bank of multiresolution wavelets {hj}J\nvalue), and (iii) a low-pass average operator U. These elements are combined sequentially to produce\na representation \u03a6(x) of the data x. More speci\ufb01cally, as illustrated in Fig. 1, each of the J wavelets\nis applied to each of the nodes of the previous layer, generating J new nodes to which the nonlinearity\nis applied. The output is harvested at each node by computing a low-pass average through the operator\nU. For a scattering transform with L layers, the number of coef\ufb01cients of the representation \u03a6(x) is\n\n(cid:80)L\u22121\n(cid:96)=0 J (cid:96) = (J L \u2212 1)/(J \u2212 1), independent of the size of the input data.\n\nEach coef\ufb01cient of the scattering transform is determined by the sequence of wavelet indices (res-\nolution scales) traversed to compute it. We call this sequence a path. Let J ((cid:96)) = {1, . . . , J}(cid:96)\nbe a shorthand for the space of all possible (cid:96)-tuples with J elements, de\ufb01ned for all (cid:96) > 0 and\nwhere we set J (0) = {0}. Then, we can de\ufb01ne the path pj((cid:96)) : N \u2192 J ((cid:96)) as the mapping\nbetween j \u2208 N and the speci\ufb01c sequence pj((cid:96)) = (j1, . . . , j(cid:96)) of length (cid:96) comprised of a com-\nbination of indices from 1 to J (tuples), with p1(0) = 0. Sequences pj((cid:96)) and pi((cid:96)) are distinct\nfor j (cid:54)= i so that {pj((cid:96))}j=1,...,J (cid:96) \u2261 J ((cid:96)) is the space of all possible tuples. We denote by\nJ (L) = {pj((cid:96)) \u2208 J ((cid:96)),\u2200 j \u2208 {1, . . . , J (cid:96)},\u2200 (cid:96) \u2208 {0, . . . , L \u2212 1}} the set of all sequences for all\nvalues of (cid:96), see Fig. 1.\nWith this notation in place, the scattering transform \u03a6(x) of the data x is the collection of scattering\ncoef\ufb01cients \u03c6pj ((cid:96))(x)\n\nJ (L)\n\npj ((cid:96))\u2208J ((cid:96)),(cid:96)=0,...,L\u22121\n\n\u03c6pj ((cid:96))(x) = U(cid:2)(\u03c1hj)pj ((cid:96)) \u2217 x(cid:3) = U xpj ((cid:96))\n\n(1)\nFor a given sequence pj((cid:96)) = (j1, . . . , j(cid:96)) \u2208 J ((cid:96)), the scattering coef\ufb01cient \u03c6pj ((cid:96)) is computed as\n(2)\nwhere the notation [(\u03c1hj)pj ((cid:96)) \u2217 x] := [(\u03c1hj)j\u2208pj ((cid:96)) \u2217 x] = \u03c1hj(cid:96) \u2217 \u00b7\u00b7\u00b7 \u2217 \u03c1hj1 \u2217 x is a shorthand\nfor the repeated application of pointwise nonlinearities \u03c1 and wavelets hj following the scale\nindices determined by the path pj((cid:96)). The operator U outputs as a scalar, computed by means of a\nsummarizing low-pass linear operator, typically an average or a sum. Note that we set \u03c6p1(0) = \u03c60 =\nU x. The energy of the scattering transform is given by the energy in its coef\ufb01cients\n\n.\n\n\u03a6(x) =(cid:8)\u03c6pj ((cid:96))(x)(cid:9)\n\n:=(cid:8)\u03c6pj ((cid:96))(x)(cid:9)\n\n(cid:88)\n\nJ (L)\n\nL\u22121(cid:88)\n\nJ (cid:96)(cid:88)\n\n(cid:96)=0\n\nj=1\n\n|\u03c6pj ((cid:96))(x)|2 =\n\n|\u03c6pj ((cid:96))(x)|2.\n\n(3)\n\n(cid:107)\u03a6(x)(cid:107)2 =\n\n3.1 Network data\n\nThe scattering transform relies heavily on the use of the convolution to \ufb01lter the data through the\nwavelet multiresolution bank. The convolution operation, in turn, depends on the data exhibiting\na regular structure, such that contiguous data elements represent elements that are spatially or\ntemporally related. This is not the case for network data, whereby data elements are related by\narbitrary pairwise relationships determined by the underlying network topology.\nTo describe network data, we denote by G = (V,E,W) the underlying graph support, with V the\nset of N nodes, E \u2286 V \u00d7 V the set of edges, and W : E \u2192 R the edge weighing function. The\ndata x \u2208 RN is modeled as a graph signal where each element [x]i = xi is the value of the data at\nnode i \u2208 V1 [15]. To operationally relate data x with the underlying graph support G, we de\ufb01ne a\ngraph shift operator (GSO) S \u2208 RN\u00d7N which is a matrix representation of the graph that respects\nits sparsity, i.e. [S]ij = sij can be nonzero, only if (j, i) \u2208 E or if i = j [15]. Examples of GSOs\ncommonly used in the literature include the adjacency matrix [12, 13], the Laplacian matrix [14], and\ntheir normalized counterparts [18, 22].\nThe operation Sx is, due to the sparsity constraint of S, a local, linear operation, by which each node\ni in the network updates its value by means of a weighted linear combination of the signal values at\n\n1For notational simplicity, we consider that each node in the graph holds scalar data, but the extension to\n\nvector data is straightforward, see [18, 20] for details.\n\n3\n\n\f\u03c6(0)(x)\n\nx(0) = x\n\nx(0)\n\nx(1)\n\nx(2)\n\n\u03c6(1)(x)\n\nx(1)\n\nh1\n\nh2\n\n\u03c6(2)(x)\n\n\u03c6(3)(x)\n\nx(2)\n\nh3\n\nh4\n\n\u03c6(4)(x)\n\nx(3)\n\nx(4)\n\n\u03c6(1,1)(x)\n\nx(1,1)\n\nx(1,3)\n\nx(2,2)\n\nx(2,4)\n\nx(3,2)\n\nx(3,4) x(4,1)\n\nx(4,3)\n\n(cid:96) = 0\n\n(cid:96) = 1\n\n(cid:96) = 2\n\nFigure 1. Graph scattering transform. Illustration for J = 4 scales and L = 3 layers. At layer (cid:96) = 0 we\nhave a single coef\ufb01cient \u03c6(0)(x) since J (0) = {0}, which is obtained by applying the low-pass operator U\nto the input data x directly. In the next layer (cid:96) = 1 we have J 1 = 4 coef\ufb01cients. We generate 4 nodes by\napplying each of the 4 wavelets hj to the input data followed by a pointwise nonlinearity, yielding xpj (1) where\nJ (1) = {1, 2, 3, 4}. Then, we obtain the output coef\ufb01cients \u03c6pj ((cid:96))(x) by means of the low-pass operator U.\nFor the following layer (cid:96) = 2 we have J 2 = 16 coef\ufb01cients. For each of the J previous nodes, we apply each of\nthe wavelets yielding J new nodes for each one of them, followed by the nonlinearity \u03c1. Then, we obtain the\nnew 16 coef\ufb01cients by applying the low-pass operator U.\n\nneighboring nodes j \u2208 Ni\n\n(cid:88)\n\nj\u2208Ni\n\nsijxj.\n\n[Sx]i =\n\n(4)\n\nNote that, while Sx computes a summary of the information in the one-hop neighborhood of each\nnode, repeated application of S computes summaries from farther away neighborhoods, i.e. Skx =\nS(Sk\u22121x) computes a summary from the k-hop neighborhood. This allows for the de\ufb01nition of\ngraph convolutions, in analogy with regular convolutions. More precisely, since regular convolutions\nare linear combinations of data that is spatially or temporally nearby, graph convolutions are de\ufb01ned\nas a linear combination of data located at consecutive neighborhoods\n\nhkSkx = H(S)x\n\n(5)\n\nwhere h = {h0, . . . , hK\u22121} is the set of K \ufb01lter coef\ufb01cients, and where we use \u2217S to denote a graph\nconvolution over GSO S [29]. We note that the output of the graph convolution is another graph\nsignal de\ufb01ned over the same graph G as the input x.\nThe graph convolution (5) also satis\ufb01es the convolution theorem [30, Sec. 2.9.6], which states that\nconvolution implies multiplication in frequency domain. We de\ufb01ne the graph frequency domain in\nterms of the eigendecomposition of the GSO, which we assume to be normal S = V\u039bVH, where V\nis the matrix of eigenvectors which determines the frequency basis signals, and \u039b is the diagonal\nmatrix of eigenvalues that determines the frequency coef\ufb01cients [13]. The graph Fourier transform\n(GFT) of a graph signal is de\ufb01ned as the projection of the graph signal onto the space of frequency\nbasis signals \u02dcx = VHx. So, if we compute the GFT of the output of the graph convolution, we get\n\nK\u22121(cid:88)\n\nK\u22121(cid:88)\n\nK\u22121(cid:88)\n\nk=0\n\nh \u2217S x =\n\n\u02dcy = VHy = VH (h \u2217S x) = VH\n\nhkSkx =\n\nhk\u039bk \u02dcx = diag(\u02dch)\u02dcx = \u02dch \u25e6 \u02dcx\n\n(6)\n\nk=0\n\nk=0\n\nwhere \u25e6 denotes the elementwise (Hadamard) product, yielding an multiplication of the GFT of the\n\ufb01lter taps with the GFT of the signal. We note that the GFT \u02dch of the \ufb01lter coef\ufb01cients h is given by a\npolynomial on the eigenvalues of the graph\n\n[\u02dch]i = \u02dchi = h(\u03bbi) with h(\u03bb) =\n\nhk\u03bbk.\n\n(7)\n\nK\u22121(cid:88)\n\nk=0\n\n4\n\n\fIt is very interesting to remark that the GFT of the \ufb01lter is characterized by the same function h(\u03bb),\nwhich depends on the \ufb01lter coef\ufb01cients, irrespective of the graph. The speci\ufb01c value of the frequency\ncoef\ufb01cients of the \ufb01lter (and its impact on the output), however, is obtained by instantiating h(\u03bb)\non the eigenvalues of the given graph. But h(\u03bb) still characterizes the GFT of the \ufb01lter taps for all\ngraphs.\n\n3.2 Graph wavelets and graph scattering transforms\n\nGraph wavelets are typically de\ufb01ned in the graph frequency domain, by specifying a speci\ufb01c form on\nthe function h(\u03bb) [31, 32]. For instance, [31] proposes to choose a mother wavelet (wave generating\nkernel) h(\u03bb) from the regular Wavelet literature and then construct all the rest of the wavelet scales\nby rescaling the continuous parameter \u03bb before sampling it with the eigenvalues corresponding to the\nspeci\ufb01c graph, see [31, eq. (65)] for a concrete example of a graph wavelet. This same construction\nmethod is further developed in [32] to obtain graph wavelets that are adapted to the spectrum (i.e.\nthat localize the wavelets around the actual eigenvalues of the given graph, instead of just sampling\nrescaled versions of the wavelets). Concrete examples of graph wavelets are given in [32, Sec. IV-A].\nOnce the multiresolution wavelet \ufb01lter bank is de\ufb01ned {hj(\u03bb)}J\nj=1 we proceed to compute the output\nby \ufb01ltering each graph signal with the corresponding wavelet on the given graph. More precisely,\nconsider S = V\u039bVH and de\ufb01ne \u02dchj = [hj(\u03bb1), . . . , hj(\u03bbN )]T by evaluating hj(\u03bb) on each of the\nN eigenvalues of S. Then, we obtain [cf. (6)]\n\nyj = V\u02dcyj = Vdiag(\u02dchj)\u02dcx = Vdiag(\u02dchj)VHx = Hj(S)x\n\n(8)\n\nwhere the output yj for each scale is computed as a linear operation Hj(S) on the input data x.\nAn important property of wavelets in general, and graph wavelets in particular, is that they conform\na frame [32]. This controls the spread of energy when computing the multiresolution output. For\n0 < A \u2264 B < \u221e and a multiresolution wavelet bank {hj}J\n\nj=1, it conforms a frame if\n\nA2(cid:107)x(cid:107)2 \u2264\n\n(cid:107)Hj(S)x(cid:107)2 \u2264 B2(cid:107)x(cid:107)2.\n\n(9)\n\nJ(cid:88)\n\nj=1\n\nFor wavelets constructed following the above method, it is proven that they always conform a\nframe [31, Theorem 5.6]. In particular, the work in [32] designs graph wavelets that are tight, which\nmeans that A = B in (9).\nWe note that every analytic function h(\u03bb) can be computed in terms of a graph convolution (5). More\nprecisely, an analytic function can be written in terms of a power series, but since graphs are \ufb01nite, in\nvirtue of the Cayley-Hamilton theorem [33, Theorem 2.4.2], this power series can be written as a\npolynomial of degree at most N \u2212 1, i.e. by setting K = N in (5). Moreover, [31, Sec. 6] provides a\nmethod for fast computation of the output of graph wavelets, by approximation with a polynomial of\norder K (cid:28) N.\nFinally, we de\ufb01ne a graph scattering transform (GST), as an architecture of the form (1)-(2), but\nwhere we replace regular convolutions by graph convolutions (5) with a bank of analytic graph\nwavelets {hj}J\n4 Stability to perturbations\n\nj=1 that conform a frame (9).\n\nRegular scattering transforms have been proven invariant to translations and stable to perturbations\n(or deformations) that are close to translations. That is, the difference on the scattering transform of\nthe original data and that of the perturbed data, is proportional to the size of the perturbation. In the\ncase of network data, we consider perturbations to the underlying graph support. More speci\ufb01cally,\n\nwe consider a N-node graph G with a GSO S and a perturbed N-node graph (cid:98)G with a GSO \u02c6S. The\n\nobjective, then, is to prove that the GST is a stable operation under such perturbations, namely that\n\n(cid:13)(cid:13)(cid:13)\u03a6(S, x) \u2212 \u03a6(\u02c6S, x)\n\n(cid:13)(cid:13)(cid:13) (cid:46) d(S, \u02c6S)\n\n(10)\n\nfor some distance d(S, \u02c6S) measuring the size of the perturbation. Perturbations on the underlying\ngraph support are particularly useful in cases when the graph is unknown and needs to be estimated\n\n5\n\n\f[34], or when the graph changes with time [35]. Note that, since the wavelet functions hj(\u03bb) are\n\ufb01xed by design, then the analysis centers around how changes in the underlying graph support affect\nthe eigenvalues which instantiate the GFT of the wavelets, and how does the function hj(\u03bb) change\nits output when instantiated in different eigenvalues.\nFirst, we consider perturbations that arise from permutations, that amount to node reorderings. De\ufb01ne\nthe set of permutation matrices as\n\nP =(cid:8)P \u2208 {0, 1}N\u00d7N : P1 = 1 , PT1 = 1(cid:9) .\n\nNext, we show that the GST is invariant to permutations\n\nProposition 1 (Permutation invariance). Let G be a graph with a GSO S, and let (cid:98)G be a permuted\n\ngraph with GSO \u02c6S = PTSP. Let x be the input data and \u02c6x = PTx the correspondingly permuted\ndata. Then, it holds that\n\n(11)\n\n(12)\n\n\u03a6(S, x) = \u03a6(\u02c6S, \u02c6x)\n\nProp. 1 essentially states that the GST is independent of the chosen node ordering. Furthermore, it\nstates that the GST exploits the topological symmetries present in the graph, i.e., that nodes with the\nsame topological neighborhood yield the same output (if the value of the signal in the neighborhood\nis the same). In other words, different parts of the graph are distinct inasmuch as their neighborhood\ntopologies are distinct.\n\n4.1 Perturbation model\n\nWhen considering arbitrary perturbations \u02c6S of S, and in light of Prop. 1, we need to de\ufb01ne a distance\nd(S, \u02c6S) such that, when \u02c6S is a permutation of S, then d(S, \u02c6S) = 0. This would imply that, in the\nsame way regular scattering transforms are invariant to translations and stable to perturbations that\nare close to translations, GSTs are invariant to permutations and stable to perturbations that are close\nto permutations. De\ufb01ne the set of permutations that make S and \u02c6S the closest as\n\n(cid:13)(cid:13)(cid:13)PT \u02c6SP \u2212 S\n(cid:13)(cid:13)(cid:13) .\n\nP0 = argmin\nP\u2208P\nThen, we consider the set of error matrices to be\n\n(cid:110)\nPT \u02c6SP \u2212 S = EHS + SE , P \u2208 P0\n\nE(S, \u02c6S) =\n\n(cid:111)\n\n.\n\n(13)\n\n(14)\n\nAnd, since matrices E \u2208 E(S, \u02c6S) measure the (relative) difference between S and \u02c6S accounting for\nall possible permutations, then we can de\ufb01ne the distance that we use to measure perturbations as\n\nd(S, \u02c6S) = min\n\nE\u2208E(S,\u02c6S)(cid:107)E(cid:107).\n\n(15)\n\nNote that, indeed, if \u02c6S = PTSP is simply a permutation of S, then d(S, \u02c6S) = 0.\nRemark 1. The perturbation model in (14) and the consequent distance in (15) is a relative pertur-\nbation model. Relative perturbations successfully take into account structural characteristics of the\nunderlying graph such as sparsity, average degree, or mean edge weights. This is not the case when\nconsidering absolute perturbations, which is the model adopted in [21, 22, 27].\n\n4.2 Stability of graph wavelets\n\nChanges in the underlying graph support directly affect the output of \ufb01ltering the signal with a\nwavelet. That is, by changing the eigenvalues \u03bbi on which the wavelet h(\u03bb) is instantiated, the \ufb01lter\ntaps \u02dchi are changed, and so does the output \u02dcyi in virtue of (6). Thus, the \ufb01rst necessary result is to\nquantify the change in the output of a wavelet \ufb01lter. Given a wavelet function h(\u03bb) and corresponding\ninstantiations H(S) and H(\u02c6S), de\ufb01ne the wavelet output difference as\n\n(cid:13)(cid:13)(cid:13)H(S)x \u2212 PH(PT \u02c6SP)PTx\n\n(cid:27)\n(cid:13)(cid:13)(cid:13) \u2264 c(cid:107)x(cid:107)\n\n.\n\n(16)\n\n(cid:107)H(S) \u2212 H(\u02c6S)(cid:107) = inf\n\nc \u2265 0 : min\nP\u2208P\n\n(cid:26)\n\nWe can then bound the wavelet output difference as shown next.\n\n6\n\n\fProposition 2 (Graph wavelet stability). Let G be a graph with GSO S and (cid:98)G be the perturbed\n\ngraph with GSO \u02c6S, such that d(S, \u02c6S) \u2264 \u03b5/2. Let E \u2208 E(S, \u02c6S), consider its eigendecomposition\nE = UMUH where the eigenvalues in M = diag(m1, . . . , mN ) are ordered such that |m1| \u2264\n\u00b7\u00b7\u00b7 \u2264 |mN|, and assume that the structural constraint (cid:107)E/mN \u2212 I(cid:107) \u2264 \u03b5 holds. Let h(\u03bb) be a graph\nwavelet that satis\ufb01es the integral Lipschitz constraint |\u03bbh(cid:48)(\u03bb)| \u2264 C. Then, it holds that\n\n(cid:107)H(S) \u2212 H(\u02c6S)(cid:107) \u2264 \u03b5C + O(\u03b52)\n\n(17)\n\nThe bound in Prop. 2 shows that the wavelet output difference is proportional to the size \u03b5 of the\nperturbation. The structural constraint (cid:107)E/mN \u2212 I(cid:107) limits the changes in the structure of the graph,\nsuch as changes in sparsity or average degree and determines a cost for different perturbations.\nFor instance, changing all the edge weights by the same amount does not affect the topology\nstructure and thus (cid:107)E/mN \u2212 I(cid:107) = 0. Also, while changing some edge weights by \u03b5/2 satis\ufb01es the\nconstraint, contracting some edges by \u03b5/2 and dilating others in the same amount actually requires\n(cid:107)E/mN \u2212 I(cid:107) = O(1). Finally, we note that graph perturbations such as adding and/or dropping\nedges altogether leads to (cid:107)E/mN \u2212 I(cid:107) = O(1) as well. In a way, d(S, \u02c6S) \u2264 \u03b5/2 limits the maximum\nedge weight change, while (cid:107)E/mN \u2212 I(cid:107) \u2264 \u03b5 limits how the edge weight changes affect the overall\ngraph topology.\nRemark 2. In what follows, we consider the low-pass average operator U to be independent of the\ngraph shift operator structure S. In particular, we choose U to be a straightforward average of the\nrepresentation obtained at all nodes, i.e. U = N\u221211T. In the appendix, we offer a proof of stability\nfor cases in which U depends on S as well.\n\n4.3 Stability of graph scattering transform\nThe integral Lipschitz condition |\u03bbh(cid:48)(\u03bb)| \u2264 C requires the wavelet to be constant in high-eigenvalue\nfrequencies (i.e. for \u03bb \u2192 \u221e, the derivative h(cid:48)(\u03bb) has to go to 0). This implies that information\nlocated in high-eigenvalue frequencies cannot be adequately discriminated (i.e. the output of the\nwavelet is the same for a broad band of the high-eigenvalue frequencies). Therefore, integral Lipschitz\nwavelets are stable, but not discriminative enough.\nGSTs address this issue by incorporating pointwise nonlinearities. The effect of the pointwise\nnonlinearities is to cause a spillage of information throughout the frequency spectrum, in particular,\ninto low-eigenvalue frequencies, which can then be discriminated in a stable fashion. Thus, GSTs are\nstable and discriminative information processing architectures.\nTo give a bound on the stability of the GST, we \ufb01rst derive a bound on the difference of a single GST\ncoef\ufb01cient, when computed on different graphs.\n\nProposition 3 (GST coef\ufb01cient stability). Let G be a graph with GSO S and (cid:98)G be the perturbed\n\ngraph with GSO \u02c6S, such that d(S, \u02c6S) \u2264 \u03b5/2. Let E \u2208 E(S, \u02c6S), consider its eigendecomposition\nE = UMUH where the eigenvalues in M = diag(m1, . . . , mN ) are ordered such that |m1| \u2264\n\u00b7\u00b7\u00b7 \u2264 |mN|, and assume that the structural constraint (cid:107)E/mN \u2212 I(cid:107) \u2264 \u03b5 holds. Consider a GST\nwith L layers and J wavelet scales hj(\u03bb), each of which satis\ufb01es the integral Lipschitz constraint\n|\u03bbh(cid:48)\nj(\u03bb)| \u2264 C and conform a frame with bounds 0 < A \u2264 B [cf. (9)]. Then, for the coef\ufb01cient \u03c6pj ((cid:96))\nassociated to path pj((cid:96)) = (j1, . . . , j(cid:96)) it holds that\n\n|\u03c6pj ((cid:96))(S, x) \u2212 \u03c6pj ((cid:96))(\u02c6S, x)| \u2264 \u03b5C(cid:96)B(cid:96)\u22121(cid:107)x(cid:107).\n\n(18)\n\nWe note that for wavelets hj built as a rescaling of a mother wavelet h [31], then it suf\ufb01ces for h to\nsatisfy the integral Lipschitz constraint |\u03bbh(cid:48)(\u03bb)| \u2264 C for all wavelets hj to satisfy the constraint as\nwell. The bound in Prop. 3 can be used to prove stability for the entire GST representation.\nTheorem 1 (GST stability). Under the conditions of Proposition 3 it holds that\n\n(cid:13)(cid:13)(cid:13)\u03a6(S, x) \u2212 \u03a6(\u02c6S, x)\n\n(cid:13)(cid:13)(cid:13) \u2264\n\n\u03b5C\nB\n\n(cid:32)L\u22121(cid:88)\n\n(cid:96)=0\n\n(cid:33)1/2\n\n(cid:96)2(B2J)(cid:96)\n\n(cid:107)x(cid:107).\n\n(19)\n\nFirst of all, we observe that the bound (19) is linear in the perturbation size \u03b5, thus proving stability\nof the GST transform. Also, the proportionality constant depends on the characteristics of the GST\n\n7\n\n\farchitecture, but not on the spectral gap nor any other characteristic of the underlying graph. It is\nlinear also in the integral Lipschitz constant C, and depends exponentially on the upper bound of the\n\ufb01lters B and on the number of scales J, with the exponential factor given by the number of layers L.\nTheorem 1 provides a bound that is independent of graph properties. This in contrast to results\nin [21, 22, 27] that depend on spectral signatures of the graph. An interesting consequence of this fact\nis that it makes it ready to take limits as we grow the number of nodes in the graph. There is, in fact,\nno limit to be taken as the bound holds for all graphs.\nOf particular importance is the limit of a line graph in which case we partially recover the seminal\nstability results for scattering transforms using regular convolutions in [10]. The difference between\nTheorem 1 and the results in [10] is our restriction that the perturbation matrix be close to an identity.\nThis means we can perturb the line graph by dilating all edges or by contracting all edges. Dilations\nand contractions can be different for different nodes but we cannot have a mix of dilation and\ncontraction in different parts of the line. This is allowed in [10] where perturbations are arbitrary\ndiffeomorphisms. Yet, we note that even in the context of diffeomorphisms, perturbations such as\ndropping an edge, still have a large gradient since it implies folding two points into one.\nThe reason for the relative weakness of the result is that [10] leverages extrinsic geometric information\nthat is not available in an analysis that applies to arbitrary graphs. More precisely, [10] uses the\nknowledge of the underlying geometry of the Euclidean space to compute the bounds (i.e. they are\nderived for continuous space Rd). In the context of this work, this amounts to using this extrinsic\nknowledge to bound the difference between the eigenvector basis of S and that of \u02c6S. When we want\na general result applicable to any graph, as is the case of Theorem 1, we need some (external) means\nof bounding how different the eigenvector basis are, and this is achieved by means of the structural\nconstraint. All in all, this implies that if we have speci\ufb01c knowledge of the domain where the graph\nand the possible perturbations leave, then we can improve on (19) by leveraging this information to\nbound the difference in the eigenvector basis.\n\n5 Numerical results\n\nFor the numerical experiments, we consider three scenarios2: representation error over a synthetic\nsmall world graph, authorship attribution and source localization over a Facebook subgraph, namely\nthe same problems considered in [22]. We note that we are concerned with studying how changes in\nthe underlying graph support S affect the output of the graph scattering transforms, when applied to\nthe same input data x. As such, we are interested in datasets where we can keep x constant while\nchanging S, i.e. scenarios involving data modeled as graph signals. In all cases, we study the GST\ncarried out by three different wavelets: a monic cubic polynomial as suggested in [31], a tight Hann\nwavelet as in [32], and the geometric scattering introduced in [28]. For comparison, we consider\nthe GFT as a linear, graph-based representation of the data and a trainable GIN [36]. We note that\nan exhaustive comparison between scattering transforms and other more traditional graph-based\nmethods can be found in [28]. Complete details of all simulations are provided in the appendix. We\nconsider GSTs with 6 scales and 3 layers, yielding representations with 43 coef\ufb01cients when using\nthe monic cubic polynomial [31] and a tight Hann wavelet [32]. For the geometric scattering we\nconsider the low pass operator to compute 4 moments, as used in [28], leading to 172 coef\ufb01cients. For\nscenarios two and three we consider a GFT with 43 coef\ufb01cients and a GIN that produce 43 features\nin the hidden layer.\nThe \ufb01rst experiment is used to corroborate numerically the stability of the GST, and consists of comput-\ning the representation error obtained by transforming a white noise signal de\ufb01ned over a small world\ngraph of 100 nodes. We compute the relative representation error (cid:107)\u03a6(S, x) \u2212 \u03a6(\u02c6S, x)(cid:107)/(cid:107)\u03a6(S, x)(cid:107)\nand show the results in Fig. 2a. We observe that the GST incurs in up to 4 orders of magnitude less\nrelative representation error than the GFT, resulting in markedly more stable representations. Within\nthe different choices of wavelets, the geometric scattering is the more stable. Also, we show the\ntheoretical bound of Theorem 1, computed as in (19) for the monic cubic polynomial wavelets, and\nwhere the values of B and C where obtained numerically (see the appendix for details). We see that\nthe bound is not tight, but it is still lower than the GFT.\n\n2Datasets and source code: http://github.com/alelab-upenn/graph-scattering-transforms\n\n8\n\n\f(a) Small world\n\n(b) Authorship attribution\n\n(c) Facebook graph\n\nFigure 2. (a) Difference in representation between the signal de\ufb01ned using the original GSO S and using the GSO\n\u02c6S corresponding to the deformed graph as a function of the perturbation size \u03b5 [cf. (15)]. (b)-(c) Classi\ufb01cation\naccuracy as a function of perturbation for the authorship attribution and the Facebook graph, respectively.\n\nFor the second and third experiments, we consider two problems involving real-world data. The\nobjective is twofold: (i) to show that the GST representations are, at least, as rich as the widely\nused GFT representation, and (ii) to consider stability to real-world perturbations (as opposed to\ncontrolled perturbations like in the \ufb01rst experiment). In Fig. 2b we show the classi\ufb01cation accuracy\nin a problem involving authorship attribution of texts written by Jane Austen [37, 38], in the same\nscenario considered in [22]. The perturbation comes from considering different number of training\nexcerpts and amounts to uncertainty in estimating the underlying graph topology. It is immediate to\nnote that the performance obtained by a linear SVM classi\ufb01er operating on the GST representation is\ncomparable to that obtained when using the GFT, but worse than the GIN \u2013which is understandable\nsince the GIN has been trained for 40 epochs to \ufb01t the dataset\u2013. We also observe that the oscillation of\nthe mean classi\ufb01cation accuracy of the GFT (as well as the large error bars) show that is is much less\nstable than the GST. In Fig. 2c we show the classi\ufb01cation accuracy for a source localization problem\nover the 234-node Facebook subnetwork [39], as discussed in [22]. In this case, the perturbation\ncomes from randomly dropping edges with probability given in the x-axis of the \ufb01gure (from 0.01\nto 0.3). We observe that the GST using tight Hann wavelets and the geometric scattering transform\nachieve better performance than the GFT and similar to that of the trained GIN, while the GST\nusing monic cubic polynomials yields similar performance to the GFT. Finally, we note that the\nvariability in the GFT is signi\ufb01cantly larger than the geometric scattering and the tight Hann GST,\nbut comparable to the Monic Cubic GST.\n\n6 Conclusions\n\nWe have studied the stability properties of graph scattering transforms (GSTs) built with integral\nLipschitz wavelets. We have introduced a relative perturbation model that takes into account the\nstructure of the graph as well as its edge weights. We proved stability of the GST, by which changes\nin the output of the GST are bounded proportionally to size of the perturbation of the underlying\ngraph. The proportionality constant depends on the model characteristics (number of scales, number\nof layers, chosen wavelets) but does not depend on characteristics of the graph. Finally, we used\nnumerical experiments to show that the GST representation is also rich enough to achieve comparable\nperformance as the popular GFT, which is a linear, graph-based representation.\n\nAcknowledgments\n\nFernando Gama and Alejandro Ribeiro are supported by NSF CCF 1717120, ARO W911NF1710438,\nARL DCIST CRA W911NF-17-2-0181, ISTC-WAS and Intel DevCloud.\nJoan Bruna is partially supported by the Alfred P. Sloan Foundation, NSF RI-1816753, NSF CAREER\nCIF 1845360, and Samsung Electronics.\nWe thank Edouard Oyallon for reviewing an earlier version of the manuscript and suggesting valuable\nimprovements.\n\n9\n\n0.20.40.60.81.0\u03b510\u2212510\u2212410\u2212310\u2212210\u22121100RepresentationError:k\u03a6(S,x)\u2212\u03a6(\u02c6S,x)k/k\u03a6(S,x)kGeometricMonicCubicTightHannGFTBound600800100012001400Numberoftrainingsamples0.700.750.800.850.900.95Classi\ufb01cationaccuracyGeometricMonicCubicTightHannGFTGIN10\u2212210\u22121Probabilityofedgefailure0.700.750.800.850.900.951.00Classi\ufb01cationaccuracyGeometricMonicCubicTightHannGFTGIN\fReferences\n[1] C. R. Rao, Linear Statistical Inference and its Applications, 2nd ed., ser. Wiley Series in\n\nProbability and Statistics. New York, NY: John Wiley & Sons, 1973.\n\n[2] B. D. O. Anderson and J. B. Moore, Optimal Filtering, ser. Information and System Sciences\n\nSeries. Englewood Cliffs, NJ: Prentice-Hall, 1979.\n\n[3] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, ser. Signal\n\nProcessing Series. Upper Saddle River, NJ: Prentice-Hall, 1993.\n\n[4] T. Kailath, Linear Systems. Englewood Cliffs, NJ: Prentice-Hall, 1980.\n[5] K. P. Murphy, Machine Learning: A Probabilistic Perspective, ser. Adaptive Computation and\n\nMachine Learning. Cambridge, MA: The MIT Press, 2012.\n\n[6] S. Haykin, Adaptive Filter Theory, 3rd ed., ser. Information and System Sciences Series. Upper\n\nSaddle River, NJ: Prentice Hall, 1996.\n\n[7] Y. LeCun, Y. Bengio, and G. Hinton, \u201cDeep learning,\u201d Nature, vol. 521, no. 7553, pp. 85\u2013117,\n\n2015.\n\n[8] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, ser. The Adaptive Computation\n\nand Machine Learning Series. Cambridge, MA: The MIT Press, 2016.\n\n[9] M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations. Camb-\n\ndridge, UK: Cambrdige University Press, 1999.\n\n[10] S. Mallat, \u201cGroup invariant scattering,\u201d Commun. Pure, Appl. Math., vol. 65, no. 10, pp.\n\n1331\u20131398, Oct. 2012.\n\n[11] J. Bruna and S. Mallat, \u201cInvariant scattering convolution networks,\u201d IEEE Trans. Pattern Anal.\n\nMach. Intell., vol. 35, no. 8, pp. 1872\u20131886, Aug. 2013.\n\n[12] A. Sandryhaila and J. M. F. Moura, \u201cDiscrete signal processing on graphs,\u201d IEEE Trans. Signal\n\nProcess., vol. 61, no. 7, pp. 1644\u20131656, Apr. 2013.\n\n[13] A. Sandyhaila and J. M. F. Moura, \u201cDiscrete signal processing on graphs: Frequency analysis,\u201d\n\nIEEE Trans. Signal Process., vol. 62, no. 12, pp. 3042\u20133054, June 2014.\n\n[14] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, \u201cThe emerging \ufb01eld\nof signal processing on graphs: Extending high-dimensional data analysis to networks and other\nirregular domains,\u201d IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83\u201398, May 2013.\n\n[15] A. Ortega, P. Frossard, J. Kova\u02c7cevi\u00b4c, J. M. F. Moura, and P. Vandergheynst, \u201cGraph signal\nprocessing: Overview, challenges and applications,\u201d Proc. IEEE, vol. 106, no. 5, pp. 808\u2013828,\nMay 2018.\n\n[16] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, \u201cThe graph neural\n\nnetwork model,\u201d IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 61\u201380, Jan. 2009.\n\n[17] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, \u201cSpectral networks and deep locally connected\nnetworks on graphs,\u201d in 2nd Int. Conf. Learning Representations. Banff, AB: Assoc. Comput.\nLinguistics, 14-16 Apr. 2014, pp. 1\u201314.\n\n[18] M. Defferrard, X. Bresson, and P. Vandergheynst, \u201cConvolutional neural networks on graphs\nwith fast localized spectral \ufb01ltering,\u201d in 30th Conf. Neural Inform. Process. Syst. Barcelona,\nSpain: Neural Inform. Process. Foundation, 5-10 Dec. 2016, pp. 3844\u20133858.\n\n[19] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, \u201cGeometric deep\nlearning: Going beyond Euclidean data,\u201d IEEE Signal Process. Mag., vol. 34, no. 4, pp. 18\u201342,\nJuly 2017.\n\n[20] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, \u201cConvolutional neural network architectures\nfor signals supported on graphs,\u201d IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1034\u20131049,\nFeb. 2019.\n\n[21] D. Zou and G. Lerman, \u201cGraph convolutional neural networks via scattering,\u201d Appl. Comput.\nHarmonic Anal., 13 June 2019, accepted for publication (in press). [Online]. Available:\nhttp://doi.org/10.1016/j.acha.2019.06.003\n\n[22] F. Gama, A. Ribeiro, and J. Bruna, \u201cDiffusion scattering transforms on graphs,\u201d in 7th Int. Conf.\nLearning Representations. New Orleans, LA: Assoc. Comput. Linguistics, 6-9 May 2019, pp.\n1\u201312.\n\n10\n\n\f[23] M. Perlmutter, G. Wolf, and M. Hirn, \u201cGeometric scattering on manifolds,\u201d arXiv:1812.06968v3\n\n[stat.ML], 4 Feb. 2019. [Online]. Available: http://arxiv.org/abs/1812.06968\n\n[24] R. R. Coifman and M. Maggioni, \u201cDiffusion wavelets,\u201d Appl. Comput. Harmonic Anal., vol. 21,\n\nno. 1, pp. 53\u201394, July 2006.\n\n[25] B. Nadler, S. Lafon, I. Kevrekidis, and R. R. Coifman, \u201cDiffusion maps, spectral clustering\nand eigenfunctions of Fokker-Planck operators,\u201d in 19th Conf. Neural Inform. Process. Syst.\nVancouver, BC: Neural Inform. Process. Syst. Foundation, 5-8 Dec. 2005, pp. 1\u20138.\n\n[26] R. R. Coifman and S. Lafon, \u201cDiffusion maps,\u201d Appl. Comput. Harmonic Anal., vol. 21, no. 1,\n\npp. 5\u201330, July 2006.\n\n[27] R. Levie, E. Isu\ufb01, and G. Kutyniok, \u201cOn the transferability of spectral graph \ufb01lters,\u201d in 13th Int.\n\nConf. Sampling Theory Applications. Bordeaux, France: IEEE, 8-12 July 2019, pp. 1\u20135.\n\n[28] F. Gao, G. Wolf, and M. Hirn, \u201cGeometric scattering for graph data analysis,\u201d in 36th Int. Conf.\n\nMach. Learning, Long Beach, CA, 15-9 June 2019, pp. 1\u201310.\n\n[29] S. Segarra, A. G. Marques, and A. Ribeiro, \u201cOptimal graph-\ufb01lter design and applications\nto distributed linear network operators,\u201d IEEE Trans. Signal Process., vol. 65, no. 15, pp.\n4117\u20134131, Aug. 2017.\n\n[30] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, 3rd ed. Upper Saddle\n\nRiver, NJ: Pearson, 2010.\n\n[31] D. K. Hammond, P. Vandergheynst, and R. Gribonval, \u201cWavelets on graphs via spectral graph\n\ntheory,\u201d Appl. Comput. Harmonic Anal., vol. 30, no. 2, pp. 129\u2013150, March 2011.\n\n[32] D. I. Shuman, C. Wiesmeyr, N. Holighaus, and P. Vandergheynst, \u201cSpectrum-adapted tight\ngraph wavelet and vertex-frequency frames,\u201d IEEE Trans. Signal Process., vol. 63, no. 16, pp.\n4223\u20134235, Aug. 2015.\n\n[33] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, UK: Cambridge University Press,\n\n1985.\n\n[34] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, \u201cNetwork topology inference from\nspectral templates,\u201d IEEE Trans. Signal, Inform. Process. Networks, vol. 3, no. 3, pp. 467\u2013483,\nSep. 2017.\n\n[35] E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and A. Ribeiro, \u201cLearning decentralized\ncontrollers for robot swarms with graph neural networks,\u201d in Conf. Robot Learning 2019.\nOsaka, Japan: Int. Found. Robotics Res., 30 Oct.-1 Nov. 2019.\n\n[36] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, \u201cHow powerful are graph neural networks?\u201d in 7th\nInt. Conf. Learning Representations. New Orleans, LA: Assoc. Comput. Linguistics, 6-9 May\n2019, pp. 1\u201317.\n\n[37] S. Segarra, M. Eisen, and A. Ribeiro, \u201cAuthorship attribution through function word adjacency\n\nnetworks,\u201d IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5464\u20135478, Oct. 2015.\n\n[38] E. Isu\ufb01, F. Gama, and A. Ribeiro, \u201cGeneralizing graph convolutional neural networks with\nedge-variant recursions on graphs,\u201d in 27th Eur. Signal Process. Conf. A Coru\u00f1a, Spain: Eur.\nAssoc. Signal Process., 2-6 Sep. 2019.\n\n[39] J. McAuley and J. Leskovec, \u201cLearning to discover social circles in Ego networks,\u201d in 26th\nConf. Neural Inform. Process. Syst. Stateline, TX: Neural Inform. Process. Foundation, 3-8\nDec. 2012.\n\n11\n\n\f", "award": [], "sourceid": 4400, "authors": [{"given_name": "Fernando", "family_name": "Gama", "institution": "University of Pennsylvania"}, {"given_name": "Alejandro", "family_name": "Ribeiro", "institution": "University of Pennsylvania"}, {"given_name": "Joan", "family_name": "Bruna", "institution": "NYU"}]}