{"title": "Phase transition in the family of p-resistances", "book": "Advances in Neural Information Processing Systems", "page_first": 379, "page_last": 387, "abstract": "We study the family of p-resistances on graphs for p \u2265 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p=1, the p-resistance coincides with the shortest path distance, for p=2 it coincides with the standard resistance distance, and for p \u2192 \u221e it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase-transition takes place. There exist two critical thresholds p^* and p^** such that if p < p^*, then the p-resistance depends on meaningful global properties of the graph, whereas if p > p^**, it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p^* = 1 + 1/(d-1) and p^** = 1 + 1/(d-2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p^* and p^** is an artifact of our proofs. We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p^* + 1/q = 1.", "full_text": "Phase transition in the family of p-resistances\n\nMorteza Alamgir\n\nT\u00a8ubingen, Germany\n\nUlrike von Luxburg\n\nT\u00a8ubingen, Germany\n\nMax Planck Institute for Intelligent Systems\n\nMax Planck Institute for Intelligent Systems\n\nmorteza@tuebingen.mpg.de\n\nulrike.luxburg@tuebingen.mpg.de\n\nAbstract\n\nWe study the family of p-resistances on graphs for p  1. This family generalizes\nthe standard resistance distance. We prove that for any \ufb01xed graph, for p = 1\nthe p-resistance coincides with the shortest path distance, for p = 2 it coincides\nwith the standard resistance distance, and for p ! 1 it converges to the inverse\nof the minimal s-t-cut in the graph. Secondly, we consider the special case of\nrandom geometric graphs (such as k-nearest neighbor graphs) when the number\nn of vertices in the graph tends to in\ufb01nity. We prove that an interesting phase\ntransition takes place. There exist two critical thresholds p\u21e4 and p\u21e4\u21e4 such that\nif p < p\u21e4, then the p-resistance depends on meaningful global properties of the\ngraph, whereas if p > p\u21e4\u21e4, it only depends on trivial local quantities and does\nnot convey any useful information. We can explicitly compute the critical values:\np\u21e4 = 1 + 1/(d  1) and p\u21e4\u21e4 = 1 + 1/(d  2) where d is the dimension of\nthe underlying space (we believe that the fact that there is a small gap between\np\u21e4 and p\u21e4\u21e4 is an artifact of our proofs). We also relate our \ufb01ndings to Laplacian\nregularization and suggest to use q-Laplacians as regularizers, where q satis\ufb01es\n1/p\u21e4 + 1/q = 1.\n\n1\n\nIntroduction\n\nThe graph Laplacian is a popular tool for unsupervised and semi-supervised learning problems on\ngraphs. It is used in the context of spectral clustering, as a regularizer for semi-supervised learning,\nor to compute the resistance distance on graphs. However, it has been observed that under certain\ncircumstances, standard Laplacian-based methods show undesired artifacts. In the semi-supervised\nlearning setting Nadler et al. (2009) showed that as the number of unlabeled points increases, the so-\nlution obtained by Laplacian regularization degenerates to a non-informative function. von Luxburg\net al. (2010) proved that as the number of points increases, the resistance distance converges to a\nmeaningless limit function. Independently of these observations, a number of authors suggested to\ngeneralize Laplacian methods. The observation was that the \u201cstandard\u201d Laplacian methods corre-\nspond to a vector space setting with L2-norms, and that it might be bene\ufb01cial to work in a more\ngeneral Lp setting for p 6= 2 instead. See B\u00a8uhler and Hein (2009) for an application to clustering\nand Herbster and Lever (2009) for an application to label propagation. In this paper we take up\nseveral of these loose ends and connect them.\nThe main object under study in this paper is the family of p-resistances, which is a generalization\nof the standard resistance distance. Our \ufb01rst major result proves that the family of p-resistances is\nvery rich and contains several special cases. The general picture is that the smaller p is, the more the\nresistance is concentrated on \u201cshort paths\u201d. In particular, the case p = 1 corresponds to the shortest\npath distance in the graph, the case p = 2 to the standard resistance distance, and the case p ! 1\nto the inverse s-t-mincut.\nSecond, we study the behavior of p-resistances in the setting of random geometric graphs like lattice\ngraphs, \"-graphs or k-nearest neighbor graphs. We prove that as the sample size n increases, there\n\n1\n\n\fare two completely different regimes of behavior. Namely, there exist two critical thresholds p\u21e4 and\np\u21e4\u21e4 such that if p < p\u21e4, the p-resistances convey useful information about the global topology of\nthe data (such as its cluster properties), whereas for p > p\u21e4\u21e4 the resistance distances approximate a\nlimit that does not convey any useful information. We can explicitly compute the value of the critical\nthresholds p\u21e4 := 1 + 1/(d  1) and p\u21e4\u21e4 := 1 + 1/(d  2). This result even holds independently of\nthe exact construction of the geometric graph.\nThird, as we will see in Section 5, our results also shed light on the Laplacian regularization\nand semi-supervised learning setting. As there is a tight relationship between p-resistances and\ngraph Laplacians, we can reformulate the artifacts described in Nadler et al. (2009) in terms of\np-resistances. Taken together, our results suggest that standard Laplacian regularization should be\nreplaced by q-Laplacian regularization (where q is such that 1/p\u21e4 + 1/q = 1).\n\n2\n\nIntuition and main results\n\nConsider an undirected, weighted graph G = (V, E) with n vertices. As is standard in machine\nlearning, the edge weights are supposed to indicate similarity of the adjacent points (not distances).\nDenote the weight of edge e by we  0 and the degree of vertex u by du. The length of a path \nin the weighted graph is de\ufb01ned asPe2 1/we. In the electrical network interpretation, a graph is\nconsidered as a network where each edge e 2 E has resistance re = 1/we. The effective resistance\n(or resistance distance) R(s, t) between two vertices s and t in the network is de\ufb01ned as the overall\nresistance one obtains when connecting a unit volt battery to s and t. It can be computed in many\nways, but the one most useful for our paper is the following representation in terms of \ufb02ows (cf.\nSection IX.1 of Bollobas, 1998):\n\nR(s, t) = minnPe2E rei2\n\ne i = (ie)e2E unit \ufb02ow from s to to.\n\n(1)\nIn von Luxburg et al. (2010) it has been proved that in many random graph models, the resistance\ndistance R(s, t) between two vertices s and t converges to the trivial limit expression 1/ds + 1/dt\nas the size of the graph increases. We now want to present some intuition as to how this problem\ncan be resolved in a natural way. For a subset M \u21e2 E of edges we de\ufb01ne the contribution of M\nto the resistance R(s, t) as the part of the sum in (1) that runs over the edges in M. Let i\u21e4 be\na \ufb02ow minimizing (1). To explain our intuition we separate this \ufb02ow into two parts: R(s, t) =\nR(s, t)local + R(s, t)global. The part R(s, t)local stands for the contribution of i\u21e4 that stems from\nthe edges in small neighborhoods around s and t, whereas R(s, t)global is the contribution of the\nremaining edges (exact de\ufb01nition given below). A useful distance function is supposed to encode\nthe global geometry of the graph, for example its cluster properties. Hence, R(s, t)global should be\nthe most important part in this decomposition. However, in case of the standard resistance distance\nthe contribution of the global part becomes negligible as n ! 1 (for many different models of\ngraph construction). This effect happens because as the graph increases, there are so many different\npaths between s and t that once the \ufb02ow has left the neighborhood of s, electricity can \ufb02ow \u201cwithout\nconsiderable resistance\u201d. The \u201cbottleneck\u201d for the \ufb02ow is the part that comes from the edges in the\nlocal neighborhoods of s and t, because here the \ufb02ow has to concentrate on relatively few edges. So\nthe dominating part is R(s, t)local.\nIn order to de\ufb01ne a useful distance function, we have to ensure that the global part has a signi\ufb01cant\ncontribution to the overall resistance. To this end, we have to avoid that the \ufb02ow is distributed over\n\u201ctoo many paths\u201d. In machine learning terms, we would like to achieve a \ufb02ow that is \u201csparser\u201d\nin the number of paths it uses. From this point of view, a natural attempt is to replace the 2-norm-\noptimization problem (1) by a p-norm optimization problem for some p < 2. Based on this intuition,\nour idea is to replace the squares in the \ufb02ow problem (1) by a general exponent p  1 and de\ufb01ne the\nfollowing new distance function on the graph.\nDe\ufb01nition 1 (p-resistance) On any weighted graph G, for any p  1 we de\ufb01ne\n\nRp(s, t) := minnPe2E re|ie|p i = (ie)e2E unit \ufb02ow from s to to.\n\n(\u21e4)\nAs it turns out, our newly de\ufb01ned distance function Rp is closely related but not completely identical\nto the p-resistance RH\np de\ufb01ned by Herbster and Lever (2009). A discussion of this issue can be found\nin Section 6.1.\n\n2\n\n\f30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\n30\n\n(a) p = 2\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\n30\n\n(b) p = 1.33\n\n30\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\n30\n\n(c) p = 1.1\n\nFigure 1: The s-t-\ufb02ows minimizing (\u21e4) in a two-dimensional grid for different values of p. The\nsmaller p, the more the \ufb02ow concentrates along the shortest path.\n\nIn toy simulations we can observe that the desired effect of concentrating the \ufb02ow on fewer paths\ntakes place indeed. In Figure 1 we show how the optimal \ufb02ow between two points s and t gets\npropagated through the network. We can see that the smaller p is, the more the \ufb02ow is concentrated\nalong the shortest path between s and t. We are now going to formally investigate the in\ufb02uence of\nthe parameter p. Our \ufb01rst question is how the family Rp(s, t) behaves as a function of p (that is, on\na \ufb01xed graph and for \ufb01xed s, t). The answer is given in the following theorem.\n\nTheorem 2 (Family of p-resistances) For any weighted graph G the following statements are true:\n\n1. For p = 1, the p-resistance coincides with the shortest path distance on the graph.\n\n2. For p = 2, the p-resistance reduces to the standard resistance distance.\n3. For p ! 1, Rp(s, t)p1 converges to 1/m where m is the unweighted s-t-mincut.\n\nThis theorem shows that our intuition as outlined above was exactly the right one. The smaller p\nis, the more \ufb02ow is concentrated along straight paths. The extreme case is p = 1, which yields the\nshortest path distance. In the other direction, the larger p is, the more widely distributed the \ufb02ow is.\nMoreover, the theorem above suggests that for p close to 1, Rp encodes global information about the\npart of the graph that is concentrated around the shortest path. As p increases, global information\nis still present, but now describes a larger portion of the graph, say, its cluster structure. This is\nthe regime that is most interesting for machine learning. The larger p becomes, the less global\ninformation is present in Rp (because \ufb02ows even use extremely long paths that take long detours),\nand in the extreme case p ! 1 we are left with nothing but the information about the minimal\ns-t-cut. In many large graphs, the latter just contains local information about one of the points s or t\n(see the discussion at the end of this section). An illustration of the different behaviors can be found\nin Figure 2.\nThe next question, inspired by the results of von Luxburg et al. (2010), is what happens to Rp(s, t)\nif we \ufb01x p but consider a family (Gn)n2N of graphs such that the number n of vertices in Gn tends\nto 1. Let us consider geometric graphs such as k-nearest neighbor graphs or \"-graphs. We now\ngive exact de\ufb01nitions of the local and global contributions to the p-resistance. Let r and R be real\nnumbers that depend on n (they will be speci\ufb01ed in Section 4) and C  R/r a constant. We de\ufb01ne\nthe local neighborhood N (s) of vertex s as the ball with radius C \u00b7 r around s. We will see later that\nthe condition C  R/r ensures that N (s) contains at least all vertices adjacent to s. By abuse of\nnotation we also write e 2N (s) if both endpoints of edge e are contained in N (s). Let i\u21e4 be the\noptimal \ufb02ow in Problem (\u21e4). We de\ufb01ne\nRlocal\n\np\n\n(s, t) := Rlocal\n\n(s) + Rlocal\n\np\n\np\n\n(s, t). Our next result\nRlocal\nconveys that the behavior of the family of p-resistances shows an interesting phase transition. The\nstatements involve a term \u2327n that should be interpreted as the average degree in the graph Gn (exact\nde\ufb01nition see later).\n\n(s, t) := Rp(s, t)  Rlocal\n\n(t), and Rglobal\n\np\n\np\n\n(s) :=Pe2N (s) re|i\u21e4e|p,\n\np\n\n3\n\n\f50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n(a) p = 1\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n(b) p = 1.11\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n(c) p = 1.5\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\n400\n\n450\n\n500\n\n(d) p = 2\n\nFigure 2: Heat plots of the Rp distance matrices for a mixture of two Gaussians in R10. We can see\nthat the larger p it, the less pronounced the \u201cglobal information\u201d about the cluster structure is.\n\nTheorem 3 (Phase transition for p-resistances in large geometric graphs) Consider a family\n(Gn)n2N of unweighted geometric graphs on Rd, d > 2 that satis\ufb01es some general assumptions\n(see Section 4 for de\ufb01nitions and details). Fix two vertices s and t. De\ufb01ne the two critical values\np\u21e4 := 1 + 1/(d  1) and p\u21e4\u21e4 := 1 + 1/(d  2). Then, as n ! 1, the following statements hold:\n(s, t) ! 1, that is the global\n1. If p < p\u21e4 and \u2327n is sub-polynomial in n, then Rglobal\ncontribution dominates the local one.\n, that\n2. If p > p\u21e4\u21e4 and \u2327n ! 1, then Rlocal\nis all global information vanishes.\n\n(s, t) ! 1 and Rp(s, t) ! 1\n\n(s, t)/Rglobal\n\n(s, t)/Rlocal\n\ndp1\ns\n\n+ 1\n\ndp1\nt\n\np\n\np\n\np\n\np\n\nThis result is interesting. It shows that there exists a non-trivial point of phase transition in the\nbehavior of p-resistances: if p < p\u21e4, then p-resistances are informative about the global topology\nof the graph, whereas if p > p\u21e4\u21e4 the p-resistances converge to trivial distance functions that do not\ndepend on any global properties of the graph. In fact, we believe that p\u21e4\u21e4 should be 1 1/(d 1) as\nwell, but our current proof leaves the tiny gap between p\u21e4 = 1 1/(d 1) and p\u21e4\u21e4 = 1 1/(d 2).\nTheorem 3 is a substantial extension of the work of von Luxburg et al. (2010), in several respects.\nFirst, and most importantly, it shows the complete picture of the full range of p  1, and not\njust the single snapshot at p = 2. We can see that there is a range of values for p for which p-\nresistance distances convey very important information about the global topology of the graph, even\nin extremely large graphs. Also note how nicely Theorems 2 and 3 \ufb01t together. It is well-known\nthat as n ! 1, the shortest path distance corresponding to p = 1 converges to the (geodesic)\ndistance of s and t in the underlying space (Tenenbaum et al., 2000), which of course conveys\nglobal information. von Luxburg et al. (2010) proved that the standard resistance distance (p = 2)\nconverges to the trivial local limit. Theorem 3 now identi\ufb01es the point of phase transition p\u21e4 between\nthe boundary cases p = 1 and p = 2. Finally, for p ! 1, we know by Theorem 2 that the p-\nresistance converges to the inverse of the s-t-min-cut. It is widely believed that the minimal s-t cut\nin geometric graphs converges to the minimum of the degrees of s and t as n ! 1 (even though\na formal proof has yet to be presented and we cannot point to any reference). This is in alignment\nwith the result of Theorem 3 that the p-resistance converges to 1/dp1\n. As p ! 1,\nonly the smaller of the two degrees contributes to the local part, which agrees with the limit for the\ns-t-mincut.\n\n+ 1/dp1\n\ns\n\nt\n\n3 Equivalent optimization problems and proof of Theorem 2\n\nIn this section we will consider different optimization problems that are inherently related to p-\nresistances. All graphs in this section are considered to be weighted.\n\n3.1 Equivalent optimization problems\n\nConsider the following two optimization problems for p > 1:\n\nFlow-problem:\n\nRp(s, t) := minnPe2E re|ie|p  i = (ie)e2E unit \ufb02ow from s to to (\u21e4)\n\n4\n\n\fPotential problem: Cp(s, t) = minn Xe=(u,v)\n\n|'(u)  '(v)|1+ 1\n\np1\n\n1\n\nr\n\np1\ne\n\n '(s)  '(t) = 1o\n\n(\u21e4\u21e4)\n\nIt is well known that these two problems are equivalent for p = 2 (see Section 1.3 of Doyle and\nSnell, 2000). We will now extend this result to general p > 1.\n\nProposition 4 (Equivalent optimization problems) For p > 1, the following statements are true:\n\n1. The \ufb02ow problem (\u21e4) has a unique solution.\n2. The solutions of (\u21e4) and (\u21e4\u21e4) satisfy Rp(s, t) = (Cp(s, t)) 1\np1 .\n\nTo prove this proposition, we derive the Lagrange dual of problem (\u21e4) and use the homogeneity of\nthe variables to convert it to the form of problem (\u21e4\u21e4). Details can be found in the supplementary\nmaterial. With this proposition we can now easily see why Theorem 2 is true.\nProof of Theorem 2. Part (1).\nIf we set p = 1, Problem (\u21e4) coincides with the well-known linear\nprogramming formulation of the shortest path problem, see Chapter 12 of Bazaraa et al. (2010).\nPart (2). For p = 2, we get the well-known formula for the effective resistance.\nPart (3). For p ! 1, the objective function in the dual problem (\u21e4\u21e4) converges to\n\nC1(s, t) := minnPe=(u,v) |'(u)  '(v)| '(s)  '(t) = 1o.\n\nThis coincides with the well-known linear programming formulation of the min-cut problem in\nunweighted graphs. Using Proposition 4 we \ufb01nally obtain\n\nlim\np!1\n\nRp(s, t)p1 = lim\np!1\n\n1\n\nCp(s, t)\n\n=\n\n1\n\nC1(s, t)\n\n=\n\n1\n\ns-t-mincut .\n\n4 Geometric graphs and the Proof of Theorem 3\n\nIn this section we consider the class of geometric graphs. The vertices of such graphs consist of\npoints X1, .., Xn 2 Rd, and vertices are connected by edges if the corresponding points are \u201cclose\u201d\n(for example, they are k-nearest neighbors of each other). In most cases, we consider the set of\npoints as drawn i.i.d from some density on Rd. Consider the following general assumptions.\nGeneral Assumptions: Consider a family (Gn)n2N of unweighted geometric graphs where Gn is\nbased on X1, ..., Xn 2 M \u21e2 Rd, d > 2. We assume that there exist 0 < r \uf8ff R (depending on n\nand d) such that the following statements about Gn holds simultaneously for all x 2{ X1, ..., Xn}:\n1. Distribution of points: For \u21e2 2{ r, R} the number of sample points in B(x, \u21e2) is of the\n2. Graph connectivity: x is connected to all sample points inside B(x, r) and x is not con-\n\norder \u21e5(n \u00b7 \u21e2d).\nnected to any sample point outside B(x, R).\n\n3. Geometry of M: M is a compact, connected set such that M \\ @M is still connected.\nThe boundary @M is regular in the sense that there exist positive constants \u21b5> 0 and\n0, then for all points x 2 @M we have vol(B\"(x) \\ M ) \n\"0 > 0 such that if \"<\"\n\u21b5 vol(B\"(x)) (where vol denotes the Lebesgue volume). Essentially this condition just\nexcludes the situation where the boundary has arbitrarily thin spikes.\n\nIt is a straightforward consequence of these assumptions that there exists some function \u2327 (n) =: \u2327n\nsuch that r and R are both of the order \u21e5((\u2327n/n)1/d) and all degrees in the graph are of order \u21e5(\u2327n).\n\n4.1 Lower and upper bounds and the proof of Theorem 3\n\nTo prove Theorem 3 we need to study the balance between Rlocal\nshorthand notation\n\np\n\nand Rglobal\n\np\n\n. We introduce the\n\nT1 =\u21e5\u21e3\n\n1\n\nnp(11/d)1\u2327 p(1+1/d)1\n\nn\n\n\u2318 , T2 =\u21e5\u21e3\n\n1\n\n\u2327 2(p1)\nn\n\n1/rXk=1\n\n1\n\nk(d2)(p1)\u2318.\n\n5\n\n\fTheorem 5 (General bounds on Rlocal\n) Consider a family (Gn)n2N of unweighted\ngeometric graphs that satis\ufb01es the general assumptions. Then the following statements are true\nfor any \ufb01xed pair s, t of vertices in Gn:\n\nand Rglobal\n\np\n\np\n\n4C > Rlocal\n\np\n\n(s, t) \n\n1\n\ndp1\ns\n\n+\n\n1\n\ndp1\nt\n\nand\n\nT1 + T2  Rglobal\n\np\n\n(s, t)  T1.\n\nNote that by taking the sum of the two inequalities this theorem also leads to upper and lower\nbounds for Rp(s, t) itself. The proof of Theorem 5 consists of several parts. To derive lower bounds\non Rp(s, t) we construct a second graph G0n which is a contracted version of Gn. Lower bounds\ncan then be obtained by Rayleigh\u2019s monotonicity principle. To get upper bounds on Rp(s, t) we\ne,\nwhere i is any unit \ufb02ow from s to t. We construct a particular \ufb02ow that leads to a good upper bound.\nFinally, investigating the properties of lower and upper bounds we can derive the individual bounds\non Rlocal\nTheorem 3 can now be derived from Theorem 5 by straight forward computations.\n\nexploit the fact that the p-resistance in an unweighted graph can be upper bounded byPe2E ip\n\n. Details can be found in the supplementary material.\n\nand Rglobal\n\np\n\np\n\n4.2 Applications\n\nOur general results can directly be applied to many standard geometric graph models.\nThe \"-graph. We assume that X1, ..., Xn have been drawn i.i.d from some underlying density f\non Rd, where M := supp(f ) satis\ufb01es Part (3) of the general assumptions. Points are connected by\nunweighted edges in the graph if their Euclidean distances are smaller than \". Exploiting standard\nresults on \"-graphs (cf. the appendix in von Luxburg et al., 2010), it is easy to see that the general\nassumptions (1) and (2) are satis\ufb01ed with probability at least 1 c1n exp(c2n\"d) (where c1, c2 are\nconstants independent of n and d) with r = R = \" and \u2327n =\u21e5( n\"d). The probability converges to\n1 if n ! 1, \" ! 0 and n\"d/ log(n) ! 1.\nk-nearest neighbor graphs. We assume that X1, ..., Xn have been drawn i.i.d from some un-\nderlying density f on Rd, where M := supp(f ) satis\ufb01es Part (3) of the general assumptions. We\nconnect each point to its k nearest neighbors by an undirected, unweighted edge. Exploiting stan-\ndard results on kNN-graphs (cf.\nthe appendix in von Luxburg et al., 2010), it is easy to see that\nthe general assumptions (1) and (2) are satis\ufb01ed with probability at least 1  c1k exp(c2k) with\nr =\u21e5 (k/n)1/d, R =\u21e5 (k/n)1/d, and \u2327n = k. The probability converges to 1 if n ! 1,\nk ! 1, and k/ log(n) ! 1.\nLattice graphs. Consider uniform lattices such as the square lattice or triangular lattice in Rd.\nThese lattices have constant degrees, which means that \u2327n = \u21e5(1). If we denote the edge length of\ngrid by \", the total number of nodes in the support will be in the order of n =\u21e5(1 /\"d). This means\nthat the general assumptions hold for r = R = \" =\u21e5(\nn1/d ) and \u2327n = \u21e5(1). Note that while the\nlower bounds of Theorem 3 can be applied to the lattice case, our current upper bounds do not hold\nbecause they require that \u2327n ! 1.\n5 Regularization by p-Laplacians\n\n1\n\nOne of the most popular methods for semi-supervised learning on graphs is based on Laplacian\nregularization. In Zhu et al. (2003) the label assignment problem is formulated as\n\nsubject to\n\n' = argminx C(x)\n\n(2)\nwhere yi 2 {\u00b11}, C(x) := 'T L' is the energy function involving the standard (p = 2) graph\nLaplacian L. This formulation is appealing and works well for small sample problems. However,\nNadler et al. (2009) showed that the method is not well posed when the number of unlabeled data\npoints is very large. In this setting, the solution of the optimization problem converges to a constant\nfunction with \u201cspikes\u201d at the labeled points. We now present a simple theorem that connects these\n\ufb01ndings to those concerning the resistance distance.\n\n'(xi) = yi , i = 1, . . . , l\n\nTheorem 6 (Laplacian regularization in terms of resistance distance) Consider a semi-super-\nvised classi\ufb01cation problem with one labeled point per class: '(s) = 1, '(t) = 1. Denote\n\n6\n\n\fthe solution of (2) by '\u21e4, and let v be an unlabeled data point. Then\n\n'\u21e4(v)  '\u21e4(t) >' \u21e4(s)  '\u21e4(v) () R2(v, t) > R2(v, s).\n\nIt is easy to verify that '\u21e4 = L\u2020(es  et) and R2(s, t) = (es  et)T L\u2020(es  et) where L\u2020\nProof.\nis the pseudo-inverse of the Laplacian matrix L. Therefore we have '\u21e4(v) = (ev)T L\u2020(es  et) and\n\n'\u21e4(v)  '\u21e4(t) >' \u21e4(s)  '\u21e4(v) () (ev  et)T L\u2020(es  et) > (es  ev)T L\u2020(es  et)\n() (ev  et)T L\u2020(ev  et) > (ev  es)T L\u2020(ev  es) () R2(v, t) > R2(v, s).\n\n(a)\n\nHere in step (a) we use the symmetry of L\u2020 to state that eT\nWhat does this theorem mean? We have seen above that in case p = 2, if n ! 1,\n+\n\nv L\u2020es = eT\n\ns L\u2020ev.\n\n+\n\nand\n\nR2(v, t) \u21e1\n\n1\ndv\n\n1\ndt\n\nR2(v, s) \u21e1\n\n1\ndv\n\n1\nds\n\n.\n\n2\n\nHence, the theorem states that if we threshold the function '\u21e4 at 0 to separate the two classes, then\nall the points will be assigned to the labeled vertex with larger degree.\nOur conjecture is that an analogue to Theorem 6 also holds for general p. For a precise formulation,\nde\ufb01ne the matrix r as\n\nri,j =\u21e2'(i)  '(j)\nij )1/m)n1/n. Consider q such that 1/p +\nand introduce the matrix norm kAkm,n =Pi((Pj am\n1/q = 1. We conjecture that if we used krkq,q as a regularizer for semi-supervised learning, then\nthe corresponding solution '\u21e4 would satisfy\n\ni \u21e0 j\notherwise\n\n0\n\n'\u21e4(v)  '\u21e4(t) >' \u21e4(s)  '\u21e4(v) () Rp(v, t) > Rp(v, s).\n\nThat is, the solution of the q-regularized problem would assign labels according to the Rp-distances.\nIn particular, using q-regularization for the value q with 1/q + 1/p\u21e4 = 1 would resolve the artifacts\nof Laplacian regularization described in Nadler et al. (2009).\nIt is worth mentioning that this regularization is different from others in the literature. The usual\nLaplacian regularization term as in Zhu et al. (2003) coincides with krk2,2, Zhou and Sch\u00a8olkopf\n(2005) use the krk2,p norm, and our conjecture is that the krkq,q norm would be a good candidate.\nProving whether this conjecture is right or wrong is a subject of future work.\n\n6 Related families of distance functions on graphs\n\nIn this section we sketch some relations between p-resistances and other families of distances.\n\n6.1 Comparing Herbster\u2019s and our de\ufb01nition of p-resistances\nFor p \uf8ff 2, Herbster and Lever (2009) introduced the following de\ufb01nition of p-resistances:\nRH\n\n|'(u)  '(v)|p0\n\nwith CH\n\n1\n\np0 (s, t) :=\n\np0 (s, t) := minn Xe=(u,v)\n\nCH\n\np0 (s, t)\n\nre\n\n '(s)  '(t) = 1o.\n\nIn Section 3.1 we have seen that the potential and \ufb02ow optimization problems are duals of each\nother. Based on this derivation we believe that the natural way of relating RH and CH would be to\nreplace the p0 in Herbster\u2019s potential formulation by q0 such that 1/p0 +1/q0 = 1. That is, one would\nq0 . In particular, reducing Herbster\u2019s p0 towards 1\nhave to consider CH\nhas the same in\ufb02uence as increasing our p to in\ufb01nity and makes RH\np0 converge to the minimal s-t-cut.\nTo ease further comparison, let us assume for now that we use \u201cour\u201d p in the de\ufb01nition of Herbster\u2019s\nresistances. Then one can see by similar arguments as in Section 3.1 that RH\n\np0 := 1/CH\n\np can be rewritten as\n\nq0 and then de\ufb01ne bRH\np (s, t) := minnXe2E\n\nrp1\ne\n\nRH\n\n|ie|p i = (ie)e2E unit \ufb02ow from s to to.\n\n(H)\n\n7\n\n\fNow it is easy to see that the main difference between Herbster\u2019s de\ufb01nition (H) and our de\ufb01nition\n(\u21e4) is that (H) takes the power p  1 of the resistances re, while we keep the resistances with\npower 1. In many respects, Rp and RH\np have properties that are similar to each other: they satisfy\nslightly different versions (with different powers or weights) of the triangle inequality, Rayleigh\u2019s\nmonotonicity principle, laws for resistances in series and in parallel, and so on. We will not discuss\nfurther details due to space constraints.\n\n6.2 Other families of distances\n\nThere also exist other families of distances on graphs that share some of the properties of p-\nresistances. We will only discuss the ones that are most related to our work, for more references\nsee von Luxburg et al. (2010). The \ufb01rst such family was introduced by Yen et al. (2008), where the\nauthors use a statistical physics approach to reduce the in\ufb02uence of long paths to the distance. This\nfamily is parameterized by a parameter \u2713, contains the shortest path distance at one end (\u2713 ! 1)\nand the standard resistance distance at the other end (\u2713 ! 0). However, the construction is somewhat\nad hoc, the resulting distances cannot be computed in closed form and do not even satisfy the triangle\ninequality. A second family is the one of \u201clogarithmic forest distances\u201d by Chebotarev (2011). Even\nthough its derivation is complicated, it has a closed form solution and can be interpreted intuitively:\nThe contribution of a path to the overall distance is \u201cdiscounted\u201d by a factor (1/\u21b5)l where l is the\nlength of the path. For \u21b5 ! 0, the logarithmic forest distance distance converges to the shortest path\ndistance, for \u21b5 ! 1, it converges to the resistance distance.\nAt the time of writing this paper, the major disadvantage of both the families introduced by Yen\net al. (2008) and Chebotarev (2011) is that it is unknown how their distances behave as the size of\nthe graph increases. It is clear that on the one end (shortest path), they convey global information,\nwhereas on the other end (resistance distance) they depend on local quantities only when n ! 1.\nBut what happens to all intermediate parameter values? Do all of them lead to meaningless distances\nas n ! 1, or is there some interesting phase transition as well? As long as this question has not\nbeen answered, one should be careful when using these distances. In particular, it is unclear how the\nparameters (\u2713 and \u21b5, respectively) should be chosen, and it is hard to get an intuition about this.\n\n7 Conclusions\n\nt\n\ns + 1/dp1\n\nWe proved that the family of p-resistances has a wide range of behaviors. In particular, for p = 1\nit coincides with the shortest path distance, for p = 2 with the standard resistance distance and\nfor p ! 1 it is related to the minimal s-t-cut. Moreover, an interesting phase transition takes\nplace: in large geometric graphs such as k-nearest neighbor graphs, the p-resistance is governed by\nmeaningful global properties as long as p < p\u21e4 := 1 + 1/(d 1), whereas it converges to the trivial\nlocal quantity 1/dp1\nif p > p\u21e4\u21e4 := 1 + 1/(d  2). Our suggestion for practice is to use\np-resistances with p \u21e1 p\u21e4. For this value of p, the p-resistances encode those global properties of\nthe graph that are most important for machine learning, namely the cluster structure of the graph.\nOur \ufb01ndings are interesting on their own, but also help in explaining several artifacts discussed in the\nliterature. They go much beyond the work of von Luxburg et al. (2010) (which only studied the case\np = 2) and lead to an intuitive explanation of the artifacts of Laplacian regularization discovered in\nNadler et al. (2009). An interesting line of future research will be to connect our results to the ones\nabout p-eigenvectors of p-Laplacians (B\u00a8uhler and Hein, 2009). For p = 2, the resistance distance\ncan be expressed in terms of the eigenvalues and eigenvectors of the Laplacian. We are curious to\nsee whether a re\ufb01ned theory on p-eigenvalues can lead to similarly tight relationships for general\nvalues of p.\n\nAcknowledgements\n\nWe would like to thank the anonymous reviewers who discovered an inconsistency in our earlier\nproof, and Bernhard Sch\u00a8olkopf for helpful discussions.\n\n8\n\n\fReferences\nM. Bazaraa, J. Jarvis, and H. Sherali. Linear Programming and Network Flows. Wiley-Interscience,\n\n2010.\n\nB. Bollobas. Modern Graph Theory. Springer, 1998.\nT. B\u00a8uhler and M. Hein. Spectral clustering based on the graph p-Laplacian. In Proceedings of the\n\nInternational Conference on Machine Learning (ICML), pages 81\u201388, 2009.\n\nP. Chebotarev. A class of graph-geodetic distances generalizing the shortets path and the resistance\n\ndistances. Discrete Applied Mathematics, 159(295 \u2013 302), 2011.\n\nP. G. Doyle and J. Laurie Snell. Random walks and electric networks, 2000. URL http://www.\n\ncitebase.org/abstract?id=oai:arXiv.org:math/0001057.\n\nM. Herbster and G. Lever. Predicting the labelling of a graph via minimum p-seminorm interpola-\n\ntion. In Conference on Learning Theory (COLT), 2009.\n\nB. Nadler, N. Srebro, and X. Zhou. Semi-supervised learning with the graph Laplacian: The limit\nof in\ufb01nite unlabelled data. In Advances in Neural Information Processing Systems (NIPS), 2009.\nJ. Tenenbaum, V. de Silva, and J. Langford. Supplementary material to \u201dA Global Geometric\nFramework for Nonlinear Dimensionality Reduction\u201d. Science, 290:2319 \u2013 2323, 2000. URL\nhttp://isomap.stanford.edu/BdSLT.pdf.\n\nU. von Luxburg, A. Radl, and M. Hein. Getting lost in space: Large sample analysis of the commute\n\ndistance. In Neural Information Processing Systems (NIPS), 2010.\n\nL. Yen, M. Saerens, A. Mantrach, and M. Shimbo. A family of dissimilarity measures between\nnodes generalizing both the shortest-path and the commute-time distances. In Proceedings of the\n14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages\n785\u2013793, 2008.\n\nD. Zhou and B. Sch\u00a8olkopf. Regularization on discrete spaces. In DAGM-Symposium, pages 361\u2013\n\n368, 2005.\n\nX. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using Gaussian \ufb01elds and\n\nharmonic functions. In ICML, pages 912\u2013919, 2003.\n\n9\n\n\f", "award": [], "sourceid": 278, "authors": [{"given_name": "Morteza", "family_name": "Alamgir", "institution": null}, {"given_name": "Ulrike", "family_name": "Luxburg", "institution": null}]}