{"title": "Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling", "book": "Advances in Neural Information Processing Systems", "page_first": 1862, "page_last": 1870, "abstract": "Statistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches.", "full_text": "Fiedler Random Fields: A Large-Scale Spectral\n\nApproach to Statistical Network Modeling\n\nAntonino Freno\n\nMikaela Keller\u2217\n\nMarc Tommasi\u2217\n\nINRIA Lille \u2013 Nord Europe\n\n40 avenue Halley \u2013 B\u02c6at A \u2013 Park Plaza\n\n59650 Villeneuve d\u2019Ascq (France)\n\n{antonino.freno, mikaela.keller, marc.tommasi}@inria.fr\n\nAbstract\n\nStatistical models for networks have been typically committed to strong prior as-\nsumptions concerning the form of the modeled distributions. Moreover, the vast\nmajority of currently available models are explicitly designed for capturing some\nspeci\ufb01c graph properties (such as power-law degree distributions), which makes\nthem unsuitable for application to domains where the behavior of the target quan-\ntities is not known a priori. The key contribution of this paper is twofold. First,\nwe introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs,\nwhich allows to dispense with any parametric assumption concerning the modeled\nnetwork properties. Second, we use the de\ufb01ned statistic to develop the Fiedler\nrandom \ufb01eld model, which allows for ef\ufb01cient estimation of edge distributions\nover large-scale random networks. After analyzing the dependence structure in-\nvolved in Fiedler random \ufb01elds, we estimate them over several real-world net-\nworks, showing that they achieve a much higher modeling accuracy than other\nwell-known statistical approaches.\n\n1 Introduction\n\nArising from domains as diverse as bioinformatics and web mining, large-scale data exhibiting net-\nwork structure are becoming increasingly available. Network models are commonly used to rep-\nresent the relations among data units and their structural interactions. Recent studies, especially\ntargeted at social network modeling, have focused on random graph models of those networks. In\nthe simplest form, a random network is a con\ufb01guration of binary random variables Xuv such that\nthe value of Xuv stands for the presence or absence of a link between nodes u and v in the network.\nThe general idea underlying random graph modeling is that network con\ufb01gurations are generated\nby a stochastic process governed by speci\ufb01c probability laws, so that different models correspond to\ndifferent families of distributions over graphs.\n\nThe simplest random graph model is the Erd\u02ddos-R\u00b4enyi (ER) model [1], which assumes that the prob-\nability of observing a link between two nodes in a given graph is constant for any pair of nodes in\nthat graph, and it is independent of which other edges are being observed. In preferential attachment\nmodels [2], the probability of linking to any speci\ufb01ed node in a graph is proportional to the degree\nof the node in the graph, leading to \u201crich get richer\u201d effects. Small-world models [3] try to capture\ninstead such phenomena often observed in real networks as small diameters and high clustering co-\nef\ufb01cients. An attempt to model potentially complex dependencies between graph edges in the form\nof Gibbs-Boltzmann distributions is made by exponential random graph (ERG) models [4], which\nsubsume the ER model as a special case. Finally, a recent attempt at modeling real networks through\n\n\u2217Universit\u00b4e Charles de Gaulle \u2013 Lille 3, Domaine Universitaire du Pont de Bois \u2013 BP 60149, 59653 Vil-\n\nleneuve d\u2019Ascq (France).\n\n1\n\n\fa stochastic generative process is made by Kronecker graphs [5], which try to capture phenomena\nsuch as heavy-tailed degree distributions and shrinking diameter properties while paying attention\nto the temporal dynamics of network growth.\n\nWhile some of these models behave better than others in terms of computational tractability, one\nbasic limitation affecting all of them is a sort of parametric assumption concerning the probability\nlaws underlying the observed network properties. In other words, currently available models of net-\nwork structure assume that the shape of the probability distribution generating the network is known\na priori. For example, typical formulations of ERG models assume that the building blocks of real\nnetworks are given by such structures as k-stars and k-triangles, with different weights assigned to\ndifferent structures, whereas the preferential attachment model is committed to the assumption that\nthe observed degree distributions obey a power law. In such frameworks, estimating the model from\ndata reduces to \ufb01tting the model parameters, where the parametric form of the target distribution is\n\ufb01xed a priori. Clearly, in order for such models to deliver accurate estimates of the distributions at\nhand, their prior assumptions concerning the behavior of the target quantities must be satis\ufb01ed by\nthe given data. But unfortunately, this is something that we can rarely assess a priori. To date, the\nknowledge we have concerning large-scale real-world networks does not allow to assess whether\nany particular parametric assumption is capturing in depth the target generative process, although\nsome observed network properties may happen to be modeled fairly well.\n\nThe aim of this paper is twofold. On the one hand, we take a \ufb01rst step toward nonparametric\nmodeling of random networks by developing a novel network statistic, which we call the Fiedler\ndelta statistic. The Fiedler delta function allows to model different graph properties at once in an\nextremely compact form. This statistic is based on the spectral analysis of the graph, and in particular\non the smallest non-zero eigenvalue of the Laplacian matrix, which is known as Fiedler value [6, 7].\nOn the other hand, we use the Fiedler delta statistic to de\ufb01ne a Boltzmann distribution over graphs,\nleading to the Fiedler random \ufb01eld (FRF) model. Roughly speaking, for each binary edge variable\nXuv, potentials in a FRF are functions of the difference determined in the Fiedler value by \ufb02ipping\nthe value of Xuv, where the spectral decomposition is restricted to a suitable subgraph incident to\nnodes u, v. The intuition is that the information encapsulated in the Fiedler delta for Xuv gives\na measure of the role of Xuv in determining the algebraic connectivity of its neighborhood. As\na \ufb01rst step in the theoretical analysis of FRFs, we prove that these models allow to capture edge\ncorrelations at any distance within a given neighborhood, hence de\ufb01ning a fairly general class of\nconditional independence structures over networks.\n\nThe paper is organized as follows. Sec. 2 reviews some theoretical background concerning the\nLaplacian spectrum of graphs. FRFs are then introduced in Sec. 3, where we also analyze their\ndependence structure and present an ef\ufb01cient approach for learning them from data. To avoid un-\nwarranted prior assumptions concerning the statistical behavior of the Fiedler delta, potentials are\nmodeled by non-linear functions, which we estimate from data by minimizing a contrastive diver-\ngence objective. FRFs are evaluated experimentally in Sec. 4, showing that they are well suited for\nlarge-scale estimation problems over real-world networks, while Sec. 5 draws some conclusions and\nsketches a few directions for further work.\n\n2 Graphs, Laplacians, and eigenvalues\n\nLet G = (V, E) be an undirected graph with n nodes. In the following we assume that the graph is\nunweighted with adjacency matrix A. The degree du of a node u \u2208 V is de\ufb01ned as the number of\nconnections of u to other nodes, that is du = |{v: {u, v} \u2208 E}|. Accordingly, the degree matrix D of\na graph G corresponds to the diagonal matrix with the vertex degrees d1, . . . , dn on the diagonal. The\nmain tools exploited by the random graph model proposed here are the graph Laplacian matrices.\nDifferent graph Laplacians have been de\ufb01ned in the literature. In this work, we use consistently the\nunnormalized graph Laplacian, given by L = D \u2212 A. Some basic facts related to the unnormalized\nLaplacian matrix can be summarized as follows [7]:\nProposition 1. The unnormalized graph Laplacian L of an undirected graph G has the following\nproperties: (i) L is symmetric and positive semi-de\ufb01nite; (ii) the smallest eigenvalue of L is 0; (iii)\nL has n non-negative, real-valued eigenvalues 0 = \u03bb1 \u2264 . . . \u2264 \u03bbn; (iv) the multiplicity of the\neigenvalue 0 of L equals the number of connected components in the graph, that is, \u03bb1 = 0 and\n\u03bb2 > 0 if and only if G is connected.\n\n2\n\n\fIn the following, the (algebraic) multiplicity of an eigenvalue \u03bbi will be denoted by M (\u03bbi, G).\nIf the graph has one single connected component, then M (0, G) = 1, and the second smallest eigen-\nvalue \u03bb2(G) > 0 is called, in this case, the Fiedler eigenvalue. The Fiedler eigenvalue provides\ninsight into several graph properties: when there is a nontrivial spectral gap, i.e. \u03bb2(G) is clearly\nseparated from 0, the graph has good expansion properties, stronger connectivity, and rapid conver-\ngence of random walks in the graph. For example, it is known that \u03bb2(G) \u2264 \u00b5(G), where \u00b5(G) is the\nedge connectivity of the graph (i.e. the size of the smallest edge cut whose removal makes the graph\ndisconnected [7]). Notice that if the graph has more than one connected component, then \u03bb2(G) will\nbe also equal to zero, thus implying that the graph is not connected. Without loss of generality, we\nabuse the term Fiedler eigenvalue to denote the smallest eigenvalue different from zero, regardless\nof the number of connected components. In this paper, by Fiedler value we mean the eigenvalue\n\u03bbk+1(G), where k = M (0, G).\n\n= G or Guv\u2212\n\n= (V, E \u222a {{u, v}}), and Guv\u2212\n\nFor any pair of nodes u and v in a graph G = (V, E), we de\ufb01ne two corresponding graphs Guv+ and\nGuv\u2212 in the following way: Guv+\n= (V, E \\ {{u, v}}). Clearly, we\nhave that either Guv+\n= G. A basic property concerning the Laplacian eigenvalues of\nGuv+ and Guv\u2212 is the following [7, 8, 9]:\nLemma 1. If Guv+ and Guv\u2212 are two graphs with n nodes, such that {u, v} \u2286 V, Guv+\n{{u, v}}), and Guv\u2212\n(ii) \u03bbi(Guv+\n\n= (V, E \\ {{u, v}}), then we have that: (i)Pn\n\n) for any i such that 1 \u2264 i \u2264 n.\n\n= (V, E \u222a\n) = 2;\n\ni=1 \u03bbi(Guv+\n\n) \u2264 \u03bbi(Guv\u2212\n\n) \u2212 \u03bbi(Guv\u2212\n\n3 Fiedler random \ufb01elds\n\nFiedler random \ufb01elds are introduced in Sec. 3.1, while in Secs. 3.2\u20133.3 we discuss their dependence\nstructure and explain how to estimate them from data respectively.\n\n3.1 Probability distribution\n\nUsing the notions reviewed above, we de\ufb01ne the Fiedler delta function \u2206\u03bb2 in the following way:\nDe\ufb01nition 1. Given graph G, let k = M (0, Guv+\n\n\u2206\u03bb2(u, v, G) =(cid:26) \u03bbk+1(Guv+\n\n\u03bbk+1(Guv\u2212\n\n). Then,\n) \u2212 \u03bbk+1(Guv\u2212\n) \u2212 \u03bbk+1(Guv+\n\nif Xuv = 1\n)\n) otherwise\n\n(1)\n\nIn other words, \u2206\u03bb2(u, v, G) is the variation in the Fiedler eigenvalue of the graph Laplacian that\nwould result from \ufb02ipping the value of Xuv in G. Concerning the range of the Fiedler delta function,\nwe can easily prove the following proposition:\nProposition 2. For any graph G = (V, E) and any pair of nodes {u, v} such that Xuv = 1, we have\nthat 0 \u2264 \u2206\u03bb2(u, v, G) \u2264 2.\n\nProof. Let k = M (0, G). The proposition follows straightforwardly from Lemma 1, given that\n\u2206\u03bb2(u, v, G) = \u03bbk+1(G) \u2212 \u03bbk+1(Guv\u2212\n\n).\n\nWe now proceed to de\ufb01ne FRFs. Given a graph G = (V, E), for each (unordered) pair of nodes\n{u, v} such that u 6= v, we take Xuv to denote a binary random variable such that Xuv = 1 if\n{u, v} \u2208 E, and Xuv = 0 otherwise. Since the graph is undirected, Xuv = Xvu. We also say that a\n\nsubgraph GS of G with edge set ES is incident to Xuv if {u, v} \u2286Se\u2208ES\n\nDe\ufb01nition 2. Given a graph G, let XG denote the set of random variables de\ufb01ned on G, i.e. XG =\n{Xuv: u 6= v \u2227 {u, v} \u2286 V}. For any Xuv \u2208 XG, let Guv be a subgraph of G which is incident\nto Xuv and \u03d5uv be a two-place real-valued function with parameter vector \u03b8. We say that the\nprobability distribution of XG is a Fiedler random \ufb01eld if it factorizes as\n\ne. Then:\n\nP (XG| \u03b8) =\n\n1\n\nZ(\u03b8)\n\nexp\uf8eb\n\uf8ed XXuv \u2208XG\n\n\u03d5uv(cid:0)Xuv, \u2206\u03bb2(u, v, Guv); \u03b8(cid:1)\uf8f6\n\uf8f8\n\n(2)\n\n3\n\n\fwhere Z(\u03b8) is the partition function.\n\nIn other words, a FRF is a Gibbs-Boltzmann distribution over graphs, with potential functions de-\n\ufb01ned for each node pair {u, v} along with some neighboring subgraph Guv. In particular, in order\nto model the dependence of each variable Xuv on Guv, potentials take as argument both the value of\nXuv and the Fiedler delta corresponding to {u, v} in Guv. The idea is to treat the Fiedler delta statis-\ntic as a (real-valued) random variable de\ufb01ned over subgraph con\ufb01gurations, and to exploit this ran-\ndom variable as a compact representation of those con\ufb01gurations. This means that the dependence\nstructure of a FRF is \ufb01xed by the particular choice of subgraphs Guv, so that the set XGuv \\ {Xuv}\nmakes Xuv independent of XG \\ XGuv . Three fundamental questions are then the following. First,\nhow do we \ufb01x the subgraph Guv for each pair of nodes {u, v}? Second, how do we choose a shape\nfor the potential functions, so as to fully exploit the information contained in the Fiedler delta, while\navoiding unwarranted assumptions concerning their parametric form? Third, how does the Fiedler\ndelta statistic behave with respect to the Markov dependence property for random graphs? One basic\nresult related to the third question is presented in Sec. 3.2, while Sec. 3.3 will address the \ufb01rst two\npoints.\n\n3.2 Dependence structure\n\nWe \ufb01rst recall the de\ufb01nition of Markov dependence for random graphs [10]. Let N (Xuv) denote\nthe set {Xwz: {w, z} \u2208 E \u2227 |{w, z} \u2229 {u, v}| = 1}. Then:\nDe\ufb01nition 3. A random graph G is said to be a Markov graph (or to have a Markov dependence\nstructure) if, for any pair of variables Xuv and Xwz in G such that {u, v} \u2229 {w, z} = \u2205, we have\nthat P (Xuv| Xwz, N (Xuv)) = P (Xuv| N (Xuv)).\n\nBased on Def. 3, we say that the dependence structure of a random graph G is non-Markovian if,\nfor disjoint pairs of nodes {u, v} and {w, z}, it does not imply that P (Xuv| Xwz, N (Xuv)) =\nP (Xuv| N (Xuv)),\n6=\nP (Xuv| N (Xuv)). We can then prove the following proposition:\nProposition 3. There exist Fiedler random \ufb01elds with non-Markovian dependence structure.\n\nis consistent with the inequality P (Xuv| Xwz, N (Xuv))\n\nit\n\ni.e.\n\nif\n\nProof sketch. Consider a graph G = (V, E) such that V = {u, v, w, z} and E =\n{{u, v}, {v, w}, {w, z}, {u, z}}. The proof relies on the following result [6]:\nif graphs G1\nand G2 are, respectively, a path and a circuit of size n,\nthen \u03bb2(G1) = 2 (1 \u2212 cos(\u03c0/n))\nand \u03bb2(G2) = 2 (1 \u2212 cos(2\u03c0/n)). Since adding exactly one edge to a path of size 4 can\nyield a circuit of the same size, this property allows to derive analytic forms for the Fiedler\ndelta statistic in such graphs, showing that\nthere exist parameterizations of \u03d5uv such that\n\u03d5uv(Xuv, \u2206\u03bb2(u, v, G); \u03b8) 6= \u03d5uv(Xuv, \u2206\u03bb2(u, v, GS); \u03b8). This means that the dependence struc-\nture of G is non-Markovian.1\n\nNote that the proof of Prop. 3 can be straightforwardly generalized to the dependence between two\nvariables Xuv and Xwz in circuits/paths of arbitrary size n, since the expression used for the Fiedler\neigenvalues of such graphs holds for any n. This fact suggests that FRFs allow to model edge\ncorrelations at virtually any distance within G, provided that each subgraph Guv is chosen in such a\nway as to encompass the relevant correlation.\n\n3.3 Model estimation\n\nThe problem of learning a FRF from an observed network can be split into the task of estimating\nthe potential functions once the network distribution has been factorized into a particular set of\nsubgraphs, and the task of factorizing the distribution through a suitable set of subgraphs, which\ncorresponds to estimating the dependence structure of the FRF. Here we focus on the problem of\nlearning the FRF potentials, while suggesting a heuristic way to \ufb01x the dependence structure of the\nmodel.\n\nIn order to estimate the FRF potentials, we need to specify on the one hand a suitable architecture\nfor such functions, and on the other hand the objective function that we want to optimize. As a\n\n1For a complete proof, see the supplementary material.\n\n4\n\n\fpreliminary step, we tested experimentally a variety of shapes for the potential functions. The tests\nindicated the importance of avoiding limiting assumptions concerning the form of the potentials,\nwhich motivated us to model them by a feed-forward multilayer perceptron (MLP), due to its well-\nknown capabilities of approximating functions of arbitrary shape [12]. In particular, throughout\nthe applications described in this paper we use a simple MLP architecture with one hidden layer\nand hyperbolic tangent activation functions. Therefore, our parameter vector \u03b8 simply consists of\nthe weights of the chosen MLP architecture. Notice that, as far as the estimation of potentials is\nconcerned, any regression model offering approximation capabilities analogous to the MLP family\ncould be used as well. Here, the only requirement is to avoid unwarranted prior assumptions with\nrespect to the shape of the potential functions. In this respect, we take our approach to be genuinely\nnonparametric, since it does not require the parametric form of the target functions to be speci\ufb01ed\na priori in order to estimate them accurately. Concerning instead the learning objective, the main\ndif\ufb01culty we want to avoid is the complexity of computing the partition function involved in the\nGibbs-Boltzmann distribution. The approach we adopt to this aim is to minimize a contrastive\nIf G = (V, E) is the network that we want to \ufb01t our model to, and\ndivergence objective [13].\nGuv = (Vuv, Euv) is a subgraph of G such that {u, v} \u2208 Vuv, let G\u2217\nuv denote the graph that we obtain\n\nuv is the result of performing just one iteration\n{xuv}; \u03b8) predicted by our model. In other words, G\u2217\nof Gibbs sampling on Xuv using \u03b8, where the con\ufb01guration xGuv of Guv is used to initialize the\n(single-step) Markov chain. Then, our goal is to minimize the function \u2113CD(\u03b8; G), given by:\n\nby resampling the value of Xuv in Guv according to the conditional distribution bP (Xuv| xGuv \\\n\u2113CD(\u03b8; G) = log\uf8f1\uf8f2\n\uf8f3\n= XXuv \u2208XG\n\nexp\uf8eb\n\uf8ed XXuv \u2208XG\n(cid:26)\u03d5(cid:0)x\u2217\n\n\u2212 log bP (xG| \u03b8)\n\n\uf8fc\uf8fd\nuv); \u03b8(cid:1)\uf8f6\n\uf8f8\n\uf8fe\n\nuv, \u2206\u03bb2(u, v, G\u2217\n\n\u03d5(cid:0)x\u2217\nuv); \u03b8(cid:1) \u2212 \u03d5(cid:0)xuv, \u2206\u03bb2(u, v, Guv); \u03b8(cid:1)(cid:27)\n\n1\n\nZ(\u03b8)\n\nuv, \u2206\u03bb2(u, v, G\u2217\n\n(3)\n\nwhere \u03d5 is the function computed by our MLP architecture. The appeal of contrastive divergence\nlearning is that, while it does not require to compute the partition function, it is known to converge\nto points which are very close to maximum-likelihood solutions [14].\n\nIf we want our learning objective to be usable in the large-scale setting, then it is not feasible to\nsum over all node pairs {u, v} in the network, since the number of such pairs grows quadratically\nwith |V|. In this respect, a straightforward approach for scaling to very large networks consists in\nsampling n objects from the set of all possible pairs of nodes, taking care that the sample contains a\ngood balance between linked and unlinked pairs. Another issue we need to address concerns the way\nwe sample a suitable set of subgraphs Gu1v1 , . . . , Gun vn for the selected pairs of nodes. Although\ndifferent sampling techniques could be used in principle [15], our goal is to model correlations\nbetween each variable Xuv and some neighboring region Guv in G. Such a neighborhood should be\nlarge enough to make \u2206\u03bb2(u, v, Guv) suf\ufb01ciently informative with respect to the overall network, but\nalso small enough to keep the spectral decomposition of Guv computationally tractable. Therefore,\nin order to sample Guv, we propose to draw Vuv by performing k \u2018snowball waves\u2019 on G [16], using\nu and v as seeds, and then setting Euv to be the edge set induced by Vuv in G (see Algorithm 1\nfor the details). In this way, we can empirically tune the k hyperparameter in order to trade-off the\ninformativeness of Guv for the tractability of its spectral decomposition, where it is known that the\ncomplexity of computing \u2206\u03bb2(u, v, Guv) is cubic with respect to the number of nodes in Guv [17].\n\nAlgorithm 1 SampleSubgraph: Sampling a neighboring subgraph for a given node pair\nInput: Undirected graph G = (V, E); node pair {u, v}; number k of snowball waves.\nOutput: Undirected graph Guv = (Vuv, Euv).\n\nSampleSubgraph(G, {u, v}, k):\n1. Vuv = {u, v}\n2. for(i = 1 to k)\n3.\n4. Euv = {{w, z} \u2208 E: {w, z} \u2286 Vuv}\n5. return (Vuv, Euv)\n\nVuv = Vuv \u222aSw\u2208Vuv {z \u2208 V: {w, z} \u2208 E}\n\n5\n\n\fOnce sampled our training set D = (cid:8)(xu1 v1, Gu1v1 ), . . . , (xun vn , Gunvn )(cid:9), we learn the MLP\n\nweights by minimizing the objective \u2113CD(\u03b8; D), which which we obtain from \u2113CD(\u03b8; G) by re-\nstricting the summation in Eq. 3 to the elements of D. Minimization is performed by iterative\ngradient descent, using standard backpropagation for updating the MLP weights.\n\n4 Experimental evaluation\n\ntrained on a data sample D = (cid:8)(xu1v1 , Gu1v1 ), . . . , (xun vn , Gunvn )(cid:9), where n \u226a |V| (|V|\u22121)\n\nIn order to investigate the empirical behavior of FRFs as models of large-scale networks, we design\ntwo different groups of experiments (in link prediction and graph generation respectively), using col-\nlaboration networks drawn from the arXiv e-print repository (http://snap.stanford.edu/\ndata/index.html), where nodes represent scientists and edges represent paper coauthorships.\nSome basic network statistics are reported in Table 1.\nLink prediction.\nIn the \ufb01rst kind of experiments, given a random network G = (V, E), our\ngoal is to measure the accuracy of FRFs at estimating the conditional distribution of variables\nXuv given the con\ufb01guration of neighboring subgraphs Guv of G. This can be seen as a link\nprediction problem where only local information (given by Guv) can be used for predicting the\npresence of a link {u, v}. At the same time, we want to understand whether the overall net-\nwork size (in terms of the number of nodes) has an impact on the number of training examples\nthat will be necessary for FRFs to converge to stable prediction accuracy. Recall that FRFs are\n.\nGiven this, converging to stable predictions for values of n which do not depend on |V| is a cru-\ncial requirement for achieving large-scale applicability. Let us sample our training set D by \ufb01rst\ndrawing n node pairs from V in such a way that linked and unlinked pairs from G are equally\nrepresented in D, and then extracting the corresponding subgraphs Gui,vi by Algorithm 1 using\none snowball wave. We then learn our model from D as described in Sec. 3.3.\nIn all the ex-\nperiments reported in this work, the number of hidden units in our MLP architecture is set to\n5. A test set T containing m objects (xu1 v1 , GS1), . . . , (xumvm , GSm) is also sampled from G\nso that T \u2229 D = \u2205, where pairs {ui, vi} in T are drawn uniformly at random from V \u00d7 V.\nPredictions are derived from the learned model\nby \ufb01rst computing the conditional probabil-\nity of observing a link for each pair of nodes\n{uj, vj} in T , and then making a decision on\nthe presence/absence of links by thresholding\nthe predicted probability (where the threshold is\ntuned by cross-validation). Prediction accuracy\nis measured by averaging the recognition accu-\nracy for linked and unlinked pairs in T respec-\ntively (where |T | = 10, 000). In Fig. 1, the ac-\ncuracy of FRFs on the test set is plotted against\na growing size n of D (where 12 \u2264 n \u2264 48).\nInterestingly, the number of training examples\nrequired for the accuracy curve to stabilize does\nnot seem to depend at all on the overall network\nsize.\nIndeed, fastest convergence is achieved\nfor the average-sized and the second largest\nnetworks, i.e. HepPh and AstroPh respectively.\nNotice how a training sample containing an extremely small percentage of node pairs is suf\ufb01cient\nfor our learning approach to converge to stable prediction accuracy. This result encourages to think\nof FRFs as a convenient modeling option for the large-scale setting.\n\nFigure 1: Prediction accuracy of FRFs on the\narXiv networks for a growing training set size.\n\nGrQc (5,242 nodes)\nHepTh (9,877 nodes)\nHepPh (12,008 nodes)\nAstroPh (18,772 nodes)\nCondMat (23,133 nodes)\n\nTraining set size\n\ni\n\ni\nt\nc\nd\ne\nr\nP\n\nt\n\ne\ns\n \nt\ns\ne\n\nt\n \n\nn\no\n \ny\nc\na\nr\nu\nc\nc\na\n\n 0.55\n\n 0.45\n\n 10\n\n \n\nn\no\n\n 0.65\n\n 15\n\n 20\n\n 35\n\n 40\n\n 45\n\n 50\n\n 25\n\n 30\n\n 0.85\n\n 0.8\n\n 0.75\n\n 0.7\n\n 0.95\n\n 0.9\n\n2\n\n 0.6\n\n 0.5\n\nBesides assessing whether the network size affects the number of training samples needed to accu-\nrately learn FRFs, we want to evaluate the usefulness of the dependence structure involved in our\nmodel in predicting the conditional distributions of edges given their neighboring subgraphs. That\nis, we want to ascertain whether the effort of modeling the conditional independence structure of\nthe overall network through the FRF formalism is justi\ufb01ed by a suitable gain in prediction accuracy\nwith respect to statistical models that do not focus explicitly on such dependence structure. To this\naim, we compare FRFs to two popular statistical models for large-scale networks, namely the Watts-\nStrogatz (WS) and the Barab\u00b4asi-Albert (BA) models [3, 2]. The WS formalism is mainly aimed\n\n6\n\n\fat modeling the short-diameter property often observed in real-world networks. Interestingly, the\ndegree distribution of WS networks can be expressed in closed form in terms of two parameters \u03b4\nand \u03b2, related to the average degree distribution and a network rewiring process respectively [18].\nOn the other hand, the BA model is aimed at explaining the emergence of power-law degree distri-\nbutions, where such distributions can be expressed in terms of an adaptive parameter \u03b1 [19]. The\nparameters of both the WS and the BA model can be estimated by standard maximum-likelihood\napproaches and then used to predict conditional edge distributions, exploiting information from the\ndegrees observed in the given subgraphs [20, 21]. The ER model is not considered in this group\nof experiments, since the involved independence assumption makes it unusable (i.e. equivalent to\nrandom guessing) for the purposes of conditional estimation tasks. On the other hand, ERG models\nare not suitable for application to the large-scale setting. We tried them out using edge, k-star and\nk-triangle statistics [4], and the tests con\ufb01rmed this point. Although the prohibitive cost of \ufb01tting the\nmodels and computing the involved feature functions could be overcome in principle by sampling\nstrategies similar to the ones we employ for FRFs, the potentials used in ERGs become numerically\nunstable in the large-scale setting, leading to numerical representation issues for which we are not\naware of any off-the-shelf solution. Accuracy values for the different models are reported in Ta-\nble 1. FRFs dramatically outperform the other two models on all networks. Since both the BA and\nthe WS model do not show relevant improvements over simple random guessing, this result clearly\nsuggests that exploiting the dependence structure involved in network edge con\ufb01gurations is crucial\nto accurately predict the presence/absence of links.\n\nTable 1: Edge prediction results on the arXiv networks. General network statistics are also reported,\nwhere CCG and DG stand for average clustering coef\ufb01cient and network diameter respectively.\n\nDataset\nAstroPh\nCondMat\nGrQc\nHepPh\nHepTh\n\n|V|\n18,772\n23,133\n5,242\n12,008\n9,877\n\nNetwork Statistics\n\n|E | CCG DG\n14\n15\n17\n13\n17\n\n0.63\n0.63\n0.52\n0.61\n0.47\n\n396,160\n186,936\n28,980\n237,010\n51,971\n\nPrediction Accuracy\n\nBA\n\nFRF\n\nWS\n\n50.98% 89.97% 50.14%\n50.15% 91.62% 56.71%\n52.57% 91.14% 53.72%\n51.61% 86.57% 54.33%\n58.33% 92.25% 50.30%\n\nGraph generation. A second group of experiments is aimed at assessing whether the FRFs learned\non the arXiv networks can be considered as plausible models of the degree distribution (DD) and\nthe clustering coef\ufb01cient distribution (CC) observed in each network [15]. To this aim, we use the\nestimated FRF models to generate arti\ufb01cial graphs of various size, using Gibbs sampling, and then\nwe compare the DD and CC observed in the arti\ufb01cial graphs with those estimated on the whole\nnetworks. For scale-free networks such as the ones considered here, the BA model is known to be\nthe most accurate model currently available with respect to DD. On the other hand, for CC both BA\nand WS are known to be more realistic models than ER random graphs. Therefore, we compare the\ngraphs generated by FRFs to those generated by the BA, ER, and WS models for the same networks.\nThe distance in DD and CC between the arti\ufb01cial graphs on the one hand and the corresponding real\nnetwork on the other hand is measured using the Kolmogorov-Smirnov D-statistic, following a\ncommon use in graph mining research [15]. Here we only plot results for the CondMat and HepTh\nnetworks, noticing that the results we collected on the other arXiv networks lend themselves to the\nsame interpretation as the ones displayed in Fig. 2. Values are averaged over 100 samples for each\nconsidered graph size, where the standard deviation is typically in the order of 10\u22122. The outcome\nmotivates the following considerations. Concerning DD, FRFs are able to improve (at least slightly)\nthe accuracy of the state-of-the-art BA model, while they are very close that model with respect\nto clustering coef\ufb01cient. In all cases, both BA and FRFs prove to be far more accurate than ER\nor WS, where the only advantage of using WS is limited to improving CC over ER. These results\nare particularly encouraging, since they show how the nonparametric approach motivating the FRF\nmodel allows to accurately estimate network properties (such as DD) that are not aimed for explicitly\nin the model design. This suggests that the Fiedler delta statistic is a promising direction for building\ngenerative models capable of capturing different network properties through a uni\ufb01ed approach.\n\n7\n\n\fD\nD\n\n \nr\no\n\nf\n \nc\ni\nt\ns\ni\nt\n\na\n\nt\ns\n-\nD\n\nD\nD\n\n \nr\no\nf\n \nc\ni\nt\ns\ni\nt\na\nt\ns\n-\nD\n\n 1\n\n 0.9\n\n 0.8\n\n 0.7\n\n 0.6\n\n 0.5\n\n 0.4\n\n 0.3\n\n 40\n\n 1\n\n 0.9\n\n 0.8\n\n 0.7\n\n 0.6\n\n 0.5\n\n 0.4\n\n 0.3\n\n 40\n\nBA\nER\nFRF\nWS\n\n 60\n\n 80\n\n 100\n\n 120\n\n 140\n\n 160\n\nArtificial graph size\n\n(b)\n\nBA\nER\nFRF\nWS\n\nBA\nER\nFRF\nWS\n\n 60\n\n 80\n\n 100\n\n 120\n\n 140\n\n 160\n\nArtificial graph size\n\n(a)\n\nBA\nER\nFRF\nWS\n\n 0.9\n\n 0.8\n\n 0.7\n\n 0.6\n\n 0.5\n\n 0.4\n\n 40\n\n 0.9\n\n 0.8\n\n 0.7\n\n 0.6\n\n 0.5\n\nC\nC\n\n \nr\no\n\nf\n \nc\ni\nt\ns\ni\nt\n\na\n\nt\ns\n-\nD\n\nC\nC\n\n \nr\no\nf\n \nc\ni\nt\ns\ni\nt\na\nt\ns\n-\nD\n\n 60\n\n 80\n\n 100\n\n 120\n\n 140\n\n 160\n\n 0.4\n\n 40\n\n 60\n\n 80\n\n 100\n\n 120\n\n 140\n\n 160\n\nArtificial graph size\n\n(c)\n\nArtificial graph size\n\n(d)\n\nFigure 2: D-statistic values for DD and CC on the CondMat (a\u2013b) and HepTh (c\u2013d) networks.\n\n5 Conclusions and future work\n\nThe main motivation inspiring this work was the observation that statistical modeling of networks\ncries for genuinely nonparametric estimation, because of the inaccuracy often resulting from unwar-\nranted parametric assumptions. In this respect, we showed how the Fiedler delta statistic offers a\npowerful building block for designing a nonparametric estimator, which we developed in the form\nof the FRF model. Since here we only applied FRFs to collaboration networks, which are typically\nscale-free, an important option for future work is to assess the \ufb02exibility of FRFs in modeling net-\nworks from different families. In the second place, since we only addressed in a heuristic way the\nproblem of learning the dependence structure of FRFs, a stimulating direction for further research\nconsists in designing clever techniques for learning the structure of FRFs, e.g. considering the use\nof alternative subgraph sampling techniques. Finally, we would like to assess the possibility of\nmodeling networks through mixtures of FRFs, so as to \ufb01t different network regions (with possibly\ncon\ufb02icting properties) through specialized components of the mixture.\n\nAcknowledgments\n\nThis work has been supported by the French National Research Agency (ANR-09-EMER-007). The\nauthors are grateful to Gemma Garriga, R\u00b4emi Gilleron, Liva Ralaivola, and Michal Valko for their\nuseful suggestions and comments.\n\nReferences\n\n[1] P. Erd\u02ddos and A. R\u00b4enyi, \u201cOn Random Graphs, I,\u201d Publicationes Mathematicae Debrecen, vol. 6,\n\npp. 290\u2013297, 1959.\n\n[2] A.-L. Barab\u00b4asi and R. Albert, \u201cEmergence of scaling in random networks,\u201d Science, vol. 286,\n\npp. 509\u2013512, 1999.\n\n8\n\n\f[3] D. J. Watts and S. H. Strogatz, \u201cCollective dynamics of \u2018small-world\u2019 networks,\u201d Nature,\n\nvol. 393, pp. 440\u2013442, 1998.\n\n[4] T. A. B. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock, \u201cNew Speci\ufb01cations for\n\nExponential Random Graph Models,\u201d Sociological Methodology, vol. 36, pp. 99\u2013153, 2006.\n\n[5] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani, \u201cKronecker\ngraphs: An approach to modeling networks,\u201d Journal of Machine Learning Research, vol. 11,\npp. 985\u20131042, 2010.\n\n[6] M. Fiedler, \u201cAlgebraic connectivity of graphs,\u201d Czechoslovak Mathematical Journal, vol. 23,\n\npp. 298\u2013305, 1973.\n\n[7] B. Mohar, \u201cThe Laplacian Spectrum of Graphs,\u201d in Graph Theory, Combinatorics, and Ap-\nplications (Y. Alavi, G. Chartrand, O. R. Oellermann, and A. J. Schwenk, eds.), pp. 871\u2013898,\nWiley, 1991.\n\n[8] W. N. Anderson and T. D. Morley, \u201cEigenvalues of the Laplacian of a graph,\u201d Linear and\n\nMultilinear Algebra, vol. 18, pp. 141\u2013145, 1985.\n\n[9] D. M. Cvetkovi\u00b4c, M. Doob, and H. Sachs, eds., Spectra of Graphs: Theory and Application.\n\nNew York (NY): Academic Press, 1979.\n\n[10] O. Frank and D. Strauss, \u201cMarkov Graphs,\u201d Journal of the American Statistical Association,\n\nvol. 81, pp. 832\u2013842, 1986.\n\n[11] J. Besag, \u201cSpatial Interaction and the Statistical Analysis of Lattice Systems,\u201d Journal of the\n\nRoyal Statistical Society. Series B, vol. 36, pp. 192\u2013236, 1974.\n\n[12] K. Hornik, \u201cApproximation capabilities of multilayer feedforward networks,\u201d Neural Net-\n\nworks, vol. 4, no. 2, pp. 251\u2013257, 1991.\n\n[13] G. E. Hinton, \u201cTraining Products of Experts by Minimizing Contrastive Divergence,\u201d Neural\n\nComputation, vol. 14, no. 8, pp. 1771\u20131800, 2002.\n\n[14] M. \u00b4A. Carreira-Perpi\u02dcn\u00b4an and G. E. Hinton, \u201cOn Contrastive Divergence Learning,\u201d in Pro-\nceedings of the Tenth International Workshop on Articial Intelligence and Statistics (AISTATS\n2005), pp. 33\u201340, 2005.\n\n[15] J. Leskovec and C. Faloutsos, \u201cSampling from large graphs,\u201d in Proceedings of the Twelfth\nACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\n2006), pp. 631\u2013636, 2006.\n\n[16] E. D. Kolaczyk, Statistical Analysis of Network Data. Methods and Models. New York (NY):\n\nSpringer, 2009.\n\n[17] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, eds., Templates for the Solution\n\nof Algebraic Eigenvalue Problems: A Practical Guide. Philadelphia (PA): SIAM, 2000.\n\n[18] A. Barrat and M. Weigt, \u201cOn the properties of small-world network models,\u201d The European\n\nPhysical Journal B, vol. 13, pp. 547\u2013560, 2000.\n\n[19] R. Albert and A.-L. Barab\u00b4asi, \u201cStatistical mechanics of complex networks,\u201d Reviews of Modern\n\nPhysics, vol. 74, pp. 47\u201397, 2002.\n\n[20] M. E. J. Newman, \u201cClustering and preferential attachment in growing networks,\u201d Physical\n\nReview E, vol. 64, p. 025102, 2001.\n\n[21] A. Barab\u00b4asi, H. Jeong, Z. N\u00b4eda, E. Ravasz, A. Schubert, and T. Vicsek, \u201cEvolution of the social\n\nnetwork of scientic collaborations,\u201d Physica A, vol. 311, pp. 590\u2013614, 2002.\n\n9\n\n\f", "award": [], "sourceid": 932, "authors": [{"given_name": "Antonino", "family_name": "Freno", "institution": null}, {"given_name": "Mikaela", "family_name": "Keller", "institution": null}, {"given_name": "Marc", "family_name": "Tommasi", "institution": null}]}