{"title": "On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability", "book": "Advances in Neural Information Processing Systems", "page_first": 694, "page_last": 702, "abstract": "The problem of predicting connections between a set of data points finds many applications, in systems biology and social network analysis among others. This paper focuses on the \\textit{graph reconstruction} problem, where the prediction rule is obtained by minimizing the average error over all n(n-1)/2 possible pairs of the n nodes of a training graph. Our first contribution is to derive learning rates of order O(log n / n) for this problem, significantly improving upon the slow rates of order O(1/\u221an) established in the seminal work of Biau & Bleakley (2006). Strikingly, these fast rates are universal, in contrast to similar results known for other statistical learning problems (e.g., classification, density level set estimation, ranking, clustering) which require strong assumptions on the distribution of the data. Motivated by applications to large graphs, our second contribution deals with the computational complexity of graph reconstruction. Specifically, we investigate to which extent the learning rates can be preserved when replacing the empirical reconstruction risk by a computationally cheaper Monte-Carlo version, obtained by sampling with replacement B << n\u00b2 pairs of nodes. Finally, we illustrate our theoretical results by numerical experiments on synthetic and real graphs.", "full_text": "On Graph Reconstruction via Empirical Risk\n\nMinimization: Fast Learning Rates and Scalability\n\nGuillaume Papa, St\u00e9phan Cl\u00e9men\u00e7on\n\nLTCI, CNRS, T\u00e9l\u00e9com ParisTech, Universit\u00e9 Paris-Saclay\n\n75013, Paris, France\n\nfirst.last@telecom-paristech.fr\n\nAur\u00e9lien Bellet\n\nINRIA\n\n59650 Villeneuve d\u2019Ascq, France\naurelien.bellet@inria.fr\n\nAbstract\n\nThe problem of predicting connections between a set of data points \ufb01nds many\napplications, in systems biology and social network analysis among others. This\npaper focuses on the graph reconstruction problem, where the prediction rule is\nobtained by minimizing the average error over all n(n \u2212 1)/2 possible pairs of\nthe n nodes of a training graph. Our \ufb01rst contribution is to derive learning rates of\n\u221a\norder OP(log n/n) for this problem, signi\ufb01cantly improving upon the slow rates\nof order OP(1/\nn) established in the seminal work of Biau and Bleakley (2006).\nStrikingly, these fast rates are universal, in contrast to similar results known for\nother statistical learning problems (e.g., classi\ufb01cation, density level set estimation,\nranking, clustering) which require strong assumptions on the distribution of the\ndata. Motivated by applications to large graphs, our second contribution deals with\nthe computational complexity of graph reconstruction. Speci\ufb01cally, we investigate\nto which extent the learning rates can be preserved when replacing the empirical\nreconstruction risk by a computationally cheaper Monte-Carlo version, obtained\nby sampling with replacement B (cid:28) n2 pairs of nodes. Finally, we illustrate our\ntheoretical results by numerical experiments on synthetic and real graphs.\n\n1\n\nIntroduction\n\n\u221a\n\nAlthough statistical learning theory mainly focuses on establishing universal rate bounds (i.e.,\nwhich hold for any distribution of the data) for the accuracy of a decision rule based on training\nobservations, re\ufb01ned concentration inequalities have recently helped understanding conditions on\nthe data distribution under which learning paradigms such as Empirical Risk Minimization (ERM)\nIn binary classi\ufb01cation, i.e., the problem of learning to predict a random\nlead to faster rates.\nbinary label Y \u2208 {\u22121, +1} from on an input random variable X based on independent copies\n(X1, Y1), . . . , (Xn, Yn) of the pair (X, Y ), rates faster than 1/\nn are achieved when little mass in\nthe vicinity of 1/2 is assigned by the distribution of the random variable \u03b7(X) = P{Y = +1 | X}.\nThis condition and its generalizations are referred to as the Mammen-Tsybakov noise conditions (see\nMammen and Tsybakov, 1999; Tsybakov, 2004; Massart and N\u00e9d\u00e9lec, 2006). It has been shown\nthat a similar phenomenon occurs for various other statistical learning problems. Indeed, speci\ufb01c\nconditions under which fast rate results hold have been exhibited for density level set estimation\n(Rigollet and Vert, 2009), (bipartite) ranking (Cl\u00e9men\u00e7on et al., 2008; Cl\u00e9men\u00e7on and Robbiano,\n2011; Agarwal, 2014), clustering (Antos et al., 2005; Cl\u00e9men\u00e7on, 2014) and composite hypothesis\ntesting (Cl\u00e9men\u00e7on and Vayatis, 2010).\nIn this paper, we consider the supervised learning problem on graphs referred to as graph reconstruc-\ntion, rigorously formulated by Biau and Bleakley (2006). The objective of graph reconstruction is to\npredict the possible occurrence of connections between a set of objects/individuals known to form the\nnodes of an undirected graph. Precisely, each node is described by a random vector X which de\ufb01nes\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fa form of conditional preferential attachment: one predicts whether two nodes are connected based\non their features X and X(cid:48). This statistical learning problem is motivated by a variety of applications\nsuch as systems biology (e.g., inferring protein-protein interactions or metabolic networks, see Jansen\net al., 2003; Kanehisa, 2001) and social network analysis (e.g., predicting future connections between\nusers, see Liben-Nowell and Kleinberg, 2003). It has recently been the subject of a good deal of\nattention in the machine learning literature (see Vert and Yamanishi, 2004; Biau and Bleakley, 2006;\nShaw et al., 2011), and is also known as supervised link prediction (Lichtenwalter et al., 2010;\nCukierski et al., 2011). The learning task is formulated as the minimization of a reconstruction risk,\nwhose natural empirical version is the average prediction error over the n(n \u2212 1)/2 pairs of nodes in\n\u221a\na training graph of size n. Under standard complexity assumptions on the set of candidate prediction\nrules, excess risk bounds of the order OP(1/\nn) for the empirical risk minimizers have been estab-\nlished by Biau and Bleakley (2006) based on a representation of the objective functional very similar\nto the \ufb01rst Hoeffding decomposition for second-order U-statistics (see Hoeffding, 1948). However,\nBiau & Bleakley ignored the computational complexity of \ufb01nding an empirical risk minimizer, which\nscales at least as O(n2) since the empirical graph reconstruction risk involves summing up over\nn(n \u2212 1)/2 terms. This makes the approach impractical when dealing with large graphs commonly\nfound in many applications.\nBuilding up on the above work, our contributions to statistical graph reconstruction are two-fold:\nUniversal fast rates. We prove that a fast rate of order OP(log n/n) is always achieved by empirical\n\u221a\nreconstruction risk minimizers, in absence of any restrictive condition imposed on the data distribution.\nThis is much faster than the OP(1/\nn) rate established by Biau and Bleakley (2006). Our analysis is\nbased on a different decomposition of the excess of reconstruction risk of any decision rule candidate,\ninvolving the second Hoeffding representation of a U-statistic approximating it, as well as appropriate\nmaximal/concentration inequalities.\nScaling-up ERM. We investigate the performance of minimizers of computationally cheaper Monte-\nCarlo estimates of the empirical reconstruction risk, built by averaging over B (cid:28) n2 pairs of vertices\ndrawn with replacement. The rate bounds we obtain highlight that B plays the role of a tuning\nparameter to achieve an effective trade-off between statistical accuracy and computational cost.\nNumerical results based on simulated graphs and real-world networks are presented in order to\nsupport these theoretical \ufb01ndings.\nThe paper is organized as follows. In Section 2, we present the probabilistic setting for graph\nreconstruction and recall state-of-the-art results. Section 3 provides our fast rate bound analysis,\nwhile Section 4 deals with the problem of scaling-up reconstruction risk minimization to large graphs.\nNumerical experiments are displayed in Section 5, and a few concluding remarks are collected\nin Section 6. The technical proofs can be found in the Supplementary Material, along with some\nadditional remarks and results.\n\n2 Background and Preliminaries\n\nWe start by describing at length the probabilistic framework we consider for statistical inference on\ngraphs, as introduced by Biau and Bleakley (2006). We then brie\ufb02y recall the related theoretical\nresults documented in the literature.\n\n2.1 A Probabilistic Setup for Preferential Attachment\nIn this paper, G = (V, E) is an undirected random graph with a set V = {1, . . . , n} of n \u2265 2\nvertices and a set E = {ei,j : 1 \u2264 i (cid:54)= j \u2264 n} \u2208 {0, 1}n(n\u22121) describing its edges: for all i (cid:54)= j, we\nhave ei,j = ej,i = +1 if the vertices i and j are connected by an edge and ei,j = ej,i = 0 otherwise.\nWe assume that G is a Bernoulli graph, i.e. the random variables ei,j, 1 \u2264 i < j \u2264 n, are independent\nlabels drawn from a Bernoulli distribution Ber(p) with parameter p = P{ei,j = +1}, the probability\nthat two vertices of G are connected by an edge. The degree of each vertex is thus distributed as a\nbinomial with parameters n and p, which can be classically approximated by a Poisson distribution\nof parameter \u03bb > 0 in the limit of large n, when np \u2192 \u03bb.\nWhereas the marginal distribution of the graph G is that of a Bernoulli graph (also sometimes\nabusively referred to as a random graph), a form of conditional preferential attachment is also\nspeci\ufb01ed in the framework considered here. Precisely, we assume that, for all i \u2208 V , a continuous r.v.\n\n2\n\n\fXi, taking its values in a separable Banach space X , describes some features related to vertex i. The\nXi\u2019s are i.i.d. with common distribution \u00b5(dx) and, for any i (cid:54)= j, the random pair (Xi, Xj) models\nsome information useful for predicting the occurrence of an edge connecting the vertices i and j.\nConditioned upon the features (X1, . . . , Xn), any binary variables ei,j and ek,l are independent\nonly if {i, j} \u2229 {k, l} = \u2205. The conditional distribution of ei,j, i (cid:54)= j, is supposed to depend on\n(Xi, Xj) solely, described by the posterior preferential attachment probability:\n\n\u03b7 (Xi, Xj) = P{ei,j = +1 | (Xi, Xj)} .\n\nx\u2208X \u03b7(Xi, x)\u00b5(dx) (respectively, (cid:80)\n\nis thus (n \u2212 1)(cid:82)\nequipped with these notations, p = (cid:82)\n\n(1)\nFor instance, \u2200(x1, x2) \u2208 X 2, \u03b7 (x1, x2) can be a certain function of a speci\ufb01c distance or similarity\nmeasure between x1 and x2, as in the synthetic graphs described in Section 5.\nThe conditional average degree of the vertex i \u2208 V given Xi (respectively, given (X1, . . . , Xn))\nj(cid:54)=i \u03b7(Xi, Xj)). Observe incidentally that,\nthe 3-tuples\n(Xi, Xj, ei,j), 1 \u2264 i < j \u2264 n, are non-i.i.d. copies of a generic random vector (X1, X2, e1,2)\nwhose distribution L is given by the tensorial product \u00b5(dx1) \u2297 \u00b5(dx2) \u2297 Ber(\u03b7(x1, x2)), which\nis fully described by the pair (\u00b5, \u03b7). Observe also that the function \u03b7 is symmetric by construction:\n\u2200(x1, x2) \u2208 X 2, \u03b7(x1, x2) = \u03b7(x2, x1).\nIn this framework, the learning problem introduced by Biau and Bleakley (2006), referred to as graph\nreconstruction, consists in building a symmetric reconstruction rule g : X 2 \u2192 {0, 1}, from a training\ngraph G, with nearly minimum reconstruction risk\n\n(x,x(cid:48))\u2208X 2 \u03b7(x, x(cid:48))\u00b5(dx)\u00b5(dx(cid:48)). Hence,\n\nR(g) = P{g(X1, X2) (cid:54)= e1,2} ,\n\n(2)\nthus achieving a comparable performance to that of the Bayes rule g\u2217(x1, x2) = I{\u03b7(x1, x2) > 1/2},\nwhose risk is given by R\u2217 = E[min{\u03b7(X1, X2), 1 \u2212 \u03b7(X1, X2)}] = inf g R(g).\nRemark 1 (EXTENDED FRAMEWORK) The results established in this paper can be straightforwardly\nextended to a more general framework, where L = L(n) may depend on the number n of vertices.\nThis allows to consider a general class of models, accounting for possible accelerating properties\nexhibited by various non scale-free real networks (Mattick and Gagen, 2005). An asymptotic study can\nbe then carried out with the additional assumption that, as n \u2192 +\u221e, L(n) converges in distribution\nto a probability measure L(\u221e) on X \u00d7 X \u00d7 {0, 1}, see (Biau and Bleakley, 2006). For simplicity, we\nrestrict our study to the stationary case, i.e. L(n) = L for all n \u2265 2.\n\n2.2 Related Results on Empirical Risk Minimization\n\nA paradigmatic approach in statistical learning, referred to as Empirical Risk Minimization (ERM),\nconsists in replacing (2) by its empirical version based on the labeled sample Dn = {(Xi, Xj, ei,j) :\n1 \u2264 i < j \u2264 n} related to G:1\n\n(cid:98)Rn(g) =\n\n2\n\nn(n \u2212 1)\n\n(cid:88)\n\nI{g(Xi, Xj) (cid:54)= ei,j} .\n\nAn empirical risk minimizer(cid:98)gn is a solution of the optimization problem ming\u2208G (cid:98)Rn(g), where G\nbias inf g\u2208G R(g)\u2212R\u2217. The performance of(cid:98)gn is measured by its excess risk R((cid:98)gn)\u2212 inf g\u2208G R(g),\n\nis a class of reconstruction rules of controlled complexity, hopefully rich enough to yield a small\n\nwhich can be bounded if we can derive probability inequalities for the maximal deviation\n\n1\u2264i 0, n \u2265 1 and B \u2265 1, we have with probability at least 1 \u2212 \u03b4:,\n\n|(cid:101)RB(g) \u2212 (cid:98)Rn(g)| \u2264\n\nsup\ng\u2208G\n\n(cid:114)\n\nlog 2 + V log ((1 + n(n \u2212 1)/2)/\u03b4)\n\n2B\n\n.\n\nThe \ufb01nite VC-dimension hypothesis can be relaxed and a bound of the same order can be proved\nto hold true under weaker complexity assumptions involving Rademacher averages (see Remark 4).\n\n6\n\n\f(a) True graph\n\n(b) Graph with scrambled features\n\n(c) Reconstructed graph\n\nFigure 1: Illustrative experiment with n = 50, q = 2, \u03c4 = 0.27 and p = 0. Figure 1(a) shows\nthe training graph, where the position of each node is given by its 2D feature vector. Figure 1(b)\ndepicts the same graph after applying a random transformation R to the features. On this graph, the\nEuclidean distance with optimal threshold achieves a reconstruction error of 0.1311. In contrast, the\nreconstruction rule learned from B = 100 pairs of nodes (out of 1225 possible pairs) successfully\ninverts R and accurately recovers the original graph (Figure 1(c)). Its reconstruction error is 0.008 on\nthe training graph and 0.009 on a held-out graph generated with the same parameters.\n\nRemarkably, with only B = O(n) pairs, the rate in Theorem 4 is of the same order (up to a log factor)\n\nas that obtained by Biau and Bleakley (2006) for the maximal deviation supg\u2208G |(cid:98)Rn(g) \u2212 R(g)|\nrelated to the complete reconstruction risk (cid:98)Rn(g) with O(n2) pairs. From Theorem 4, one can get a\n\n\u221a\nn) for the minimizer of the incomplete risk involving only O(n) pairs.\nlearning rate of order OP(1/\nUnfortunately, such an analysis does not exploit the relationship between conditional variance\nand expectation formulated in Lemma 2, and is thus not suf\ufb01cient to show that reconstruction rules\nminimizing the incomplete risk (7) can achieve learning rates comparable to those stated in Theorem 1.\nIn contrast, the next theorem provides sharper statistical guarantees. We refer to the Supplementary\nMaterial for the proof.\n\nTheorem 5 Let(cid:101)gB be any minimizer of the incomplete reconstruction risk (7) over a class G of \ufb01nite\n\nVC-dimension V < +\u221e. Then, for all \u03b4 \u2208 (0, 1), we have with probability at least 1 \u2212 \u03b4: \u2200n \u2265 2,\n\nR((cid:101)gB) \u2212 R\u2217 \u2264 2\n\n(cid:18)\n\ng\u2208G R(g) \u2212 R\u2217(cid:19)\n\ninf\n\n(cid:18) 1\n\n(cid:19)\n\n,\n\n+ CV log(n/\u03b4) \u00d7\n\n+\n\nn\n\n1\u221a\nB\n\nwhere C < +\u221e is a universal constant.\nThis bound reveals that the number B \u2265 1 of pairs of vertices plays the role of a tuning parameter,\nruling a trade-off between statistical accuracy (taking B(n) = O(n2) fully preserves the convergence\nrate) and computational complexity. This will be con\ufb01rmed numerically in Section 5.\nThe above results can be extended to other sampling techniques, such as Bernoulli sampling and\nsampling without replacement. We refer to the Supplementary Material for details.\n\n5 Numerical Experiments\n\nIn this section, we present some numerical experiments on large-scale graph reconstruction to\nillustrate the practical relevance of the idea of incomplete risk introduced in Section 4. Following a\nwell-established line of work (Vert and Yamanishi, 2004; Vert et al., 2007; Shaw et al., 2011), we\nformulate graph reconstruction as a distance metric learning problem (Bellet et al., 2015): we learn\na distance function such that we predict an edge between two nodes if the distance between their\nfeatures is smaller than some threshold. Assuming X \u2286 Rq, let Sq\n+ be the cone of symmetric PSD\nq \u00d7 q real-valued matrices. The reconstruction rules we consider are parameterized by M \u2208 Sq\n+ and\nhave the form\n\ngM (x1, x2) = I{DM (x1, x2) \u2264 1} ,\n\nwhere DM (x1, x2) = (x1 \u2212 x2)T M (x1 \u2212 x2) is a (pseudo) distance equivalent to the Euclidean\ndistance after a linear transformation L \u2208 Rq\u00d7q, with M = LT L. Note that gM (x1, x2) can be\nseen as a linear separator operating on the pairwise representation vec((x1 \u2212 x2)(x1 \u2212 x(cid:48)\n2)T ) \u2208 Rq2,\n\n7\n\n\fTable 1: Results (averaged over 10 runs) on synthetic graph with n = 1, 000, 000, q = 100, p = 0.05.\n\nReconstruction error\nRelative improvement\nTraining time (seconds)\n\n\u2013\n21\n\nB = 0.01n B = 0.1n B = n B = 5n B = 10n\n0.1159\n\n0.1185\n\n0.2272\n\n0.1543\n32%\n398\n\n0.1276\n17%\n5,705\n\n7%\n\n20,815\n\n2%\n\n42,574\n\nhence the class of learning rules we consider has VC-dimension bounded by q2 + 1. We de\ufb01ne the\n\nreconstruction risk as:(cid:98)Sn(gM ) =\n\n(cid:88)\n[(2ei,j \u2212 1)(DM (Xi, Xj) \u2212 1)]+ ,\n\n2\n\nn(n \u2212 1)\n\ni \u03c4}.\n\nThe process is illustrated in Figure 1. Using this procedure, we generate a training graph with\nn = 1, 000, 000 and q = 100. We set the threshold \u03c4 such that there is an edge between about 20%\nof the node pairs, and set p = 0.05. We also generate a test graph using the same parameters. We\nthen sample uniformly with replacement B pairs of nodes from the training graph to construct our\nincomplete reconstruction risk. The reconstruction error of the resulting empirical risk minimizer\nis estimated on 1,000,000 pairs of nodes drawn from the test graph. Table 1 shows the test error\n(averaged over 10 runs) as well as the training time for several values of B. Consistently with our\ntheoretical \ufb01ndings, B implements a trade-off between statistical accuracy and computational cost.\nFor this dataset, sampling B = 5, 000, 000 pairs (out of 1012 possible pairs!) is suf\ufb01cient to \ufb01nd an\naccurate reconstruction rule. A larger B would result in increased training time for negligible gains\nin reconstruction error.\nAdditional results. In the Supplementary Material, we present comparisons to a node sampling\nscheme and to the \u201cdataset splitting\u201d strategy given by (6), as well as experiments on a real network.\n\n6 Conclusion\n\nIn this paper, we proved that the learning rates for ERM in the graph reconstruction problem are\nalways of order OP(log n/n). We also showed how sampling schemes applied to the population\nof edges (not nodes) can be used to scale-up such ERM-based predictive methods to very large\ngraphs by means of a detailed rate bound analysis, further supported by empirical results. A \ufb01rst\npossible extension of this work would naturally consist in considering more general sampling designs,\nsuch as Poisson sampling (which generalizes Bernoulli sampling) used in graph sparsi\ufb01cation (cf\nSpielman, 2005), and investigating the properties of minimizers of Horvitz-Thompson versions of the\nreconstruction risk (see Horvitz and Thompson, 1951). Another challenging line of future research is\nto extend the results of this paper to more complex unconditional graph structures in order to account\nfor properties shared by some real-world graphs (e.g., graphs with a power law degree distribution).\n\nAcknowledgments This work was partially supported by the chair \u201cMachine Learning for Big\nData\u201d of T\u00e9l\u00e9com ParisTech and by a grant from CPER Nord-Pas de Calais/FEDER DATA Advanced\ndata science and technologies 2015-2020.\n\n8\n\n\fReferences\nAgarwal, S. (2014). Surrogate regret bounds for bipartite ranking via strongly proper losses. JMLR, 15:1653\u2013\n\n1674.\n\nAntos, A., Gy\u00f6r\ufb01, L., and Gy\u00f6rgy, A. (2005). Individual convergence rates in empirical vector quantizer design.\n\nIEEE Transactions on Information Theory, 51(11):4013\u20134023.\n\nArcones, M. and Gin\u00e9, E. (1994). U-processes indexed by Vapnik-Chervonenkis classes of functions with\napplications to asymptotics and bootstrap of U-statistics with estimated parameters. Stochastic Processes and\ntheir Applications, 52:17\u201338.\n\nBellet, A., Habrard, A., and Sebban, M. (2015). Metric Learning. Morgan & Claypool Publishers.\nBiau, G. and Bleakley, K. (2006). Statistical Inference on Graphs. Statistics & Decisions, 24:209\u2013232.\nBoucheron, S., Bousquet, O., Lugosi, G., and Massart, P. (2005). Moment inequalities for functions of\n\nindependent random variables. Ann. Stat., 33(2):514\u2013560.\n\nCl\u00e9men\u00e7on, S. (2014). A statistical view of clustering performance through the theory of U-processes. Journal\n\nof Multivariate Analysis, 124:42 \u2013 56.\n\nCl\u00e9men\u00e7on, S. and Robbiano, S. (2011). Minimax learning rates for bipartite ranking and plug-in rules. In\n\nICML.\n\nCl\u00e9men\u00e7on, S. and Vayatis, N. (2010). Overlaying classi\ufb01ers: a practical approach to optimal scoring. Construc-\n\ntive Approximation, 32(3):619\u2013648.\n\nCl\u00e9men\u00e7on, S., Lugosi, G., and Vayatis, N. (2008). Ranking and Empirical Minimization of U-statistics. Ann.\n\nStat., 36(2):844\u2013874.\n\nCukierski, W., Hamner, B., and Yang, B. (2011). Graph-based features for supervised link prediction. In IJCNN.\nDe la Pena, V. and Gin\u00e9, E. (1999). Decoupling : from dependence to independence. Springer.\nHoeffding, W. (1948). A class of statistics with asymptotically normal distribution. The Annals of Mathematical\n\nStatistics, 19:293\u2013325.\n\nHorvitz, D. and Thompson, D. (1951). A generalization of sampling without replacement from a \ufb01nite universe.\n\nJournal of the American Statistical Association, 47:663\u2013685.\n\nJansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N., Chung, S., Emili, A., Snyder, M., Greenblatt, J., and\nGerstein, M. (2003). A Bayesian networks approach for predicting protein-protein interactions from genomic\ndata. Science, 302(5644):449\u2013453.\n\nJanson, S. and Nowicki, K. (1991). The asymptotic distributions of generalized U-statistics with applications to\n\nrandom graphs. Probability Theory and Related Fields, 90:341\u2013375.\n\nKanehisa, M. (2001). Prediction of higher order functional networks from genomic data. Pharmacogenomics,\n\n2(4):373\u2013385.\n\nLee, A. J. (1990). U-statistics: Theory and practice. Marcel Dekker, Inc., New York.\nLiben-Nowell, D. and Kleinberg, J. (2003). The link prediction problem for social networks. In CIKM.\nLichtenwalter, R., Lussier, J., and Chawla, N. (2010). New perspectives and methods in link prediction. In KDD.\nMammen, E. and Tsybakov, A. (1999). Smooth discrimination analysis. Ann. Stat., 27(6):1808\u20131829.\nMassart, P. and N\u00e9d\u00e9lec, E. (2006). Risk bounds for statistical learning. Ann. Stat., 34(5).\nMattick, J. and Gagen, M. (2005). Accelerating networks. Science, 307(5711):856\u2013858.\nRigollet, P. and Vert, R. (2009). Fast rates for plug-in estimators of density level sets. Bernoulli, 14(4):1154\u20131178.\nShaw, B., Huang, B., and Jebara, T. (2011). Learning a Distance Metric from a Network. In NIPS.\nSpielman, D. (2005). Fast Randomized Algorithms for Partitioning, Sparsi\ufb01cation, and Solving Linear Systems.\n\nLecture notes from IPCO Summer School 2005.\n\nTsybakov, A. (2004). Optimal aggregation of classi\ufb01ers in statistical learning. Ann. Stat., 32(1):135\u2013166.\nVert, J.-P., Qiu, J., and Noble, W. S. (2007). A new pairwise kernel for biological network inference with support\n\nvector machines. BMC Bioinformatics, 8(10).\n\nVert, J.-P. and Yamanishi, Y. (2004). Supervised graph inference. In NIPS, pages 1433\u20131440.\n\n9\n\n\f", "award": [], "sourceid": 390, "authors": [{"given_name": "Guillaume", "family_name": "Papa", "institution": "T\u00e9l\u00e9com ParisTech"}, {"given_name": "Aur\u00e9lien", "family_name": "Bellet", "institution": "INRIA"}, {"given_name": "Stephan", "family_name": "Cl\u00e9men\u00e7on", "institution": "Telecom ParisTech"}]}