{"title": "Multi-Scale Spectral Decomposition of Massive Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 2798, "page_last": 2806, "abstract": "Computing the $k$ dominant eigenvalues and eigenvectors of massive graphs is a key operation in numerous machine learning applications; however, popular solvers suffer from slow convergence, especially when $k$ is reasonably large. In this paper, we propose and analyze a novel multi-scale spectral decomposition method (MSEIGS), which first clusters the graph into smaller clusters whose spectral decomposition can be computed efficiently and independently. We show theoretically as well as empirically that the union of all cluster's subspaces has significant overlap with the dominant subspace of the original graph, provided that the graph is clustered appropriately. Thus, eigenvectors of the clusters serve as good initializations to a block Lanczos algorithm that is used to compute spectral decomposition of the original graph. We further use hierarchical clustering to speed up the computation and adopt a fast early termination strategy to compute quality approximations. Our method outperforms widely used solvers in terms of convergence speed and approximation quality. Furthermore, our method is naturally parallelizable and exhibits significant speedups in shared-memory parallel settings. For example, on a graph with more than 82 million nodes and 3.6 billion edges, MSEIGS takes less than 3 hours on a single-core machine while Randomized SVD takes more than 6 hours, to obtain a similar approximation of the top-50 eigenvectors. Using 16 cores, we can reduce this time to less than 40 minutes.", "full_text": "Multi-Scale Spectral Decomposition of Massive\n\nGraphs\n\nSi Si\u21e4\n\nDepartment of Computer Science\n\nUniversity of Texas at Austin\n\nssi@cs.utexas.edu\n\nDonghyuk Shin\u21e4\n\nDepartment of Computer Science\n\nUniversity of Texas at Austin\ndshin@cs.utexas.edu\n\nInderjit S. Dhillon\n\nDepartment of Computer Science\n\nUniversity of Texas at Austin\n\ninderjit@cs.utexas.edu\n\nBeresford N. Parlett\n\nDepartment of Mathematics\n\nUniversity of California, Berkeley\n\nparlett@math.berkeley.edu\n\nAbstract\n\nComputing the k dominant eigenvalues and eigenvectors of massive graphs is\na key operation in numerous machine learning applications; however, popular\nsolvers suffer from slow convergence, especially when k is reasonably large.\nIn this paper, we propose and analyze a novel multi-scale spectral decomposi-\ntion method (MSEIGS), which \ufb01rst clusters the graph into smaller clusters whose\nspectral decomposition can be computed ef\ufb01ciently and independently. We show\ntheoretically as well as empirically that the union of all cluster\u2019s subspaces has\nsigni\ufb01cant overlap with the dominant subspace of the original graph, provided\nthat the graph is clustered appropriately. Thus, eigenvectors of the clusters serve\nas good initializations to a block Lanczos algorithm that is used to compute spec-\ntral decomposition of the original graph. We further use hierarchical clustering to\nspeed up the computation and adopt a fast early termination strategy to compute\nquality approximations. Our method outperforms widely used solvers in terms of\nconvergence speed and approximation quality. Furthermore, our method is nat-\nurally parallelizable and exhibits signi\ufb01cant speedups in shared-memory parallel\nsettings. For example, on a graph with more than 82 million nodes and 3.6 billion\nedges, MSEIGS takes less than 3 hours on a single-core machine while Random-\nized SVD takes more than 6 hours, to obtain a similar approximation of the top-50\neigenvectors. Using 16 cores, we can reduce this time to less than 40 minutes.\n\nIntroduction\n\n1\nSpectral decomposition of large-scale graphs is one of the most informative and fundamental ma-\ntrix approximations. Speci\ufb01cally, we are interested in the case where the top-k eigenvalues and\neigenvectors are needed, where k is in the hundreds. This computation is needed in various ma-\nchine learning applications such as semi-supervised classi\ufb01cation, link prediction and recommender\nsystems. The data for these applications is typically given as sparse graphs containing information\nabout dyadic relationship between entities, e.g., friendship between pairs of users. Supporting the\ncurrent big data trend, the scale of these graphs is massive and continues to grow rapidly. Moreover,\nthey are also very sparse and often exhibit clustering structure, which should be exploited. How-\never, popular solvers, such as subspace iteration, randomized SVD [7] and the classical Lanczos\nalgorithm [21], are often too slow for very big graphs.\nA key insight is that the graph often exhibits a clustering structure and the union of all cluster\u2019s sub-\nspaces turns out to have signi\ufb01cant overlap with the dominant subspace of the original matrix, which\n\n\u21e4Equal contribution to the work.\n\n1\n\n\fis shown both theoretically and empirically. Based on this observation, we propose a novel divide-\nand-conquer approach to compute the spectral decomposition of large and sparse matrices, called\nMSEIGS, which exploits the clustering structure of the graph and achieves faster convergence than\nstate-of-the-art solvers. In the divide step, MSEIGS employs graph clustering to divide the graph\ninto several clusters that are manageable in size and allow fast computation of the eigendecomposi-\ntion by standard methods. Then, in the conquer step, eigenvectors of the clusters are combined to\ninitialize the eigendecomposition of the entire matrix via block Lanczos. As shown in our analysis\nand experiments, MSEIGS converges faster than other methods that do not consider the clustering\nstructure of the graph. To speedup the computation, we further divide the subproblems into smaller\nones and construct a hierarchical clustering structure; our framework can then be applied recur-\nsively as the algorithm moves from lower levels to upper levels in the hierarchy tree. Moreover, our\nproposed algorithm is naturally parallelizable as the main steps can be carried out independently for\neach cluster. On the SDWeb dataset with more than 82 million nodes and 3.6 billion edges, MSEIGS\ntakes only about 2.7 hours on a single-core machine while Matlab\u2019s eigs function takes about 4.2\nhours and randomized SVD takes more than 6 hours. Using 16 cores, we can cut this time to less\nthan 40 minutes showing that our algorithm obtains good speedups in shared-memory settings.\nWhile our proposed algorithm is capable of computing highly accurate eigenpairs, it can also obtain\na much faster approximate eigendecomposition with modest precision by prematurely terminating\nthe algorithm at a certain level in the hierarchy tree. This early termination strategy is particularly\nuseful as it is suf\ufb01cient in many applications to use an approximate eigendecomposition. We apply\nMSEIGS and its early termination strategy to two real-world machine learning applications: label\npropagation for semi-supervised classi\ufb01cation and inductive matrix completion for recommender\nsystems. We show that both our methods are much faster than other methods while still attaining\ngood performance. For example, to perform semi-supervised learning using label propagation on the\nAloi dataset with 1,000 classes, MSEIGS takes around 800 seconds to obtain an accuracy of 60.03%;\nMSEIGS with early termination takes less than 200 seconds achieving an accuracy of 58.98%, which\nis more than 10 times faster than a conjugate gradient based semi-supervised method [10].\nThe rest of the paper is organized as follows. In Section 2, we review some closely related work. We\npresent MSEIGS in Section 3 by describing the single-level case and extending it to the multi-level\nsetting. Experimental results are shown in Section 4 followed by conclusions in Section 5.\n2 Related Work\nThe spectral decomposition of large and sparse graphs is a fundamental tool that lies at the core of\nnumerous algorithms in varied machine learning tasks. Practical examples include spectral cluster-\ning [19], link prediction in social networks [24], recommender systems with side-information [18],\ndensest k-subgraph problem [20] and graph matching [22]. Most of the existing eigensolvers for\nsparse matrices employ the single-vector version of iterative algorithms, such as the power method\nand Lanczos algorithm [21]. The Lanczos algorithm iteratively constructs the basis of the Krylov\nsubspace to obtain the eigendecomposition, which has been extensively investigated and applied in\npopular eigensolvers, e.g., eigs in Matlab (ARPACK) [14] and PROPACK [12]. However, it is well\nknown that single-vector iterative algorithms can only compute the leading eigenvalue/eigenvector\n(e.g., power method) or have dif\ufb01culty in computing multiplicities/clusters of eigenvalues (e.g.,\nLanczos). In contrast, the block version of iterative algorithms using multiple starting vectors, such\nas the randomized SVD [7] and block Lanczos [21], can avoid such problems and utilize ef\ufb01cient\nmatrix-matrix operations (e.g., Level 3 BLAS) with better caching behavior.\nWhile these are the most commonly used methods to compute the spectral decomposition of a\nsparse matrix, they do not scale well to large problems, especially when hundreds of eigenval-\nues/eigenvectors are needed. Furthermore, none of them consider the clustering structure of the\nsparse graph. One exception is the classical divide and conquer algorithm by [3], which partitions\nthe tridiagonal eigenvalue problem into several smaller problems that are solved separately. Then it\ncombines the solutions of these smaller problems and uses rank-one modi\ufb01cation to solve the orig-\ninal problem. However, this method can only be used for tridiagonal matrices and it is unclear how\nto extend it to general sparse matrices.\n3 Multi-Scale Spectral Decomposition\nSuppose we are given a graph G = (V,E, A), which consists of |V| vertices and |E| edges such\nthat an edge between any two vertices i and j represents their similarity wij. The corresponding\nadjacency matrix A is a n\u21e5 n sparse matrix with (i, j) entry equal to wij if there is an edge between\ni and j and 0 otherwise. We consider the case where G is an undirected graph, i.e., A is symmetric.\nOur goal is to ef\ufb01ciently compute the top-k eigenvalues 1,\u00b7\u00b7\u00b7 , k (|1|\u00b7\u00b7\u00b7|\nk|) and their\n\n2\n\n\fcorresponding eigenvectors u1, u2,\u00b7\u00b7\u00b7 uk of A, which form the best rank-k approximation of A.\nThat is, A \u21e1 Uk\u2303kU T\nk , where \u2303k is a k \u21e5 k diagonal matrix with the k largest eigenvalues of A and\nUk = [u1, u2,\u00b7\u00b7\u00b7 , uk] is an n\u21e5 k orthonormal matrix. In this paper, we propose a novel multi-scale\nspectral decomposition method (MSEIGS), which embodies the clustering structure of A to achieve\nfaster convergence. We begin by \ufb01rst describing the single-level version of MSEIGS.\n3.1 Single-level division\nOur proposed multi-scale spectral decomposition algorithm, which can be used as an alternative\nto Matlab\u2019s eigs function, is based on the divide-and-conquer principle to utilize the clustering\nstructure of the graph. It consists of two main phases: in the divide step, we divide the problem into\nseveral smaller subproblems such that each subproblem can be solved ef\ufb01ciently and independently;\nin the conquer step, we use the solutions from each subproblem as a good initialization for the\noriginal problem and achieve faster convergence compared to existing solvers which typically start\nfrom random initialization.\nDivide Step: We \ufb01rst use clustering to partition the sparse matrix A into c2 submatrices as\n\u00b7\u00b7\u00b7 A1c\n...\n...\n\u00b7\u00b7\u00b7\n0\n\nA = D + = 24\n\n35 , D =24\n\n35 , = 24\n\n\u00b7\u00b7\u00b7 A1c\n...\n...\n\u00b7\u00b7\u00b7 Acc\n\n\u00b7\u00b7\u00b7\n0\n...\n...\n\u00b7\u00b7\u00b7 Acc\n\n35 , (1)\n\nA11\n...\nAc1\n\nA11\n...\n0\n\n0\n...\nAc1\n\n1 , u(i)\n\n2 ,\u00b7\u00b7\u00b7 , u(i)\n\n2 ,\u00b7\u00b7\u00b7 , u(i)\n\nr )T , where \u2303(i)\nr\n\nr (U (i)\nr = [u(i)\n\n], then we concatenate all U (i)\n\nwhere each diagonal block Aii is a mi\u21e5mi matrix, D is a block diagonal matrix and  is the matrix\nconsisting of all off-diagonal blocks of A. We then compute the dominant r (r \uf8ff k) eigenpairs of\nr \u2303(i)\neach diagonal block Aii independently, such that Aii \u21e1 U (i)\nis a r \u21e5\nr diagonal matrix with the r dominant eigenvalues of Aii and U (i)\nr ] is an\northonormal matrix with the corresponding eigenvectors.\nAfter obtaining the r dominant eigenpairs of each Aii, we can sort all cr eigenvalues from the c\ndiagonal blocks and select the k largest eigenvalues (in terms of magnitude) and the corresponding\neigenvectors. More speci\ufb01cally, suppose that we select the top-ki eigenpairs of Aii and construct an\nmi \u21e5 ki orthonormal matrix U (i)\nki \u2019s and form\nan n \u21e5 k orthonormal matrix \u2326 as\nwherePi ki = k and  denotes direct sum, which can be viewed as the sum of the subspaces\n\nspanned by U (i)\nki . Note that \u2326 is exactly the k dominant eigenvectors of D. After obtaining \u2326, we\ncan use it as a starting subspace for the eigendecomposition of A in the conquer step. We next\nshow that if we use graph clustering to generate the partition of A in (1), then the space spanned\nby \u2326 is close to that of Uk, which makes the conquer step more ef\ufb01cient. We use principal angles\n[15] to measure the closeness of two subspaces. Since \u2326 and Uk are orthonormal matrices, the j-th\nprincipal angle between subspaces spanned by \u2326 and Uk is \u2713j(\u2326, Uk) = arccos(j), where j,\nj = 1, 2,\u00b7\u00b7\u00b7 , k, are the singular values of \u2326T Uk in descending order. In Theorem 3.1, we show that\n\u21e5(\u2326, Uk) = diag(\u27131(\u2326, Uk),\u00b7\u00b7\u00b7 ,\u2713 k(\u2326, Uk)) is related to the matrix .\nTheorem 3.1. Suppose 1(D),\u00b7\u00b7\u00b7 , n(D) (in descending order of magnitude) are the eigenvalues\nof D. Assume there is an interval [\u21b5, ] and \u2318  0 such that k+1(D),\u00b7\u00b7\u00b7 , n(D) lies entirely in\n[\u21b5, ] and the k dominant eigenvalues of A, 1,\u00b7\u00b7\u00b7 , k, lie entirely outside of (\u21b5  \u2318,  + \u2318), then\n\nk2 \u00b7\u00b7\u00b7 U (c)\n\nk1  U (2)\n\n\u2326= U (1)\n\n1 , u(i)\n\n= [u(i)\n\n,\n\nkc\n\n(2)\n\nki\n\nki\n\nThe proof is given in Appendix 6.2. As we can see, \u21e5(\u2326, Uk) is in\ufb02uenced by , thus we need to\n\ufb01nd a partition such that kkF is small in order for k sin(\u21e5(\u2326, Uk))kF to be small. Assuming that\nthe graph has clustering structure, we apply graph clustering algorithms to partition A to generate\nsmall kkF . In general, the goal of graph clustering is to \ufb01nd clusters such that there are many edges\nwithin clusters and only a few between clusters, i.e., make kkF small. Various graph clustering\nsoftware can be used to generate the partitions, e.g., Graclus [5], Metis [11], Nerstrand [13] and\nGEM [27]. Figure 1(a) shows a comparison of the cosine values of \u21e5(\u2326, Uk) with different \u2326 for\nthe CondMat dataset, a collaboration network with 21,362 nodes and 182,628 edges. We compute\n\u2326 using random partitioning and graph clustering, where we cluster the graph into 4 clusters using\nMetis and more than 85% of edges appear within clusters. In Figure 1(a), more than 80% of principal\nangles have cosine values that are greater than 0.9 with graph clustering, whereas this ratio drops to\n5% with random partitioning. This illustrates that (1) the effectiveness of graph clustering to reduce\n\u21e5(\u2326, Uk); (2) the subspace spanned by \u2326 from graph clustering is close to that of Uk.\n\nk sin(\u21e5(\u2326, Uk))k2 \uf8ff kk2\n\n\u2318\n\n,\n\nk sin(\u21e5(\u2326, Uk))kF \uf8ff\n\npkkkF\n\u2318\n\n.\n\n3\n\n\fl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\no\n\n \n\ni\n\ne\nn\ns\no\nC\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n0\n\n \n\nl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\nRandom Partition\nGraph Clustering\n\no\n\n \n\ni\n\ne\nn\ns\no\nC\n\n10\n\n20\n\n30\n\n40\n\n50\n\n60\n\n70\n\n80\n\n90\n\n100\n\nRank k\n\n(a)\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n0\n\n \n\n \n\nRSVD\nBlkLan\nMSEIGS with single level\nMSEIGS\n\n10\n\n20\n\n30\n\n40\n\n50\n\n60\n\n70\n\n80\n\n90\n\n100\n\nrank k\n\n(b)\n\n|\ni\n\n\u03bb\n\n|\n\n\u2212\n\n|\ni\n\n\u00af\u03bb\n\n|\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n \n0\n\nRSVD\nBlkLan\nMSEIGS with single level\nMSEIGS\n\n10\n\n20\n\n30\n\n40\n\n50\n\n60\n\n70\n\n80\n\n90\n\n100\n\nRank k\n\n(c)\n\nFigure 1: (a): cos(\u21e5(\u2326, Uk)) with graph clustering and random partition. (b) and (c): comparison\nof RSVD, BlkLan, MSEIGS with single level and MSEIGS on the CondMat dataset with the same\nnumber of iterations (5 steps). (b) shows cos(\u21e5( \u00afUk, Uk)), where \u00afUk consists of the computed top-k\neigenvectors and (c) shows the difference between the computed eigenvalues and the exact ones.\n\nConquer Step: After obtaining \u2326 from the clusters (diagonal blocks) of A, we use \u2326 to initial-\nIn principle, we can use different solvers such as\nize the spectral decomposition solver for A.\nrandomized SVD (RSVD) and block Lanczos (BlkLan).\nIn our divide-and-conquer framework,\nwe focus on using block Lanczos due to its superior performance as compared to RSVD. The\nbasic idea of block Lanczos is to use an n \u21e5 b initial matrix V0 to construct the Krylov sub-\nspace of A. After j  1 steps of block Lanczos, the j-th Krylov subspace of A is given as\nKj(A, V0) = span{V0, AV0,\u00b7\u00b7\u00b7 , Aj1V0}. As the block Lanczos algorithm proceeds, an orthonor-\nmal basis \u02c6Qj for Kj(A, V0) is generated as well as a block tridiagonal matrix \u02c6Tj, which is a projec-\ntion of A onto Kj(A, V0). Then the Rayleigh-Ritz procedure is applied to compute the approximate\neigenpairs of A. More details about the block Lanczos is given in Appendix 6.1. In contrast, RSVD,\nwhich is equivalent to subspace iteration with a Gaussian random matrix, constructs a basis for\nAj1V0 and then restricts A to this subspace to obtain the decomposition. As a consequence, block\nLanczos can achieve better performance than RSVD with the same number of iterations.\nIn Figure 1(b), we compare block Lanczos with RSVD in terms of cos(\u21e5( \u00afUk, Uk)) for the CondMat\ndataset, where \u00afUk consists of the approximate k dominant eigenvectors. Similarly in Figure 1(c),\nwe show that the eigenvalues computed by block Lanczos are more closer to the true eigenvalues.\nIn other words, block Lanczos needs less iterations than RSVD to achieve similar accuracy. For the\nCondMat dataset, block Lanczos takes 7 iterations to achieve mean of cos(\u21e5( \u00afUk, Uk)) to be 0.99,\nwhile RSVD takes more than 10 iterations to obtain similar performance. It is worth noting that\nthere are a number of improved versions of block Lanczos [1, 6], and we show in the experiments\nthat our method achieves superior performance even with the simple version of block Lanczos.\nThe single-level version of our proposed MSEIGS algorithm is given in Algorithm 1. Some remarks\non Algorithm 1 are in order: (1) kAiikF is likely to be different among clusters and larger clusters\ntend to have more in\ufb02uence over the spectrum of the entire matrix. Thus, we select the rank r\nfor each cluster i based on the ratio kAiikF /Pi kAiikF ; (2) We use a small number of additional\neigenvectors in step 4 (similar to RSVD) to improve the effectiveness of block Lanczos; (3) It is\ntime consuming to test convergence of the Ritz pairs in block Lanczos (steps 7, 8 of Algorithm 3 in\nthe Appendix), thus we test convergence after running a few iterations of block Lanczos; (4) Better\nquality of clustering, i.e., smaller kkF , implies higher accuracy of MSEIGS. We give performance\nresults of MSEIGS with varying cluster quality in Appendix 6.4. From Figures 1(b) and 1(c), we\ncan observe that the single-level MSEIGS performs much better than block Lanczos and RSVD.\nWe can now analyze the approximation quality of Algorithm 1 by \ufb01rst examining the difference\nbetween the eigenvalues computed by Algorithm 1 and the exact eigenvalues of A.\nTheorem 3.2. Let \u00af1 \u00b7\u00b7\u00b7 \u00afkq be the approximate eigenvalues obtained after q steps of block\nLanczos in Algorithm 1. According to Kaniel-Paige Convergence Theory [23], we have\n\ni \uf8ff \u00afi \uf8ff i +\n\n(1  i) tan2(\u2713)\n\n.\n\nq1( 1+\u232bi\nT 2\n1\u232bi\n\n)\n\nUsing Theorem 3.1, we further have\n\ni \uf8ff \u00afi \uf8ff i +\n\n(1  i)kk2\nq1( 1+\u232bi\nT 2\n1\u232bi\n\n)(\u23182  kk2\n2)\n\n2\n\n,\n\nwhere Tm(x) is the m-th Chebyshev polynomial of the \ufb01rst kind, \u2713 is the largest principal angle of\n\u21e5(\u2326, Uk) and \u232bi = ik+1\ni1\n\n.\n\n4\n\n\fsin2(\u2713)\n\nNext we show the bound of Algorithm 1 in terms of rank-k approximation error.\nTheorem 3.3. Given a n\u21e5n symmetric matrix A, suppose by Algorithm 1, we can approximate its k\ndominant eigenpairs and form a rank-k approximation, i.e., A \u21e1 \u00afUk \u00af\u2303k \u00afV T\nk with \u00afUk = [\u00afu1,\u00b7\u00b7\u00b7 , \u00afuk]\nand \u00af\u2303k = diag(\u00af1,\u00b7\u00b7\u00b7 , \u00afk) . The approximation error can be bounded as\n1  sin2(\u2713)\u25c6 1\n2\u25c6 1\n\nkA  \u00afUk \u00af\u2303k \u00afV T\nwhere q is the number of iterations for block Lanczos and Ak is the best rank-k approximation of A.\nUsing Theorem 3.1, we further have\nkA  \u00afUk \u00af\u2303k \u00afV T\n\nk k2 \uf8ff 2kA  Akk2\u27131 +\nk k2 \uf8ff 2kA  Akk2\u2713 kk2\n\nThe proof is given in Appendix 6.3. The above two theorems show that a good initialization is\nimportant for block Lanczos. Using Algorithm 1, we will expect a small kk2 and \u2713 (as shown in\nFigure 1(a)) because it embodies the clustering structure of A and constructs a good initialization.\nTherefore, our algorithm can have faster convergence compared to block Lanczos with random\ninitialization. The time complexity for Algorithm 1 is O(|E|k + nk2).\nAlgorithm 1: MSEIGS with single level\n\n\u23182  kk2\n\n2(q+1)\n\n2\n\n,\n\n2(q+1)\n\n.\n\n: n \u21e5 n symmetric sparse matrix A, target rank k and number of clusters c.\n\nInput\nOutput: The approximate dominant k eigenpairs (\u00afi, \u00afui), i = 1,\u00b7\u00b7\u00b7 , k of A.\n1 Generate c clusters A11,\u00b7\u00b7\u00b7 , Acc by performing graph clustering on A (e.g., Metis or Graclus).\n2 Compute top-r eigenpairs ((i)\n3 Select the top-k eigenvalues and their eigenvectors from the c clusters to obtain U (1)\n,\u00b7\u00b7\u00b7 , U (c)\nkc .\nk1\n4 Form block diagonal matrix \u2326= U (1)\n5 Apply block Lanczos (Algorithm 3 in Appendix 6.1) with initialization Q1 =\u2326 .\n\nj ), j = 1,\u00b7\u00b7\u00b7 , r, of Aii using standard eigensolvers.\n\nk1 \u00b7\u00b7\u00b7 U (c)\n\nj , u(i)\n\nkc\n\n(Pi ki = k).\n\n3.2 Multi-scale spectral decomposition\nIn this section, we describe our multi-scale spectral decomposition algorithm (MSEIGS). One chal-\nlenge for Algorithm 1 is the trade-off in choosing the number of clusters c. If c is large, although\ncomputing the top-r eigenpairs of Aii can be very ef\ufb01cient, it is likely to increase kk, which in\nturn will result in slower convergence of Algorithm 1. In contrast, larger clusters will emerge when\nc is small, increasing the time to compute the top-r eigendecomposition for each Aii. However,\nkk is likely to decrease in this case, resulting in faster convergence of Algorithm 1. To address\nthis issue, we can further partition Aii into c smaller clusters and construct a hierarchy until each\ncluster is small enough to be solved ef\ufb01ciently. After obtaining this hierarchical clustering, we can\nrecursively apply Algorithm 1 as it moves from lower levels to upper levels in the hierarchy tree.\nBy constructing a hierarchy, we can pick a small c to obtain \u2326 with small \u21e5(\u2326, Uk) (we set c = 4 in\nthe experiments). Our MSEIGS algorithm with multiple levels is described in Algorithm 2. Figures\n1(b) and 1(c) show a comparison between MSEIGS and MSEIGS with a single level. For the single\nlevel case, we use the top-r eigenpairs of the c child clusters computed up to machine precision.\nWe can see that MSEIGS performs similarly well compared to the single level case showing the\neffectiveness of our multi-scale approach. To build the hierarchy, we can adopt either top-down or\nbottom-up approaches using existing clustering algorithms. The overhead of clustering is very low,\nusually less than 10% of the total time. For example, MSEIGS takes 1,825 seconds, where clustering\ntakes only 80 seconds, for the FriendsterSub dataset (in Table 1) with 10M nodes and 83M edges.\nEarly Termination of MSEIGS: Computing the exact spectral decomposition of A can be quite\ntime consuming. Furthermore, highly accurate eigenvalues/eigenvectors are not essential for many\napplications. Thus, we propose a fast early termination strategy (MSEIGS-Early) to approximate\nthe eigenpairs of A by terminating MSEIGS at a certain level of the hierarchy tree. Suppose that\nwe terminate MSEIGS at the `-th level with c` clusters. From the top-r eigenpairs of each cluster,\nwe can select the top-k eigenvalues and the corresponding eigenvectors from all c` clusters as an\napproximate eigendecomposition of A. As shown in Sections 4.2 and 4.3, we can signi\ufb01cantly\nreduce the computation time while attaining comparable performance using the early termination\nstrategy for two applications: label propagation and inductive matrix completion.\nMulti-core Parallelization: An important advantage of MSEIGS is that it can be easily parallelized,\nwhich is essential for large-scale eigendecomposition. There are two main aspects of parallelism\n\n5\n\n\fAlgorithm 2: Multi-scale spectral decomposition (MSEIGS)\n\n: n \u21e5 n symmetric sparse matrix A, target rank k, the number of levels ` of the\nhierarchy tree and the number of clusters c at each node.\n\nInput\nOutput: The approximate dominant k eigenpairs (\u00afi, \u00afui), i = 1,\u00b7\u00b7\u00b7 , k of A.\n1 Perform hierarchical clustering on A (e.g., top-down or bottom-up).\n2 Compute the top-r eigenpairs of each leaf node A(`)\nii\n3 for i = `  1,\u00b7\u00b7\u00b7 , 1 do\nfor j = 1,\u00b7\u00b7\u00b7 , ci do\n\nfor i = 1,\u00b7\u00b7\u00b7 , c`, using block Lanczos.\n\n4\n\n5\n\n6\n\nend\n\n7\n8 end\n\nForm block diagonal matrix \u2326(i)\nj by (2).\nCompute the eigendecomposition of A(i)\n\njj by Algorithm 1 with \u2326(i)\n\nj as the initial block.\n\nin MSEIGS: (1) The eigendecomposition of clusters in the same level of the hierarchy tree can\nbe computed independently; (2) Block Lanczos mainly involves matrix-matrix operations (Level 3\nBLAS), thus ef\ufb01cient parallel linear algebra libraries (e.g., Intel MKL) can be used. We show in\nSection 4 that MSEIGS can achieve signi\ufb01cant speedup in shared-memory multi-core settings.\n\n4 Experimental Results\nIn this section, we empirically demonstrate the bene\ufb01ts of our proposed MSEIGS method. We\ncompare MSEIGS with other popular eigensolvers including Matlab\u2019s eigs function (EIGS) [14],\nPROPACK [12], randomized SVD (RSVD) [7] and block Lanczos with random initialization (Blk-\nLan) [21] on three different tasks: approximating the eigendecomposition, label propagation and\ninductive matrix completion. The experimental settings can be found in Appendix 6.5.\n4.1 Approximation results\nFirst, we show in Figure 2 the performance of MSEIGS for approximating the top-k eigenvectors\nfor different types of real-world graphs including web graphs, social networks and road networks\n[17, 28]. Summary of the datasets is given in Table 1, where the largest graph contains more than 3.6\nbillion edges. We use the average of the cosine of principal angles cos(\u21e5( \u00afUk, Uk)) as the evaluation\nmetric, where \u00afUk consists of the computed top-k eigenvectors and Uk represents the \u201ctrue\u201d top-k\neigenvectors computed up to machine precision using Matlab\u2019s eigs function. Larger values of the\naverage cos(\u21e5( \u00afUk, Uk)) imply smaller principal angles between the subspace spanned by Uk and\nthat of \u00afUk, i.e., better approximation. As shown in Figure 2, with the same amount of time, the\neigenvectors computed by MSEIGS consistently yield better principal angles than other methods.\n\ndataset\n\n# of nodes\n\n# of nonzeros\n\nrank k\n\nCondMat\n21,263\n182,628\n100\n\nTable 1: Datasets of increasing sizes.\nRoadCA LiveJournal\n1,965,206\n3,997,962\n69,362,378\n5,533,214\n200\n500\n\nAmazon\n334,843\n1,851,744\n100\n\nFriendsterSub\n\nSDWeb\n10.00M 82.29M\n3.68B\n83.67M\n100\n50\n\nSince MSEIGS divides the problem into independent subproblems, it is naturally parallelizable. In\nFigure 3, we compare MSEIGS with other methods under the shared-memory multi-core setting\nfor the LiveJournal and SDWeb datasets. We vary the number of cores from 1 to 16 and show the\ntime to compute similar approximation of the eigenpairs. As shown in Figure 3, MSEIGS achieves\nalmost linear speedup and outperforms other methods. For example, MSEIGS is the fastest method\nachieving a speedup of 10 using 16 cores for the LiveJournal dataset.\n4.2 Label propagation for semi-supervised learning and multi-label learning\nOne application for MSEIGS is to speed up the label propagation algorithm, which is widely used\nfor graph-based semi-supervised learning [29] and multi-label learning [26]. The basic idea of\nlabel propagation is to propagate the known labels over an af\ufb01nity graph (represented as a weighted\nmatrix W ) constructed using both labeled and unlabeled examples. Mathematically, at the (t + 1)-th\niteration, F (t + 1) = \u21b5SF (t) + (1  \u21b5)Y , where S is the normalized af\ufb01nity matrix of W ; Y is\nthe n \u21e5 l initial label matrix; F is the predicted label matrix; l is the number of labels; n is the total\nnumber of samples; 0 \uf8ff \u21b5< 1. The optimal solution is F \u21e4 = (1 \u21b5)(I  \u21b5S)1Y . There are two\nstandard approaches to approximate F \u21e4: one is to iterate over F (t) until convergence (truncated\n\n6\n\n\fl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\ni\n\n \n\no\ne\nn\ns\no\nc\n \n.\n\ng\nv\nA\n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n \n0\n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\nl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\no\n\n \n\ni\n\ne\nn\ns\no\nc\n \n.\n\ng\nv\nA\n\n0.6\n \n0\n\n \n\nl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n \n0\n\ni\n\n \n\no\ne\nn\ns\no\nc\n \n.\n\ng\nv\nA\n\nEIGS\nPROPACK\nRSVD\nBlkLan\nMSEIGS\n\n1\n\n2\n\n3\n\n4\n\n5\n\nTime (sec)\n\n(a) CondMat\n\nEIGS\nPROPACK\nRSVD\nBlkLan\nMSEIGS\n\n \n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\nl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\no\n\n \n\ni\n\ne\nn\ns\no\nc\n \n.\n\ng\nv\nA\n\nEIGS\nPROPACK\nRSVD\nBlkLan\nMSEIGS\n\n20\n\n40\n\n60\n\n80\n\n100\n\nTime (sec)\n\n(b) Amazon\n\n \n\n \n\nl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n \n0\n\ni\n\n \n\no\ne\nn\ns\no\nc\n \n.\n\ng\nv\nA\n\n1\n\n0.95\n\n0.9\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\nl\n\ns\ne\ng\nn\na\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\np\n\n \nf\n\no\n\n \n\ni\n\ne\nn\ns\no\nc\n \n.\n\ng\nv\nA\n\n \n\nEIGS\nPROPACK\nRSVD\nBlkLan\nMSEIGS\n\n500\n\n1000\n\n1500\n\n2000\n\n2500\n\nTime (sec)\n\n(c) FriendsterSub\n\n \n\nEIGS\nPROPACK\nRSVD\nBlkLan\nMSEIGS\n\n1\n\n1.5\n\nTime (sec)\n\n2\n\n2.5\n4\nx 10\n\n(f) SDWeb\n\n500\n\n1000\n\n1500\n\n2000\n\nTime (sec)\n\n2500\n\n3000\n\n3500\n\n4000\n\n(d) RoadCA\n\n0.6\n \n0\n\n2000\n\n4000\n\n6000\n\n8000\n\nTime (sec)\n\n(e) LiveJournal\n\nEIGS\nPROPACK\nRSVD\nBlkLan\nMSEIGS\n\n10000\n\n12000\n\n14000\n\n0.6\n\n \n\n0.5\n\nFigure 2: The k dominant eigenvectors approximation results showing time vs. average cosine of\nprincipal angles. For a given time, MSEIGS consistently yields better results than other methods.\n\n4\n10\n\n3\n10\n\n)\nc\ne\ns\n(\n \n\ne\nm\nT\n\ni\n\n \n\n2\n\n4\n\n6\nNumber of cores\n\n10\n\n8\n\n \n\nEIGS\nRSVD\nBlkLan\nMSEIGS\n\n4\n10\n\n)\nc\ne\ns\n(\n \n\ne\nm\nT\n\ni\n\n12\n\n14\n\n16\n\n \n\n2\n\n4\n\n \n\nEIGS\nRSVD\nBlkLan\nMSEIGS\n\n12\n\n14\n\n16\n\n6\nNumber of cores\n\n10\n\n8\n\nFigure 3: Shared-memory multi-core results showing number of cores vs. time to compute similar\napproximation. MSEIGS achieves almost linear speedup and outperforms other methods.\n\n(a) LiveJournal\n\n(b) SDWeb\n\nmethod); another is to solve F \u21e4 as a system of linear equations by using an iterative solver like\nconjugate gradient (CG) [10]. However, both methods suffer from slow convergence, especially\nwhen the number of labels, i.e., columns of Y , grows dramatically. As an alternative, we can apply\nMSEIGS to generate the top-k eigendecomposition of S such that S \u21e1 \u00afUk \u00af\u2303k \u00afU T\nk and approximate\nk Y . Obviously, \u00afF is robust to large numbers of labels.\nF \u21e4 as F \u21e4 \u21e1 \u00afF = (1  \u21b5) \u00afUk(I  \u21b5 \u00af\u2303k)1 \u00afU T\nIn Table 2, we compare MSEIGS and MSEIGS-Early with other methods for label propagation on\ntwo public datasets: Aloi and Delicious, where Delicious is a multi-label dataset containing 16,105\nsamples and 983 labels, and Aloi is a semi-supervised learning dataset containing 108,000 samples\nwith 1,000 classes. More details of the datasets and parameters are given in Appendix 6.6. As we\ncan see in Table 2, MSEIGS and MSEIGS-Early signi\ufb01cantly outperform other methods. To achieve\nsimilar accuracy, MSEIGS takes much less time. More interestingly, MSEIGS-Early is faster than\nMSEIGS and almost 10 times faster than other methods with very little degradation of accuracy\nshowing the ef\ufb01ciency of our early-termination strategy.\n4.3\nIn the context of recommender systems, Inductive Matrix Completion (IMC) [8] is another important\napplication where MSEIGS can be applied. IMC incorporates side-information of users and items\ngiven in the form of feature vectors for matrix factorization, which has been shown to be effective\nfor the gene-disease association problem [18]. Given a user-item ratings matrix R 2 Rm\u21e5n, where\nRij is the known rating of item j by user i, IMC is formulated as follows:\n(kWk2\n\nInductive matrix completion for recommender systems\n\nF + kHk2\nF ),\n\ni W H T yj)2 +\n\n(Rij  xT\n\n\n2\n\nW2Rfc\u21e5r,H2Rfd\u21e5r X(i,j)2\u2326\n\nmin\n\nwhere \u2326 is the set of observed entries;  is a regularization parameter; xi 2 Rfc and yj 2 Rfd\nare feature vectors for user i and item j, respectively. We evaluated MSEIGS combined with IMC\nfor recommendation tasks where a social network among users is also available. It has been shown\n\n7\n\n\fTable 2: Label propagation results on two real datasets including Aloi for semi-supervised classi\ufb01-\ncation and Delicious for multi-label learning. The graph is constructed using [16], which takes 87.9\nseconds for Aloi and 16.1 seconds for Delicious. MSEIGS is about 5 times faster and MSEIGS-\nEarly is almost 20 times faster than EIGS while achieving similar accuracy on the Aloi dataset.\n\nMethod\nTruncated\n\nCG\nEIGS\nRSVD\nBlkLan\nMSEIGS\n\nMSEIGS-Early\n\nAloi (k = 1500)\n\nDelicious (k = 1000)\n\ntime(seconds)\n1824.8\n2921.6\n3890.9\n964.1\n1272.2\n767.1\n176.2\n\nacc(%)\n59.87\n60.01\n60.08\n59.62\n59.96\n60.03\n58.98\n\ntime(seconds)\n3385.1\n1094.9\n458.2\n359.8\n395.6\n235.6\n61.36\n\ntop3-acc(%)\n45.12\n44.93\n45.11\n44.11\n43.52\n44.84\n44.71\n\ntop1-acc(%)\n48.89\n48.73\n48.51\n46.91\n45.53\n49.23\n48.22\n\nthat exploiting these social networks improves the quality of recommendations [9, 25]. One way to\nobtain useful and robust features from the social network is to consider the k principal components,\ni.e., top-k eigenvectors, of the corresponding adjacency matrix A. We compare the recommendation\nperformance of IMC using eigenvectors computed by MSEIGS, MSEIGS-Early and EIGS. We also\nreport results for two baseline methods: standard matrix completion (MC) without user/item features\nand Katz1 on the combined network C = [A R; RT 0] as in [25].\nWe evaluated the recommendation performance on three publicly available datasets shown in Table 6\n(see Appendix 6.7 for more details). The Flixster dataset [9] contains user-movie ratings information\nand the other two datasets [28] are for the user-af\ufb01liation recommendation task. We report recall-\nat-N with N = 20 averaged over 5-fold cross-validation, which is a widely used evaluation metric\nfor top-N recommendation tasks [2]. In Table 3, we can see that IMC outperforms the two baseline\nmethods: Katz and MC. For IMC, both MSEIGS and MSEIGS-Early achieve comparable results\ncompared to other methods, but require much less time to compute the top-k eigenvectors (i.e., user\nlatent features). For the LiveJournal dataset, MSEIGS-Early is almost 8 times faster than EIGS\nwhile attaining similar performance as shown in Table 3.\n\nTable 3: Recall-at-20 (RCL@20) and top-k eigendecomposition time (eig-time, in seconds) results\non three real-world datasets: Flixster, Amazon and LiveJournal. MSEIGS and MSEIGS-Early re-\nquire much less time to compute the top-k eigenvectors (latent features) for IMC while achieving\nsimilar performance compared to other methods. Note that Katz and MC do not use eigenvectors.\n\nMethod\nKatz\nMC\nEIGS\nRSVD\nBlkLan\nMSEIGS\n\nMSEIGS-Early\n\nFlixster (k = 100)\neig-time RCL@20\n0.1119\n0.0820\n0.1472\n0.1491\n0.1465\n0.1489\n0.1481\n\n-\n-\n120.51\n85.31\n104.95\n36.27\n21.88\n\nAmazon (k = 500)\neig-time RCL@20\n0.3224\n0.4497\n0.4999\n0.4875\n0.4687\n0.4911\n0.4644\n\n-\n-\n871.30\n369.82\n882.58\n264.47\n179.04\n\nLiveJournal (k = 500)\nRCL@20\neig-time\n-\n0.2838\n0.2699\n-\n0.4259\n12099.57\n0.4294\n7617.98\n0.4248\n5099.79\n2863.55\n0.4253\n0.4246\n1545.52\n\n5 Conclusions\nIn this paper, we proposed a novel divide-and-conquer based framework, multi-scale spectral de-\ncomposition (MSEIGS), for approximating the top-k eigendecomposition of large-scale graphs. Our\nmethod exploits the clustering structure of the graph and converges faster than state-of-the-art meth-\nods. Moreover, our method can be easily parallelized, which makes it suitable for massive graphs.\nEmpirically, MSEIGS consistently outperforms other popular eigensolvers in terms of convergence\nspeed and approximation quality on real-world graphs with up to billions of edges. We also show\nthat MSEIGS is highly effective for two important applications: label propagation and inductive\nmatrix completion. Dealing with graphs that cannot \ufb01t into memory is one of our future research\ndirections. We believe that MSEIGS can also be ef\ufb01cient in streaming and distributed settings with\ncareful implementation.\nAcknowledgments\nThis research was supported by NSF grant CCF-1117055 and NSF grant CCF-1320746.\n\n1The Katz measure is de\ufb01ned asPt\n\ni=1 tCt. We set  = 0.01 and t = 10.\n\n8\n\n\fReferences\n\n[1] J. Baglama, D. Calvetti, and L. Reichel. IRBL: An implicitly restarted block-lanczos method for large-\n\nscale hermitian eigenproblems. SIAM J. Sci. Comput., 24(5):1650\u20131677, 2003.\n\n[2] P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-N recommenda-\n\ntion tasks. In RecSys, pages 39\u201346, 2010.\n\n[3] J. Cuppen. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numer. Math.,\n\n36(2):177\u2013195, 1980.\n\n[4] C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal.,\n\n7(1):1\u201346, 1970.\n\n[5] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors a multilevel approach.\n\nIEEE Trans. Pattern Anal. Mach. Intell., 29(11):1944\u20131957, 2007.\n\n[6] R. Grimes, J. Lewis, and H. Simon. A shifted block lanczos algorithm for solving sparse symmetric\n\ngeneralized eigenproblems. SIAM J. Matrix Anal. Appl., 15(1):228\u2013272, 1994.\n\n[7] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms\n\nfor constructing approximate matrix decompositions. SIAM Rev., 53(2):217\u2013288, 2011.\n\n[8] P. Jain and I. S. Dhillon. Provable inductive matrix completion. CoRR, abs/1306.0626, 2013.\n[9] M. Jamali and M. Ester. A matrix factorization technique with trust propagation for recommendation in\n\nsocial networks. In RecSys, pages 135\u2013142, 2010.\n\n[10] M. Karasuyama and H. Mamitsuka. Manifold-based similarity adaptation for label propagation. In NIPS,\n\npages 1547\u20131555, 2013.\n\n[11] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs.\n\nSIAM J. Sci. Comput., 20(1):359\u2013392, 1998.\n\n[12] R. M. Larsen. Lanczos bidiagonalization with partial reorthogonalization. Technical Report DAIMI\n\nPB-357, Aarhus University, 1998.\n\n[13] D. LaSalle and G. Karypis. Multi-threaded modularity based graph clustering using the multilevel\n\nparadigm. Technical Report 14-010, University of Minnesota, 2014.\n\n[14] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users\u2019 Guide. Society for Industrial and Applied\n\nMathematics, 1998.\n\n[15] R. Li. Relative perturbation theory: II. eigenspace and singular subspace variations. SIAM J. Matrix Anal.\n\nAppl., 20(2):471\u2013492, 1998.\n\n[16] W. Liu, J. He, and S.-F. Chang. Large graph construction for scalable semi-supervised learning. In ICML,\n\npages 679\u2013686, 2010.\n\n[17] R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer. Graph structure in the web \u2014 revisited: A trick of the\n\nheavy tail. In WWW Companion, pages 427\u2013432, 2014.\n\n[18] N. Natarajan and I. S. Dhillon.\n\nInductive matrix completion for predicting gene-disease associations.\n\nBioinformatics, 30(12):i60\u2013i68, 2014.\n\n[19] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In NIPS, pages\n\n849\u2013856, 2001.\n\n[20] D. Papailiopoulos, I. Mitliagkas, A. Dimakis, and C. Caramanis. Finding dense subgraphs via low-rank\n\nbilinear optimization. In ICML, pages 1890\u20131898, 2014.\n\n[21] B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, 1980.\n[22] R. Patro and C. Kingsford. Global network alignment using multiscale spectral signatures. Bioinformatics,\n\n28(23):3105\u20133114, 2012.\n\n[23] Y. Saad. On the rates of convergence of the lanczos and the block-lanczos methods. SIAM J. Numer.\n\nAnal., 17(5):687\u2013706, 1980.\n\n[24] D. Shin, S. Si, and I. S. Dhillon. Multi-scale link prediction. In CIKM, pages 215\u2013224, 2012.\n[25] V. Vasuki, N. Natarajan, Z. Lu, B. Savas, and I. Dhillon. Scalable af\ufb01liation recommendation using\n\nauxiliary networks. ACM Trans. Intell. Syst. Technol., 3(1):3:1\u20133:20, 2011.\n\n[26] B. Wang, Z. Tu, and J. Tsotsos. Dynamic label propagation for semi-supervised multi-class multi-label\n\nclassi\ufb01cation. In ICCV, pages 425\u2013432, 2013.\n\n[27] J. J. Whang, X. Sui, and I. S. Dhillon. Scalable and memory-ef\ufb01cient clustering of large-scale social\n\nnetworks. In ICDM, pages 705\u2013714, 2012.\n\n[28] J. Yang and J. Leskovec. De\ufb01ning and evaluating network communities based on ground-truth. In ICDM,\n\npages 745\u2013754, 2012.\n\n[29] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch\u00a8olkopf. Learning with local and global consistency.\n\nIn NIPS, pages 321\u2013328, 2004.\n\n9\n\n\f", "award": [], "sourceid": 1454, "authors": [{"given_name": "Si", "family_name": "Si", "institution": "University of Texas at Austin"}, {"given_name": "Donghyuk", "family_name": "Shin", "institution": "University of Texas at Austin"}, {"given_name": "Inderjit", "family_name": "Dhillon", "institution": "University of Texas"}, {"given_name": "Beresford", "family_name": "Parlett", "institution": "University of California, Berkeley"}]}