{"title": "Structured Graph Learning Via Laplacian Spectral Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 11651, "page_last": 11663, "abstract": "Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. But structured graph learning from observed samples is an NP-hard combinatorial problem. In this paper, we first show, for a set of important graph families it is possible to convert the combinatorial constraints of structure into eigenvalue constraints of the graph Laplacian matrix. Then we introduce a unified graph learning framework lying at the integration of the spectral properties of the Laplacian matrix with Gaussian graphical modeling, which is capable of learning structures of a large class of graph families. The proposed algorithms are provably convergent and practically amenable for big-data specific tasks. Extensive numerical experiments with both synthetic and real datasets demonstrate the effectiveness of the proposed methods. An R package containing codes for all the experimental results is submitted as a supplementary file.", "full_text": "Structured Graph Learning via Laplacian Spectral\n\nConstraints\n\nSandeep Kumar(cid:63)\n\nJiaxi Ying\u2020\n\nJos\u00b4e Vin\u00b4\u0131cius de M. Cardoso\u2020\n\nsandeep0kr@gmail.com\n\njx.ying@connect.ust.hk\n\njvdmc@connect.ust.hk\n\nDaniel P. Palomar(cid:63),\u2020\npalomar@ust.hk\n\nDepartment of Industrial Engineering and Data Analytics(cid:63)\n\nDepartment of Electronic and Computer Engineering\u2020\n\nThe Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong\n\nAbstract\n\nLearning a graph with a speci\ufb01c structure is essential for interpretability and\nidenti\ufb01cation of the relationships among data. It is well known that structured\ngraph learning from observed samples is an NP-hard combinatorial problem. In\nthis paper, we \ufb01rst show that for a set of important graph families it is possible\nto convert the structural constraints of structure into eigenvalue constraints of the\ngraph Laplacian matrix. Then we introduce a uni\ufb01ed graph learning framework,\nlying at the integration of the spectral properties of the Laplacian matrix with\nGaussian graphical modeling that is capable of learning structures of a large class\nof graph families. The proposed algorithms are provably convergent and practically\namenable for large-scale semi-supervised and unsupervised graph based learning\ntasks. Extensive numerical experiments with both synthetic and real data sets\ndemonstrate the effectiveness of the proposed methods. An R package containing\ncode for all the experimental results is available at https://cran.r-project.\norg/package=spectralGraphTopology.\n\nIntroduction\n\n1\nGraph models constitute an effective representation of data available across numerous domains\nin science and engineering [1]. Gaussian graphical modeling (GGM) encodes the conditional\ndependence relationships among a set of p variables. In this framework, an undirected graph is\nassociated to the variables, where each vertex corresponds to one variable, and an edge is present\nbetween two vertices if the corresponding random variables are conditionally dependent [2, 3]. GGM\nis a tool of increasing importance in a number of \ufb01elds including \ufb01nance, biology, statistical learning,\nand computer vision [4, 5].\nFor improved interpretability and precise identi\ufb01cation of the structure in the data, it is desirable to\nlearn a graph with a speci\ufb01c structure. For example, gene pathways analysis are studied through\nmulti-component graph structures [6, 7], as genes can be grouped into pathways, and connections\nwithin a pathway might be more likely than connections between pathway, forming a cluster; a\nbipartite graph structure yields a more precise model for drug matching and topic modeling in\ndocument analysis [8, 9]; a regular graph structure is suited for designing communication ef\ufb01cient\ndeep learning architectures [10, 11]; and a sparse yet connected graph structure is required for graph\nsignal processing applications [12].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fStructured graph learning from sample data involves both the estimation of structure (graph connec-\ntivity) and parameters (graph weights). While there are a variety of methods for parameter estimation\n(e.g., maximum likelihood), structure estimation is arguably very challenging due to its combinatorial\nnature. Structure learning is NP-hard [13, 14] for a general class of graphical models and the effort\nhas been on characterizing families of structures for which learning can be feasible. In this paper,\nwe present one such characterization based on the so-called spectral properties of a graph Laplacian\nmatrix. Under this framework, structure learning of a large class of graph families can be expressed\nas the eigenvalue problem of the graph Laplacian matrix. Our contributions in this paper are threefold.\nFirst, we introduce a problem formulation that converts the combinatorial problem of structured graph\nlearning into an optimization problem of graph matrix eigenvalues. Secondly, we discuss various\ntheoretical and practical aspects of the proposed formulation and develop computationally ef\ufb01cient\nalgorithms to solve the problem. Finally, we show the effectiveness of the proposed algorithm with\nnumerous synthetic and real data experiments.\nAs a byproduct of our investigation, we also reinforce the known connections between graph structure\nrepresentation and Laplacian quadratic methods (for smooth graph signals) by introducing a procedure\nthat maps a priori information of graph signals to the spectral constraints of the graph Laplacian. This\nconnection enables us to use computationally ef\ufb01cient spectral regularization framework for standard\ngraph smoothing problems to incorporate a priori information.\nThe rest of the paper is organized as follows. In Section 2, we present related background, the\nproblem formulation, and its connection to smooth graph signal analysis. In Section 3, we \ufb01rst\npropose a tractable formulation for the proposed problem and then we develop an ef\ufb01cient algorithm\nand discuss its various theoretical and practical aspects. In Section 4, we show experimental results\nwith real datasets and present additional experiments and the associated convergence proof into the\nsupplementary material. An R package containing the code for all the simulations is made available\nas open source repository.\n\n2 Background and Proposed Formulation\n\nIn this section, we review Gaussian graphical models and formulate the problem of structured graph\nlearning via Laplacian spectral constraints.\n\n2.1 Gaussian Graphical Models\nLet x = [x1, x2, . . . , xp]T be a p\u2212dimensional zero mean random vector associated with an undi-\nrected graph G = (V,E), where V = {1, 2, . . . , p} is a set of nodes corresponding to the elements of\nx, and E \u2208 V \u00d7 V is the set of edges connecting nodes. The GGM method learns a graph by solving\nthe following optimization problem:\nmaximize\n\nlog det(\u0398) \u2212 tr(cid:0)\u0398S(cid:1) \u2212 \u03b1h(\u0398),\n\n(1)\n\n\u0398\u2208S p\n\n++\n\nwhere \u0398 \u2208 Rp\u00d7p denotes the desired graph matrix, S p\n++ denotes the set of p \u00d7 p positive de\ufb01nite\nmatrices, S \u2208 Rp\u00d7p is the sample covariance matrix (SCM) obtained from data, h(\u00b7) is the regu-\nlarization term, and \u03b1 > 0 is the regularization parameter. The optimization in (1) corresponds to\nthe penalized maximum likelihood estimation of the inverse covariance (precision) matrix and also\nknown as Gaussian Markov Random Field (GMRF). With the graph G inferred from \u0398, the random\nvector x follows the Markov property, meaning \u0398ij (cid:54)= 0 \u21d0\u21d2 {i, j} \u2208 E \u2200 i (cid:54)= j: implies xi and\nxj are conditionally dependent given the rest [2, 3].\n\n2.2 Graph Laplacian\nA matrix \u0398 \u2208 Rp\u00d7p is called a combinatorial graph Laplacian matrix if it belongs to the following\nset:\n\n(cid:110)\n\u0398|\u0398ij = \u0398ji \u2264 0 for i (cid:54)= j, \u0398ii = \u2212(cid:88)\n\nS\u0398 =\n\n(2)\n\n(cid:111)\n\n\u0398ij\n\n.\n\nj(cid:54)=i\n\nThe Laplacian matrix \u0398 is a symmetric, positive semi de\ufb01nite matrix with zero row sum [15]. The non-\nzero entries of the matrix encode positive edge weights as \u2212\u0398ij and \u0398ij = 0 implies no connectivity\nbetween vertices i and j. The importance of the graph Laplacian has been well recognized as a tool\n\n2\n\n\ffor embedding, manifold learning, spectral sparsi\ufb01cation, clustering and semi-supervised learning\n[16, 17, 18, 19, 20, 21, 22]. In addition, structural properties of a large class of important graph\nfamilies are encoded in the eigenvalues of the graph Laplacian matrix, and utilizing these under the\nGGM setting is the main goal of the present work.\n\nStructured Gaussian Graphical Models\n\n2.3\nThe goal is to learn matrix \u0398 as a Laplacian matrix under some eigenvalue constraints. We introduce\na general optimization framework\n\nlog gdet(\u0398) \u2212 tr(cid:0)\u0398S(cid:1) \u2212 \u03b1h(\u0398),\n\n\u0398 \u2208 S\u0398, \u03bb(\u0398) \u2208 S\u03bb,\n\nmaximize\nsubject to\n\n\u0398\n\n(3)\n\nwhere gdet(\u0398) denotes the generalized determinant [23] de\ufb01ned as the product of the non-zero\neigenvalues of \u0398, S is the SCM (with the mean removed, i.e., S = xxT ) obtained from data x, S\u0398\nis the Laplacian matrix structural constraint (2), \u03bb(\u0398) denotes the eigenvalues of \u0398, and S\u03bb is the\nset containing spectral constraints on the eigenvalues. Precisely S\u03bb will facilitate the process of\nincorporating the spectral properties required for enforcing structure on the graph to be learned.\nFrom the probabilistic perspective, when the data is generated from a Gaussian distribution x \u223c\nN (0, \u0398\u2020), then (3) can be viewed as a penalized maximum likelihood estimation of the structured\nprecision matrix of an improper attractive GMRF model [23]. For any arbitrarily distributed data,\nformulation (3) corresponds to minimizing a penalized log-determinant Bregman divergence problem,\nand hence this formulation yields a meaningful graph even for distributions that are not GMRFs.\n\nto the graph Laplacian quadratic form tr(cid:0)\u0398xxT(cid:1) =(cid:80)\n\n2.3.1 Laplacian quadratic and smooth graph signals\nIn the context of graph signal modeling, the widely used assumption is that the signal/data residing\non graphs change smoothly between connected nodes [20, 24, 25, 26]. The trace term in (3) relates\ni,j \u2212\u0398ij(xi \u2212 xj)2 also known as quadratic\nenergy function, which is used for quantifying smoothness of the graph signals [20]. Smooth graph\nsignal methods are an extremely popular family of approaches for semi-supervised learning. The type\nof graph used to encode relationships in these learning problems is often a more important decision\nthan the particular algorithm or loss function used, yet this choice is not well-investigated in the\nliterature [24]. Our proposed framework that can learn a graph with a speci\ufb01c structure based on a\npriori information of the problem at hand is indeed a promising direction for strengthening these\napproaches.\n\n2.3.2 Graph Structure via Laplacian Spectral Constraints\nNow, we introduce various choices of S\u03bb that will enable (3) to learn some important graph structures.\n\u2022 k-component graph: A graph is said to be k\u2212component connected if its vertex set can be\npartitioned into k disjoint subsets such that any two nodes belonging to different subsets are not\nconnected. The eigenvalues of any Laplacian matrix can be expressed as:\n\nS\u03bb = {{\u03bbj = 0}k\n\nj=1, c1 \u2264 \u03bbk+1 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbp \u2264 c2}\n\n(4)\nwhere k \u2265 1 denotes the number of connected components in the graph, and c1, c2 > 0 are constants\nthat depend on the number of edges and their weights [15, 19].\n\u2022 connected sparse graph: A sparse graph is simply a graph with not many connections among the\nnodes. Often, making a graph highly sparse can split the graph into several disconnected components,\nwhich many times is undesirable [12, 27]. The existing formulation cannot ensure both sparsity\nand connectedness, and there always exists a trade-off between the two properties. We can achieve\nsparsity and connectedness by using the following spectral constraint:\nS\u03bb = {\u03bb1 = 0, c1 \u2264 \u03bb2 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbp \u2264 c2}\n\n(5)\n\nd, i.e.,(cid:80)\n\nwith a proper choice of c1 > 0, c2 > 0.\n\u2022 k-component d-regular graph: All the nodes of a d-regular graph have the same weighted degree\n\u2212\u0398ij = d, \u2200 i = 1, 2, . . . , p, where Ni is the set of neighboring nodes connected to\nnode i. This states that the diagonal entries of the matrix \u0398 are d, diag(\u0398) = d1. A k\u2212component\n\nj\u2208Ni\n\n3\n\n\fregular graph structure can be learned by forcing diag(\u0398) = d1 along with the following spectral\nconstraints\n\nS\u03bb = {{\u03bbj = 0}k\n\nj=1, c1 \u2264 \u03bbk+1 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbp \u2264 c2}, diag(\u0398) = d1.\n\n(6)\n\u2022 cospectral graphs: In many applications, it is motivated to learn \u0398 with speci\ufb01c eigenvalues which\nis also known as cospectral graph learning [28]. One example is spectral sparsi\ufb01cation of graphs\n[19, 29] which aims to learn a graph \u0398 to approximate a given graph \u00af\u0398, while \u0398 is sparse and its\neigenvalues \u03bbi satisfy \u03bbi = f (\u00af\u03bbi), where { \u00af\u03bbi}p\ni=1 are the eigenvalues of the given graph \u00af\u0398 and\nf is some speci\ufb01c function. Therefore, for cospectral graph learning, we introduce the following\nconstraint\n\nS\u03bb = {\u03bbi = f (\u00af\u03bbi),\n\n\u2200i \u2208 [1, p]}.\n\n(7)\n\n2.4 Related work and discussion\nThe complexity of structure learning depends critically on the underlying graph structure and the focus\nhas been on characterizing classes of structures for which learning is feasible. The seminal work [30]\nestablished that structure learning for tree-structured graph reduces to a maximum weight spanning\ntree problem, while the work in [14] presented a characterization based on the local separation\nproperty, and proposed a greedy method based on thresholding of sample statistics for learning the\nfollowing graph structures: Erdos-Renyi random graphs, power law graphs, small world graphs, and\nother augmented graphs. Sparse graphs have been been widely studied in the high-dimensional setting\n[31]. A sparse graph under the GGM model (1) is typically learned by introducing an (cid:96)1-norm penalty\nterm, such as Graphical Lasso (GLasso) [32]. But a uniform sparsity is not enough when a speci\ufb01c\nstructure is desired [33, 34]. Recent works extended the GGM to include other structures such as\nfactor models [35], scale-free [36], degree-distribution [37], and overlapping structure with multiple\ngraphical models [34, 38], those methods are restrictive to the particular case and it is dif\ufb01cult to\nextend them to learn other structures.\nA feasible characterization that can enable a k\u2212component structured graph learning is still lacking.\nExisting methods employ a relaxation based approach where they focus on either structure estimation\nor parameter estimation. The work in [39] can only do structure estimation, while the works in\n[40, 41, 42] estimate parameters with structure information known already. In recent work, the\nauthors in [6] have developed a two-stage approach for learning a multi-component structure. The\nmethod is based on the integration expectation maximization (EM) with the GMM method, which\ncan estimate both the structure and the parameters jointly. However, the costly EM step makes this\napproach computationally prohibitive for large scale problems.\nFinally, several recent publications considered learning different types of graph Laplacians (2) under\nthe GGM setting [26, 43, 44]; however, they do not include spectral constraints and are not able to\nenforce speci\ufb01c structures onto the graph. Speci\ufb01cally, all these methods are limited to learning a\nconnected graph without structural constraints, or just learn Laplacian weights for a graph with given\nstructure estimates.\n\n2.4.1 Discussion\nThe present work identi\ufb01es that the spectral characteristics of the graph matrices are a natural and\nef\ufb01cient tool for learning structured graphs. The proposed idea is to use the spectral characteristics\ndirectly into a graph learning framework. Here, the focus is on utilizing Laplacian spectral constraints\nunder the GGM-type model but the proposed machinery has a much wider appeal. For example,\nthe proposed framework can be easily extended to learn more non-trivial structures (e.g., bipartite\nand clustered bipartite graph structures) by considering spectral properties of other graph matrices,\ne.g., adjacency, normalized Laplacian, and signless Laplacian [15, 45, 46, 47]; furthermore, the\nscope of spectral methods can be easily extended to other important statistical models such as\nthe Ising model [48], Gaussian covariance graphical models [49], Gaussian graphical models with\nlatent variables [50], least-square formulation for graph learning [51], structured linear regression,\nvector autoregression models [52], and also for the structured graph signal processing applications\n[21, 53, 54].\n\n4\n\n\f3 Optimization Method\nWe reformulate the optimization problem presented in (3) by introducing a graph Laplacian linear\noperator L and spectral penalty which, by consequence, transforms the combinatorial Laplacian\nstructural constraints into easier to handle algebraic constraints.\n3.1 Graph Laplacian operator L\nThe Laplacian matrix \u0398 belonging to S\u0398 satis\ufb01es i) \u0398ij = \u0398ji \u2264 0, ii) \u03981 = 0, implying the target\nmatrix is symmetric with degrees of freedom of \u0398 equal to p(p \u2212 1)/2. Therefore, we introduce a\nlinear operator L that transforms a non-negative vector w \u2208 Rp(p\u22121)/2\nto the matrix Lw \u2208 Rp\u00d7p\nthat satis\ufb01es the Laplacian constraints ([Lw]ij = [Lw]ji \u2264 0, for i (cid:54)= j and [Lw] \u00b7 1 = 0) as in (2).\nDe\ufb01nition 1. The linear operator L : w \u2208 Rp(p\u22121)/2\n\n\u2192 Lw \u2208 Rp\u00d7p is de\ufb01ned as\n\n+\n\n+\n\n\uf8f1\uf8f2\uf8f3\u2212wi+dj\n(cid:80)\ni(cid:54)=j[Lw]ij\n\n[Lw]ji\n\ni > j,\ni < j,\ni = j,\n\n[Lw]ij =\n\n2 (2p \u2212 j).\n\nwhere dj = \u2212j + j\u22121\nWe derive the adjoint operator L\u2217 of L to satisfy (cid:104)Lw, Y (cid:105) = (cid:104)w,L\u2217Y (cid:105).\nLemma 1. The adjoint operator L\u2217 : Y \u2208 Rp\u00d7p (cid:55)\u2192 L\u2217Y \u2208 R p(p\u22121)\n[L\u2217Y ]k = yi,i \u2212 yi,j \u2212 yj,i + yj,j,\n\n2\n\nis de\ufb01ned by\n\n2p, where (cid:107)L(cid:107)2 = sup(cid:107)x(cid:107)=1 (cid:107)Lx(cid:107)F with x \u2208 Rp\u00d7(p\u22121)/2.\n\nwhere i, j \u2208 Z+ satisfy k = i \u2212 j + j\u22121\n2 (2p \u2212 j) and i > j.\n\u221a\nLemma 2. The operator norm (cid:107)L(cid:107)2 is\nProof. Follows from the de\ufb01nitions of L and L\u2217: see supplementary material for detailed proof.\nBy the de\ufb01nition of the Laplacian operator L in (1), the set of graph Laplacian constraints in (2) can\nbe expressed as S\u0398 = {Lw|w \u2265 0}, where w \u2265 0 means each entry of w is non-negative. We\nrepresent the Laplacian matrix \u0398 \u2208 S\u0398 as Lw.\nTo ensure sparsity of edges in the learned graph, we use the (cid:96)1-regularization function. Observe\nthat the sign of Lw is \ufb01xed by the constraints Lwij \u2264 0 for i (cid:54)= j and Lwij \u2265 0 for i = j, the\nregularization term \u03b1(cid:107)Lw(cid:107)1 can be written as tr (LwH), where H = \u03b1(2I \u2212 11T ), which implies\n\ntr(cid:0)\u0398S(cid:1) + \u03b1h(Lw) = tr(cid:0)LwK(cid:1), where K = S + H.\n\n3.2 Reformulating problem (3) with graph Laplacian operator\n\nTo solve (3), for learning a graph Laplacian \u0398 with the desired spectral properties, we propose the\nfollowing Laplacian spectral constrained optimization problem\n\u2212 log gdet(UDiag(\u03bb)U T ) + tr (KLw) +\n\n(cid:107)Lw \u2212 UDiag(\u03bb)U T(cid:107)2\nF ,\n\n(8)\n\n\u03b2\n2\n\nw,\u03bb,U\n\nminimize\nsubject to w \u2265 0, \u03bb \u2208 S\u03bb, U T U = I.\n\nwhere Lw is the desired Laplacian matrix which seeks to admit the decomposition Lw =\nUDiag(\u03bb)U T , Diag(\u03bb) \u2208 Rp\u00d7p is a diagonal matrix containing {\u03bbi}p\ni=1 on its diagonal, and\nU \u2208 Rp\u00d7p is a matrix satisfying U T U = I. We incorporate speci\ufb01c spectral properties on {\u03bbi}p\nF with S\u03bb containing a priori spectral\nby the following spectral penalty term \u03b2\n2(cid:107)Lw \u2212 UDiag(\u03bb)U T(cid:107)2\ninformation of the desired graph structure. We introduce the term \u03b2\nF to\nkeep Lw close to UDiag(\u03bb)U T instead of exactly solving the constraint. Note that this relaxation\ncan be made tight by choosing suf\ufb01ciently large or iteratively increasing \u03b2. The penalty term can\nalso be understood as a spectral regularization term, which aims to provide a direct control over the\neigenvalues allowing to incorporate additional information via priors. This has been successfully\nused in matrix factorization applications, see [55, 56, 57, 58, 59] for more details.\n\n2(cid:107)Lw\u2212 UDiag(\u03bb)U T(cid:107)2\n\ni=1\n\n5\n\n\fWe consider solving (8) for learning a k\u2212component graph structure utilizing the constraints in (4),\nwhere the \ufb01rst k eigenvalues are zero. There are a total of q = p \u2212 k non-zero eigenvalues ordered\nin the given set S\u03bb = {c1 \u2264 \u03bbk+1 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbp \u2264 c2}. Collecting the variables in three blocks as\n\nX =(cid:0)w \u2208 Rp(p\u22121)/2, \u03bb \u2208 Rq, U \u2208 Rp\u00d7q(cid:1) we develop an algorithm based on the block successive\n\nupper-bound minimization (BSUM) framework [60], which updates each block sequentially while\nkeeping the other blocks \ufb01xed.\n\n3.3 Update of w\nAt iteration t + 1, treating w as a variable with \ufb01xed \u03bb, U and ignoring the terms independent of w,\nwe have the following sub-problem:\n\nminimize\n\nw\u22650\n\ntr (KLw) +\n\n\u03b2\n2\n\n(cid:107)Lw \u2212 UDiag(\u03bb)U T(cid:107)2\nF .\n\n(9)\n\nThe problem (9) is equivalent to the non-negative quadratic program problem\n\nw\u22650\n\nf (w) =\n\nminimize\n\n(10)\nwhich is strictly convex where c = L\u2217(UDiag(\u03bb)(U )T \u2212 \u03b2\u22121K). It is easy to check that the\nsub-problem (10) is strictly convex. However, due to the non-negativity constraint (w \u2265 0), there is\nno closed-form solution, and thus we derive a majorization function via the following lemma.\nLemma 3. The function f (w) in (10) is majorized at wt by the function\n\n(cid:107)Lw(cid:107)2\n\nF \u2212 cT w,\n\n1\n2\n\ng(w|wt) = f (wt) + (w \u2212 wt)T\u2207f (wt) +\n\n(cid:13)(cid:13)w \u2212 wt(cid:13)(cid:13)2\n\nL\n2\n\nwhere wt is the update from previous iteration, L = (cid:107)L(cid:107)2\nfunction can be easily checked [61, 62].\n\n2 = 2p. The condition for the majorization\n\nAfter ignoring the constant terms in Lemma 3, the problem (10) is majorized at wt as\n\nminimize\n\nw\u22650\n\n1\n2\n\nwT w \u2212 aT w, where a = wt \u2212 1\n2p\n\n\u2207f (wt) and \u2207f (wt) = L\u2217(Lwt) \u2212 c.\n\n(11)\n\nLemma 4. From the KKT optimality conditions we can obtain the optimal solution as\n\n(cid:18)\n\n(cid:19)+\n\nwt+1 =\n\nwt \u2212 1\n2p\n\n\u2207f (wt)\n\n, where (a)+ = max(a, 0).\n\n(12)\n\n3.4 Update for U\nAt iteration t + 1, treating U as a variable, and \ufb01xing w and \u03bb, we obtain the following subproblem:\n(13)\n\ntr(cid:0)U TLwUDiag(\u03bb)(cid:1)\n\nsubject to U T U = Iq.\n\nmaximize\n\nU\n\nLemma 5. From the KKT optimality conditions the solution to (13) is given by\n\n(14)\nthat is, the n\u2212 k eigenvectors of the matrix Lw in increasing order of eigenvalue magnitude [63, 64].\n\nU t+1 = eigenvectors(Lw)[k + 1 : p],\n\n3.5 Update for \u03bb\nWe obtain the following sub-problem for the update of \u03bb for given w and U:\n\nminimize\n\n\u03bb\u2208S\u03bb\n\n\u2212 log det(Diag(\u03bb)) +\n\n(cid:107)U T (Lw)U \u2212 Diag(\u03bb)(cid:107)2\nF .\n\n\u03b2\n2\n\n(15)\n\nNow, \u03bb only contains non-zero eigenvalues in increasing order, we can replace the generalized\ndeterminant with the determinant on Diag(\u03bb). For notation brevity, we denote the indices for the\nnon-zero eigenvalues \u03bbi from 1 to q = p \u2212 k instead of k + 1 to p. Next the sub-problem (15) can be\nfurther written as\n\nminimize\n\nc1\u2264\u03bb1\u2264\u00b7\u00b7\u00b7\u2264\u03bbq\u2264c2\n\nlog \u03bbi +\n\n(cid:107)\u03bb \u2212 d(cid:107)2\n2,\n\n\u03b2\n2\n\n(16)\n\n\u2212 q(cid:88)\n\ni=1\n\n6\n\n\fthe i-th diagonal element of\nwhere \u03bb = [\u03bb1, . . . , \u03bbq]T and d = [d1, . . . , dq]T , with di\nDiag(U T (Lw)U ). The sub-problem (16) is a convex optimization problem and the solution can be\nobtained from the KKT optimality conditions. One can solve the convex problem (16) with a solver\nbut not suitable for large scale problems. We derive a tailor-made computationally ef\ufb01cient algorithm,\nwhich updates \u03bb following an iterative procedure with the maximum number of q + 1 iterations.\nPlease refer to the supplementary material for the detailed derivation of the algorithm.\n\n3.6 SGL Algorithm Summary\nAlgorithm 1, which we denote by SGL, summarizes the implementation of the structured graph\nlearning via Laplacian spectral constraints. Note that the eigen-decomposition for the update U is the\n\nAlgorithm 1 SGL\n\nInput: SCM S, k, c1, c2, \u03b2\nt \u2190 0\nwhile Stopping criteria not met do\n\nUpdate wt+1 as in (12).\nUpdate U t+1 as in (14).\nUpdate \u03bbt+1 as discussed in Section 3.5.\nt \u2190 t + 1\n\nend while\nOutput: \u02c6\u0398t+1 = Lwt+1\n\nmost demanding task in our algorithm, with a complexity of O(p3). This is very ef\ufb01cient considering\nthe fact that the total number of parameters to estimate is O(p2), which also are required to satisfy\ncomplex combinatorial-structural constraints. Computationally ef\ufb01cient graph learning algorithms\nsuch as GLasso [32] and GGL [26] have similar worst-case complexity, though they learn a graph\nwithout any structural constraints. It is implied that the algorithm would be applicable to problems\nwhere eigenvalue decomposition can be performed\u2013which nowadays are possible for large scale\nproblems.\nRemark 1. Apart from learning k\u2212component graph, the SGL algorithm can also be easily adapted\nto learn other graph structures with aforementioned spectral constraints in (5) to (7). Furthermore,\nSGL can also be utilized to learn classical connected graph structures (e.g., Erdos-Renyi graph,\nmodular graph, grid graph, etc.) just by setting the eigenvalue constraints corresponding to one\ncomponent graph (i.e., k = 1) and c1, c2 to very small and large values, respectively.\nTheorem 1. The sequence (wt, U t, \u03bbt) generated by Algorithm 1 converges to the set of KKT points\nof (8).\nProof: The detailed proof is deferred to the supplementary material.\n\n4 Experiments\nIn this section, we illustrate the advantages of incorporating spectral information directly into a graph\nlearning framework with real data experiments. We apply SGL to learn similarity graphs from a\nreal categorical animal dataset [65] with binary entries to highlight that it can obtain a meaningful\ngraph for non-Gaussian data as well. We also apply our method to detect biologically meaningful\nclusters from complex and high-dimensional PANCAN cancer dataset [66]. Performance is evaluated\nbased on visual inspection and by evaluating accuracy (ACC). Additional experiments with different\nperformance measures (e.g., relative error and F-score) for several structures, such as Grid, Modular,\nand multi-component, noisy multi-component graph structures are shown in the supplementary\nmaterial..\n\n4.1 Animals data set\nHerein, animals data set [65, 67] is taken into consideration to learn weighted graphs. The data set\nconsists of binary values (categorical non-Gaussian data) which are the answers to questions such as\n\u201cis warm-blooded?,\u201d \u201chas lungs?\u201d, etc. There are a total of 102 such questions, which make up the\nfeatures for 33 animal categories. Figure 1 shows the results of estimating the graph of the animals\n\n7\n\n\fdata set using the SGL algorithm, with GGL1, and GLasso. Graph vertices denote animals, and\nedge weights representing similarity among them. The input for all the algorithms is the sample\ncovariance matrix plus an identity matrix scaled by 1/3 (see [26]). The evaluation of the estimated\ngraphs is based on the visual inspection. It is expected that similar animals such as (ant, cockroach),\n(bee, butter\ufb02y), and (trout, salmon) would be grouped together. Based on this premise, it can be seen\nthat the SGL algorithm yields a more clear graph than the ones learned by GGL and GLasso.\n\n(a) GLasso [32]\n\n(b) GGL [26]\n\n(c) SGL (k = 1)\n\n(d) SGL(k = 5)\n\nFigure 1: Perceptual graphs of animal connections are obtained by (a) GGL, (b) GLasso, and\n(c) SGL with k = 1, and (d) SGL with k = 5. GGL, GLasso split the graph into multiple\ncomponents due to the sparsity regularization, while SGL with k = 1 (connectedness) yields a\nsparse yet connected graph. (d) SGL with k = 5 obtains a graph with 5 components which depicts\na more \ufb01ne-grained representation of animal connection by grouping similar animals in respective\ncomponents. Furthermore, the animal data is categorical (non-Gaussian) which does not follow the\nGMRF assumption, the above result also establishes the capability of SGL under mismatch of the\ndata model.\n\n4.2 Cancer Genome data set\nWe consider the RNA-Seq Cancer Genome Atlas Research Network [66] data set available at the\nUC-Irvine Machine Learning Database [68]. This data set consists of genetic features which map 5\ntypes of cancer namely: breast carcinoma (BRCA), kidney renal clear-cell carcinoma (KIRC), lung\n\n1The state-of-the-art algorithm for learning generalized graph Laplacian [26].\n\n8\n\nElephantRhinoHorseCowCamelGiraffeChimpGorillaMouseSquirrelTigerLionCatDogWolfSealDolphinRobinEagleChickenSalmonTroutBeeIguanaAlligatorButterflyAntFinchPenguinCockroachWhaleOstrichDeerElephantRhinoHorseCowCamelGiraffeChimpGorillaMouseSquirrelTigerLionCatDogWolfSealDolphinRobinEagleChickenSalmonTroutBeeIguanaAlligatorButterflyAntFinchPenguinCockroachWhaleOstrichDeerElephantRhinoHorseCowCamelGiraffeChimpGorillaMouseSquirrelTigerLionCatDogWolfSealDolphinRobinEagleChickenSalmonTroutBeeIguanaAlligatorButterflyAntFinchPenguinCockroachWhaleOstrichDeerElephantRhinoHorseCowCamelGiraffeChimpGorillaMouseSquirrelTigerLionCatDogWolfSealDolphinRobinEagleChickenSalmonTroutBeeIguanaAlligatorButterflyAntFinchPenguinCockroachWhaleOstrichDeer\fadenocarcinoma (LUAD), colon adenocarcinoma (COAD), and prostate adenocarcinoma (PRAD). In\nFigure 2, they are labeled with colors black, blue, red, violet, and green, respectively. The data set\nconsists of 801 labeled samples, in which every sample has 20531 genetic features and the goal is to\nclassify and group the samples, according to their tumor type, on the basis of those genetic features.\n\n(a) CLR [51]\n\n(b) SGL(proposed)\n\nFigure 2: Clustering with (a) CLR method\u2013there are two miss-classi\ufb01ed points in the black group\nand 10 miss-classi\ufb01ed points in the red group, and (b) Clustered graph learned with proposed SGL\nwith k = 5 shows a perfect clustering. Furthermore, the graph for the BRCA (black) data sample\nhighlights an inner sub-grouping: suggesting for further biological investigation.\n\nWe compare the SGL performance against the state-of-the-art method for graph-based clustering, i.e.,\nconstrained Laplacian rank algorithm CLR [51]. CLR uses a well-curated similarity measure as the\ninput to the algorithm, which is obtained by solving a separate optimization problem, while the SGL\ntakes the sample covariance matrix as its input. Still SGL method outperforms CLR, even though the\nlater is a specialized clustering algorithm. The clustering accuracy (ACC)[51] for both the methods\nare ( CLR=0.9862, SGL=0.99875). The improved performance of the SGL can be attributed to two\nmain reasons i) SGL is able to estimate the graph structure and weight simultaneously, which is\nessentially an optimal joint procedure, ii) SGL is able to capture the conditional dependencies (i.e.,\ninverse covariance matrix entries) among nodes which consider a global view of relationships, while\nthe CLR encodes the connectivity via the direct pairwise distances. The conditional dependencies\nrelationships are expected to give an improved performance for clustering tasks [6].\nTo the best of our knowledge, the SGL is the \ufb01rst single stage algorithm that can learn a clustered\ngraph directly from sample covariance matrix data without any additional pre-processing (i.e., learning\noptimized similarity matrix) or post-processing steps (i.e., thresholding). This makes the SGL highly\nfavorable for large-scale unsupervised learning applications.\n\n5 Conclusion\nIn this paper, we have shown how to convert the combinatorial constraints of structured graph learning\ninto analytical constraints of the graph matrix eigenvalues. We presented the SGL algorithm that\ncan learn structured graphs directly from sample data. Extensive numerical experiments with both\nsynthetic and real datasets demonstrate the effectiveness of the proposed methods. The algorithm\nenjoys comprehensive theoretical convergence properties along with low computational complexity.\n\nAcknowledgments\n\nThis work was supported by the Hong Kong GRF 16207019 research grant.\n\nReferences\n[1] E. D. Kolaczyk and G. Cs\u00b4ardi, Statistical analysis of network data with R. Springer, 2014, vol. 65.\n\n[2] S. L. Lauritzen, Graphical models. Clarendon Press, 1996, vol. 17.\n\n9\n\n\f[3] A. P. Dempster, \u201cCovariance selection,\u201d Biometrics, pp. 157\u2013175, 1972.\n\n[4] O. Banerjee, L. E. Ghaoui, and A. d\u2019Aspremont, \u201cModel selection through sparse maximum likelihood\nestimation for multivariate Gaussian or binary data,\u201d Journal of Machine Learning Research, vol. 9, no.\nMar, pp. 485\u2013516, 2008.\n\n[5] Y. Park, D. Hallac, S. Boyd, and J. Leskovec, \u201cLearning the network structure of heterogeneous data via\npairwise exponential markov random \ufb01elds,\u201d Proceedings of machine learning research, vol. 54, p. 1302,\n2017.\n\n[6] B. Hao, W. W. Sun, Y. Liu, and G. Cheng, \u201cSimultaneous clustering and estimation of heterogeneous\n\ngraphical models,\u201d Journal of Machine Learning Research, vol. 18, no. 217, pp. 1\u201358, 2018.\n\n[7] B. M. Marlin and K. P. Murphy, \u201cSparse gaussian graphical models with unknown block structure,\u201d in\nProceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009, pp. 705\u2013712.\n\n[8] G. A. Pavlopoulos, P. I. Kontou, A. Pavlopoulou, C. Bouyioukos, E. Markou, and P. G. Bagos, \u201cBipartite\ngraphs in systems biology and medicine: a survey of methods and applications,\u201d GigaScience, vol. 7, no. 4,\np. giy014, 2018.\n\n[9] F. Nie, X. Wang, C. Deng, and H. Huang, \u201cLearning a structured optimal bipartite graph for co-clustering,\u201d\n\nin Advances in Neural Information Processing Systems, 2017, pp. 4132\u20134141.\n\n[10] A. Prabhu, G. Varma, and A. Namboodiri, \u201cDeep expander networks: Ef\ufb01cient deep networks from graph\n\ntheory,\u201d in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 20\u201335.\n\n[11] Y.-T. Chow, W. Shi, T. Wu, and W. Yin, \u201cExpander graph and communication-ef\ufb01cient decentralized\nIEEE, 2016, pp.\n\noptimization,\u201d in Signals, Systems and Computers, 2016 50th Asilomar Conference on.\n1715\u20131720.\n\n[12] M. Sundin, A. Venkitaraman, M. Jansson, and S. Chatterjee, \u201cA connectedness constraint for learning\nIEEE, 2017, pp.\n\nsparse graphs,\u201d in Signal Processing Conference (EUSIPCO), 2017 25th European.\n151\u2013155.\n\n[13] A. Bogdanov, E. Mossel, and S. Vadhan, \u201cThe complexity of distinguishing markov random \ufb01elds,\u201d in\nApproximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. Springer,\n2008, pp. 331\u2013342.\n\n[14] A. Anandkumar, V. Y. Tan, F. Huang, and A. S. Willsky, \u201cHigh-dimensional gaussian graphical model\nselection: Walk summability and local separation criterion,\u201d Journal of Machine Learning Research,\nvol. 13, no. Aug, pp. 2293\u20132337, 2012.\n\n[15] F. R. Chung, Spectral graph theory. American Mathematical Soc., 1997, no. 92.\n\n[16] M. Belkin, P. Niyogi, and V. Sindhwani, \u201cManifold regularization: A geometric framework for learning\nfrom labeled and unlabeled examples,\u201d Journal of machine learning research, vol. 7, no. Nov, pp. 2399\u2013\n2434, 2006.\n\n[17] M. Belkin and P. Niyogi, \u201cLaplacian eigenmaps and spectral techniques for embedding and clustering,\u201d in\n\nAdvances in neural information processing systems, 2002, pp. 585\u2013591.\n\n[18] A. J. Smola and R. Kondor, \u201cKernels and regularization on graphs,\u201d in Learning Theory and Kernel\n\nMachines. Springer, 2003, pp. 144\u2013158.\n\n[19] D. A. Spielman and S.-H. Teng, \u201cSpectral sparsi\ufb01cation of graphs,\u201d SIAM Journal on Computing, vol. 40,\n\nno. 4, pp. 981\u20131025, 2011.\n\n[20] X. Zhu, Z. Ghahramani, and J. D. Lafferty, \u201cSemi-supervised learning using gaussian \ufb01elds and harmonic\nfunctions,\u201d in Proceedings of the 20th International conference on Machine learning (ICML-03), 2003, pp.\n912\u2013919.\n\n[21] A. Ortega, P. Frossard, J. Kovavcevi\u00b4c, J. M. Moura, and P. Vandergheynst, \u201cGraph signal processing:\n\nOverview, challenges, and applications,\u201d Proceedings of the IEEE, vol. 106, no. 5, pp. 808\u2013828, 2018.\n\n[22] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst, \u201cLearning laplacian matrix in smooth graph signal\n\nrepresentations,\u201d IEEE Transactions on Signal Processing, vol. 64, no. 23, pp. 6160\u20136173, Dec 2016.\n\n[23] H. Rue and L. Held, Gaussian Markov random \ufb01elds: theory and applications. CRC press, 2005.\n\n10\n\n\f[24] A. Chin, Y. Chen, K. M. Altenburger, and J. Ugander, \u201cDecoupled smoothing on graphs,\u201d in The World\nWide Web Conference, ser. WWW \u201919. New York, NY, USA: ACM, 2019, pp. 263\u2013272. [Online].\nAvailable: http://doi.acm.org/10.1145/3308558.3313748\n\n[25] V. Kalofolias, \u201cHow to learn a graph from smooth signals,\u201d in Arti\ufb01cial Intelligence and Statistics, 2016,\n\npp. 920\u2013929.\n\n[26] H. E. Egilmez, E. Pavez, and A. Ortega, \u201cGraph learning from data under laplacian and structural constrints,\u201d\n\nIEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 825\u2013841, 2017.\n\n[27] S. Hassan-Moghaddam, N. K. Dhingra, and M. R. Jovanovi\u00b4c, \u201cTopology identi\ufb01cation of undirected\nconsensus networks via sparse inverse covariance estimation,\u201d in Decision and Control (CDC), 2016 IEEE\n55th Conference on.\n\nIEEE, 2016, pp. 4624\u20134629.\n\n[28] C. D. Godsil and B. McKay, \u201cConstructing cospectral graphs,\u201d Aequationes Mathematicae, vol. 25, no. 1,\n\npp. 257\u2013268, 1982.\n\n[29] A. Loukas and P. Vandergheynst, \u201cSpectrally approximating large graphs with smaller graphs,\u201d arXiv\n\npreprint arXiv:1802.07510, 2018.\n\n[30] C. Chow and C. Liu, \u201cApproximating discrete probability distributions with dependence trees,\u201d IEEE\n\ntransactions on Information Theory, vol. 14, no. 3, pp. 462\u2013467, 1968.\n\n[31] N. Meinshausen, P. B\u00a8uhlmann et al., \u201cHigh-dimensional graphs and variable selection with the lasso,\u201d The\n\nannals of statistics, vol. 34, no. 3, pp. 1436\u20131462, 2006.\n\n[32] J. Friedman, T. Hastie, and R. Tibshirani, \u201cSparse inverse covariance estimation with the graphical lasso,\u201d\n\nBiostatistics, vol. 9, no. 3, pp. 432\u2013441, 2008.\n\n[33] O. Hein\u00a8avaara, J. Lepp\u00a8a-Aho, J. Corander, and A. Honkela, \u201cOn the inconsistency of (cid:96)1-penalised sparse\n\nprecision matrix estimation,\u201d BMC Bioinformatics, vol. 17, no. 16, p. 448, 2016.\n\n[34] D. A. Tarzanagh and G. Michailidis, \u201cEstimation of graphical models through structured norm minimiza-\n\ntion,\u201d The Journal of Machine Learning Research, vol. 18, no. 1, pp. 7692\u20137739, 2017.\n\n[35] Z. Meng, B. Eriksson, and A. Hero, \u201cLearning latent variable gaussian graphical models,\u201d in International\n\nConference on Machine Learning, 2014, pp. 1269\u20131277.\n\n[36] Q. Liu and A. Ihler, \u201cLearning scale free networks by reweighted l1 regularization,\u201d in Proceedings of the\n\nFourteenth International Conference on Arti\ufb01cial Intelligence and Statistics, 2011, pp. 40\u201348.\n\n[37] B. Huang and T. Jebara, \u201cMaximum likelihood graph structure estimation with degree distributions,\u201d in\n\nAnalyzing Graphs: Theory and Applications, NIPS Workshop, vol. 14, 2008.\n\n[38] K. Mohan, P. London, M. Fazel, D. Witten, and S.-I. Lee, \u201cNode-based learning of multiple gaussian\n\ngraphical models,\u201d The Journal of Machine Learning Research, vol. 15, no. 1, pp. 445\u2013488, 2014.\n\n[39] C. Ambroise, J. Chiquet, C. Matias et al., \u201cInferring sparse gaussian graphical models with latent structure,\u201d\n\nElectronic Journal of Statistics, vol. 3, pp. 205\u2013238, 2009.\n\n[40] J. Wang, \u201cJoint estimation of sparse multivariate regression and conditional graphical models,\u201d Statistica\n\nSinica, pp. 831\u2013851, 2015.\n\n[41] T. T. Cai, H. Li, W. Liu, and J. Xie, \u201cJoint estimation of multiple high-dimensional precision matrices,\u201d\n\nStatistica Sinica, vol. 26, no. 2, p. 445, 2016.\n\n[42] P. Danaher, P. Wang, and D. M. Witten, \u201cThe joint graphical lasso for inverse covariance estimation across\nmultiple classes,\u201d Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 76,\nno. 2, pp. 373\u2013397, 2014.\n\n[43] M. Slawski and M. Hein, \u201cEstimation of positive de\ufb01nite m-matrices and structure learning for attractive\n\ngaussian markov random \ufb01elds,\u201d Linear Algebra and its Applications, vol. 473, pp. 145\u2013179, 2015.\n\n[44] E. Pavez, H. E. Egilmez, and A. Ortega, \u201cLearning graphs with monotone topology properties and multiple\n\nconnected components,\u201d IEEE Transactions on Signal Processing, vol. 66, no. 9, pp. 2399\u20132413, 2018.\n\n[45] S. Kumar, J. Ying, J. V. d. M. Cardoso, and D. Palomar, \u201cA uni\ufb01ed framework for structured graph learning\n\nvia spectral constraints,\u201d arXiv preprint arXiv:1904.09792, 2019.\n\n11\n\n\f[46] B. Mohar, \u201cSome applications of Laplace eigenvalues of graphs,\u201d in Graph Symmetry: Algebraic Methods\n\nand Applications. Springer, 1997, pp. 225\u2013275.\n\n[47] D. Cvetkovi\u00b4c, P. Rowlinson, and S. K. Simi\u00b4c, \u201cSignless laplacians of \ufb01nite graphs,\u201d Linear Algebra and its\n\napplications, vol. 423, no. 1, pp. 155\u2013171, 2007.\n\n[48] P. Ravikumar, M. J. Wainwright, J. D. Lafferty et al., \u201cHigh-dimensional ising model selection using\n\n(cid:96)1-regularized logistic regression,\u201d The Annals of Statistics, vol. 38, no. 3, pp. 1287\u20131319, 2010.\n\n[49] M. Drton and T. S. Richardson, \u201cGraphical methods for ef\ufb01cient likelihood inference in gaussian covariance\n\nmodels,\u201d Journal of Machine Learning Research, vol. 9, no. May, pp. 893\u2013914, 2008.\n\n[50] V. Chandrasekaran, P. A. Parrilo, and A. S. Willsky, \u201cLatent variable graphical model selection via convex\noptimization,\u201d in 2010 48th Annual Allerton Conference on Communication, Control, and Computing\n(Allerton).\n\nIEEE, 2010, pp. 1610\u20131613.\n\n[51] F. Nie, X. Wang, M. I. Jordan, and H. Huang, \u201cThe constrained laplacian rank algorithm for graph-based\n\nclustering,\u201d in Thirtieth AAAI Conference on Arti\ufb01cial Intelligence, 2016.\n\n[52] S. Basu, A. Shojaie, and G. Michailidis, \u201cNetwork granger causality with inherent grouping structure,\u201d The\n\nJournal of Machine Learning Research, vol. 16, no. 1, pp. 417\u2013453, 2015.\n\n[53] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, \u201cThe emerging \ufb01eld of signal\nprocessing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,\u201d\nIEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83\u201398, 2013.\n\n[54] O. Teke and P. Vaidyanathan, \u201cUncertainty principles and sparse eigenvectors of graphs,\u201d IEEE Transactions\n\non Signal Processing, vol. 65, no. 20, pp. 5406\u20135420, 2017.\n\n[55] R. Mazumder, T. Hastie, and R. Tibshirani, \u201cSpectral regularization algorithms for learning large\nincomplete matrices,\u201d J. Mach. Learn. Res., vol. 11, pp. 2287\u20132322, Aug. 2010. [Online]. Available:\nhttp://dl.acm.org/citation.cfm?id=1756006.1859931\n\n[56] A. Todeschini, F. Caron, and M. Chavent, \u201cProbabilistic low-rank matrix completion with adaptive spectral\nregularization algorithms,\u201d in Advances in Neural Information Processing Systems 26, C. J. C. Burges,\nL. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2013, pp.\n845\u2013853.\n\n[57] A. Argyriou, M. Pontil, Y. Ying, and C. A. Micchelli, \u201cA spectral regularization framework for multi-task\nstructure learning,\u201d in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller,\nY. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., 2008, pp. 25\u201332.\n\n[58] S. Yang and J. Zhu, \u201cBayesian matrix completion via adaptive relaxed spectral regularization,\u201d in The 30th\n\nAAAI Conference on Arti\ufb01cial Intelligence. AAAI., 2016.\n\n[59] J. Ying, J.-F. Cai, D. Guo, G. Tang, Z. Chen, and X. Qu, \u201cVandermonde factorization of hankel matrix\nfor complex exponential signal recovery \u2014 application in fast nmr spectroscopy,\u201d IEEE Transactions on\nSignal Processing, vol. 66, no. 21, pp. 5520\u20135533, 2018.\n\n[60] M. Razaviyayn, M. Hong, and Z.-Q. Luo, \u201cA uni\ufb01ed convergence analysis of block successive minimization\nmethods for nonsmooth optimization,\u201d SIAM Journal on Optimization, vol. 23, no. 2, pp. 1126\u20131153, 2013.\n\n[61] Y. Sun, P. Babu, and D. P. Palomar, \u201cMajorization-minimization algorithms in signal processing, communi-\ncations, and machine learning,\u201d IEEE Transactions on Signal Processing, vol. 65, no. 3, pp. 794\u2013816, Feb.\n2016.\n\n[62] J. Song, P. Babu, and D. P. Palomar, \u201cSparse generalized eigenvalue problem via smooth optimization,\u201d\n\nIEEE Transactions on Signal Processing, vol. 63, no. 7, pp. 1627\u20131642, 2015.\n\n[63] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds.\n\nUniversity Press, 2009.\n\nPrinceton\n\n[64] K. Benidis, Y. Sun, P. Babu, and D. P. Palomar, \u201cOrthogonal sparse pca and covariance estimation via\nprocrustes reformulation,\u201d IEEE Transactions on Signal Processing, vol. 64, no. 23, pp. 6211\u20136226, 2016.\n\n[65] D. N. Osherson, J. Stern, O. Wilkie, M. Stob, and E. E. Smith, \u201cDefault probability,\u201d Cognitive Science,\n\nvol. 15, no. 2, pp. 251\u2013269, 1991.\n\n12\n\n\f[66] J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. M. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich,\nC. Sander, J. M. Stuart, C. G. A. R. Network et al., \u201cThe cancer genome atlas pan-cancer analysis project,\u201d\nNature Genetics, vol. 45, no. 10, p. 1113, 2013.\n\n[67] B. Lake and J. Tenenbaum, \u201cDiscovering structure by learning sparse graphs.\u201d\n\nAnnual Cognitive Science Conference., 2010.\n\nIn Proceedings of the 33rd\n\n[68] D. Dheeru and E. Karra Taniskidou, \u201cUCI machine learning repository,\u201d 2017. [Online]. Available:\n\nhttps://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq#\n\n13\n\n\f", "award": [], "sourceid": 6220, "authors": [{"given_name": "Sandeep", "family_name": "Kumar", "institution": "Hong Kong University of Science and Technology"}, {"given_name": "Jiaxi", "family_name": "Ying", "institution": "HKUST"}, {"given_name": "Jose Vinicius", "family_name": "de Miranda Cardoso", "institution": "Universidade Federal de Campina Grande"}, {"given_name": "Daniel", "family_name": "Palomar", "institution": "The Hong Kong University of Science and Technology"}]}