{"title": "Learning with Hypergraphs: Clustering, Classification, and Embedding", "book": "Advances in Neural Information Processing Systems", "page_first": 1601, "page_last": 1608, "abstract": null, "full_text": "Learning with Hypergraphs: Clustering,\n\nClassi\ufb01cation, and Embedding\n\nDengyong Zhou\u2020, Jiayuan Huang\u2021, and Bernhard Sch\u00a8olkopf\u00a7\n\n\u2020NEC Laboratories America, Inc.\n\n4 Independence Way, Suite 200, Princeton, NJ 08540, USA\n\n\u2021School of Computer Science, University of Waterloo\n\nWaterloo ON, N2L3G1, Canada\n\n\u00a7Max Planck Institute for Biological Cybernetics\n\n{dengyong.zhou, jiayuan.huang, bernhard.schoelkopf}@tuebingen.mpg.de\n\nSpemannstr. 38, 72076 T\u00a8ubingen, Germany\n\nAbstract\n\nWe usually endow the investigated objects with pairwise relationships,\nwhich can be illustrated as graphs. In many real-world problems, however,\nrelationships among the objects of our interest are more complex than pair-\nwise. Naively squeezing the complex relationships into pairwise ones will\ninevitably lead to loss of information which can be expected valuable for\nour learning tasks however. Therefore we consider using hypergraphs in-\nstead to completely represent complex relationships among the objects of\nour interest, and thus the problem of learning with hypergraphs arises. Our\nmain contribution in this paper is to generalize the powerful methodology\nof spectral clustering which originally operates on undirected graphs to hy-\npergraphs, and further develop algorithms for hypergraph embedding and\ntransductive classi\ufb01cation on the basis of the spectral hypergraph cluster-\ning approach. Our experiments on a number of benchmarks showed the\nadvantages of hypergraphs over usual graphs.\n\n1\n\nIntroduction\n\nIn machine learning problem settings, we generally assume pairwise relationships among the\nobjects of our interest. An object set endowed with pairwise relationships can be naturally\nillustrated as a graph, in which the vertices represent the objects, and any two vertices that\nhave some kind of relationship are joined together by an edge. The graph can be undirected\nor directed. It depends on whether the pairwise relationships among objects are symmetric\nor not. A \ufb01nite set of points in Euclidean space associated with a kernel matrix is a typical\nexample of undirected graphs. As to directed graphs, a well-known instance is the World\nWide Web. A hyperlink can be thought of as a directed edge because given an arbitrary\nhyperlink we cannot expect that there certainly exists an inverse one, that is, the hyperlink\nbased relationships are asymmetric [20].\nHowever, in many real-world problems, representing a set of complex relational objects as\nundirected or directed graphs is not complete. For illustrating this point of view, let us\nconsider a problem of grouping a collection of articles into di\ufb00erent topics. Given an article,\nassume the only information that we have is who wrote this article. One may construct\nan undirected graph in which two vertices are joined together by an edge if there is at\nleast one common author of their corresponding articles (Figure 1), and then an undirected\ngraph based clustering approach is applied, e.g. spectral graph techniques [7, 11, 16]. The\nundirected graph may be further embellished by assigning to each edge a weight equal to the\n\n\fFigure 1: Hypergraph vs. simple graph. Left: an author set E = {e1, e2, e3} and an article\nset V = {v1, v2, v3, v4, v5, v6, v7}. The entry (vi, ej) is set to 1 if ej is an author of article\nvi, and 0 otherwise. Middle: an undirected graph in which two articles are joined together\nby an edge if there is at least one author in common. This graph cannot tell us whether\nthe same person is the author of three or more articles or not. Right: a hypergraph which\ncompletely illustrates the complex relationships among authors and articles.\n\nnumber of authors in common. The above method may sound natural, but within its graph\nrepresentation we obviously miss the information on whether the same person joined writing\nthree or more articles or not. Such information loss is unexpected because the articles by\nthe same person likely belong to the same topic and hence the information is useful for our\ngrouping task.\nA natural way of remedying the information loss issue occurring in the above methodology\nis to represent the data as a hypergraph instead. A hypergraph is a graph in which an edge\ncan connect more than two vertices [2]. In other words, an edge is a subset of vertices. In\nwhat follows, we shall uni\ufb01edly refer to the usual undirected or directed graphs as simple\ngraphs. Moreover, without special mentioning, the referred simple graphs are undirected. It\nis obvious that a simple graph is a special kind of hypergraph with each edge containing two\nvertices only. In the problem of clustering articles stated before, it is quite straightforward to\nconstruct a hypergraph with the vertices representing the articles, and the edges the authors\n(Figure 1). Each edge contains all articles by its corresponding author. Even more than\nthat, we can consider putting positive weights on the edges to encode our prior knowledge\non authors\u2019 work if we have. For instance, for a person working on a broad range of \ufb01elds,\nwe may assign a relatively small value to his corresponding edge.\nNow we can completely represent the complex relationships among objects by using hy-\npergraphs. However, a new problem arises. How to partition a hypergraph? This is the\nmain problem that we want to solve in this paper. A powerful technique for partitioning\nsimple graphs is spectral clustering. Therefore, we generalize spectral clustering techniques\nto hypergraphs, more speci\ufb01cally, the normalized cut approach of [16]. Moreover, as in the\ncase of simple graphs, a real-valued relaxation of the hypergraph normalized cut criterion\nleads to the eigendecomposition of a positive semide\ufb01nite matrix, which can be regarded\nas an analogue of the so-called Laplacian for simple graphs (cf. [5]), and hence we sugges-\ntively call it the hypergraph Laplacian. Consequently, we develop algorithms for hypergraph\nembedding and transductive inference based on the hypergraph Laplacian.\nThere have actually existed a large amount of literature on hypergraph partitioning, which\narises from a variety of practical problems, such as partitioning circuit netlists [11], clustering\ncategorial data [9], and image segmentation [1]. Unlike the present work however, they\ngenerally transformed hypergraphs to simple ones by using the heuristics we discussed in the\nbeginning or other domain-speci\ufb01c heuristics, and then applied simple graph based spectral\nclustering techniques.\n[9] proposed an iterative approach which was indeed designed for\nhypergraphs. Nevertheless it is not a spectral method. In addition, [6] and [17] considered\npropagating label distributions on hypergraphs.\nThe structure of the paper is as follows. We \ufb01rst introduce some basic notions on hy-\npergraphs in Section 2.\nIn Section 3, we generalize the simple graph normalized cut to\n\n\fhypergraphs. As shown in Section 4, the hypergraph normalized cut has an elegant prob-\nabilistic interpretation based on a random walk naturally associated with a hypergraph.\nIn Section 5, we introduce the real-valued relaxation to approximately obtain hypergraph\nnormalized cuts, and also the hypergraph Laplacian derived from this relaxation. In section\n6, we develop a spectral hypergraph embedding technique based on the hypergraph Lapla-\ncian. In Section 7, we address transductive inference on hypergraphs, this is, classifying the\nvertices of a hypergraph provided that some of its vertices have been labeled. Experimental\nresults are shown in Section 8, and we conclude this paper in Section 9.\n\n2 Preliminaries\n\nLet V denote a \ufb01nite set of objects, and let E be a family of subsets e of V such that\n\u222ae\u2208E = V. Then we call G = (V, E) a hypergraph with the vertex set V and the hyperedge set\nE. A hyperedge containing just two vertices is a simple graph edge. A weighted hypergraph\nis a hypergraph that has a positive number w(e) associated with each hyperedge e, called\n(cid:80)\nthe weight of hyperedge e. Denote a weighted hypergraph by G = (V, E, w). A hyperedge\ne is said to be incident with a vertex v when v \u2208 e. For a vertex v \u2208 V, its degree is\n{e\u2208E|v\u2208e} w(e). Given an arbitrary set S, let |S| denote the cardinality\nde\ufb01ned by d(v) =\nof S. For a hyperedge e \u2208 E, its degree is de\ufb01ned to be \u03b4(e) = |e|. We say that there is\na hyperpath between vertices v1 and vk when there is an alternative sequence of distinct\nvertices and hyperedges v1, e1, v2, e2, . . . , ek\u22121, vk such that {vi, vi+1} \u2286 ei for 1 \u2264 i \u2264 k\u2212 1.\nA hypergraph is connected if there is a path for every pair of vertices.\nIn what follows,\nthe hypergraphs we mention are always assumed to be connected. A hypergraph G can\nbe represented by a |V | \u00d7 |E| matrix H with entries h(v, e) = 1 if v \u2208 e and 0 otherwise,\ncalled the incidence matrix of G. Then d(v) =\nv\u2208V h(v, e).\nLet Dv and De denote the diagonal matrices containing the vertex and hyperedge degrees\nrespectively, and let W denote the diagonal matrix containing the weights of hyperedges.\nThen the adjacency matrix A of hypergraph G is de\ufb01ned as A = HW H T \u2212 Dv, where H T\nis the transpose of H.\n\ne\u2208E w(e)h(v, e) and \u03b4(e) =\n\n(cid:80)\n\n(cid:80)\n\n3 Normalized hypergraph cut\nFor a vertex subset S \u2282 V, let Sc denote the compliment of S. A cut of a hypergraph\nG = (V, E, w) is a partition of V into two parts S and Sc. We say that a hyperedge e is cut\nif it is incident with the vertices in S and Sc simultaneously.\nGiven a vertex subset S \u2282 V, de\ufb01ne the hyperedge boundary \u2202S of S to be a hyperedge\nset which consists of hyperedges which are cut, i.e. \u2202S := {e \u2208 E|e \u2229 S (cid:54)= \u2205, e \u2229 Sc (cid:54)= \u2205},\nand de\ufb01ne the volume vol S of S to be the sum of the degrees of the vertices in S, that\nis,vol S :=\n\n(cid:80)\n\nv\u2208S d(v). Moreover, de\ufb01ne the volume of \u2202S by\n|e \u2229 S||e \u2229 Sc|\n\n(cid:88)\n\n.\n\n(1)\n\nvol \u2202S :=\n\nw(e)\n\ne\u2208\u2202S\n\n\u03b4(e)\n\nClearly, we have vol \u2202S = vol \u2202Sc. The de\ufb01nition given by Equation (1) can be understood\nas follows. Let us imagine each hyperedge e as a clique, i.e. a fully connected subgraph.\nFor avoiding unnecessary confusion, we call the edges in such an imaginary subgraph the\nsubedges. Moreover, we assign the same weight w(e)/\u03b4(e) to all subedges. Then, when a\nhyperedge e is cut, there are |e \u2229 S||e \u2229 Sc| subedges are cut, and hence a single sum term\nin Equation (1) is the sum of the weights over the subedges which are cut. Naturally, we\ntry to obtain a partition in which the connection among the vertices in the same cluster\nis dense while the connection between two clusters is sparse. Using the above introduced\nde\ufb01nitions, we may formalize this natural partition as\n\nargmin\n\u2205(cid:54)=S\u2282V\n\n(2)\nFor a simple graph, |e \u2229 S| = |e \u2229 Sc| = 1, and \u03b4(e) = 2. Thus the right-hand side of\nEquation (2) reduces to the simple graph normalized cut [16] up to a factor 1/2. In what\nfollows, we explain the hypergraph normalized cut in terms of random walks.\n\nc(S) := vol \u2202S\n\nvol Sc\n\nvol S\n\n+\n\n.\n\n(cid:181)\n\n1\n\n1\n\n(cid:182)\n\n\f4 Random walk explanation\n\nWe associate each hypergraph with a natural random walk which has the transition rule as\nfollows. Given the current position u \u2208 V, \ufb01rst choose a hyperedge e over all hyperedges\nincident with u with the probability proportional to w(e), and then choose a vertex v \u2208 e\nuniformly at random. Obviously, it generalizes the natural random walk de\ufb01ned on simple\ngraphs. Let P denote the transition probability matrix of this hypergraph random walk.\nThen each entry of P is\n\n(cid:88)\n\nIn matrix notation, P = D\u22121\nwalk is\n\nv HW D\u22121\n\n(cid:88)\n\nu\u2208V\n\nwhich follows from that\n\n\u03c0(u)p(u, v) =\n\n=\n\n(cid:88)\n\nu\u2208V\n1\n\nd(u)\nvol V\n\n(cid:88)\n\nvol V\n\ne\u2208E\n\n(cid:181)\n\n(cid:88)\n\ne\u2208E\n\nw(e)\n\n1\n\nWe written c(S) =\n\nvol \u2202S\nvol V\n\np(u, v) =\n\nw(e) h(u, e)\nd(u)\n\nh(v, e)\n\u03b4(e) .\n\ne\u2208E\ne H T . The stationary distribution \u03c0 of the random\n\u03c0(v) = d(v)\nvol V\n\n(4)\n\n,\n\n(3)\n\n(cid:88)\n\n(cid:88)\n(cid:88)\n\nu\u2208V\n\ne\u2208E\n\nw(e)h(u, e)h(v, e)\n\n\u03b4(e)\n\ne\u2208E\nw(e)h(v, e) = d(v)\nvol V\n\n.\n\nw(e)h(u, e)h(v, e)\n\n(cid:88)\n\nu\u2208V\n\nd(u)\u03b4(e)\nh(u, e) h(v, e)\n\u03b4(e)\n\n1\n\nvol V\n1\n\nvol V\n\n=\n\n=\n\n(cid:182)\n\nvol S/ vol V\n\nvol S\nvol V\n\n=\n\n+\n\n(cid:88)\n\nv\u2208S\n\n1\n\n(cid:88)\n\nvol Sc/ vol V\nd(v)\nvol V\n\n=\n\nv\u2208V\n\n. From Equation (4), we have\n\n\u03c0(v),\n\n(5)\n\nthat is, the ratio vol S/ vol V is the probability with which the random walk occupies some\nvertex in S. Moreover, from Equations (3) and (4), we have\n\nvol \u2202S\nvol V\n\n=\n\n=\n\n=\n\n(cid:88)\n(cid:88)\n(cid:88)\n\ne\u2208\u2202S\n\ne\u2208\u2202S\n\nw(e)\nvol V\n\n(cid:88)\n(cid:88)\n\nu\u2208e\u2229S\n\n=\n\n|e \u2229 S||e \u2229 Sc|\n\n(cid:88)\n\u03b4(e)\nw(e) d(u)\n(cid:88)\nvol V\n\nv\u2208e\u2229Sc\n\nd(u)\nvol V\n\nw(e) h(u, e)\nd(u)\n\nu\u2208S\n\nv\u2208Sc\n\ne\u2208S\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nu\u2208e\u2229S\n\nv\u2208e\u2229Sc\n\ne\u2208\u2202S\nh(u, e)\nd(u)\nh(v, e)\n\u03b4(e)\n\nh(v, e)\n\u03b4(e)\n\n(cid:88)\n\n(cid:88)\n\n=\n\nu\u2208S\n\nv\u2208Sc\n\nw(e)\nvol V\n\nh(u, e)h(v, e)\n\n\u03b4(e)\n\n\u03c0(u)p(u, v),\n\nthat is, the ratio vol \u2202S/ vol V is the probability with which one sees a jump of the random\nwalk from S to Sc under the stationary distribution. From Equations (5) and (6), we can\nunderstand the hypergraph normalized cut criterion as follows: looking for a cut such that\nthe probability with which the random walk crosses di\ufb00erent clusters is as small as possible\nwhile the probability with which the random walk stays in the same cluster is as large as\npossible. It is worth pointing out that the random walk view is consistent with that for\nthe simple graph normalized cut [13]. The consistency means that our generalization of the\nnormalized cut approach from simple graphs to hypergraphs is reasonable.\n\n5 Spectral hypergraph partitioning\n\nAs in [16], the combinatorial optimization problem given by Equation (2) is NP-complete,\nand it can be relaxed (2) into a real-valued optimization problem\n\n(cid:88)\n\n(cid:88)\n(cid:88)\n\ne\u2208E\n\nv\u2208V\n\nargmin\nf\u2208R|V |\n\n1\n2\n\nsubject to\n\nw(e)\n\u03b4(e)\n\n{u,v}\u2286e\nf 2(v) = 1,\n\n(cid:33)2\n\n(cid:195)\nf(u)(cid:112)\n(cid:88)\n\nd(u)\n\n\u2212 f(v)(cid:112)\n(cid:112)\n\nd(v)\n\nf(v)\n\nd(v) = 0.\n\nv\u2208V\n\n\f\u22121/2\nv HW D\u22121\nWe de\ufb01ne the matrices \u0398 = D\nidentity matrix. Then it can be veri\ufb01ed that\n\n\u22121/2\ne H T D\nv\n\nand \u2206 = I \u2212 \u0398, where I denotes the\n\n(cid:195)\n\n(cid:88)\n\n(cid:88)\n\ne\u2208E\n\n{u,v}\u2286e\n\nw(e)\n\u03b4(e)\n\n(cid:33)2\n\nf(u)(cid:112)\n\nd(u)\n\n\u2212 f(v)(cid:112)\n\nd(v)\n\n= 2f T \u2206f.\n\n2 D\u22121/2\n\nv HW H T D\u22121/2\n\n\u221a\nNote that this also shows that \u2206 is positive semi-de\ufb01nite. We can check that the smallest\nd. Therefore, from standard\neigenvalue of \u2206 is 0, and its corresponding eigenvector is just\nresults in linear algebra, we know that the solution to the optimization problem is an\neigenvector \u03a6 of \u2206 associated with its smallest nonzero eigenvalue. Hence, the vertex set\nis clustered into the two parts S = {v \u2208 V |\u03a6(v) \u2265 0} and Sc = {v \u2208 V |\u03a6(v) < 0}. For a\nsimple graph, the edge degree matrix De reduces to 2I. Thus\n\u2206 = I \u2212 1\n(Dv + A) D\u22121/2\nwhich coincides with the simple graph Laplacian up to a factor of 1/2. So we suggestively\ncall \u2206 the hypergraph Laplacian.\nAs in [20] where the spectral clustering methodology is generalized from undirected to\ndirected simple graphs, we may consider generalizing the present approach to directed hy-\npergraphs [8]. A directed hypergraph is a hypergraph in which each hyperedge e is an\nordered pair (X, Y ) where X \u2286 V is the tail of e and Y \u2286 V \\ X is the head. Directed\nhypergraphs have been used to model various practical problems from biochemical networks\n[15] to natural language parsing [12].\n\n2 D\u22121/2\n\nI \u2212 D\u22121/2\n\n= I \u2212 1\n\nAD\u22121/2\n\n(cid:180)\n\n(cid:179)\n\n1\n2\n\n=\n\nv\n\nv\n\nv\n\nv\n\nv\n\n,\n\n6 Spectral hypergraph embedding\n\n(cid:80)k\n\nvol \u2202Vi\nvol Vi\n\nAs in the simple graph case [4, 10], it is straightforward to extend the spectral hypergraph\nclustering approach to k-way partitioning. Denote a k-way partition by (V1,\u00b7\u00b7\u00b7 , Vk), where\nV1 \u222a V2 \u222a \u00b7\u00b7\u00b7 \u222a Vk = V, and Vi \u2229 Vj = \u2205 for all 1 \u2264 i, j \u2264 k. We may obtain a k-way\npartition by minimizing c(V1,\u00b7\u00b7\u00b7 , Vk) =\nover all k-way partitions. Similarly,\nthe combinatorial optimization problem can be relaxed into a real-valued one, of which the\nsolution can be any orthogonal basis of the linear space spanned by the eigenvectors of \u2206\nassociated with the k smallest eigenvalues.\nTheorem 1. Assume a hypergraph G = (V, E, w) with |V | = n. Denote the eigenvalues of\nthe Laplacian \u2206 of G by \u03bb1 \u2264 \u03bb2 \u2264 \u00b7\u00b7\u00b7 \u2264 \u03bbn. De\ufb01ne ck(G) = min c(V1,\u00b7\u00b7\u00b7 , Vk), where the\nminimization is over all k-way partitions. Then\nProof. Let ri be a n-dimensional vector de\ufb01ned by ri(v) = 1 if v \u2208 Vi, and 0 otherwise.\nThen\n\n(cid:80)k\ni=1 \u03bbi \u2264 ck(G).\n\ni=1\n\nc(V1,\u00b7\u00b7\u00b7 , Vk) =\n\ni (Dv \u2212 HW D\u22121\nrT\n\ne H T )ri\n\nrT\ni Dvri\n\nDe\ufb01ne si = D\n\n\u22121/2\nv\n\nri, and fi = si/(cid:107)si(cid:107), where (cid:107) \u00b7 (cid:107) denotes the usual Euclidean norm. Thus\n\nc(V1,\u00b7\u00b7\u00b7 , Vk) =\n\ni \u2206fi = tr F T \u2206F,\nf T\n\nk(cid:88)\nk(cid:88)\n\ni=1\n\ni=1\n\nwhere F = [f1 \u00b7\u00b7\u00b7 fk]. Clearly, F T F = I. If allowing the elements of ri to take arbitrary\ncontinuous values rather than Boolean ones only, we have\n\nck(G) = min c(V1,\u00b7\u00b7\u00b7 , Vk) \u2265 min\n\nF T F =I\n\ntr F T \u2206F =\n\n\u03bbi.\n\nThe last equality is from standard results in linear algebra. This completes the proof.\n\nk(cid:88)\n\ni=1\n\n\fThe above result also shows that the real-valued optimization problem derived from the\nrelaxation is actually a lower bound of the original combinatorial optimization problem.\nUnlike 2-way partitioning however, it is unclear how to utilize multiple eigenvectors simul-\ntaneously to obtain a k-way partition. Many heuristics have been proposed in the situation\nof simple graphs, and they can be applied here as well. Perhaps the most popular one among\nthem is as follows [14]. First form a matrix X = [\u03a61 \u00b7\u00b7\u00b7 \u03a6k], where \u03a6i\u2019s are the eigenvectors\nof \u2206 associated with the k smallest eigenvalues. And then the row vectors of X are regarded\nas the representations of the graph vertices in k-dimensional Euclidian space. Those vectors\ncorresponding to the vertices are generally expected to be well separated, and consequently\nwe can obtain a good partition simply by running k-means on them once. [18] has resorted\nto a semide\ufb01nite relaxation model for the k-way normalized cut instead of the relatively\nloose spectral relaxation, and then obtained a more accurate solution. It sounds reasonable\nto expect that the improved solution will lead to improved clustering. As reported in [18],\nhowever, the expected improvement does not occur in practice.\n\n7 Transductive inference\n\nWe have established algorithms for spectral hypergraph clustering and embedding. Now\nwe consider transductive inference on hypergraphs. Speci\ufb01cally, given a hypergraph G =\n(V, E, w), the vertices in a subset S \u2282 V have labels in L = {1,\u22121}, our task is to predict\nthe labels of the remaining unlabeled vertices. Basically, we should try to assign the same\nlabel to all vertices contained in the same hyperedge. It is actually straightforward to derive\na transductive inference approach from a clustering scheme. Let f : V (cid:55)\u2192 R denote a\nclassi\ufb01cation function, which assigns a label sign f(v) to a vertex v \u2208 V. Given an objective\nfunctional \u2126(\u00b7) from some clustering approach, one may choose a classi\ufb01cation function by\n\n{Remp(f) + \u00b5\u2126(f)},\n\nargmin\nf\u2208R|V |\n\nwhere Remp(f) denotes a chosen empirical loss, such as the least square loss or the hinge\nloss, and the number \u00b5 > 0 the regularization parameter. Since in general normalized\ncuts are thought to be superior to mincuts, the transductive inference approach that we\nused in the later experiments is built on the above spectral hypergraph clustering method.\nConsequently, as shown in [20], with the least square loss function, the classi\ufb01cation function\nis \ufb01nally given by f = (I \u2212 \u03be\u0398)\u22121y, where the elements of y denote the initial labels, and \u03be\nis a parameter in (0, 1). For a survey on transductive inference, we refer the readers to [21].\n\n8 Experiments\n\nAll datasets except a particular version of the 20-newsgroup one are from the UCI Ma-\nchine Learning Depository. They are usually referred to as the so-called categorical data.\nSpeci\ufb01cally, each instance in those datasets is described by one or more attributes. Each\nattribute takes only a small number of values, each corresponding to a speci\ufb01c category.\nAttribute values cannot be naturally ordered linearly as numerical values can [9]. In our\nexperiments, we constructed a hypergraph for each dataset, where attribute values were\nregarded as hyperedges. The weights for all hyperedges were simply set to 1. How to choose\nsuitable weights is de\ufb01nitely an important problem requiring additional exploration how-\never. We also constructed a simple graph for each dataset, and the simple graph spectral\nclustering based approach [19] was then used as the baseline. Those simple graphs were\nconstructed in the way discussed in the beginning of Section 1, which is essentially to de\ufb01ne\npairwise relationships among the objects by the adjacency matrices of hypergraphs. The\n\ufb01rst task we addressed is to embed the animals in the zoo dataset into Euclidean space. This\ndataset contains 100 animals with 17 attributes. The attributes include hair, feathers,\neggs, milk, legs, tail, etc. The animals have been manually classi\ufb01ed into 7 di\ufb00erent\ncategories. We embedded those animals into Euclidean space by using the eigenvectors of\nthe hypergraph Laplacian associated with the smallest eigenvalues (Figure 2). For the an-\nimals having the same attributes, we randomly chose one as their representative to put in\nthe \ufb01gures. It is apparent that those animals are well separated in their Euclidean repre-\nsentations. Moreover, it deserves a further look that seal and dolphin are signi\ufb01cantly\n\n\fFigure 2: Embedding the zoo dataset. Left panel: the eigenvectors with the 2nd and 3rd\nsmallest eigenvalues; right panel: the eigenvectors with the 3rd and 4th smallest eigenvalues.\nNote that dolphin is between class 1 (denoted by \u25e6) containing the animals having milk\nand living on land, and class 4 (denoted by (cid:166)) containing the animals living in sea.\n\n(a) mushroom\n\n(b) 20-newsgroup\n\n(c) letter\n\n(d) \u03b1 (letter)\n\n\u2018\n\nFigure 3: Classi\ufb01cation on complex relational data. (a)-(c) Results from both the hyper-\ngraph based approach and the simple graph based approach. (d) The in\ufb02uence of the \u03b1 in\nletter recognition with 100 labeled instances.\n\nmapped to the positions between class 1 consisting of the animals having milk and living\non land, and class 4 consisting of the animals living in sea. A similar observation also holds\nfor seasnake. The second task is classi\ufb01cation on the mushroom dataset that contains 8124\ninstances described by 22 categorical attributes, such as shape, color, etc. We remove the\n11th attribute that has missing values. Each instance is labeled as edible or poisonous.\nThey contain 4208 and 3916 instances separately. The third task is text categorization on\na modi\ufb01ed 20-newsgroup dataset with binary occurrence values for 100 words across 16242\narticles (see http://www.cs.toronto.edu/~roweis). The articles belong to 4 di\ufb00erent top-\nics corresponding to the highest level of the original 20 newsgroups, with the sizes being\n4605, 3519, 2657 and 5461 respectively. The \ufb01nal task is to guess the letter categories with\nthe letter dataset, in which each instance is described by 16 primitive numerical attributes\n(statistical moments and edge counts). We used a subset containing the instances of the\nletters from A to E with the sizes being 789, 766, 736, 805 and 768 respectively. The exper-\nimental results of the above three tasks are shown in Figures 3(a)-3(c). The regularization\nparameter \u03b1 is \ufb01xed at 0.1. Each testing error is averaged over 20 trials. The results show\nthat the hypergraph based method is consistently better than the baseline. The in\ufb02uence\nof the \u03b1 used in the letter recognition task is shown in Figure 3(d). It is interesting that\nthe \u03b1 in\ufb02uences the baseline much more than the hypergraph based approach.\n\n9 Conclusion\n\nWe generalized spectral clustering techniques to hypergraphs, and developed algorithms for\nhypergraph embedding and transductive inference. It is interesting to consider applying the\npresent methodology to a broader range of practical problems. We are particularly interested\nin the following problems. One is biological network analysis [17]. Biological networks are\n\n\u22120.2\u22120.15\u22120.1\u22120.0500.050.10.15\u22120.2\u22120.15\u22120.1\u22120.0500.050.10.150.20.25bearcarpcavyclamcrabdeerdogfishdolphindoveflamingofleafroggirlgnatgorillagullhawkhoneybeehouseflykiwiladybirdlionlobsterminknewtoctopusostrichpenguinpitviperplatypusponypussycatscorpionseahorsesealsealionseasnakeseawaspslowwormsquirrelstarfishstingrayswantoadtortoisetuataravampirewaspworm\u22120.2\u22120.15\u22120.1\u22120.0500.050.10.150.20.25\u22120.15\u22120.1\u22120.0500.050.10.150.20.25bearpussycatcarpcavyclamcrabdeerdogfishdolphinflamingofleafroggirlgnatgorillagullhawkhoneybeehouseflykiwiladybirdlionlobsterminknewtoctopusostrichpenguinpitviperplatypusponyscorpionseahorsesealsealionseasnakeseawaspslowwormsquirrelstingrayswantoadtortoisetuataravampirewaspwormstarfishdove204060801001201401601802000.10.150.20.250.3# labeled pointstest errorhypergraphsimple graph204060801001201401601802000.140.160.180.20.22# labeled pointstest errorhypergraphsimple graph204060801001201401601802000.120.140.160.180.20.220.24# labeled pointstest errorhypergraphsimple graph0.10.20.30.40.50.60.70.80.90.140.160.180.20.220.240.260.28different valuetest errorhypergraphsimple graph\fmainly modeled as simple graphs so far.\nIt might be more sensible to model them as\nhypergraphs instead such that complex interactions will be completely taken into account.\nThe other is social network analysis. As recently pointed out by [3], many social transactions\nare supra-dyadic; they either involve more than two actors or they involve numerous aspects\nof the setting of interaction. So standard network techniques are not adequate in analyzing\nthese networks. Consequently, they resorted to the concept of a hypergraph, and showed\nhow the concept of network centrality can be adapted to hypergraphs.\n\nReferences\n\n[1] S. Agarwal, L. Zelnik-Manor J. Lim, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwise\n\nclustering. In IEEE Conf. on Computer Vision and Pattern Recognition, 2005.\n\n[2] C. Berge. Hypergraphs. North-Holland, Amsterdam, 1989.\n[3] P. Bonacich, A.C. Holdren, and M. Johnston. Hyper-edges and multi-dimensional centrality.\n\nSocial Networks, 26(3):189\u2013203, 2004.\n\n[4] P.K. Chan, M.D.F. Schlag, and J. Zien. Spectral k-way ratio cut partitioning and clustering.\nIEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, 13(9):1088\u20131096,\n1994.\n\n[5] F. Chung. Spectral Graph Theory. Number 92 in CBMS Regional Conference Series in Math-\n\nematics. American Mathematical Society, Providence, RI, 1997.\n\n[6] A. Corduneanu and T. Jaakkola. Distributed information regularization on graphs. In Advances\n\nin Neural Information Processing Systems 17, Cambridge, MA, 2005. MIT Press.\n\n[7] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(98):298\u2013\n\n305, 1973.\n\n[8] G. Gallo, G. Longo, and S. Pallottino. Directed hypergraphs and applications. Discrete Applied\n\nMathematics, 42(2):177\u2013201, 1993.\n\n[9] D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based\n\non dynamical systems. VLDB Journal, 8(3-4):222\u2013236, 2000.\n\n[10] M. Gu, H. Zha, C. Ding, X. He, and H. Simon. Spectral relaxation models and structure anal-\nysis for k-way graph clustering and bi-clustering. Technical Report CSE-01-007, Department\nof Computer Science and Engineering, Pennsylvania State University, 2001.\n\n[11] L. Hagen and A.B. Kahng. New spectral methods for ratio cut partitioning and clustering.\nIEEE Trans. on Computed-Aided Desgin of Integrated Circuits and Systems, 11(9):1074\u20131085,\n1992.\n\n[12] D. Klein and C. Manning. Parsing and hypergraphs. In Proc. 7th Intl. Workshop on Parsing\n\nTechnologies, 2001.\n\n[13] M. Meila and J. Shi. A random walks view of spectral segmentation.\n\nIn Proc. 8th Intl.\n\nWorkshop on Arti\ufb01cial Intelligence and Statistics, 2001.\n\n[14] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In\n\nAdvances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.\n\n[15] J.S. Oliveira, J.B. Jones-Oliveira, D.A. Dixon, C.G. Bailey, and D.W. Gull. Hyperdigraph\u2013\nTheoretic analysis of the EGFR signaling network: Initial steps leading to GTP: Ras complex\nformation. Journal of Computational Biology, 11(5):812\u2013842, 2004.\n\n[16] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Tran. on Pattern Analysis\n\nand Machine Intelligence, 22(8):888\u2013905, 2000.\n\n[17] K. Tsuda. Propagating distributions on a hypergraph by dual information regularization. In\n\nProc. 22th Intl. Conf. on Machine Learning, 2005.\n\n[18] E.P. Xing and M.I. Jordan. On semide\ufb01nite relaxation for normalized k-cut and connections to\nspectral clustering. Technical Report CSD-03-1265, Division of Computer Science, University\nof California, Berkeley, 2003.\n\n[19] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Sch\u00a8olkopf. Learning with local and global\nconsistency. In Advances in Neural Information Processing Systems 16, Cambridge, MA, 2004.\nMIT Press.\n\n[20] D. Zhou, J. Huang, and B. Sch\u00a8olkopf. Learning from labeled and unlabeled data on a directed\n\ngraph. In Proc. 22th Intl. Conf. on Machine Learning, 2005.\n\n[21] X. Zhu. Semi-supervised learning literature survey. Technical Report Computer Sciences 1530,\n\nUniversity of Wisconsin - Madison, 2005.\n\n\f", "award": [], "sourceid": 3128, "authors": [{"given_name": "Dengyong", "family_name": "Zhou", "institution": null}, {"given_name": "Jiayuan", "family_name": "Huang", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}