{"title": "Semi-supervised Learning on Directed Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1633, "page_last": 1640, "abstract": null, "full_text": "Semi-supervised Learning on Directed Graphs\n\nDengyong Zhouy, Bernhard Sch\u00a4olkopfy, and Thomas Hofmannzy\n\nyMax Planck Institute for Biological Cybernetics\n\n72076 Tuebingen, Germany\n\nfdengyong.zhou, bernhard.schoelkopfg@tuebingen.mpg.de\n\nzDepartment of Computer Science, Brown University\n\nProvidence, RI 02912 USA\n\nth@cs.brown.edu\n\nAbstract\n\nGiven a directed graph in which some of the nodes are labeled, we inves-\ntigate the question of how to exploit the link structure of the graph to infer\nthe labels of the remaining unlabeled nodes. To that extent we propose a\nregularization framework for functions de(cid:2)ned over nodes of a directed\ngraph that forces the classi(cid:2)cation function to change slowly on densely\nlinked subgraphs. A powerful, yet computationally simple classi(cid:2)cation\nalgorithm is derived within the proposed framework. The experimental\nevaluation on real-world Web classi(cid:2)cation problems demonstrates en-\ncouraging results that validate our approach.\n\n1\n\nIntroduction\n\nWe consider semi-supervised classi(cid:2)cation problems on weighted directed graphs, in which\nsome nodes in the graph are labeled as positive or negative, and where the task consists in\nclassifying unlabeled nodes. Typical examples of this kind are Web page categorization\nbased on hyperlink structure [4, 11] and document classi(cid:2)cation or recommendation based\non citation graphs [10], yet similar problems exist in other domains such as computational\nbiology. For the sake of concreteness, we will mainly focus on the Web graph in the sequel,\ni.e. the considered graph represents a subgraph of the Web, where nodes correspond to Web\npages and directed edges represent hyperlinks between them (cf. [3]).\nWe refrain from utilizing attributes or features associated with each node, which may or\nmay not be available in applications, but rather focus on the analysis of the connectivity of\nthe graph as a means for classifying unlabeled nodes. Such an approach inevitably needs\nto make some a priori premise about how connectivity and categorization of individual\nnodes may be related in real-world graphs. The fundamental assumption of our framework\nis the category similarity of co-linked nodes in a directed graph. This is a slightly more\ncomplex concept than in the case of undirected (weighted) graphs [1, 18, 12, 15, 17], where\na typical assumption is that an edge connecting two nodes will more or less increase the\nlikelihood of the nodes belonging to the same category. Co-linkage on the other hand\nseems a more suitable and promising concept in directed graphs, as is witnessed by its\nsuccessful use in Web page categorization [4] as well as co-citation analysis for information\nretrieval [10]. Notice that co-linkage comes in two (cid:3)avors: sibling structures, i.e. nodes\n\n\fwith common parents, and co-parent structures, i.e. nodes with common children. In most\nWeb and citation graph related application, the (cid:2)rst assumption, namely that nodes with\nhighly overlapping parent sets are likely to belong to the same category, seems to be more\nrelevant (cf. [4]), but in general this will depend on the speci(cid:2)c application.\nOne possible way of designing classi(cid:2)ers based on graph connectivity is to construct a\nkernel matrix based on pairwise links [11] and then to adopt a standard kernel method,\ne.g. Support Vector Machines (SVMs) [16] as a learning algorithm. However, a kernel\nmatrix as the one proposed in [11] only represents local relationships among nodes, but\ncompletely ignores the global structure of the graph. The idea of exploiting global rather\nthan local graph structure is widely used in other Web-related techniques, including Web\npage ranking [2, 13], (cid:2)nding similar Web pages [7], detecting Web communities [13, 9]\nand so on. The major innovation of this paper is a general regularization framework on\ndirected graphs, in which the directionality and global relationships are considered, and\na computationally attractive classi(cid:2)cation algorithm, which is derived from the proposed\nregularization framework.\n\n2 Regularization Framework\n\n2.1 Preliminaries\n\nA directed graph (cid:0) = (V; E) consists of a set of vertices, denoted by V and a set of edges,\ndenoted by E (cid:18) V (cid:2) V . Each edge is an ordered pair of nodes [u; v] representing a\ndirected connection from u to v: We do not allow self loops, i.e. [v; v] 62 E for all v 2 V .\nIn a weighted directed graph, a weight function w : V (cid:2) V ! R+ is associated with (cid:0),\nsatisfying w([u; v]) = 0 if and only if [u; v] 62 E: Typically, we can equip a directed graph\nwith a canonical weight function by de(cid:2)ning w([u; v]) (cid:17) 1 if and only if [u; v] 2 E. The\nin-degree p(v) and out-degree q(v) of a vertex v 2 V , respectively, are de(cid:2)ned as\n\np(v) (cid:17) Xfuj[u;v]2Eg\n\nw([u; v]);\n\nand\n\nq(v) (cid:17) Xfuj[v;u]2Eg\n\nw([v; u]) :\n\n(1)\n\nLet H(V ) denote the space of functions f : V ! R; which assigns a real value f (v) to\neach vertex v: The function f can be represented as a column vector in RjV j; where jV j\ndenotes the number of the vertices in V . The function space H(V ) can be endowed with\nthe usual inner product:\n\nhf; gi = Xv\n\nf (v)g(v):\n\n(2)\n\nAccordingly, the norm of the function induced from the inner product is kf k = phf; f i.\n2.2 Bipartite Graphs\n\nA bipartite graph G = (H; A; L) is a special type of directed graph that consists of two\nsets of vertices, denoted by H and A respectively, and a set of edges (or links), denoted by\nL (cid:18) H (cid:2) A. In a bipartite graph, each edge connects a vertex in H to a vertex in A. Any\ndirected graph (cid:0) = (V; E) can be regarded as a bipartite graph using the following simple\nconstruction [14]: H (cid:17) fhjh 2 V; q(h) > 0g, A (cid:17) faja 2 V; p(a) > 0g, and L (cid:17) E.\nFigure 1 depicts the construction of the bipartite graph. Notice that vertices of the original\ngraph (cid:0) may appear in both vertex sets H and A of the constructed bipartite graph.\nThe intuition behind the construction of the bipartite graph is provided by the so-called hub\nand authority web model introduced by Kleinberg [13]. The model distinguishes between\ntwo types of Web pages: authoritative pages, which are pages relevant to some topic,\nand hub pages, which are pages pointing to relevant pages. Note that some Web pages can\n\n\fFigure 1: Constructing a bipartite graph from a directed one. Left: directed graph. Right:\nbipartite graph. The hub set H = f1; 3; 5; 6g; and the authority set A = f2; 3; 4g: Notice\nthat the vertex indexed by 3 is simultaneously in the hub and authority set.\n\nsimultaneously be both hub and authority pages (see Figure 1). Hubs and authorities exhibit\na mutually reinforcement relationship: a good hub node points to many good authorities\nand a good authority node is pointed to by many good hubs. It is interesting to note that in\ngeneral there is no direct link from one authority to another. It is the hub pages that glue\ntogether authorities on a common topic.\nAccording to Kleinberg\u2019s model, we suggestively call the vertex set H in the bipartite graph\nthe hub set, and the vertex set A the authority set.\n\n2.3 Smoothness Functionals\n\nIf two distinct vertices u and v in the authority set A are co-linked by vertex h in the hub\nset H as shown in the left panel of Figure 2, then we think that u and v are likely to be\nrelated, and the co-linkage strength induced by h between u and v can be measured by\n\nch([u; v]) =\n\nw([h; u])w([h; v])\n\nq(h)\n\n:\n\n(3)\n\nIn addition, we de(cid:2)ne ch(v; v) = 0 for all v in the authority set A and for all h in the hub\nset H: Such a relevance measure can be naturally understood in the situation of citation\nnetworks. If two articles are simultaneously cited by some other article, then this should\nmake it more likely that both articles deal with a similar topics. Moreover, the more articles\ncite both articles together, the more signi(cid:2)cant the connection. A natural question arising\nin this context is why the relevance measure is further normalized by out-degree. Let us\nconsider the following two web sites: Yahoo! and kernel machines. General interest portals\nlike Yahoo! consists of pages having a large number of diverse hyperlinks. The fact that\ntwo web pages are co-linked by Yahoo! does not establish a signi(cid:2)cant connection between\nthem. In contrast, the pages on the kernel machine Web site have much fewer hyperlinks,\nbut the Web pages pointed to are closely related in topic.\nLet f denote a function de(cid:2)ned on the authority set A: The smoothness of function f can\nbe measured by the following functional:\n\nThe smoothness functional penalizes large differences in function values for vertices in the\nauthority set A that are strongly related. Notice that the function values are normalized by\n\nf (v)\n\npp(v)(cid:19)2\n\n:\n\n(4)\n\n(cid:10)A(f ) =\n\n1\n\n2 Xu;v Xh\n\nch([u; v])(cid:18) f (u)\npp(u)\n\n(cid:0)\n\n\fFigure 2: Link and relevance. Left panel: vertices u and v in the authority set A are co-\nlinked by vertex h in the hub set H: Right panel: vertices u and v in the hub set H co-link\nvertex a in the authority set A:\n\nin-degree. For the Web graph, the explanation is similar to the one given before. Many\nweb pages contain links to popular sites like the Google search engine. This does not mean\nthough that all these Web pages share a common topic. However, if two web pages point to\nweb page like the one of the Learning with Kernels book, it is likely to express a common\ninterest for kernel methods.\nNow de(cid:2)ne a linear operator T : H(A) ! H(H) by\n\n(T f )(h) = Xa\n\nw([h; a])\n\npq(h)p(a)\n\nf (a):\n\nThen its adjoint T (cid:3) : H(H) ! H(A) is given by\n\n(T (cid:3)f )(a) = Xh\n\nw([h; a])\n\npq(h)p(a)\n\nf (h):\n\n(5)\n\n(6)\n\n(7)\n\nThese two operators T and T (cid:3) were also implicitly suggested by [8] for developing a new\nWeb page ranking algorithm. Further de(cid:2)ne the operator SA : H(A) ! H(A) by compos-\ning T and T (cid:3); i.e.\n\nSA = T (cid:3)T;\n\nand the operator (cid:1)A : H(A) ! H(A) by\n\n(8)\nwhere I denotes the identity operator. Then we can show the following (See Appendix A\nfor the proof):\nProposition 1. (cid:10)A(f ) = hf; (cid:1)Af i:\n\n(cid:1)A = I (cid:0) SA;\n\nComparing with the combinatorial Laplace operator de(cid:2)ned on undirected graphs [5], we\ncan think of the operator (cid:1)A as a Laplacian but de(cid:2)ned on the authority set of directed\ngraphs. Note that Proposition 1 also shows that the Laplacian (cid:1)A is positive semi-de(cid:2)nite.\nIn fact, we can further show that the eigenvalues of the operator SA are scattered in [0; 1];\nand accordingly the eigenvalues of the Laplacian (cid:1)A fall into [0; 1]:\nSimilarly, if two distinct vertices u and v co-link vertex a in the authority set A as shown in\nright panel of Figure 2, then u and v are also thought to be related. The co-linkage strength\nbetween u and v induced by a can be measured by\n\nca([u; v]) =\n\nw([u; a])w([v; a])\n\np(a)\n\n:\n\n(9)\n\n\fand the smoothness of function f on the hub set H can be measured by:\n\n(cid:10)H (f ) =\n\n1\n\n2 Xu;v Xa\n\nca([u; v])(cid:18) f (u)\npq(u)\n\n(cid:0)\n\nf (v)\n\npq(v)(cid:19)2\n\n:\n\n(10)\n\nAs before, one can de(cid:2)ne the operators SH = T T (cid:3) and (cid:1)H = I (cid:0) SH leading to the\ncorresponding statement:\nProposition 2. (cid:10)H (f ) = hf; (cid:1)H f i:\n\nConvexly combining together the two smoothness functionals (4) and (10), we obtain a\nsmoothness measure of function f de(cid:2)ned on the whole vertex set V :\n(cid:10)(cid:13)(f ) = (cid:13)(cid:10)A(f ) + (1 (cid:0) (cid:13))(cid:10)H (f ); 0 (cid:20) (cid:13) (cid:20) 1;\n\n(11)\nwhere the parameter (cid:13) weighs the relative importance between (cid:10)A(f ) and (cid:10)H (f ). Extend\nthe operator T to H(V ) by de(cid:2)ning (T f )(v) = 0 if v is only in the authority set A and\nnot in the hub set H. Similarly extend T (cid:3) by de(cid:2)ning (T (cid:3)f )(v) = 0 if v is only in the\nhub set H and not in the authority set A. Then, if the remaining operators are extended\ncorrespondingly, one can de(cid:2)ne the operator S(cid:13) : H(V ) ! H(V ) by\n\nand the Laplacian on directed graphs (cid:1)(cid:13) : H(V ) ! H(V ) by\n\nS(cid:13) = (cid:13)SA + (1 (cid:0) (cid:13))SH ;\n\n(cid:1)(cid:13) = I (cid:0) S(cid:13) :\n\nClearly, (cid:1)(cid:13) = (cid:13)(cid:1)A + (1 (cid:0) (cid:13))(cid:1)H : By Proposition 1 and 2, it is easy to see that:\nProposition 3. (cid:10)(cid:13)(f ) = hf; (cid:1)(cid:13)f i.\n\n(12)\n\n(13)\n\n2.4 Regularization\n\nDe(cid:2)ne a function y in H(V ) in which y(v) = 1 or (cid:0)1 if vertex v is labeled as positive\nor negative, and 0, if it is not labeled. The classi(cid:2)cation problem can be regarded as the\nproblem of (cid:2)nding a function f, which reproduces the target function y to a suf(cid:2)cient\ndegree of accuracy while being smooth in a sense quanti(cid:2)ed by the above smoothness\nfunctional. A formalization of this idea leads to the following optimization problem:\n\nf (cid:3) = argmin\n\nf 2H(V )n(cid:10)(cid:13)(f ) +\n\n(cid:22)\n2\n\nkf (cid:0) yk2o :\n\n(14)\n\nThe (cid:2)nal classi(cid:2)cation of vertex v is obtained as sign f (cid:3)(v). The (cid:2)rst term in the bracket\nis called the smoothness term or regularizer, which measures the smoothness of function\nf; and the second term is called the (cid:2)tting term, which measures its closeness to the given\nfunction y: The trade-off between these two competitive terms is captured by a positive\nparameter (cid:22): Successively smoother solutions f (cid:3) can be obtained by decreasing (cid:22) ! 0.\nTheorem 4. The solution f (cid:3) of the optimization problem (14) satis(cid:2)es\n\n(cid:1)(cid:13)f (cid:3) + (cid:22)(f (cid:3) (cid:0) y) = 0:\n\nProof. By Proposition 3, we have\n\n((cid:1)(cid:13)f )(v) =\n\n@(cid:10)(cid:13)(f )\n\n@f\n\n:\n\n(cid:12)(cid:12)(cid:12)(cid:12)v\n\nDifferentiating the cost function in the bracket of (14) with respect to function f completes\nthe proof.\nCorollary 5. The solution f (cid:3) of the optimization problem (14) is\n\nf (cid:3) = (1 (cid:0) (cid:11))(I (cid:0) (cid:11)S(cid:13))(cid:0)1y; where (cid:11) = 1=(1 + (cid:22)).\n\nIt is worth noting that the closed form solution presented by Corollary 5 shares the same\nappearance as the algorithm proposed by [17], which operates on undirected graphs.\n\n\f3 Experiments\n\nWe considered the Web page categorization task on the WebKB dataset [6]. We only\naddressed a subset which contains the pages from the four universities: Cornell, Texas,\nWashington, Wisconsin. We removed pages without incoming or outgoing links, result-\ning in 858, 825, 1195 and 1238 pages respectively, for a total of 4116. These pages were\nmanually classi(cid:2)ed into the following seven categories: student, faculty, staff, department,\ncourse, project and other. We investigated two different classi(cid:2)cation tasks. The (cid:2)rst is\nused to illustrate the signi(cid:2)cance of connectivity information in classi(cid:2)cation, whereas the\nsecond one stresses the importance of preserving the directionality of edges. We may as-\nsign a weight to each hyperlink according to the textual content of web pages or the anchor\ntext contained in hyperlinks. However, here we are only interested in how much we can\nobtain from link structure only and hence adopt the canonical weight function de(cid:2)ned in\nSection 2.1.\nWe (cid:2)rst study an extreme classi(cid:2)cation problem: predicting which university the pages\nbelong to from very few labeled training examples. Since pages within a university are\nwell-linked, and cross links between different universities are rare, we can imagine that\nfew training labels are enough to exactly classify pages based on link information only. For\neach of the universities, we in turn viewed corresponding pages as positive examples and\nthe pages from the remaining universities as negative examples. We have randomly draw\ntwo pages as the training examples under the constraint that there is at least one labeled\ninstance for each class. Parameters were set to (cid:13) = 0:50; and (cid:11) = 0:95: (In fact, in\nthis experiment the tuning parameters have almost no in(cid:3)uence on the result.) Since the\nWeb graph is not connected, some small isolated subgraphs possibly do not contain labeled\ninstances. The values of our classifying function on the pages contained in these subgraphs\nwill be zeros and we simply think of these pages as negative examples. This is consistent\nwith the search engine ranking techniques [2, 13]. We compare our method with SVMs\nusing a kernel matrix K constructed as K = W T W [11], where W denotes the adjacency\nmatrix of the web graph and W T denotes the transpose of W . The test errors averaged over\n100 training sample sets for both our method and SVMs are summarized into the following\ntable:\n\nour method\n\nSVMs\n\nCornel\n\n0.03 ((cid:6) 0.00)\n0.42 ((cid:6) 0.03)\n\nTexas\n\n0.02 ((cid:6) 0.01)\n0.39 ((cid:6) 0.03)\n\nWashington\n0.01 ((cid:6) 0.00)\n0.40 ((cid:6) 0.02)\n\nWisconsin\n\n0.02 ((cid:6) 0.00)\n0.43 ((cid:6) 0.02)\n\nHowever, to be fair, we should state that the kernel matrix that we used in the SVM may not\nbe the best possible kernel matrix for this task (cid:151) this is an ongoing research issue which is\nnot the topic of the present paper.\nThe other investigated task is to discriminate the student pages in a university from the\nnon-student pages in the same university. As a baseline we have applied our regularization\nmethod on the undirected graph [17] obtained by treating links as undirected or bidirec-\ntional, i.e., the af(cid:2)nity matrix is de(cid:2)ned to be W T + W . We use the AUC scores to\nmeasure the performances of the algorithms. The experimental results in Figure 3(a)-3(d)\nclearly demonstrate that taking the directionality of edges into account can yield substan-\ntial accuracy gains. In addition, we also studied the in(cid:3)uence of different choices for the\nparameters (cid:13) and (cid:11); we used the Cornell Web for that purpose and sampled 10 labeled\ntraining pages. Figure 3(e) show that relatively small values of (cid:11) are more suitable. We\nthink that is because the subgraphs in each university are quite small, limiting the infor-\nmation conveyed in the graph structure. The in(cid:3)uence of (cid:13) is shown in Figure 3(f). The\nperformance curve shows that large values for (cid:13) are preferable. This con(cid:2)rms the conjec-\nture that co-link structure among authority pages is much more important than within the\nhub set.\n\n\f0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\nC\nU\nA\n\n0.4\n\n2\n\n4\n\n0.8\n\n0.7\n\n0.6\n\nC\nU\nA\n\n0.5\n\n0.4\n\n0.3\n\n2\n\n4\n\nC\nU\nA\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n2\n\n4\n\n6\n\nC\nU\nA\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n2\n\ndirected (g=1, a=0.10)\nundirected (a=0.10)\n18\n\n16\n\n12\n\n20\n\n10\n\n8\n14\n# labeled points\n\n(b) Texas\n\ndirected (g=1, a=0.10)\nundirected (a=0.10)\n18\n\n16\n\n20\n\n6\n\n10\n\n12\n\n8\n14\n# labeled points\n(a) Cornell\n\ndirected (g=1, a=0.10)\nundirected (a=0.10)\n18\n\n12\n\n16\n\n20\n\n4\n\n6\n\n10\n\n8\n14\n# labeled points\n(c) Washington\n\nC\nU\nA\n\n0.88\n\n0.86\n\n0.84\n\n0.82\n\n0.8\n\n0.78\n\n0.76\n\n0.74\n\n0.1\n\ndirected (g=1, a=0.10)\nundirected (a=0.10)\n18\n\n16\n\n20\n\n6\n\n10\n\n12\n\n8\n14\n# labeled points\n(d) Wisconsin\n\nC\nU\nA\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n0.8\n\n0.9\n\n0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1\n\nparameter values\n(f) (cid:13) (Cornell)\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\nparameter values\n(e) (cid:11) (Cornell)\n\nFigure 3: Classi(cid:2)cation on the WebKB dataset. Figure (a)-(d) depict the AUC scores of\nthe directed and undirected regularization methods on the classi(cid:2)cation problem student\nvs. non-student in each university. Figure (e)-(f) illustrate the in(cid:3)uences of the different\nchoices of the parameters (cid:11) and (cid:13):\n\n4 Conclusions\n\nWe proposed a general regularization framework on directed graphs, which has been vali-\ndated on a real-word Web data set. The remaining problem is how to choose the suitable\nparameters contained in this approach. In addition, it is worth noticing that this framework\ncan be applied without any essential changes to bipartite graphs, e.g. to graphs describing\ncustomers\u2019 purchase behavior in market basket analysis. Moreover, in the absence of la-\nbeled instances, this framework can be utilized in an unsupervised setting as a (spectral)\nclustering method for directed or bipartite graphs. Due to lack of space, we have not been\nable to give a thorough discussion of these topics.\n\nAcknowledgments We would like to thank David Gondek for his help on this work.\n\nReferences\n[1] M. Belkin, I. Matveeva, and P. Niyogi. Regularization and regression on large graphs. In COLT,\n\n2004.\n\n[2] S. Brin and L. Page. The anatomy of a large scale hypertextual web search engine. In Proc. 7th\n\nIntl. WWW Conf., 1998.\n\n[3] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and\n\nJ. Wiener. Graph structure in the Web. In Proc. 9th Intl. WWW Conf., 2000.\n\n[4] S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In\n\nProc. ACM SIGMOD Conf., 1998.\n\n[5] F. Chung. Spectral Graph Theory. Number 92 in Regional Conference Series in Mathematics.\n\nAmerican Mathematical Society, 1997.\n\n[6] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery.\nLearning to extract symbolic knowledge from the World Wide Web. In Proc. 15th National\nConf. on Arti(cid:2)cial Intelligence, 1998.\n\n\f[7] J. Dean and M. Henzinger. Finding related Web pages in the World Wide Web. In Proc. 8th\n\nIntl. WWW Conf., 1999.\n\n[8] C. Ding, X. He, P. Husbands, H. Zha, and H. D. Simon. PageRank, HITS and a uni(cid:2)ed frame-\n\nwork for link analysis. In Proc. 25th ACM SIGIR Conf., 2001.\n\n[9] G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identi(cid:2)cation of Web\n\ncommunities. IEEE Computer, 35(3):66(cid:150)71, 2002.\n\n[10] C. Lee Giles, K. Bollacker, and S. Lawrence. CiteSeer: An automatic citation indexing system.\n\nIn Proc. 3rd ACM Conf. on Digital Libraries, 1998.\n\n[11] T. Joachims, N. Cristianini, and J. Shawe-Taylor. Composite kernels for hypertext categorisa-\n\ntion. In ICML, 2001.\n\n[12] T. Joachims. Transductive learning via spectral graph partitioning. In ICML, 2003.\n[13] J. Kleinberg. Authoritative sources in a hyperlinked environment.\n\nJournal of the ACM,\n\n46(5):604(cid:150)632, 1999.\n\n[14] R. Lempel and S. Moran. SALSA: the stochastic approach for link-structure analysis. ACM\n\nTransactions on Information Systems, 19(2):131(cid:150)160, 2001.\n\n[15] A. Smola and R. Kondor. Kernels and regularization on graphs. In Learning Theory and Kernel\n\nMachines. Springer-Verlag, Berlin-Heidelberg, 2003.\n\n[16] V. N. Vapnik. Statistical learning theory. Wiley, NY, 1998.\n[17] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch\u00a4olkopf. Learning with local and global\n\nconsistency. In NIPS, 2003.\n\n[18] X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian (cid:2)elds and\n\nharmonic functions. In ICML, 2003.\n\nA Proof of Proposition 1\n\nExpand the right site of Equ. (4):\n\n(cid:10)A(f ) = Xu;v Xh\n\nch([u; v])(cid:18) f 2(u)\n\np(u)\n\n(cid:0)\n\n= Xu\n\n(cid:18)Xv Xh\n\nch([u; v])(cid:19) f 2(u)\n\np(u)\n\nch([u; v])\n\n:\n\n(15)\n\nf (u)f (v)\n\npp(u)p(v)\n\nBy substituting Equ. (3), the (cid:2)rst term in the above equality can be rewritten as\n\nf (u)f (v)\n\npp(u)p(v)(cid:19)\n(cid:0)Xu;v Xh\n\nIn addition, the second term in Equ. (15) can be transformed into\n\nXu\n= Xu\n\n(cid:18)Xv Xh\n(cid:18)Xh\n\nXu;v Xh\n= Xu;v Xh\n\nf (u)\n\nw([h; u])w([h; v])\n\nq(h)\n\n(cid:19) f 2(u)\n\np(u)\n\nw([h; u])\n\np(u) (cid:19)f 2(u) = Xu\n\nf 2(u):\n\n(16)\n\nw([h; u])w([h; v])\n\nf (u)f (v)\n\nq(h)\n\nw([h; u])\n\npq(h)p(u)\n\nw([h; v])\n\npp(u)p(v)\npq(h)p(v)\n\nf (v):\n\n(17)\n\nSubstituting Equ. (16) and (17) into (15), we have\nf 2(u) (cid:0)Xu;v Xh\n\n(cid:10)A(f ) = Xu\n\nf (u)\n\nThis completes the proof.\n\nw([h; u])\n\nw([h; v])\n\npq(h)p(u)\n\npq(h)p(v)\n\nf (v):\n\n(18)\n\n\f", "award": [], "sourceid": 2718, "authors": [{"given_name": "Dengyong", "family_name": "Zhou", "institution": null}, {"given_name": "Thomas", "family_name": "Hofmann", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}