{"title": "Testing Unfaithful Gaussian Graphical Models", "book": "Advances in Neural Information Processing Systems", "page_first": 2681, "page_last": 2689, "abstract": "The global Markov property for Gaussian graphical models ensures graph separation implies conditional independence. Specifically if a node set $S$ graph separates nodes $u$ and $v$ then $X_u$ is conditionally independent of $X_v$ given $X_S$. The opposite direction need not be true, that is, $X_u \\perp X_v \\mid X_S$ need not imply $S$ is a node separator of $u$ and $v$. When it does, the relation $X_u \\perp X_v \\mid X_S$ is called faithful. In this paper we provide a characterization of faithful relations and then provide an algorithm to test faithfulness based only on knowledge of other conditional relations of the form $X_i \\perp X_j \\mid X_S$.", "full_text": "Testing Unfaithful Gaussian Graphical Models\n\nDe Wen Soh\n\nYale University\n\nSekhar Tatikonda\n\nYale University\n\nDepartment of Electrical Engineering\n\nDepartment of Electrical Engineering\n\n17 Hillhouse Ave, New Haven, CT 06511\n\ndewen.soh@yale.edu\n\n17 Hillhouse Ave, New Haven, CT 06511\n\nsekhar.tatikonda@yale.edu\n\nAbstract\n\nThe global Markov property for Gaussian graphical models ensures graph separa-\ntion implies conditional independence. Speci\ufb01cally if a node set S graph separates\nnodes u and v then Xu is conditionally independent of Xv given XS. The oppo-\nsite direction need not be true, that is, Xu \u22a5 Xv | XS need not imply S is a node\nseparator of u and v. When it does, the relation Xu \u22a5 Xv | XS is called faithful.\nIn this paper we provide a characterization of faithful relations and then provide\nan algorithm to test faithfulness based only on knowledge of other conditional\nrelations of the form Xi \u22a5 Xj | XS.\n\n1\n\nIntroduction\n\nGraphical models [1, 2, 3] are a popular and important means of representing certain conditional\nindependence relations between random variables. In a Gaussian graphical model, each variable is\nassociated with a node in a graph, and any two nodes are connected by an undirected edge if and\nonly if their two corresponding variables are independent conditioned on the rest of the variables.\nAn edge between two nodes therefore corresponds directly to the non-zero entries of the precision\nmatrix \u2126 = \u03a3\u22121, where \u03a3 is the covariance matrix of the multivariate Gaussian distribution in\nquestion. With the graphical model de\ufb01ned in this way, the Gaussian distribution satis\ufb01es the global\nMarkov property: for any pair of nodes i and j, if all paths between the two pass through a set of\nnodes S, then the variables associated with i and j are conditionally independent given the variables\nassociated with S.\nThe converse of the global Markov property does not always hold. When it does hold for a condi-\ntional independence relation, that relation is called faithful. If it holds for all relations in a model,\nthat model is faithful. Faithfulness is important in structural estimation of graphical models, that is,\nidentifying the zeros of \u2126. It can be challenging to simply invert \u03a3. With faithfulness, to determine\nan edge between nodes i and j, one could run through all possible separator sets S and test for\nconditional independence. If S is small, the computation becomes more accurate. In the work of\n[4, 5, 6, 7], different assumptions are used to bound S to this end.\nThe main problem of faithfulness in graphical models is one of identi\ufb01ability. Can we distinguish\nbetween a faithful graphical model and an unfaithful one? The idea of faithfulness was \ufb01rst explored\nfor conditional independence relations that were satis\ufb01ed in a family of graphs, using the notion of\n\u03b8-Markov perfectness [8, 9]. For Gaussian graphical models with a tree topology the the distribution\nhas been shown to be faithful [10, 11]. In directed graphical models, the class of unfaithful distribu-\ntions has been studied in [12, 13]. In [14, 15], a notion of strong-faithfulness as a means of relaxing\nthe conditions of faithfulness is de\ufb01ned.\nIn this paper, we study the identi\ufb01ability of a conditional independence relation. In [6], the authors\nrestrict their study of Gaussians to walk-summable ones.\nIn [7], the authors restrict their class\nof distributions to loosely connected Markov random \ufb01elds. These restrictions are such that the\n\n1\n\n\flocal conditional independence relations imply something about the global structure of the graphical\nmodel.\nIn our discussion, we assume no such restrictions. We provide a testable condition for\nthe faithfulness of a conditional independence relation in a Gaussian undirected graphical model.\nChecking this condition requires only using other conditional independence relations in the graph.\nWe can think of these conditional independence relations as local patches of the covariance matrix\n\u03a3. To check if a local patch re\ufb02ects the global graph (that is, a local path is faithful) we have to\nmake use of other local patches. Our algorithm is the \ufb01rst algorithm, to the best of our knowledge,\nthat is able to distinguish between faithful and unfaithful conditional independence relations without\nany restrictions on the topology or assumptions on spatial mixing of the Gaussian graphical model.\nThis paper is structured as follows: In Section 2, we discuss some preliminaries. In Section 3, we\nstate our main theorem and proofs, as well as key lemmas used in the proofs. In Section 4, we lay out\nan algorithm that detects unfaithful conditional independence relations in Gaussian graphical models\nusing only local patches of the covariance matrix. We also describe a graph learning algorithm for\nunfaithful graphical models. In Section 5, we discuss possible future directions of research.\n\n2 Preliminaries\n\nWe \ufb01rst de\ufb01ne some linear algebra and graph notation. For a matrix M, let M T denote its transpose\nand let |M| denote its determinant. If I is a subset of its row indices and J a subset of its column\nindices, then we de\ufb01ne the submatrix M IJ as the |I| \u00d7 |J| matrix with elements with both row and\ncolumn indices from I and J respectively. If I = J, we use the notation M I for convenience. Let\nM (\u2212i,\u2212j) be the submatrix of M with the i-th row and j-th column deleted. Let M (\u2212I,\u2212J)\nbe the submatrix with rows with indices from I and columns with indices from J removed. In the\nsame way, for a vector v, we de\ufb01ne vI to be the subvector of v with indices from I. Similarly, we\nde\ufb01ne v(\u2212I) to be the subvector of v with indices not from I. For two vectors v and w, we denote\nthe usual dot product by v \u00b7 w.\nLet G = (W,E) be an undirected graph, where W = {1, . . . , n} is the set of nodes and E is the set\nof edges, namely, a subset of the set of all unordered pairs {(u, v) | u, v \u2208 W}. In our paper we are\ndealing with graphs that have no self-loops and no multiple edges between the same pair of nodes.\nFor I \u2286 W, we denote the induced subgraph on nodes I by GI. For any two distinct nodes u and v,\nwe say that the node set S \u2286 W \\ {u, v} is a node separator of u and v if all paths from u to v must\npass through some node in S.\nLet X = (X1, . . . , Xn) be a multivariate Gaussian distribution with mean \u00b5 and covariance matrix\n\u03a3. Let \u2126 = \u03a3\u22121 be the precision or concentration matrix of the graph. For any set S \u2282 W, we\nde\ufb01ne X S = {Xi | i \u2208 S}. We note here that \u03a3uv = 0 if and only if Xu is independent of Xv,\nwhich we denote by Xu \u22a5 Xv. If Xu is independent of Xv conditioned on some random variable\nZ, we denote this independence relation by Xu \u22a5 Xv | Z. Note that \u2126uv = 0 if and only if\nXu \u22a5 Xv | XW\\{u,v}.\nFor any set S \u2286 W, the conditional distribution of XW\\S given X S = xS follows a multivariate\nGaussian distribution with conditional mean \u00b5W\\S \u2212 \u03a3(W\\S)S\u03a3\u22121\nS (xS \u2212 \u00b5S) and conditional\ncovariance matrix \u03a3W\\S \u2212 \u03a3(W\\S)S\u03a3\u22121\nS \u03a3S(W\\S). For distinct nodes u, v \u2208 W and any set\nS \u2286 W \\ {u, v}, the following property easily follows.\nProposition 1 Xu \u22a5 Xv | X S if and only if \u03a3uv = \u03a3uS\u03a3\u22121\nThe concentration graph G\u03a3 = (W,E) of a multivariate Gaussian distribution X is de\ufb01ned as\nfollows: We have node set W = {1, . . . , n}, with random variable Xu associated with node u, and\nedge set E where unordered pair (u, v) is in E if and only if \u2126uv (cid:54)= 0. The multivariate Gaussian\ndistribution, along with its concentration graph, is also known as a Gaussian graphical model. Any\nGaussian graphical model satis\ufb01es the global Markov property, that is, if S is a node separator of\nnodes u and v in G\u03a3, then Xu \u22a5 Xv | X S. The converse is not necessarily true, and therefore, this\nmotivates us to de\ufb01ne faithfulness in a graphical model.\nDe\ufb01nition 1 The conditional independence relation Xu \u22a5 Xv | X S is said to be faithful if S is a\nnode separator of u and v in the concentration graph G\u03a3. Otherwise, it is unfaithful. A multivari-\n\nS \u03a3Sv.\n\n2\n\n\f\uf8ee\uf8ef\uf8f03\n\n2\n1\n2\n\n\u03a3 =\n\n\uf8f9\uf8fa\uf8fb .\n\n2\n1\n1\n6\n\n2\n4\n2\n1\n\n1\n2\n7\n1\n\n(1)\n\nFigure 1: Even though \u03a3S\u222a{u,v} is a submatrix of \u03a3, G\u03a3S\u222a{u,v} need not be a subgraph of G\u03a3.\nEdge properties do not translate as well. That means the local patch \u03a3S\u222a{u,v} need not re\ufb02ect the\nedge properties of the global graph structure of \u03a3.\n\nate Gaussian distribution is faithful if all its conditional independence relations are faithful. The\ndistribution is unfaithful if it is not faithful.\n\nExample 1 (Example of an unfaithful Gaussian distribution) Consider the multivariate Gaus-\nsian distribution X = (X1, X2, X3, X4) with zero mean and positive de\ufb01nite covariance matrix\n\nBy Proposition 1, we have X1 \u22a5 X3 | X2 since \u03a313 = \u03a312\u03a3\u22121\n22 \u03a323. However, the precision matrix\n\u2126 = \u03a3\u22121 has no zero entries, so the concentration graph is a complete graph. This means that\nnode 2 is not a node separator of nodes 1 and 3. The independence relation X1 \u22a5 X3 | X2 is thus\nnot faithful and the distribution X is not faithful as well.\nWe can think of the submatrix \u03a3S\u222a{u,v} as a local patch of the covariance matrix \u03a3. When Xu \u22a5\nXv | X S, nodes u and v are not connected by an edge in the concentration graph of the local patch\n\u03a3S\u222a{u,v}, that is, we have (\u03a3\u22121\nS\u222a{u,v})uv = 0. This does not imply that u and v are not connected\nin the concentration graph G\u03a3. If Xu \u22a5 Xv | X S is faithful, then the implication follows. If\nXu \u22a5 Xv | X S is unfaithful, then u and v may be connected in G\u03a3 (See Figure 1).\nFaithfulness is important in structural estimation, especially in high-dimensional settings. If we as-\nsume faithfulness, then \ufb01nding a node set S such that Xu \u22a5 Xv | X S would imply that there is no\nedge between u and v in the concentration graph. When we have access only to the sample covari-\nance instead of the population covariance matrix, if the size of S is small compared to n, the error\nof computing Xu \u22a5 Xv | X S is much less than the error of inverting the entire covariance matrix.\nThis method of searching through all possible node separator sets of a certain size is employed in\n[6, 7]. As mention before, these authors impose other restrictions on their models to overcome the\nproblem of unfaithfulness. We do not place any restriction on the Gaussian models. However, we\ndo not provide probabilistic bounds when dealing with samples, which they do.\n\n3 Main Result\n\nIn this section, we will state our main theoretical result. This result is the backbone for our algorithm\nthat differentiates a faithful conditional independence relation from an unfaithful one. Our main\ngoal is to decide if a conditional independence relation Xu \u22a5 Xv | X S is faithful or not. For\nconvenience, we will denote G\u03a3 simply by G = (W,E) for the rest of this paper. Now let us\nsuppose that it is faithful; S is a node separator for u and v in G. Then we should not be able to \ufb01nd\na path from u to v in the induced subgraph GW\\S. The main idea therefore is to search for a path\nbetween u and v in GW\\S. If this fails, then we know that the conditional independence relation is\nfaithful.\nBy the global Markov property, for any two distinct nodes i, j \u2208 W \\ S, if Xi (cid:54)\u22a5 Xj | X S, then we\nknow that there is a path between i and j in GW\\S. Thus, if we \ufb01nd some w \u2208 W \\ (S \u222a{i, j}) such\nthat Xu (cid:54)\u22a5 Xw | X S and Xv (cid:54)\u22a5 Xw | X S, then a path exists from u to w and another exists from\nv to w, so u and v are connected in GW\\S. This would imply that Xu \u22a5 Xv | X S is unfaithful.\n\n3\n\n\fHowever, testing for paths this way does not necessarily rule out all possible paths in GW\\S. The\nproblem is that some paths may be obscured by other unfaithful conditional independence relations.\nThere may be some w whereby Xu (cid:54)\u22a5 Xw | X S and Xv \u22a5 Xw | X S, but the latter relation is\nunfaithful. This path from u to v through w is thus not detected by these two independence relations.\nWe will show however, that if there is no path from u to v in GW\\S, then we cannot \ufb01nd a series of\ndistinct nodes w1, . . . , wt \u2208 W \\ (S \u222a{u, v}) for some natural number t > 0 such that Xu (cid:54)\u22a5 Xw1 |\nX S, Xw1 (cid:54)\u22a5 Xw2 | X S, . . ., Xwt\u22121 (cid:54)\u22a5 Xwt | X S, Xwk (cid:54)\u22a5 Xv | X S. This is to be expected because\nof the global Markov property. What is more surprising about our result is that the converse is true.\nIf we cannot \ufb01nd such nodes w1, . . . , wt, then u and v are not connected by a path in GW\\S. This\nmeans that if there is a path from u to v in GW\\S, even though it may be hidden by some unfaithful\nconditional independence relations, ultimately there are enough conditional dependence relations\nto reveal that u and v are connected by a path in GW\\S. This gives us an equivalent condition for\nfaithfulness that is in terms of conditional independence relations.\nNot being able to \ufb01nd a series of nodes w1, . . . , wt that form a string of conditional dependencies\nfrom u to v as described in the previous paragraph is equivalent to the following: we can \ufb01nd a\npartition (U, V ) of W \\ S with u \u2208 U and v \u2208 V such that for all i \u2208 U and j \u2208 V , we have\nXi \u22a5 Xj | X S. Our main result uses the existence of this partition as a test for faithfulness.\nTheorem 1 Let X = (X1, . . . , Xn) be a Gaussian distribution with mean zero, covariance matrix\n\u03a3 and concentration matrix \u2126. Let u, v be two distinct elements of W and S \u2282 W \\{i, j} such that\nXu \u22a5 Xv | X S. Then Xu \u22a5 Xv | X S is faithful if and only if there exists a partition of W \\ S into\ntwo disjoint sets U and V such that u \u2208 U, v \u2208 V , and Xi \u22a5 Xj | X S for any i \u2208 U and j \u2208 V .\nProof of Theorem 1 . One direction is easy. Suppose Xu \u22a5 Xv | X S is faithful and S separates\nu and v in G. Let U be the set of all nodes reachable from u in GW\\S including u. Let V =\n{W \\ S \u222a U}. Then v \u2208 V since S separates u and v in G. Also, for any i \u2208 U and j \u2208 V , S\nseparates i and j in G, and by the global Markov property, Xi \u22a5 Xj | X S.\nNext, we prove the opposite direction. Suppose that there exists a partition of W \\ S into two sets\nU and V such that u \u2208 U, v \u2208 V , and Xi \u22a5 Xj | X S. for any i \u2208 U and j \u2208 V . Our goal\nis to show that S separates u and v in the concentration graph G of X. Let \u2126W\\S = \u2126(cid:48) where\nthe latter is the submatrix of the precision matrix \u2126. Let the h-th column vector of \u2126(cid:48) be \u03c9(h), for\nh = 1, . . . ,|W \\ S|.\nStep 1: We \ufb01rst solve the trivial case where |U| = |V | = 1. If |U| = |V | = 1, then S = W \\{u, v},\nand trivially, Xu \u22a5 Xv | XW\\{u,v} implies S separates u and v, and we are done. Thus, we assume\nfor the rest of the proof that U and V cannot both be size one.\nStep 2: We deal with a second trivial case in our proof, which is the case where \u03c9(i)(\u2212i) is identi-\ncally zero for any i \u2208 U. In the case where i = u, we have \u2126uj = 0 for all j \u2208 W \\ (S \u222a {u}).\nThis implies that u is an isolated node in GW\\S, and so trivially, S must separate u and v, and we\nare done. In the case where i (cid:54)= u, we can manipulate the sets U and V so that \u03c9(i)(\u2212i) is not\nidentically zero for any i \u2208 U, i (cid:54)= u. If there is some i(cid:48) \u2208 U, i(cid:48) (cid:54)= u, such that X(cid:48)\ni \u22a5 Xh | X S\nfor all h \u2208 U, h (cid:54)= i(cid:48), then we can simply move i(cid:48) from U into V to form a new partition (U(cid:48), V (cid:48))\nof W \\ S. This new partition still satis\ufb01es u \u2208 U(cid:48), v \u2208 V (cid:48), and Xi \u22a5 Xj | X S for all i \u2208 U(cid:48) and\nj \u2208 V (cid:48). We can therefore shift nodes one by one over from U to V until either |U| = 1, or for any\ni \u2208 U, i (cid:54)= u, there exists an h \u2208 U such that Xi (cid:54)\u22a5 Xh | X S. By the global Markov property, this\nassumption implies that every node i \u2208 U, i (cid:54)= u is connected by a path to some node in U, which\nmeans it must connected to some node in W \\ (S \u222a {i}) by an edge. Thus, for all i \u2208 U, i (cid:54)= u, the\nvector \u03c9(i)(\u2212i) is non-zero.\nStep 3: We can express the conditional independence relations in terms of elements in the precision\nmatrix \u2126, since the topology of G can be read off the non-zero entries of \u2126. The proof of the\nfollowing Lemma 1 uses the matrix block inversion formula and we omit the proof due to space.\nLemma 1 Xi \u22a5 Xj | X S if and only if |\u2126(cid:48)(\u2212i,\u2212j)| = 0.\nFrom Lemma 1, observe that the conditional independence relations Xi \u22a5 Xj | X S are all state-\nments about the cofactors of the matrix \u2126(cid:48). It follows immediately from Lemma 1 that the vector\n\n4\n\n\fsets {\u03c9(h)(\u2212i) : h \u2208 W \\ S, h (cid:54)= j} are linearly dependent for all i \u2208 U and j \u2208 V . Each of these\nvector sets consists of the i-th entry truncated column vectors of \u2126(cid:48), with the j-th column vector\nexcluded. Assume that the matrix \u2126(cid:48) is partitioned as follows,\n\n\u2126(cid:48) =\n\n.\n\n(2)\n\n(cid:20)\u2126U U \u2126U V\n\n\u2126V U \u2126V V\n\n(cid:21)\n\nThe strategy of this proof is to use these linear dependencies to show that the submatrix \u2126V U has to\nbe zero. This would imply that for any node in U, it is not connected to any node in V by an edge.\nTherefore, S is a node separator of u and v in G, which is our goal.\nStep 4: Let us \ufb01x i \u2208 U. Consider the vector sets of the form {\u03c9(h)(\u2212i) : h \u2208 W \\ S, h (cid:54)= j},\nj \u2208 V . There are |V | such sets. The intersection of these sets is the vector set {\u03c9(h)(\u2212i) : h \u2208 U}.\nWe want to use the |V | linearly dependent vector sets to say something about the linear dependency\nof {\u03c9(h)(\u2212i) : h \u2208 U}. With that in mind, we have the following lemmas.\nLemma 2 The vector set {\u03c9(h)(\u2212i) : h \u2208 U} is linearly dependent for any i \u2208 U.\n\nStep 5: Our \ufb01nal step is to show that these linear dependencies imply that \u2126U V = 0. We now have\n|U| vector sets {\u03c9(h)(\u2212i) : h \u2208 U} that are linearly dependent. These sets are truncated versions\nof the vector set {\u03c9(h) : h \u2208 U}, and they are speci\ufb01cally truncated by taking out entries only in U\nand not in V . The set {\u03c9(h) : h \u2208 U} must be linearly independent since \u2126(cid:48) is invertible. Observe\nthat the entries of \u2126V U are contained in {\u03c9(h)(\u2212i) : h \u2208 U} for all i \u2208 U. We can now use these\nvector sets to say something about the entries of \u2126V U .\n\nLemma 3 The vector components \u03c9(i)\n\nj = \u2126ij are zero for all i \u2208 U and j \u2208 V .\n\nThis implies that any node in U is not connected to any node in V by an edge. Therefore, S separates\nu and v in G and the relation X u \u22a5 X v | X S is faithful.\n(cid:3)\n\n4 Algorithm for Testing Unfaithfulness\n\nIn this section, we will describe a novel algorithm for testing faithfulness of a conditional indepen-\ndence relation Xu \u22a5 Xv | X S. The algorithm tests the necessary and suf\ufb01cient conditions for\nfaithfulness, namely, that we can \ufb01nd a partition (U, V ) of W \\ S such that u \u2208 U, v \u2208 V , and\nXi \u22a5 Xj | X S for all i \u2208 U and j \u2208 V .\nAlgorithm 1 (Testing Faithfulness) Input covariance matrix \u03a3.\n\n1. De\ufb01ne new graph \u00afG = { \u00afW, \u00afE}, where \u00afW = W \\ S and \u00afE = {(i, j) : i, j \u2208 W \\ S, Xi (cid:54)\u22a5\n\nXj | X S, i (cid:54)= j}.\n\n2. Generate set U to be the set of all nodes in \u00afW that are connected to u by a path in \u00afG,\n\nincluding u. (A breadth-\ufb01rst search could be used.)\n\n3. If v \u2208 U, there exists a path from u to v in \u00afG, output Xu \u22a5 Xv | X S as unfaithful.\n4. If v /\u2208 U, let V = \u00afW \\ U. Output Xu \u22a5 Xv | X S as faithful.\n\nIf we consider each test of whether two nodes are conditionally independent given X S as one step,\nthe running time of the algorithm is the that of the algorithm used to determine set U. If a breadth-\n\ufb01rst search is used, the running time is O(|W \\ S|2|).\nTheorem 2 Suppose Xu \u22a5 Xv | X S. If S is a node separator of u and v in the concentration\ngraph, then Algorithm 1 will classify Xu \u22a5 Xv | X S as faithful. Otherwise, Algorithm 1 will\nclassify Xu \u22a5 Xv | X S as unfaithful.\nProof. If Algorithm 1 determines that Xu \u22a5 Xv | X S is faithful, that means that it has found\na partition (U, V ) of W \\ S such that u \u2208 U, v \u2208 V , and Xi \u22a5 Xj | X S for any i \u2208 U and\n\n5\n\n\fFigure 2: The concentration graph of the distribution in Example 4.\n\n(cid:54)\u22a5 Xw(cid:96)t\n\n(cid:54)\u22a5 Xw(cid:96)2\n\n| X S, . . ., Xw(cid:96)t\u22121\n\nj \u2208 V . By Theorem 1, this implies that Xu \u22a5 Xv | X S is faithful and so Algorithm 1 is correct.\nIf Algorithm 1 decides that Xu \u22a5 Xv | X S is unfaithful, it does so by \ufb01nding a series of nodes\nw(cid:96)1 , . . . , w(cid:96)t \u2208 W \\ (S \u222a {u, v}) for some natural number t > 0 such that Xu (cid:54)\u22a5 Xw(cid:96)1\n| X S,\n| X S, Xwk (cid:54)\u22a5 Xv | X S, where (cid:96)1, . . . , (cid:96)t are t distinct\nXw(cid:96)1\nindices from R. By the global Markov property, this means that u is connected to v by a path in G,\nso this implies that Xu \u22a5 Xv | X S is unfaithful and Algorithm 1 is correct.\n(cid:3)\nExample 2 (Testing an Unfaithful Distribution (1)) Let us take a look again at the 4-dimensional\nGaussian distribution in Example 1. Suppose we want to test if X1 \u22a5 X3 | X2 is faithful or not.\nFrom its covariance matrix, we have \u03a314 \u2212 \u03a312\u03a3\u22121\n2 \u03a324 = 2 \u2212 2 \u00b7 1/4 = 3/2 (cid:54)= 0, so this implies\nthat X1 (cid:54)\u22a5 X4 | X2. Similarly, X3 (cid:54)\u22a5 X4 | X2. So there exists a path from X1 to X3 in G{1,3,4} (it\nis trivially the edge (1, 3)), so the relation X1 \u22a5 X3 | X2 is unfaithful.\nExample 3 (Testing an Unfaithful Distribution (2)) Consider a 6-dimensional Gaussian distribu-\ntion X = (X1, . . . , X6) that has the covariance matrix\n2\n1\n4\n9\n1\n6\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb .\n\n2\n2\n10\n4\n3\n8\n\n4\n3\n8\n6\n9\n12\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\n7\n1\n2\n2\n3\n4\n\n3\n\n2.25\n\n3\n1\n11\n9\n\n1\n8\n2\n1\n\n3\n\n2.25\n\n\u03a3 =\n\nWe want to test if the relation X1 \u22a5 X2 | X6 is faithful or unfaithful. Working out the\nnecessary conditional independence relations to obtain \u00afG with S = {6}, we observed that\n(1, 3), (3, 5), (5, 4), (4, 2) \u2208 \u00afE This means that 2 is reachable from 1 in G, so the relation is un-\nfaithful. In fact, the concentration graph is the complete graph K6, and 6 is not a node separator of\n1 and 2.\n\nExample 4 (Testing a Faithful Distribution) We consider a 6-dimensional Gaussian distribution\nX = (X1, . . . , X6) that has a covariance matrix which is similar to the distribution in Example 3,\n\n(3)\n\n(4)\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\n7\n1\n2\n2\n3\n4\n\n\u03a3 =\n\n1\n8\n2\n1\n\n2.25\n\n3\n\n2\n2\n10\n4\n6\n8\n\n2\n1\n4\n9\n1\n6\n\n3\n\n2.25\n\n6\n1\n11\n9\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb .\n\n4\n3\n8\n6\n9\n12\n\nObserve that only \u03a335 is changed. We again test the relation X1 \u22a5 X2 | X6. Running the algorithm\nproduces a viable partition with U = {1, 3} and V = {2, 4, 5}. This agrees with the concentration\ngraph, as shown in Figure 2.\n\nWe include now an algorithm that learns the topology of a class of (possibly) unfaithful Gaussian\ngraphical models using local patches. Let us \ufb01x a natural number K < n\u2212 2. We consider graphical\nmodels that satisfy the following assumption: for any nodes i and j that are not connected by an\nedge in G, there exists a vertex set S with |S| \u2264 K such that S is a vertex separator of i and j.\nCertain graphs have this property, including graphs with bounded degree and some random graphs\nwith high probability, like the Erd\u00a8os-Renyi graph. The following algorithm learns the edges of a\ngraphical model that satis\ufb01es the above assumptions.\n\nAlgorithm 2 (Edge Learning) Input covariance matrix \u03a3. For each node pair (i, j),\n\n6\n\n\f1. Let F = {S \u2282 W \\ {i, j} : |S| = K, Xi \u22a5 Xj | X S, and it is faithful}.\n2. If F (cid:54)= \u03c6, output (i, j) /\u2208 E. If F = \u03c6, output (i, j) \u2208 E.\n3. Output E.\n\n2\n\nAgain, considering a computation of a conditional independence relation as one step, the running\n\ntime of the algorithm is O(nK+4). This comes from exhaustively checking through all(cid:0)n\u22122\nble separation sets S for each of the(cid:0)n\n\n(cid:1) possi-\n(cid:1) (i, j) pairs. Each time there is a conditional independence\n\nK\n\nrelation, we have to check for faithfulness using Algorithm 1, and the running time for that is O(n2).\nThe novelty of the algorithm is in its ability to learn graphical models that are unfaithful.\nTheorem 3 Algorithm 2 recovers the concentration graph G.\nProof. If F (cid:54)= \u03c6, F is non-empty so there exists an S such that Xi \u22a5 Xj | X S is faithful. Therefore,\nS separates i and j in G and (i, j) /\u2208 E. If F = \u03c6, then for any S \u2286 W, |S| \u2264 K, we have either\nXi (cid:54)\u22a5 Xj | X S or Xi \u22a5 Xj | X S but it is unfaithful. In both cases, S does not separate i and j in\nG, for any S \u2286 W, |S| \u2264 K. By the assumption on the graphical model, (i, j) must be in E. This\nshows that Algorithm 2 will correctly output the edges of G.\n(cid:3)\n\n5 Conclusion\n\nWe have presented an equivalence condition for faithfulness in Gaussian graphical models and an\nalgorithm to test whether a conditional independence relation is faithful or not. Gaussian distribu-\ntions are special because its conditional independence relations depend on its covariance matrix,\nwhose inverse, the precision matrix, provides us with a graph structure. The question of faithfulness\nin other Markov random \ufb01elds, like Ising models, is an area of study that has much to be explored.\nThe same questions can be asked, such as when unfaithful conditional independence relations occur,\nand whether they can be identi\ufb01ed. In the future, we plan to extend some of these results to other\nMarkov random \ufb01elds. Determining statistical guarantees is another important direction to explore.\n\n6 Appendix\n\nIn this case, |U| > 1 since |U| and |V | cannot both be one.\n\n6.1 Proof of Lemma 2\nCase 1: |V | = 1.\n{\u03c9(h)(\u2212i) : h \u2208 W \\ S, h (cid:54)= j} is the vector set {\u03c9(h)(\u2212i) : h \u2208 U}.\nCase 2: |V | > 1. Let us \ufb01x i \u2208 U. Note that \u03c9(i)(\u2212j) (cid:54)= 0 for all j \u2208 W \\ (S \u222a {i}), since the\n(cid:54)= 0. Also, \u03c9(i)(\u2212i) (cid:54)= 0\ndiagonal entries of a positive de\ufb01nite matrix are non-zero, that is, \u03c9(i)\nfor all i \u2208 U as well by Step 2 of the proof of Theorem 1. As such, the linear dependency of\ni\n{\u03c9(h)(\u2212i) : h \u2208 W \\ S, h (cid:54)= j} for any i \u2208 U and j \u2208 V implies that there exists scalars c(i,j)\n, . . .,\nc(i,j)\nj\u22121 , c(i,j)\n\nj+1 , . . ., c(i,j)\n\nthe vector set\n\n1\n\n|W\\S| such that (cid:88)\n\nh \u03c9(h)(\u2212i) = 0.\nc(i,j)\n\n(5)\n\n1\u2264h\u2264|W\\S|,h(cid:54)=j\n\ni\n\ni\n\n= 0, the vector set {\u03c9(h)(\u2212i) : 1 \u2264 h \u2264 |W \\ S|, h (cid:54)= u, j} is linearly dependent. This\nIf c(i,j)\nimplies that the principal submatrix \u2126(cid:48)(\u2212i,\u2212i) has zero determinant, which contradicts \u2126(cid:48) being\n(cid:54)= 0 for all i \u2208 U and j \u2208 V . For each i \u2208 U and j \u2208 V , this\npositive de\ufb01nite. Thus, we have c(i,j)\nallows us to manipulate (5) such that w(i)(\u2212i) is expressed in terms of the other vectors in (5).\n|W\\S|), for i \u2208\nMore precisely, let \u00afc(i,j) = [c(i,j)\nU and j \u2208 V . Note that \u2126(cid:48)(\u2212j,\u2212{i, j}) has the form [\u03c9(1)(\u2212i), . . ., \u03c9(i\u22121)(\u2212i), \u03c9(i+1)(\u2212i), . . .,\n\u03c9(j\u22121)(\u2212i), \u03c9(j+1)(\u2212i), . . ., \u03c9(|W\\S|)(\u2212i)], where the vectors in the notation described above are\ncolumn vectors. From (5), for any distinct j1, j2 \u2208 V , we can generate equations\n\u03c9(i)(\u2212i) = \u2126(cid:48)(\u2212j1,\u2212{i.j1})\u00afc(i,j1) = \u2126(cid:48)(\u2212j2,\u2212{i, j2})\u00afc(i,j2),\n\ni+1 , . . . , c(i,j)\n\nj+1 , . . . , c(i,j)\n\n]\u22121(c(i,j)\n\nj\u22121 , c(i,j)\n\n, . . . , c(i,j)\n\ni\u22121 , c(i,j)\n\n(6)\n\n1\n\ni\n\n7\n\n\for effectively,\n\nj2\n\n\u2126(cid:48)(\u2212j1,\u2212{i.j1})\u00afc(i,j1) \u2212 \u2126(cid:48)(\u2212j2,\u2212{i, j2})\u00afc(i,j2) = 0.\n\n(7)\nThis is a linear equation in terms of the column vectors {\u03c9(h)(\u2212i) : h (cid:54)= i, h \u2208 W}. These vectors\nmust be linear independent, otherwise |\u2126(cid:48)(\u2212i,\u2212i)| = 0. Therefore, the coef\ufb01cient of each of the\nvectors must be zero. Speci\ufb01cally, the coef\ufb01cient of \u03c9(j2)(\u2212i) in 7 is c(i,j1)\nis zero, which\nimplies that c(i,j1)\nis zero as well. Since this holds for any\nj1, j2 \u2208 V , this implies that for any j \u2208 V , c(i,j)\nThere are now two cases to consider. The \ufb01rst is where |U| = 1. Here, i = u. Then, by (5),\n= 0 for all distinct j, h \u2208 V implies that \u03c9u(\u2212u) = 0, which is a contradiction. Therefore\nc(u,j)\nh\n|U| (cid:54)= 1, so |U| must be greater than 1. We then substitute c(i,j)\nh = 0, for all distinct j, h \u2208 V , into\n(5) to deduce that {\u03c9(h)(\u2212i) : h \u2208 U} is indeed linearly dependent for any i \u2208 U.\n\nh = 0 for all h \u2208 V, h (cid:54)= j.\n\nis zero, as required. Similarly, c(i,j2)\n\n/c(i,j1)\n\nj1\n\nj2\n\ni\n\n(cid:3)\n\n6.2 Proof of Lemma 3\nLet |U| = k > 1 We arrange the indices of the column vectors of \u2126(cid:48) so that U = {1, . . . , k}. For\neach i \u2208 U, since {\u03c9(h)(\u2212i) : h \u2208 U} is linearly dependent and {\u03c9(h) : h \u2208 U} is linearly indepen-\ni \u03c9(h)(\u2212i) = 0.\ndent, there exists a non-zero vector d(i) = (d(i)\nU , since \u2126(cid:48) is symmetric, and so is a\nLet y(i) = (\u03c9(1)\nnon-zero vector for all i = 1, . . . , k. Because \u03c9(1), . . . , \u03c9(k) are linearly independent, for each\ni = 1, . . . , k, we have d(i) \u00b7 y(h) = 0 for all h (cid:54)= i, h \u2208 U and d(i) \u00b7 y(i) (cid:54)= 0.\nWe next show that vectors d(1), . . . , d(k) are linearly independent. Suppose that they are not. Then\nthere exists some index i \u2208 U and scalars a1, . . . , ai\u22121, ai+1, . . . , ak not all zeros, such that d(i) =\n1\u2264h\u2264k,j(cid:54)=i ahd(j) \u00b7 y(i) = 0, a contradiction.\n\n(cid:80)\n1\u2264j\u2264k,j(cid:54)=i ajd(j). We then have 0 (cid:54)= d(i) \u00b7 y(i) =(cid:80)\n\nk ) \u2208 Rk such that(cid:80)k\n\n) \u2208 Rk. Note that y(i) = \u03c9(i)\n\n1 , . . . , d(i)\n\n, . . . , \u03c9(k)\n\nh=1 d(i)\n\ni\n\ni\n\nTherefore, d(1), . . . , d(k) are linearly independent.\nFor each j such that k+1 \u2264 j \u2264 |W\\S| (that is, j \u2208 V ), let us de\ufb01ne yj = (\u03c9(1)\n). Let us\n\ufb01x j. Observe that d(h) \u00b7 yj = 0 for all h = 1, . . . , k. Since d(1), . . . , d(k) are linearly independent,\nthis implies that yj is the zero vector. Since this holds for all j such that k + 1 \u2264 j \u2264 |W \\ S|,\n(cid:3)\ntherefore, \u03c9(i)\n\nj = 0 for all 1 \u2264 i \u2264 k and k + 1 \u2264 j \u2264 |W \\ S|.\n\n, . . . , \u03c9(k)\n\nj\n\nj\n\nAcknowledgments\n\nThis work was partially supported by the National Science Foundation under Grant CNS-0963989\nand Grant CCF-1217023.\n\nReferences\n[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.\n[2] S. L. Lauritzen, Graphical models. New York: Oxford University Press, 1996.\n[3] J. Whittaker, Graphical Models in Applied Multivariate Statistics. Wiley, 1990.\n[4] N. Meinshausen and P. B\u00a8uhlmann, \u201cHigh dimensional graphs and variable selection with the lasso,\u201d An-\n\nnals of Statistics, vol. 34, no. 3, pp. 1436\u20131462, 2006.\n\n[5] P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu, \u201cHigh dimensional covariance estimation by\nminimizing (cid:96)-1 penalized log-determinant divergence,\u201d Electronic Journal in Statistics, vol. 4, pp. 935\u2013\n980, 2011.\n\n[6] A. Anandkumar, V. Tan, F. Huang, and A. Willsky, \u201cHigh-dimensional gaussian graphical model se-\nlection: walk-summability and local separation criterion,\u201d J. Machine Learning Research, vol. 13, pp.\n2293\u20132337, Aug 2012.\n\n8\n\n\f[7] R. Wu, R. Srikant, and J. Ni, \u201cLearning loosely connected markov random \ufb01elds,\u201d Stochastic Systems,\n\nvol. 3, 2013.\n\n[8] M. Frydenberg, \u201cMarginalisation and collapsibility in graphical interaction models,\u201d Annals of Statistics,\n\nvol. 18, pp. 790\u2013805, 1990.\n\n[9] G. Kauermann, \u201cOn a dualization of graphical gaussian models,\u201d Scandinavian Journal of Statistics,\n\nvol. 23, no. 1, pp. 105\u2013116, 1996.\n\n[10] A. Becker, D. Geiger, and C. Meek, \u201cPerfect tree-like markovian distributions,\u201d Probability and Mathe-\n\nmatical Statistics, vol. 25, no. 2, pp. 231\u2013239, 2005.\n\n[11] D. Malouche and B. Rajaratnam, \u201cGaussian covariance faithful markov trees,\u201d Technical report, Depart-\n\nment of Statistics, Stanford University, 2009.\n\n[12] P. Spirites, C. Glymore, and R. Scheines, Causation, prediction and search. New York: Springer Verlag,\n\n1993.\n\n[13] C. Meek, \u201cStrong completeness and faithfulness in bayesian networks,\u201d in Proceedings of the eleventh\n\ninternational conference on uncertainty in arti\ufb01cial intelligence, 1995.\n\n[14] C. Uhler, G. Raskutti, P. B\u00a8uhlmann, and B. Yu, \u201cGeometry of faithfulness assumption in causal inference,\u201d\n\nAnnals of Statistics, vol. 41, pp. 436\u2013463, 2013.\n\n[15] S. Lin, C. Uhler, B. Sturmfels, and P. B\u00a8uhlmann, \u201cHypersurfaces and their singularities in partial correla-\n\ntion testing,\u201d Preprint.\n\n9\n\n\f", "award": [], "sourceid": 1389, "authors": [{"given_name": "De Wen", "family_name": "Soh", "institution": "Yale University"}, {"given_name": "Sekhar", "family_name": "Tatikonda", "institution": "Yale University"}]}