{"title": "Identification and Overidentification of Linear Structural Equation Models", "book": "Advances in Neural Information Processing Systems", "page_first": 1587, "page_last": 1595, "abstract": "In this paper, we address the problems of identifying linear structural equation models and discovering the constraints they imply. We first extend the half-trek criterion to cover a broader class of models and apply our extension to finding testable constraints implied by the model. We then show that any semi-Markovian linear model can be recursively decomposed into simpler sub-models, resulting in improved identification and constraint discovery power. Finally, we show that, unlike the existing methods developed for linear models, the resulting method subsumes the identification and constraint discovery algorithms for non-parametric models.", "full_text": "Identi\ufb01cation and Overidenti\ufb01cation of\n\nLinear Structural Equation Models\n\nBryant Chen\n\nUniversity of California, Los Angeles\n\nComputer Science Department\n\nLos Angeles, CA, 90095-1596, USA\n\nAbstract\n\nIn this paper, we address the problems of identifying linear structural equation\nmodels and discovering the constraints they imply. We \ufb01rst extend the half-trek\ncriterion to cover a broader class of models and apply our extension to \ufb01nding\ntestable constraints implied by the model. We then show that any semi-Markovian\nlinear model can be recursively decomposed into simpler sub-models, resulting\nin improved identi\ufb01cation and constraint discovery power. Finally, we show that,\nunlike the existing methods developed for linear models, the resulting method\nsubsumes the identi\ufb01cation and constraint discovery algorithms for non-parametric\nmodels.\n\n1\n\nIntroduction\n\nMany researchers, particularly in economics, psychology, and the social sciences, use linear structural\nequation models (SEMs) to describe the causal and statistical relationships between a set of variables,\npredict the effects of interventions and policies, and to estimate parameters of interest. When modeling\nusing linear SEMs, researchers typically specify the causal structure (i.e. exclusion restrictions and\nindependence restrictions between error terms) from domain knowledge, leaving the structural\ncoef\ufb01cients (representing the strength of the causal relationships) as free parameters to be estimated\nfrom data. If these coef\ufb01cients are known, then total effects, direct effects, and counterfactuals\ncan be computed from them directly (Balke and Pearl, 1994). However, in some cases, the causal\nassumptions embedded in the model are not enough to uniquely determine one or more coef\ufb01cients\nfrom the probability distribution, and therefore, cannot be estimated using data. In such cases, we say\nthat the coef\ufb01cient is not identi\ufb01ed or not identi\ufb01able1.\nIn other cases, a coef\ufb01cient may be overidenti\ufb01ed in addition to being identi\ufb01ed, meaning that there\nare at least two minimal sets of logically independent assumptions in the model that are suf\ufb01cient\nfor identifying a coef\ufb01cient, and the identi\ufb01ed expressions for the coef\ufb01cient are distinct functions\nof the covariance matrix (Pearl, 2004). As a result, the model imposes a testable constraint on the\nprobability distribution that the two (or more) identi\ufb01ed expressions for the coef\ufb01cient are equal.\nAs compact and transparent representations of the model\u2019s structure, causal graphs provide a con-\nvenient tool to aid in the identi\ufb01cation of coef\ufb01cients. First utilized as a causal inference tool by\nWright (1921), graphs have more recently been applied to identify causal effects in non-parametric\ncausal models (Pearl, 2009) and enabled the development of causal effect identi\ufb01cation algorithms\nthat are complete for non-parametric models (Huang and Valtorta, 2006; Shpitser and Pearl, 2006).\nThese algorithms can be applied to the identi\ufb01cation of coef\ufb01cients in linear SEMs by identifying\nnon-parametric direct effects, which are closely related to structural coef\ufb01cients (Tian, 2005; Chen\nand Pearl, 2014). Algorithms designed speci\ufb01cally for the identi\ufb01cation of linear SEMs were de-\n\n1We will also use the term \u201cidenti\ufb01ed\" with respect to individual variables and the model as a whole.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fveloped by Brito and Pearl (2002), Brito (2004), Tian (2005, 2007, 2009), Foygel et al. (2012), and\nChen et al. (2014).\nGraphs have also proven to be valuable tools in the discovery of testable implications. It is well\nknown that conditional independence relationships can be easily read from the causal graph using d-\nseparation (Pearl, 2009), and Kang and Tian (2009) gave a procedure for linear SEMs that enumerates\na set of conditional independences that imply all others. In non-parametric models without latent\nvariables or correlated error terms, these conditional independence constraints represent all of the\ntestable implications of the model (Pearl, 2009). In models with latent variables and/or correlated\nerror terms, there may be additional constraints implied by the model. These non-independence\nconstraints, often called Verma constraints, were \ufb01rst noted by Verma and Pearl (1990), and Tian\nand Pearl (2002b) and Shpitser and Pearl (2008) developed graphical algorithms for systematically\ndiscovering such constraints in non-parametric models. In the case of linear models, Chen et al. (2014)\napplied their aforementioned identi\ufb01cation method to the discovery of overidentifying constraints,\nwhich in some cases are equivalent to the non-parametric constraints enumerated in Tian and Pearl\n(2002b) and Shpitser and Pearl (2008).\nSurprisingly, naively applying algorithms designed for non-parametric models to linear models\nenables the identi\ufb01cation of coef\ufb01cients and constraints that the aforementioned methods developed\nfor linear models are unable to, despite utilizing the additional assumption of linearity. In this paper,\nwe \ufb01rst extend the half-trek identi\ufb01cation method of Foygel et al. (2012) and apply it to the discovery\nof half-trek constraints, which generalize the overidentifying constraints given in Chen et al. (2014).\nOur extensions can be applied to Markovian, semi-Markovian, and non-Markovian models. We then\ndemonstrate how recursive c-component decomposition, which was \ufb01rst utilized in identi\ufb01cation\nalgorithms developed for non-parametric models (Tian, 2002; Huang and Valtorta, 2006; Shpitser\nand Pearl, 2006), can be incorporated into our linear identi\ufb01cation and constraint discovery methods\nfor Markovian and semi-Markovian models. We show that doing so allows the identi\ufb01cation of\nadditional models and constraints. Further, we will demonstrate that, unlike existing algorithms, our\nmethod subsumes the aforementioned identi\ufb01cation and constraint discovery methods developed for\nnon-parametric models when applied to linear SEMs.\n\n2 Preliminaries\n\nA linear structural equation model consists of a set of equations of the form, X = \u039bX + \u0001, where\nX = [x1, ..., xn]t is a vector containing the model variables, \u039b is a matrix containing the coef\ufb01cients\nof the model, which convey the strength of the causal relationships, and \u0001 = [\u00011, ..., \u0001n]t is a vector\nof error terms, which represents omitted or latent variables. The matrix \u039b contains zeroes on the\ndiagonal, and \u039bij = 0 whenever xi is not a cause of xj. The error terms are normally distributed\nrandom variables and induce the probability distribution over the model variables. The covariance\nmatrix of X will be denoted by \u03a3 and the covariance matrix over the error terms, \u0001, by \u2126.\nAn instantiation of a model M is an assignment of values to the model parameters (i.e. \u039b and the\nnon-zero elements of \u2126). For a given instantiation mi, let \u03a3(mi) denote the covariance matrix\nimplied by the model and \u03bbk(mi) be the value of coef\ufb01cient \u03bbk.\nDe\ufb01nition 1. A coef\ufb01cient, \u03bbk, is identi\ufb01ed if for any two instantiations of the model, mi and mj,\nwe have \u03bbk(mi) = \u03bbk(mj) whenever \u03a3(mi) = \u03a3(mj).\n\nIn other words, \u03bbk is identi\ufb01ed if it can be uniquely determined from the covariance matrix, \u03a3. Now,\nwe de\ufb01ne when a structural coef\ufb01cient, \u03bbk, is overidenti\ufb01ed.\nDe\ufb01nition 2. (Pearl, 2004) A coef\ufb01cient, \u03bbk is overidenti\ufb01ed if there are two or more distinct sets of\nlogically independent assumptions in M such that\n\n(i) each set is suf\ufb01cient for deriving \u03bbk as a function of \u03a3, \u03bbk = f (\u03a3),\n(ii) each set induces a distinct function \u03bbk = f (\u03a3), and\n(iii) each assumption set is minimal, that is, no proper subset of those assumptions is suf\ufb01cient for\n\nthe derivation of \u03bbk.\n\nThe causal graph or path diagram of an SEM is a graph, G = (V, D, B), where V are vertices or\nnodes, D directed edges, and B bidirected edges. The vertices represent model variables. Directed\n\n2\n\n\feges represent the direction of causality, and for each coef\ufb01cient \u039bij (cid:54)= 0, an edge is drawn from\nxi to xj. Each directed edge, therefore, is associated with a coef\ufb01cient in the SEM, which we\nwill often refer to as its structural coef\ufb01cient. The error terms, \u0001i, are not represented in the graph.\nHowever, a bidirected edge between two variables indicates that their corresponding error terms\nmay be statistically dependent while the lack of a bidirected edge indicates that the error terms are\nindependent. When the causal graph is acyclic without bidirected edges, then we say that the model\nis Markovian. Graphs with bidirected edges are non-Markovian, while acyclic graphs with bidirected\nedges are additionally called semi-Markovian.\nWe will use standard graph terminology with P a(y) denoting the parents of y, Anc(y) denoting\nthe ancestors of y, De(y) denoting the descendants of y, and Sib(y) denoting the siblings of y, the\nvariables that are connected to y via a bidirected edge. He(E) denotes the heads of a set of directed\nedges, E, while T a(E) denotes the tails. Additionally, for a node v, the set of edges for which\nHe(E) = v is denoted Inc(v). Lastly, we will utilize d-separation (Pearl, 2009).\nLastly, we establish a couple preliminary de\ufb01nitions around half-treks. These de\ufb01nitions and\nillustrative examples can also be found in Foygel et al. (2012) and Chen et al. (2014).\nDe\ufb01nition 3. (Foygel et al., 2012) A half-trek, \u03c0, from x to y is a path from x to y that either begins\nwith a bidirected arc and then continues with directed edges towards y or is simply a directed path\nfrom x to y.\n\nWe will denote the set of nodes that are reachable by half-trek from v htr(v).\nDe\ufb01nition 4. (Foygel et al., 2012) For any half-trek, \u03c0, let Right(\u03c0) be the set of vertices in \u03c0 that\nhave an outgoing directed edge in \u03c0 (as opposed to bidirected edge) union the last node in the trek. In\nother words, if the trek is a directed path then every node in the path is a member of Right(\u03c0). If the\ntrek begins with a bidirected edge then every node other than the \ufb01rst node is a member of Right(\u03c0).\nDe\ufb01nition 5. (Foygel et al., 2012) A system of half-treks, \u03c01, ..., \u03c0n, has no sided intersection if for\nall \u03c0i, \u03c0j \u2208 {\u03c01, ..., \u03c0n} such that \u03c0i (cid:54)= \u03c0j, Right(\u03c0i)\u2229Right(\u03c0j)= \u2205.\nDe\ufb01nition 6. (Chen et al., 2014) For an arbitrary variable, v, let P a1, P a2, ..., P ak be the unique\npartition of Pa(v) such that any two parents are placed in the same subset, P ai, whenever they are\nconnected by an unblocked path (given the empty set). A connected edge set with head v is a set of\ndirected edges from P ai to v for some i \u2208 {1, 2, ..., k}.\n\n3 General Half-Trek Criterion\n\nThe half-trek criterion is a graphical condition that can be used to determine the identi\ufb01ability of\nrecursive and non-recursive linear models (Foygel et al., 2012). Foygel et al. (2012) use the half-trek\ncriterion to identify the model variables one at a time, where each identi\ufb01ed variable may be able\nto aid in the identi\ufb01cation of other variables. If any variable is not identi\ufb01able using the half-trek\ncriterion, then their algorithm returns that the model is not HTC-identi\ufb01able. Otherwise the algorithm\nreturns that the model is identi\ufb01able. Their algorithm subsumes the earlier methods of Brito and Pearl\n(2002) and Brito (2004). In this section, we extend the half-trek criterion to allow the identi\ufb01cation\nof arbitrary subsets of edges belonging to a variable. As a result, our algorithm can be utilized to\nidentify as many coef\ufb01cients as possible, even when the model is not identi\ufb01ed. Additionally, this\nextension improves our ability to identify entire models, as we will show.\nDe\ufb01nition 7. (General Half-Trek Criterion) Let E be a set of directed edges sharing a single head y.\nA set of variables Z satis\ufb01es the general half-trek criterion with respect to E, if\n(i) |Z| = |E|,\n(ii) Z \u2229 (y \u222a Sib(y)) = \u2205,\n(iii) There is a system of half-treks with no sided intersection from Z to T a(E), and\n(iv) (P a(y) \\ T a(E)) \u2229 htr(Z) = \u2205.\n\nA set of directed edges, E, sharing a head y is identi\ufb01able if there exists a set, ZE, that satis\ufb01es\nthe general half-trek criterion (g-HTC) with respect to E, and ZE consists only of \u201callowed\u201d nodes.\nIntuitively, a node z is allowed if Ezy is identi\ufb01ed or empty, where Ezy \u2286 Inc(z) is the set of edges\n\n3\n\n\fFigure 1: The above model is identi\ufb01ed using the g-HTC but not the HTC.\n\nbelonging to z that lie on half-treks from y to z or lie on unblocked paths (given the empty set)\nbetween z and P a(y) \\ T a(E).2 The following de\ufb01nition formalizes this notion.\nDe\ufb01nition 8. A node, z, is g-HT allowed (or simply allowed) for directed edges E with head y if\nEzy = \u2205 or there exists sequences of sets of nodes, (Z1, ...Zk), and sets of edges, (E1, ..., Ek), with\nEzy \u2286 E1 \u222a ... \u222a Ek such that\n(i) Zi satis\ufb01es the g-HTC with respect to Ei for all i \u2208 {1, ..., k},\n(ii) EZ1y1 = \u2205, where yi = He(Ei) for all i \u2208 {1, ..., k}, and\n(iii) EZiyi \u2286 (E1 \u222a ... \u222a Ei\u22121) for all i \u2208 {1, ...k}.\nWhen a set of allowed nodes, ZE, satis\ufb01es the g-HTC for a set of edges E, then we will say that ZE\nis a g-HT admissible set for E.\nTheorem 1. If a g-HT admissible set for directed edges Ey with head y exists then Ey is g-HT\nidenti\ufb01able. Further, let ZEy = {z1, ..., zk} be a g-HT admissible set for Ey, T a(Ey) = {p1, ..., pk},\nand \u03a3 be the covariance matrix of the model variables. De\ufb01ne A as\n\nand b as\n\nAij =\n\nbi =\n\n(cid:26)[(I \u2212 \u039b)T \u03a3]zipj , Eziy (cid:54)= \u2205\n(cid:26)[(I \u2212 \u039b)T \u03a3]ziy, Eziy (cid:54)= \u2205\n\nEziy = \u2205\n\n\u03a3zipj ,\n\nEziy = \u2205\n\n(1)\n\n(2)\n\nThen A is an invertible matrix and A \u00b7 \u039bT a(Ey),y = b.\n\n\u03a3ziy,\n\nProof. See Appendix for proofs of all theorems and lemmas.\n\nThe g-HTC impoves upon the HTC because subsets of a variable\u2019s coef\ufb01cients may be identi\ufb01able\neven when the variable is not. By identifying subsets of a variable\u2019s coef\ufb01cients, we not only allow\nthe identi\ufb01cation of as many coef\ufb01cients as possible in unidenti\ufb01ed models, but we also are able to\nidentify additional models as a whole. For example, Figure 1 is not identi\ufb01able using the HTC. In\norder to identify Y , Z2 needs to be identi\ufb01ed \ufb01rst as it is the only variable with a half-trek to X2\nwithout being a sibling of Y . However, to identify Z2, either Y or W1 needs to be identi\ufb01ed. Finally\nto identify W1, Y needs to be identi\ufb01ed. This cycle implies that the model is not HTC-identi\ufb01able.\nIt is, however, g-HTC identi\ufb01able since the g-HTC allows d to be identi\ufb01ed independently of f,\nusing {Z1} as a g-HT admissible set, which in turn allows {Y } to be a g-HT admissible set for W1\u2019s\ncoef\ufb01cient, a.\nFinding a g-HT admissible set for directed edges, E, with head, y, from a set of allowed nodes, AE,\ncan be accomplished by utilizing the max-\ufb02ow algorithm described in Chen et al. (2014)3, which we\ncall MaxFlow(G, E, AE). This algorithm returns a maximal set of allowed nodes that satis\ufb01es (ii) -\n(iv) of the g-HTC.\n\n. In\nIn some cases, there may be no g-HT admissible set for E\nother cases, there may be no g-HT admissible set of variables for a set of edges E but there may be a\n\nbut there may be one for E \u2282 E\n\n(cid:48)\n\n(cid:48)\n\n2We will continue to use the EZy notation and allow Z to be a set of nodes.\n3Brito (2004) utilized a similar max-\ufb02ow construction in his identi\ufb01cation algorithm.\n\n4\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 2: (a) The graph is not identi\ufb01ed using the g-HTC and cannot be decomposed (b) After\nremoving V6 we are able to decompose the graph (c) Graph for c-component, {V2, V3, V5} (d) Graph\nfor c-component, {V1, V4}\n\n(cid:48)\n\nwith E \u2282 E\n\n(cid:48)\n\ng-HT admissible set of variables for E\n. As a result, if a HT-admissible set does not\nexist for Ey, where Ey = Inc(y) for some node y, we may have to check whether such a set exists\nfor all possible subsets of Ey in order to identify as many coef\ufb01cients in Ey as possible. This process\ncan be somewhat simpli\ufb01ed by noting that if E is a connected edge set with no g-HT admissible set,\nthen there is no superset E\nAn algorithm that utilizes the g-HTC and Theorem 1 to identify as many coef\ufb01cients as possible in\nrecursive or non-recursive linear SEMs is given in the Appendix. Since we may need to check the\nidenti\ufb01ability of all subsets of a node\u2019s edges, the algorithm\u2019s complexity is polynomial time if the\ndegree of each node is bounded.\n\nwith a g-HT admissible set.\n\n(cid:48)\n\n4 Generalizing Overidentifying Constraints\n\nChen et al. (2014) discovered overidentifying constraints by \ufb01nding two HT-admissible sets for\na given connected edge set. When two such sets exist, we obtain two distinct expressions for the\nidenti\ufb01ed coef\ufb01cients, and equating the two expressions gives the overidentifying constraint. However,\nwe may be able to obtain constraints even when |ZE| < |E| and E is not identi\ufb01ed. The algorithm,\nMaxFlow, returns a maximal set, ZE, for which the equations, A \u00b7 \u039bT a(E),y = b, are linearly\nindependent, regardless of whether |ZE| = |E| and E is identi\ufb01ed or not. Therefore, if we are able\nto \ufb01nd an allowed node w that satis\ufb01es the conditions below, then the equation aw \u00b7 \u039bT a(E),y = bw\nwill be a linear combination of the equations, A \u00b7 \u039bT a(E),y = b.\nTheorem 2. Let ZE be a set of maximal size that satis\ufb01es conditions (ii)-(iv) of the g-HTC for a\nset of edges, E, with head y. If there exists a node w such that there exists a half-trek from w to\nTa(E), w /\u2208 (y \u222a Sib(y)), and w is g-HT allowed for E, then we obtain the equality constraint,\nawA\u22121\nWe will call these generalized overidentifying constraints, half-trek constraints or HT-constraints. An\nalgorithm that identi\ufb01es coef\ufb01cients and \ufb01nds HT-constraints for a recursive or non-recursive linear\nSEM is given in the Appendix.\n\nrightb = bw, where A\u22121\n\nright is the right inverse of A.\n\n5 Decomposition\n\nTian showed that the identi\ufb01cation problem could be simpli\ufb01ed in semi-Markovian linear structural\nequation models by decomposing the model into sub-models according to their c-components (Tian,\n2005). Each coef\ufb01cient is identi\ufb01able if and only if it is identi\ufb01able in the sub-model to which it\nbelongs (Tian, 2005). In this section, we show that the c-component decomposition can be applied\nrecursively to the model after marginalizing certain variables. This idea was \ufb01rst used to identify\ninterventional distributions in non-parameteric models by Tian (2002) and Tian and Pearl (2002a)\nand adapting this technique for linear models will allow us to identify models that the g-HTC, even\ncoupled with (non-recursive) c-component decomposition, is unable to identify. Further, it ensures the\nidenti\ufb01cation of all coef\ufb01cients identi\ufb01able using methods developed for non-parametric models\u2013a\nguarantee that none of the existing methods developed for linear models satisfy.\n\n5\n\n\f(cid:48)\n\n(cid:48)\n\nv6\n\nyields two c-components, as shown in Figure 2b.\n\ninduces the distribution(cid:82)\n\nThe graph in Figure 2a consists of a single c-component, and we are unable to decompose it.\nAs a result, we are able to identify a but no other coef\ufb01cients using the g-HTC. Moreover,\nE[v5|do(v6, v4, v3, v2, v1)] is identi\ufb01ed using identi\ufb01cation methods developed for non-\nf = \u2202\n\u2202v4\nparametric models (e.g. do-calculus) but not the g-HTC or other methods developed for linear\nmodels.\nHowever, if we remove v6 from the analysis, then the resulting model can be decomposed. Let M be\nbe a model that is\nthe model depicted in Figure 2a, P (v) be the distribution induced by M, and M\nidentical to M except the equation for v6 is removed. M\nP (V )dv6, and\n(cid:48)\nits associated graph G\n(cid:48)\nNow, decomposing G\naccording to these c-components yields the sub-models depicted by Figures 2c\nand 2d. Both of these sub-models are identi\ufb01able using the half-trek criterion. Thus, all coef\ufb01cients\nother than h have been shown to be identi\ufb01able. Returning to the graph prior to removal, depicted in\nFigure 2a, we are now able to identify h because both v4 and v5 are now allowed nodes for h, and the\nmodel is identi\ufb01ed4.\nAs a result, we can improve our identi\ufb01cation and constraint-discovery algorithm by recursively\ndecomposing, using the g-HTC and Theorem 2, and removing descendant sets5. Note, however,\nthat we must consider every descendant set for removal. It is possible that removing D1 will allow\nidenti\ufb01cation of a coef\ufb01cient but removing a superset D2 with D1 \u2282 D2 will not. Additionally, it is\npossible that removing D2 will allow identi\ufb01cation but removing a subset D1 will not.\nAfter recursively decomposing the graph, if some of the removed variables were unidenti\ufb01ed, we\nmay be able to identify them by returning to the original graph prior to removal since we may have a\nlarger set of allowed nodes. For example, we were able to identify h in Figure 2a by \u201cun-removing\"\nv6 after the other coef\ufb01cients were identi\ufb01ed. In some cases, however, we may need to again\nrecursively decompose and remove descendant sets. As a result, in order to fully exploit the powers\nof decomposition and the g-HTC, we must repeat the recursive decomposition process on the original\nmodel until all marginalized nodes are identi\ufb01ed or no new coef\ufb01cients are identi\ufb01ed in an iteration.\nClearly, recursive decomposition also aids in the discovery of HT-constraints in the same way that it\naids in the identi\ufb01cation of coef\ufb01cients using the g-HTC. However, note that recursive decomposition\nmay also introduce additional d-separation constraints. Prior to decomposition, if a node Z is d-\nseparated from a node V then we trivially obtain the constraint that \u03a3ZV = 0. However, in some\ncases, Z may become d-separated from V after decomposition. In this case, the independence\nconstraint on the covariance matrix of the decomposed c-component corresponds to a non-conditional\nindependence constraint in the original joint distribution P (V ). It is for this reason that we output\nindependence constraints in Algorithm 2 (see Appendix).\nFor example, consider the graph depicted in Figure 3a. Theorem 2 does not yield any constraints for\nthe edges of V7. However, after decomposing the graph we obtain the c-component for {V2, V5, V7},\nshown in Figure 3b. In this graph, V1 is d-separated from V7 yielding a non-independence constraint\nin the original model.\nWe can systematically identify coef\ufb01cients and HT-constraints using recursive c-component decom-\nposition by repeating the following steps for the model\u2019s graph G until the model has been identi\ufb01ed\nor no new coef\ufb01cients are identi\ufb01ed in an iteration:\n(i) Decompose the graph into c-components, {Si}\n(ii) For each c-component, utilize the g-HTC and Theorems 1 and 2 to identify coef\ufb01cients and \ufb01nd\n\nHT-constraints\n\n(iii) For each descendant set, marginalize the descendant set and repeat steps (i)-(iii) until all\n\nvariables have been marginalized\n\nsets to identify h using Theorem 1 since their coef\ufb01cients have been identi\ufb01ed.\n\n4While v4 and v5 are technically not allowed according to De\ufb01nition 8, they can be used in g-HT admissible\n5Only removing descendant sets have the ability to break up components. For example, removing {v2} from\nFigure 2a does not break the c-component because removing v2 would relegate its in\ufb02uence to the error term of\nits child, v3. As a result, the graph of the resulting model would include a bidirected arc between v3 and v6, and\nwe would still have a single c-component.\n\n6\n\n\fFigure 3: (a) V1 cannot be d-separated from V7 (b) V1 is d-separated from V7 in the graph of the\nc-component, {V2, V5, V7}\n\n(a)\n\n(b)\n\nIf a coef\ufb01cient \u03b1 can be identi\ufb01ed using the above method (see also Algorithm 3 in the Appendix,\nwhich utilizes recursive decomposition to identify coef\ufb01cients and output HT-constraints), then we\nwill say that \u03b1 is g-HTC identi\ufb01able.\nWe now show that any direct effect identi\ufb01able using non-parametric methods is also g-HTC identi\ufb01-\nable.\nTheorem 3. Let M be a linear SEM with variables V . Let M\nbe a non-parametric SEM with\nidentical structure to M. If the direct effect of x on y for x, y \u2208 V is identi\ufb01ed in M\nthen the\ncoef\ufb01cient \u039bxy in M is g-HTC identi\ufb01able and can be identi\ufb01ed using Algorithm 3 (see Appendix).\n\n(cid:48)\n\n(cid:48)\n\n6 Non-Parametric Verma Constraints\n\nTian and Pearl (2002b) and Shpitser and Pearl (2008) provided algorithms for discovering Verma\nconstraints in recursive, non-parametric models. In this section, we will show that the constraints\nobtained by the above method and Algorithm 3 (see Appendix) subsume the constraints discovered\nby both methods when applied to linear models. First, we will show that the constraints identi\ufb01ed in\n(Tian and Pearl, 2002b), which we call Q-constraints, are subsumed by HT-constraints. Second, we\nwill show that the constraints given by Shpitser and Pearl (2008), called dormant independences, are,\nin fact, equivalent to the constraints given by Tian and Pearl (2002b) for linear models. As a result,\nboth dormant independences and Q-constraints are subsumed by HT-constraints.\n\n6.1 Q-Constraints\n\nWe refer to the constraints enumerated in (Tian and Pearl, 2002b) as Q-constraints since they are\ndiscovered by identifying Q-factors, which are de\ufb01ned below.\nDe\ufb01nition 9. For any subset, S \u2286 V , the Q-factor, QS, is given by\n\n(cid:90)\n\n(cid:89)\n\nQS =\n\n\u0001S\n\ni|Vi\u2208S\n\nP (vi|pai, \u0001i)P (\u0001S)d\u0001S,\n\n(3)\n\nwhere \u0001S contains the error terms of the variables in S.\n\nA Q-factor, QS, is identi\ufb01able whenever S is a c-component (Tian and Pearl, 2002a).\nLemma 1.\ncomponent, V (i) = {v1, ..., vi}, and V (0) = \u2205.\n\n(Tian and Pearl, 2002a) Let {v1, ..., vn} be sorted topologically, S be a c-\nThen QS can be computed as QS =\n\n(cid:81){i|vi\u2208S} P (vi|V (i\u22121)), j = 1, ..., k.\n\nFor example, consider again Figure 2b. We have that Q1 = P (v1)P (v4|v3, v2, v1) and Q2 =\nP (v2|v1)P (v3|v2, v1)P (v5|v4, v3, v2, v1).\nA Q-factor can also be identi\ufb01ed by marginalizing out descendant sets (Tian and Pearl, 2002a).\nSuppose that QS is identi\ufb01ed and D is a descendant set in GS, then\n\nQSi\\D =\n\nQSi.\n\n(4)\n\nIf the marginalization over D yields additional c-components in the marginalized graph, then we can\nagain compute each of them from QS\\D (Tian and Pearl, 2002b).\n\nD\n\n7\n\n(cid:88)\n\n\fFigure 4: The above graph induces the Verma constraint, Q[v4] is not a function of v1, and equivalently,\nv4 \u22a5 v1|do(v3).\n\nTian\u2019s method recursively computes the Q-factors associated with c-components, marginalizes\ndescendant sets in the graph for the computed Q-factor, and again computes Q-factors associated\nwith c-components in the marginalized graph. The Q-constraint is obtained in the following way.\nThe de\ufb01nition of a Q-factor, QS, given by Equation 3 is a function of P a(S) only. However, the\nequivalent expression given by Lemma 1 and Equation 4 may be functions of additional variables.\nin Figure 4, {v2, v4} is a c-component so we can identify Qv2v4 =\nFor example,\nP (v4|v3, v2, v1)P (v2|v1). The decomposition also makes v2 a leaf node in Gv2v4. As a result,\nP (v4|v3, v2, v1)P (v2|v1)dv2. Since v1 is not a parent of v4 in Gv4, we\n\nwe can identify Qv4 =(cid:82)\nhave that Qv4 =(cid:82)\n\nP (v4|v3, v2, v1)P (v2|v1)dv2 \u22a5 v1.\n\nTheorem 4. Any Q-constraint, QS \u22a5 Z, in a linear SEM, has an equivalent set of HT-constraints\nthat can be discovered using Algorithm 3 (see Appendix).\n\nv2\n\nv2\n\n6.2 Dormant Independences\n\nDormant independences have a natural interpretation as independence and conditional independence\nconstraints within identi\ufb01able interventional distributions (Shpitser and Pearl, 2008). For example,\nin Figure 4, the distribution after intervention on v3 can be represented graphically by removing\nthe edge from v2 to v3 since v3 is no longer a function of v2 but is instead a constant. In the\nresulting graph, v4 is d-separated from v1 implying that v4 is independent of v1 in the distribution,\nP (v4, v2, v1|do(v3)). In other words, P (v4|do(v3), v1) = P (v4|do(v3)). Now, it is not hard to show\nP (v4|v3, v2, v1)P (v2|v1) and we obtain the\nP (v4|v3, v2, v1)P (v2|v1) is not a function of v1, which is exactly the Q-constraint\n\nthat P (v4|v1, do(v3)) is identi\ufb01able and equal to(cid:80)\nconstraint that(cid:80)\n\nv2\nwe obtained above.\nIt turns out that dormant independences among singletons and Q-constraints are equivalent, as stated\nby the following lemma.\nLemma 2. Any dormant independence, x|=y|w, do(Z), with x and y singletons has an equivalent\nQ-constraint and vice versa.\n\nv2\n\nSince pairwise independence implies independence in normal distributions, Lemma 2 and Theorem 4\nimply the following theorem.\nTheorem 5. Any dormant independence among sets, x|=y|W, do(Z), in a linear SEM, has an\nequivalent set of HT-constraints that can be discovered by incorporating recursive c-component\ndecomposition with Algorithm 3 (see Appendix).\n\n7 Conclusion\n\nIn this paper, we extend the half-trek criterion (Foygel et al., 2012) and generalize the notion of\noveridenti\ufb01cation to discover constraints using the generalized half-trek criterion, even when the\ncoef\ufb01cients are not identi\ufb01ed. We then incorporate recursive c-component decomposition and show\nthat the resulting identi\ufb01cation method is able to identify more models and constraints than the\nexisting linear and non-parameteric algorithms.\nFinally, we note that while we were preparing this manuscript for submission, Drton and Weihs\n(2016) independently introduced a similar idea to the recursive decomposition discussed in this paper,\nwhich they called ancestor decomposition. While ancestor decomposition is more ef\ufb01cient, recursive\ndecomposition is more general in that it enables the identi\ufb01cation of a larger set of coef\ufb01cients.\n\n8\n\n\f8 Acknowledgments\n\nI would like to thank Jin Tian and Judea Pearl for helpful comments and discussions. This research\nwas supported in parts by grants from NSF #IIS-1302448 and #IIS-1527490 and ONR #N00014-13-\n1-0153 and #N00014-13-1-0153.\n\nReferences\nBALKE, A. and PEARL, J. (1994). Probabilistic evaluation of counterfactual queries. In Proceedings of the\n\nTwelfth National Conference on Arti\ufb01cial Intelligence, vol. I. MIT Press, Menlo Park, CA, 230\u2013237.\n\nBRITO, C. (2004). Graphical methods for identi\ufb01cation in structural equation models. Ph.D. thesis, Computer\n\nScience Department, University of California, Los Angeles, CA.\nURL {$<$http://ftp.cs.ucla.edu/pub/stat_ser/r314.pdf$>$}\n\nBRITO, C. and PEARL, J. (2002). Generalized instrumental variables. In Uncertainty in Arti\ufb01cial Intelligence,\nProceedings of the Eighteenth Conference (A. Darwiche and N. Friedman, eds.). Morgan Kaufmann, San\nFrancisco, 85\u201393.\n\nCHEN, B. and PEARL, J. (2014). Graphical tools for linear structural equation modeling. Tech. Rep. R-432,\n, Department of Computer Science, University of California,\nLos Angeles, CA. Forthcoming, Psychometrika.\n\nCHEN, B., TIAN, J. and PEARL, J. (2014). Testable implications of linear structual equation models. In\nProceedings of the Twenty-eighth AAAI Conference on Arti\ufb01cial Intelligence (C. E. Brodley and P. Stone,\neds.). AAAI Press, Palo, CA. .\n\nDRTON, M. and WEIHS, L. (2016). Generic identi\ufb01ability of linear structural equation models by ancestor\n\ndecomposition. Scandinavian Journal of Statistics n/a\u2013n/a10.1111/sjos.12227.\nURL http://dx.doi.org/10.1111/sjos.12227\n\nFOYGEL, R., DRAISMA, J. and DRTON, M. (2012). Half-trek criterion for generic identi\ufb01ability of linear\n\nstructural equation models. The Annals of Statistics 40 1682\u20131713.\n\nHUANG, Y. and VALTORTA, M. (2006). Pearl\u2019s calculus of intervention is complete. In Proceedings of the\nTwenty-Second Conference on Uncertainty in Arti\ufb01cial Intelligence (R. Dechter and T. Richardson, eds.).\nAUAI Press, Corvallis, OR, 217\u2013224.\n\nKANG, C. and TIAN, J. (2009). Markov properties for linear causal models with correlated errors. The Journal\n\nof Machine Learning Research 10 41\u201370.\n\nPEARL, J. (2004). Robustness of causal claims. In Proceedings of the Twentieth Conference Uncertainty in\n\nArti\ufb01cial Intelligence (M. Chickering and J. Halpern, eds.). AUAI Press, Arlington, VA, 446\u2013453.\n\nPEARL, J. (2009). Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press, New\n\nYork.\n\nSHPITSER, I. and PEARL, J. (2006). Identi\ufb01cation of conditional interventional distributions. In Proceedings of\nthe Twenty-Second Conference on Uncertainty in Arti\ufb01cial Intelligence (R. Dechter and T. Richardson, eds.).\nAUAI Press, Corvallis, OR, 437\u2013444.\n\nSHPITSER, I. and PEARL, J. (2008). Dormant independence. In Proceedings of the Twenty-Third Conference\n\non Arti\ufb01cial Intelligence. AAAI Press, Menlo Park, CA, 1081\u20131087.\n\nTIAN, J. (2002). Studies in Causal Reasoning and Learning. Ph.D. thesis, Computer Science Department,\n\nUniversity of California, Los Angeles, CA.\n\nTIAN, J. (2005). Identifying direct causal effects in linear models. In Proceedings of the National Conference\non Arti\ufb01cial Intelligence, vol. 20. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.\n\nTIAN, J. (2007). A criterion for parameter identi\ufb01cation in structural equation models. In Proceedings of the\nTwenty-Third Conference Annual Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI-07). AUAI Press,\nCorvallis, Oregon.\n\nTIAN, J. (2009). Parameter identi\ufb01cation in a class of linear structural equation models. In Proceedings of the\n\nTwenty-First International Joint Conference on Arti\ufb01cial Intelligence (IJCAI-09).\n\n9\n\n\fTIAN, J. and PEARL, J. (2002a). A general identi\ufb01cation condition for causal effects. In Proceedings of the\nEighteenth National Conference on Arti\ufb01cial Intelligence. AAAI Press/The MIT Press, Menlo Park, CA,\n567\u2013573.\n\nTIAN, J. and PEARL, J. (2002b). On the testable implications of causal models with hidden variables. In Pro-\nceedings of the Eighteenth Conference on Uncertainty in Arti\ufb01cial Intelligence (A. Darwiche and N. Friedman,\neds.). Morgan Kaufmann, San Francisco, CA, 519\u2013527.\n\nVERMA, T. and PEARL, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth\nConference on Uncertainty in Arti\ufb01cial Intelligence. Cambridge, MA. Also in P. Bonissone, M. Henrion,\nL.N. Kanal and J.F. Lemmer (Eds.), Uncertainty in Arti\ufb01cial Intelligence 6, Elsevier Science Publishers, B.V.,\n255\u2013268, 1991.\n\nWRIGHT, S. (1921). Correlation and causation. Journal of Agricultural Research 20 557\u2013585.\n\n10\n\n\f", "award": [], "sourceid": 869, "authors": [{"given_name": "Bryant", "family_name": "Chen", "institution": "UCLA"}]}