{"title": "Efficient Identification in Linear Structural Causal Models with Instrumental Cutsets", "book": "Advances in Neural Information Processing Systems", "page_first": 12477, "page_last": 12486, "abstract": "One of the most common mistakes made when performing data analysis is attributing causal meaning to regression coefficients. Formally, a causal effect can only be computed if it is identifiable from a combination of observational data and structural knowledge about the domain under investigation (Pearl, 2000, Ch. 5). Building on the literature of instrumental variables (IVs), a plethora of methods has been developed to identify causal effects in linear systems. Almost invariably, however, the most powerful such methods rely on exponential-time procedures. In this paper, we investigate graphical conditions to allow efficient identification in arbitrary linear structural causal models (SCMs). In particular, we develop a method to efficiently find unconditioned instrumental subsets, which are generalizations of IVs that can be used to tame the complexity of many canonical algorithms found in the literature. Further, we prove that determining whether an effect can be identified with TSID (Weihs et al., 2017), a method more powerful than unconditioned instrumental sets and other efficient identification algorithms, is NP-Complete. Finally, building on the idea of flow constraints, we introduce a new and efficient criterion called Instrumental Cutsets (IC), which is able to solve for parameters missed by all other existing polynomial-time algorithms.", "full_text": "Ef\ufb01cient Identi\ufb01cation in Linear Structural\nCausal Models with Instrumental Cutsets\n\nDaniel Kumor\nPurdue University\n\ndkumor@purdue.edu\n\nBryant Chen\n\nBrex Inc.\n\nbryant@brex.com\n\nElias Bareinboim\nColumbia University\neb@cs.columbia.edu\n\nAbstract\n\nOne of the most common mistakes made when performing data analysis is at-\ntributing causal meaning to regression coef\ufb01cients. Formally, a causal effect can\nonly be computed if it is identi\ufb01able from a combination of observational data and\nstructural knowledge about the domain under investigation (Pearl, 2000, Ch. 5).\nBuilding on the literature of instrumental variables (IVs), a plethora of methods\nhas been developed to identify causal effects in linear systems. Almost invariably,\nhowever, the most powerful such methods rely on exponential-time procedures.\nIn this paper, we investigate graphical conditions to allow ef\ufb01cient identi\ufb01cation\nin arbitrary linear structural causal models (SCMs). In particular, we develop\na method to ef\ufb01ciently \ufb01nd unconditioned instrumental subsets, which are gen-\neralizations of IVs that can be used to tame the complexity of many canonical\nalgorithms found in the literature. Further, we prove that determining whether an\neffect can be identi\ufb01ed with TSID (Weihs et al., 2017), a method more powerful\nthan unconditioned instrumental sets and other ef\ufb01cient identi\ufb01cation algorithms,\nis NP-Complete. Finally, building on the idea of \ufb02ow constraints, we introduce a\nnew and ef\ufb01cient criterion called Instrumental Cutsets (IC), which is able to solve\nfor parameters missed by all other existing polynomial-time algorithms.\n\n1\n\nIntroduction\n\nPredicting the effects of interventions is one of the fundamental tasks in the empirical sciences.\nControlled experimentation is considered the \u201cgold standard\u201d in which one physically intervenes in\nthe system and learn about the corresponding effects. In practice, however, experimentation is not\nalways possible due costs, ethical constraints, or technical feasibility \u2013 e.g., a self-driving car should\nnot need to crash to recognize that doing so has negative consequences. In such cases, the agent must\nuniquely determine the effect of an action using observational data and its structural knowledge of\nthe environment. This leads to the problem of identi\ufb01cation (Pearl, 2000; Bareinboim & Pearl, 2016).\nStructural knowledge is usually represented as a structural causal model (SCM)1 (Pearl, 2000), which\nrepresents the set of observed and unobserved variables and their corresponding causal relations. We\nfocus on the problem of generic identi\ufb01cation in linear, acyclic SCMs. In such systems, the value of\neach observed variable is determined by a linear combination of the values of its direct causes along\nwith a latent error term \u0001. This leads to a system of equations X = \u039bT X + \u0001, where X is the vector\nof variables, \u039bT is a lower triangular matrix whose ijth element \u03bbij \u2013 called structural parameter \u2013\nis 0 whenever xi is not a direct cause of xj, and \u0001 is a vector of latent variables.\nMethods for identi\ufb01cation in linear SCMs generally assume that variables are normally distributed\n(Chen & Pearl, 2014), meaning that the observational data can be summarized with a covariance\nmatrix \u03a3. This covariance matrix can be linked to the underlying structural parameters through the\n\n1Such models are also referred to as structural equation models, or SEM, in the literature.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fEf\ufb01cient?\n\n? \u2192 \u0017*\n\n? \u2192 \u0017*\n\nPower\nlow\n\nmedium\nmedium\n\nIdenti\ufb01cation Power & Ef\ufb01ciency\n\nsystem of equations \u03a3 = XX T = (I \u2212 \u039b)\u2212T \u2126(I \u2212 \u039b)\u22121 (1), where \u2126\u2019s elements, \u0001ij, represent\n\u03c3\u0001i\u0001j , and I is the identity matrix (Foygel et al., 2012). The task of causal effect identi\ufb01cation in\nlinear SCMs can, therefore, be seen as solving for the target structural parameter \u03bbij using Eq. (1). If\nthe parameter can be expressed in terms of \u03a3 alone, then it is said to be generically identi\ufb01able.\nSuch systems of polynomial equations can be approached directly through the application of Gr\u00f6bner\nbases (Garc\u00eda-Puente et al., 2010). In practice, however, these methods are doubly-exponential\n(Bardet, 2002) in the number of structural parameters, and become computationally intractable very\nquickly, incapable of handling causal graphs with more than 4 or 5 nodes (Foygel et al., 2012).\nIdenti\ufb01cation in linear SCMs has been a topic of great interest for nearly a century (Wright, 1921),\nincluding much of the early work in econometrics (Wright, 1928; Fisher, 1966; Bowden & Turkington,\n1984; Bekker et al., 1994). The computational aspects of the problem, however, have only more\nrecently received attention from computer scientists and statisticians (Pearl, 2000, Ch. 5).\nSince then, there has been a growing body of literature\ndeveloping successively more sophisticated methods with\nincreasingly stronger identi\ufb01cation power i.e., capable of\ncovering a larger spectrum of identi\ufb01able effects. Deciding\nwhether a certain structural parameter can be identi\ufb01ed in\npolynomial time is currently an open problem.\nThe most popular identi\ufb01cation method found in the lit-\nerature today is known as the instrumental variable (IV)\n(Wright, 1928). A number of extensions of IVs have been\nproposed, including conditional IV (cIV), (Bowden &\nTurkington, 1984; Van der Zander et al., 2015), uncondi-\ntioned instrumental sets (IS) (Brito, 2004), and the half-\ntrek criterion (HTC) (Foygel et al., 2012), all of which are\naccompanied with ef\ufb01cient, polynomial-time algorithms.\nIn contrast, generalized instrumental sets (gIS) (Brito &\nPearl, 2002) were developed as a graphical criterion, with-\nout an ef\ufb01cient algorithm. Van der Zander & Liskiewicz\n(2016) proved that checking existence of conditioning sets\nthat satisfy the gIS given a set of instruments is NP-Hard.\nThey further proposed a simpli\ufb01ed version of the criterion\n(scIS), for which \ufb01nding a conditioning set can be done ef-\n\ufb01ciently. It remains an open problem whether instruments\nsatisfying the gIS can be found in polynomial time.\nThe generalized HTC (gHTC) (Chen, 2016; Weihs et al.,\n2017) and auxiliary variables (AVS) (Chen et al., 2016,\n2017) were developed with algorithms that were polyno-\nmial, provided that the number of incoming edges to each\nnode in the causal graph were bounded. The corresponding algorithms are exponential without this\nrestriction, since they enumerate all subsets of each node\u2019s incident edges.\nMore recently, Weihs et al. (2017) showed how constraints stemming from determinants of minors\nin the covariance matrix (Sullivant et al., 2010) can be exploited for identi\ufb01cation (TSID). Still, the\ncomplexity of their method was left as an open problem. We use the term tsIV to refer to the unnamed\ncriterion underlying the TSID algorithm. Against this background, our contributions are as follows:\n\u2022 We develop an ef\ufb01cient algorithm for \ufb01nding instrumental subsets, which overcomes the\nneed for enumerating all subsets of incoming edges into each node. This leads to ef\ufb01cient\nidenti\ufb01cation algorithms exploiting the gHTC and AVS criteria.\n\nAlgorithm\nIVa\ncIVb\nISd\nscISc\ngISd\nHTCe\ngHTCf,g\nvery high\ncAV & AVSh\nvery high\nvery high\nOur Method\ngAVSh\nvery high\nTSID & gHTCg very high\nGr\u00f6bneri\ncomplete\na Wright (1928);\nb Bowden & Turkington (1984);\nc Van der Zan-\nd Brito & Pearl\nder et al. (2015); Van der Zander & Liskiewicz (2016);\ng Weihs et al. (2017);\n(2002);\nh Chen et al. (2017);\n\u2020 Finding conditioning set for candidates shown to be NP-hard by (c), but\ncomplexity of search is open question\n\u2021 Previous algorithms only polynomial if node input degree bounded\n* One of our contributions is proving NP-Completeness of this method\n\nTable 1: Our contributions in relation\nto existing literature are shown in red;\nOrdered roughly by identi\ufb01cation power:\n? \u2192 represents methods for which we\ndetermined complexity in this work.\n\n? \u2192 \u0013\u2021\n? \u2192 \u0013\u2021\n\ne Foygel et al. (2012);\n\nf Chen (2016);\n\n\u0013\n\u0013c\n\u0013\n\n?\u2020\n\u0013\n\n\u0013\n?\u2020\n\n\u0017\n\nhigh\nhigh\nhigh\n\ni Garc\u00eda-Puente et al. (2010)\n\n\u2022 We prove NP-Completeness of \ufb01nding tsIVs and scIS, which shows they are impractical for\n\nuse in large graphs without constraining their search space.\n\n\u2022 Finally, we introduce a new criterion, called Instrumental Cutsets, and develop an associated\npolynomial-time identi\ufb01cation algorithm. We show that ICs subsume both gHTC and AVS.\n\nFor the sake of clarity, a summary of our results in relation to existing literature is shown in Table 1.\n\n2\n\n\fa\n\na\n\n\u03bbab\n\nb\n\n\u03bbbc\n\nc\n\na\u2022\n\n\u03bb\n\na\n\nb\n\n\u0001bc\n\n\u03bbbc\n\nb\n\n(a)\n\n\u0001aa\n\n\u0001bb\n\n\u0001\n\nb\n\nc \u0001 b c\n\n\u0001cc\n\nc\n\n\u03bbab\n\na\u2019\n\nb\u2019\n\n\u03bbbc\n\nc\u2019\n\n(b)\n\n\u2022\n\nz1\n\nx1 \u2022\n\n\u2022\n\n\u2022\n\nz2\n\nx2\n\n\u2022\n\u2022\nb\n\n(c)\n\n\u2022\nc\n\ny\n\n(d)\n\nFigure 1: Conversion of an instrumental variable (a) into a trek-encoding \ufb02ow graph shown in (b)\nif one ignores the colorings. From it, we can deduce that \u03c3bc = \u03bbab\u0001aa\u03bbab\u03bbbc + \u0001bb\u03bbbc. (c) shows\nanother way of drawing the sets from (b), which facilitates interpretation in more complex settings.\n(d) z1, z2 can be used as an instrumental set to solve for \u03bbx1y.\n\n2 Preliminaries\n\nThe causal graph of an SCM is de\ufb01ned as a triple G = (V, D, B), where V represents the nodes, D\nthe directed edges, and B the bidirected ones. A linear SCM\u2019s graph has a node vi for each variable\nxi, a directed edge between vi and vj for each non-zero \u03bbij, and a bidirected edge between vi and\nvj for each non-zero \u0001ij (Fig. 1a). Each edge in the graph, therefore, corresponds to an unknown\ncoef\ufb01cient, which we call a structural parameter. When clear from the context, we will use \u03bbij and\n\u0001ij to refer to the corresponding directed and bidirected edges in the graph. We de\ufb01ne P a(xi) as the\nset of parents of xi, An(xi) as ancestors of xi, De(xi) as descendants of xi, and Sib(xi) as variables\nconnected to xi with bidirected edges (i.e., variables with latent common causes).\nWe will refer to paths in the graph as \u201cunblocked\" conditioned on a set W (which may be empty), if\nthey contain a collider (a \u2192 b \u2190 c, a \u2194 b \u2194 c, a \u2192 b \u2194 c) only when b \u2208 W \u222aAn(W ), and if they\ndo not otherwise contain vertices from W (see d-separation, Koller & Friedman (2009)). Unblocked\npaths without conditioning do not contain colliders, and are referred to as treks (Sullivant et al., 2010).\nThe computable covariances of observable variables and the unknown structural parameters given in\nEq. (1) have a graphical interpretation in terms of a sum over all treks between nodes in the causal\n\ngraph, namely \u03c3xy =(cid:80) \u03c0(x, y), where \u03c0 is the product of structural parameters along the trek.\n\nSince unblocked paths in the causal graph have a non-trivial relation to arrow directions, we follow\nFoygel et al. (2012) in constructing an alternate DAG, called the \u201c\ufb02ow graph\" (Gf low), which encodes\ntreks as directed paths between nodes (see Fig. 1b, where the blue path shows a trek between A and\nB in Fig. 1a, meaning \u03c3ab = \u0001aa\u03bbab). When referencing the \ufb02ow graph, we call the \u201ctop\u201d nodes (e.g.,\na, b, c in Fig. 1b) the \u201csource nodes\", and the \u201cbottom\u201d nodes (a(cid:48), b(cid:48), c(cid:48)) the \u201csink nodes\".\nTreks between two sets of nodes in G are said to have \u201cno sided intersection\" if they do not intersect\nin Gf low. The red and blue paths of Fig. 1b show such a set from {a, b} to {b(cid:48), c(cid:48)}. Non-intersecting\n2. We visually denote\npath sets are related to minors of the covariance matrix, denoted with \u03a3(a,b),(b,c)\nthe source and sink sets in the original graph with a dot near the top of the node if it is a source, and a\ndot near the bottom if it is a sink. By these conventions, Fig. 1b can be represented by Fig. 1c.\nFor simplicity, we will demonstrate many of our contributions in the context of instrumental sets:\nDe\ufb01nition 2.1. (Brito & Pearl, 2002) A set Z is called an instrumental set (IS) relative to X \u2286\nP a(y) if (i) there exists an unblocked path set without sided intersection between Z and X, and (ii)\nthere does not exist an unblocked path from any z \u2208 Z to y in G with edges X \u2192 y removed.\nIn Fig. 1d, {z1, z2} is an instrumental set relative to {x1, x2}, leading to a system of equations\nsolvable for \u03bbx1y, \u03bbx2y. A conditioning set W can be added to block paths from Z to y, creating the\nsimple conditional IS (scIS). If each zi has its own conditioning set, it is called a generalized IS (gIS).\nA set of identi\ufb01ed structural parameters \u039b\u2217 can be used to create \u201cauxiliary variables\" (Chen et al.,\n2016) which subtract out parents of variables whose effect is known: x\u2217\n\u03bbxj xi\u2208\u039b\u2217 \u03bbxj xixj.\nUsing these variables as instruments leads to AV sets (AVS), which are equivalent to the gHTC3.\nFinally, we will build upon the tsIV, which exploits \ufb02ow constraints in Gf low to identify parameters:\n2Refer to Sullivant et al. (2010) and the Gessel-Viennot-Lindstr\u00f6m Lemma (A.1) (Gessel & Viennot, 1989).\n3Full de\ufb01nitions for the methods mentioned in this section are available in Appendix A.1.\n\ni = xi \u2212(cid:80)\n\n3\n\n\fz2\n\nz3\u2022\n\nx2 \u2022\n\nx3\n\nx4\n\nz1\u2022\n\n\u2022\nx1\n\n\u2022\n\nz1\n\n\u2022\n\nz2\n\nw\n\nx3\n\nx1 \u2022\n\nx2 \u2022\n\nz4\u2022\n\n\u2022\nx5\n\ny\n\n(a)\n\ny\n\n(b)\n\na\n\nb\n\nx\n\ny\n\nc\n\n(c)\n\nz1\u2022\n\n\u2022\nx1\n\n\u2022\nw \u2022\n\n\u2022\n\u2022\n\nz2\n\nx2\n\ny\n\n(d)\n\nFigure 2: (a): only a subset of edges can be identi\ufb01ed with an instrumental set, and \ufb01nding the\nmaximal subset in arbitrary graphs was exponential in previous algorithms. (b): \u03bbx1y could previously\nonly be identi\ufb01ed using tsIVs, which we show are NP-hard to \ufb01nd. (c): \u03bbxy cannot be identi\ufb01ed\nusing tsIVs, but is identi\ufb01ed through iterative application of Theorem 5.1 (\u03bbbx using a, \u03bbyc using x\u2217,\nand \u03bbxy using c\u2217). (d): \u03bbx1y is identi\ufb01able with cAV but is not captured by IC.\n\nDe\ufb01nition 2.2. (Weihs et al., 2017) Sets S, T \u2282 V , |S| = |T| + 1 = k are a tsIV with respect to \u03bbxy\nif (i) De(y) \u2229 T = \u2205 (ii) The max \ufb02ow between S and T (cid:48) \u222a {x(cid:48)} in Gf low is k, (iii) The max \ufb02ow\nbetween S and T (cid:48) \u222a {y(cid:48)} in Gf low with x(cid:48) \u2192 y(cid:48),w(cid:48)\n\ni \u2192 y(cid:48), \u03bbwiy \u2208 \u039b\u2217 removed is less than k.\n\n3 Ef\ufb01ciently Finding Instrumental Subsets\n\nWhen identifying a causal effect, \u03bbx1y, using instrumental sets, it is often the case that no instrument\nexists for x1, but an instrumental set does exist for a subset of y\u2019s parents that includes x1. For\nexample, in Fig. 1d, there does not exist any IV for {x1}, but {z1, z2} is an instrumental set for\n{x1, x2}, allowing identi\ufb01cation of both \u03bbx1y and \u03bbx2y. Likewise, in Fig. 2a, {x1, x2, x5} is the only\nsubset of P a(y) which has a valid instrumental set ({z1, z3, z4}).\nOne method for \ufb01nding sets satisfying a criterion like IS would be to list all subsets of y\u2019s incident\nedges, and for each subset, check if there exist corresponding variables {z1, ..., zk} satisfying all\nrequirements. This is indeed the approach that algorithms developed for the gHTC (Chen, 2016;\nWeihs et al., 2017) and AVS (Chen et al., 2016) take. However, enumerating all subsets is clearly\nexponential in the number structural parameters / edges pointing to y. In this section, we show that\n\ufb01nding this parameter subset can instead be performed in polynomial-time.\nFirst, we de\ufb01ne the concept of \u201cmatch-blocking\", which generalizes the above problem to arbitrary\nsource and sink sets in a DAG, and can be used to create algorithms for \ufb01nding valid subsets applicable\nto IS, the gHTC, AVS, and our own identi\ufb01cation criterion, instrumental cutsets (IC).\nDe\ufb01nition 3.1. Given a directed acyclic graph G = (V, D), a set of source nodes S and sink nodes\nT , the sets Sf \u2286 S and Tf \u2286 T , with |Sf| = |Tf| = k, are called match-blocked iff for each\nsi \u2208 Sf , all elements of T reachable from si are in the set Tf , and the max \ufb02ow between Sf and Tf\nis k in G where each vertex has capacity 1.\n\nTo ef\ufb01ciently \ufb01nd a match-block4, we observe that if a max \ufb02ow is done from a set of variables S to\nT , then any element of T that has 0 \ufb02ow through it cannot be part of a match-block, and therefore\nnone of its ancestors in S can be part of the match-block either:\nTheorem 3.1. Given a directed acyclic graph G = (V, D), a set of source nodes S, sink nodes T ,\nand a max \ufb02ow F from S to T in G with vertex capacity 1, if a node ti \u2208 T has 0 \ufb02ow crossing it in\nF, then there do not exist subsets Sm \u2286 S, Tm \u2286 T where Sm, Tm are match-blocked and ti \u2208 Tm.\nFurthermore, for any match-block (Sm, Tm), we have |Sm \u2229 An(ti)| = 0.\nThis suggests an algorithm for \ufb01nding the match-block: \ufb01nd a max \ufb02ow from S to T , then remove\nelements of T that did not have a \ufb02ow through them, and all of their ancestors from S, and repeat\nuntil no new elements are removed. The procedure, given in Algorithm 1, runs in polynomial time.\n\n4While there exist methods for \ufb01nding solvable subsystems of equations (Duff & Reid, 1978; Sridhar et al.,\n1996; Gon\u00e7alves & Porto, 2016), they cannot be applied to our situation due to the requirement of nonintersecting\npaths in a full arbitrary DAG.\n\n4\n\n\fA\n\n\u03bb\n\na\n\nb\n\n\u0001\n\na\n\nb\n\nA\n\nb\n\n\u2212\u03bb a\n\n\u03bb\n\na\n\nb\n\nB\n\nC\n\nB\u2217\n\n(a)\n\n1\n\nB\n\n(b)\n\nA\n\n\u03bbab\n\nB\n\n\u03bbbc\n\nC\n\n\u0001aa\n\n\u0001\n\na\n\nb\n\nb\n\na\n\n\u0001\n\n1\n\nB\u2217\n\u0001bb\n\n\u0001cc\n\nC\n\nA\u2019\n\n\u03bbab\n\nB\u2019\n\n\u03bbbc\n\nC\u2019\n\n(c)\n\nA\n\nb\n\n\u0001 a\n\nB\u2217\n\n1\n\n\u03bb\n\na\n\nb\n\nB\n\n(d)\n\n\u03bb\n\nbc\n\nA\n\n\u03bbab\n\nB\n\nA\u2217\n\u0001aa\nA(cid:48)\u2217\n1\n\n1\n\n1\n\nb\n\na\n\n\u0001\n\n\u0001\na\n\nb\n\n\u03bbab\n\nB\u2217\n\u0001bb\nB(cid:48)\u2217\n1\n\n\u03bb b c\n\nC\n\nA\u2019\n\nB\u2019\n\n(e)\n\nC\n\nC\u2019\n\n1\nC\u2217\n\u0001cc\nC(cid:48)\u2217\n1\n\nFigure 3: The graph in (a) with edge \u03bbab known has an auxiliary variable B\u2217 shown in (b). (c) shows\na modi\ufb01ed \ufb02ow graph, which encodes the treks from B\u2217 excluding the known edge. This corresponds\nto a new encoding of the AV, shown in (d). Finally, (e) gives the auxiliary \ufb02ow graph as described in\nDe\ufb01nition 3.2\n\nAlgorithm 1 Find Maximal Match-Block given DAG G, source nodes S and target nodes T\n\nfunction MAXMATCHBLOCK(G,S,T)\n\ndo F \u2190 MAXFLOW(G, S, T )\n\nT (cid:48) \u2190 {ti|F has 0 \ufb02ow through ti \u2208 T}\nT \u2190 T \\ T (cid:48)\nS \u2190 S \\ An(T (cid:48))\n\nwhile at least one element of T was removed in this iteration\nreturn (S, T )\n\nend function\n\nThe match-block can be exploited to \ufb01nd instrumental subsets by using Gf low with ancestors of y\nthat don\u2019t have back-paths to siblings of y as S and P a(y)(cid:48) as T . This procedure is shown to \ufb01nd an\nIS if one exists in Corollary A.3, and is implemented in Algorithm 3 of the appendix.\n\n3.1 Extending IS to AVS with the Auxiliary Flow Graph\n\nA match-block operates upon a directed graph. When using instrumental sets, one can convert the\nSCM to the \ufb02ow graph Gf low, but the AVS algorithm exploits auxiliary variables, which are not\nencoded in this graph. The covariance of an auxiliary variable with another variable y can be written:\n\n\u03c3b\u2217y = IE[b\u2217y] = IE\n\n\uf8ee\uf8f0\uf8eb\uf8edb \u2212 (cid:88)\n\n\uf8f6\uf8f8 y\n\n\uf8f9\uf8fb = \u03c3by \u2212 (cid:88)\n\n\u03bbxibxi\n\n\u03bbxib\u2208\u039b\u2217\n\n\u03bbxib\u03c3xiy\n\n\u03bbxib\u2208\u039b\u2217\n\nThis quantity behaves like \u03c3by with treks from b starting with the known \u03bbxib removed. We can\ntherefore construct a \ufb02ow graph which encodes this intuition explicitly using De\ufb01nition 3.2. An\nexample of an auxiliary \ufb02ow graph can be seen in Fig. 3, which uses b\u2217 = b \u2212 \u03bbaba. In Fig. 3c, B\u2217\nno longer has the edge \u03bbab to A, but it still has all other edges, giving it all of the same treks as B,\nexcept the ones subtracted out in the AV. The original variable, B, has an edge with weight 1 to B\u2217,\nmaking its treks identical to the standard \ufb02ow graph. With this new graph, and MAXMATCHBLOCK,\nthe algorithm for instrumental sets can easily be extended to \ufb01nd AVS (Algorithm 4 in appendix),\nwhich in turn is equivalent to the gHTC. We have therefore shown that both of these methods can be\nef\ufb01ciently applied for identi\ufb01cation, without a restriction on number of edges incoming into a node.\nDe\ufb01nition 3.2. (Auxiliary Flow Graph) Given a linear SCM (\u039b, \u2126) with causal graph G =\n(V, D, B), and set of known structural parameters \u039b\u2217, the auxiliary \ufb02ow graph is a weighted\nDAG with vertices V \u222a V \u2217 \u222a V (cid:48) \u222a V (cid:48)\u2217 and edges\n\n1. j \u2192 i and i(cid:48) \u2192 j(cid:48) both with weight \u03bbij if (i \u2192 j) \u2208 D, and \u03bbij \u2208 \u039b\u2217\n2. j\u2217 \u2192 i and i(cid:48) \u2192 j(cid:48)\u2217 both with weight \u03bbij if (i \u2192 j) \u2208 D, and \u03bbij /\u2208 \u039b\u2217\n3. i \u2192 i\u2217 and i(cid:48)\u2217 \u2192 i(cid:48) with weight 1, and i\u2217 \u2192 i(cid:48) with weight \u0001ii for i \u2208 V\n4. i\u2217 \u2192 j(cid:48)\u2217 with weight \u0001ij if (i \u2194 j) \u2208 B\n\n5\n\n\f\u2022\na \u2022\n\n\u2022\n\nz1\n\n\u2022\nx1\n\n\u2022\nb\u2022\n\n\u2022\n\nz2\n\n\u2022\nx2\n\ny\n\n(a)\n\na\n\n\u2022\nb \u2022\n\nz1\n\nz2\n\n\u2022\n\n\u00acc\n\nz3\n\nd\n\ne\n\nz5\n\nz6\n\n\u2022\nc\u2022\n\n\u2022\n\nz4\n\n\u2022\nx2\n\n\u2022\nx1\n\ny\n\n(b)\n\nFigure 4: The model in (a) encodes the basic structure we exploit in our proof. The set {z1, z2} is a\nsimple conditional instrumental set only if z1 \u2194 b and z2 \u2194 a do not exist, since it opens the collider\nfrom z1 through b to y. (b) shows the full encoding of a 1-in-3SAT for (a \u2228 b \u2228 \u00acc) \u2227 (c \u2228 d \u2228 e).\nWe have removed bidirected edges from all top nodes to y in (b) for clarity.\n\nThis graph is referred to as Gaux. The nodes without (cid:48) are called \u201csource\u201d, or \u201ctop\u201d nodes, and the\nnodes with (cid:48) are called \u201csink\u201d or \u201cbottom\u201d nodes. The nodes with \u2217 are called \u201cauxiliary\" nodes.\n\n4 On Searching for tsIVs (Weihs et al., 2017)\n\nThere exist structural parameters that can be identi\ufb01ed using tsIVs, which cannot be found with\ngHTC, AVS, nor any other ef\ufb01cient method. However, there currently does not exist an ef\ufb01cient\nalgorithm for \ufb01nding tsIVs in arbitrary DAGs. This section can be summarized with Corollary 4.1:\nCorollary 4.1. Given an SCM and target structural parameter \u03bbxy, determining whether there exists\na tsIV which can be used to solve for \u03bbxy in G is an NP-Complete problem.\n\nWe show this by encoding 1-in-3SAT, which is NP-Complete (Schaefer, 1978), into a graph, such\nthat \ufb01nding a tsIV is equivalent to solving for a satisfying boolean assignment.\nSince tsIVs can be dif\ufb01cult to intuitively visualize, we will illustrate the ideas behind the proof\nwith simple conditional instrumental sets (scIS), which we also show are NP-Complete to \ufb01nd\n(Corollary A.5). We observe a property of the graph in Fig. 4a: with a \u2194 z2 and z1 \u2194 b removed\n(blue), {z1, z2} can be used as scIS for \u03bbx1y, \u03bbx2y, since their back-paths to y (z1 \u2190 a \u2194 y and\nz2 \u2190 b \u2194 y) can be blocked by conditioning on W = {a, b}. However, if the bidirected edges are\nnot removed, conditioning on a or b opens a path to y using them as a collider (z1 \u2194 b \u2194 y and\nz2 \u2194 a \u2194 y). A simple conditional instrumental set for \u03bbx1y exists in this graph if and only if none\nof the instruments has bidirected edges to another instrument\u2019s required conditioning variable.\nWe exploit this property to construct the graph in Fig. 4b, which repeats the structure from Fig. 4a\nfor each literal li in the 1-in-3SAT formula (a \u2228 b \u2228 \u00acc) \u2227 (c \u2228 d \u2228 e). Each clause is designed so\nthat usage of any potential instrument, zi, precludes usage of the other two potential instruments\nin the clause. For example, the bidirected edges in Fig. 4b from z2 to a and b disallow usage of z1\nand z2 as instruments once z2 is used. Likewise, there are bidirected edges between the c and \u00acc\nstructures, since if c is true, \u00acc cannot be true. Similarly, b has bidirected edges to d and e, since if\na is true, then \u00acc is false and c is true, so d and e must be disabled. Finally, y has 2 parents, x1, x2,\ncorresponding to the two clauses. Each element of Z has edges to all parents of y, meaning that any\nscIS existing in the graph must have as many instruments as there are clauses. Thus, \ufb01nding an scIS\nfor y in this graph corresponds to \ufb01nding a satisfying assignment for the formula. The full procedure\nfor generating the graph, which is the same for both scIV and tsIV, is given in Theorem 4.1.\nTheorem 4.1. Given a boolean formula F in conjunctive normal form, if a graph G is constructed\nas follows, starting from a target node y:\n\n1. For each clause ci \u2208 F , a node xi is added with edges xi \u2192 y and xi \u2194 y\n2. For ci \u2208 F , take each literal lj \u2208 ci, and add nodes zij, wij, with edges wij \u2192 zij,\n\nwij \u2194 y, and zij \u2192 xk \u2200xk\n\n3. For ci \u2208 F , lj, lk \u2208 ci where j (cid:54)= k add bidirected edge zij \u2194 wik\n4. \u2200ci, cm \u2208 F, lj \u2208 ci, ln \u2208 cm with i (cid:54)= m, add a bidirected edge zij \u2194 wmn if\n\n6\n\n\f(a) lj = \u00acln, or\n(b) \u2203lq \u2208 cm with q (cid:54)= n and lj = lq, or\n(c) \u2203lp \u2208 ci, lq \u2208 cm with p (cid:54)= j and q (cid:54)= n where lp = \u00aclq\n\nThen a tsIV exists for \u03bbx1y in G if and only if there is a truth assignment to the variables of F such\nthat there is exactly one true literal in each clause of F .\n\n5 The Instrumental Cutset Identi\ufb01cation Criterion\n\nWhile \ufb01nding a tsIV is NP-hard, it is possible to create a new criterion, which both includes constraints\nfrom the tsIV that can be ef\ufb01ciently found, and exploits knowledge of previously identi\ufb01ed edges\nsimilarly to AVS. This criterion is described in the following theorem.\nTheorem 5.1. (Instrumental Cutset) Let M = (\u039b, \u2126) be a linear SCM with associated causal graph\nG = (V, D, E), a set of identi\ufb01ed structural parameters \u039b\u2217, and a target structural parameter \u03bbxy.\nDe\ufb01ne Gaux as the auxiliary \ufb02ow graph for (G, \u039b\u2217). Suppose that there exist subsets S \u2282 V \u222a V \u2217,\nwith V \u2217 representing the set of AVs, and T \u2286 P a(y\u2217) \\ {x} with |S| = |T| \u2212 1 = k such that\n\n1. There exists a \ufb02ow of size k in Gaux from S to T \u222a {x}\n2. There does not exist a \ufb02ow of size k from S to T \u222a {y} in Gaux with x(cid:48) \u2192 y(cid:48)\u2217 removed\n3. No element of {y} \u222a Sib(y) has a directed path to si \u2208 S in G\n\nthen \u03bbxy is generically identi\ufb01able by the equation:\n\n\u03bbxy =\n\ndet \u03a3S,T\u222a{y\u2217}\ndet \u03a3S,T\u222a{x}\n\nInstrumental Cutsets (ICs) differ from tsIVs in two fundamental ways:\n\n1. We allow auxiliary variables, enabling exploitation of previously identi\ufb01ed structural param-\neters incoming to si \u2208 S for identi\ufb01cation. An example that is identi\ufb01able with ICID, but\nnot with TSID is shown in Fig. 2c.\n\n2. We require that T is a subset of the parents of target node y, and that y has no half-treks to S.\nA version of IC which avoids these requirements is given in Theorem A.4 in the appendix.\nWhile this version is strictly more powerful than tsIV, \ufb01nding satisfying sets can be shown\nto be NP-hard by a modi\ufb01ed version of the arguments given in Section 4 (Appendix A.4.1).\n\n\u03bbx1y in Fig. 2b is an example of a parameter that can be identi\ufb01ed using ICs. To see this, consider\nthe paths from {z1, z2} to y that do not have sided intersection anywhere but at y. One such path set\nis z1 \u2192 x1 \u2192 y and z2 \u2192 w \u2192 x2 \u2192 y. After removing the edge x1 \u2192 y, there no longer exist 2\nseparate nonintersecting paths to y, because the node w forms a bottleneck, or cut, allowing only one\npath to pass to y. According to theorem 5.1, this is suf\ufb01cient to uniquely solve for \u03bbx1y.\nIn contrast, previously known ef\ufb01cient algorithms cannot identify \u03bbx1y. z1, which is the only possible\ninstrument for x1, has unblockable paths to y through x2 and x3 (w cannot be conditioned, since it\nhas a bidirected edge to y). Furthermore, only z2 is a possible additional instrument for x2 or x3,\ngiving 2 candidate instruments {z1, z2} for a set of 3 parents of y, {x1, x2, x3}, all of which need to\nbe matched to an instrument to enable solving for \u03bbx1y.\nIn general, any coef\ufb01cient that can be identi\ufb01ed using the gHTC or AVS can also be identi\ufb01ed using\nICs. ICs, therefore, strictly subsume gHTC and AVS.\nLemma 5.1. If a structural parameter \u03bbxy of linear SCM M is identi\ufb01able using the gHTC or AVS\nthen \u03bbxy is identi\ufb01ed using IC. There also exists a model M(cid:48) such that \u03bbxy is identi\ufb01able using IC,\nbut cannot be identi\ufb01ed using gHTC or AVS.\n\nLastly, we discuss the identi\ufb01cation power of ICs with respect to cAVs (Chen et al., 2017), which\nare single auxiliary conditional instruments, and can be found in polynomial-time. While there are\nmany examples of parameters that ICs, and even the gHTC and AVS, can identify that cAVs cannot,\nit turns out there are also examples that the cAV can identify that ICs cannot. This is because ICs,\n\n7\n\n\fAlgorithm 2 IC solves for edges incoming to y given a set of known edges \u039b\u2217\n\nfunction IC(G, y, \u039b\u2217)\n\nGaux \u2190 AUXILIARYFLOWGRAPH(G, \u039b\u2217)\nT \u2190 all sink-node parents of y(cid:48)\u2217 in Gaux\naux \u2190 Gaux with edges ti \u2208 T to y(cid:48)\u2217 removed\nGy\nS \u2190 Source nodes in Gy\nC \u2190 CLOSESTMINVERTEXCUT(Gaux, S, T )\nSf \u2190 elements of S that have a full \ufb02ow to C\n(Cm, Tm) \u2190 MAXMATCHBLOCK(Gauxwith edges to C removed, C, T )\nTf \u2190 elements of T that are part of a full \ufb02ow between C \\ Cm and T \\ Tm\nreturn (Sf , Tf \u222a Tm, Tm)\n\naux which are not ancestors of y(cid:48)\u2217\n\nend function\n\nwhich operate on sets of variables, do not include conditioning. In Fig. 2d, z1 is a cAV for \u03bbx1y\nwhen conditioned on {w, z2}, but no IC exists, because \u03bbx2x1 cannot be identi\ufb01ed, and therefore the\nback path from x1 through w \u2194 y cannot be eliminated. While a version of ICs with conditioning\ncould be developed, the algorithmic complexity of \ufb01nding parameters identi\ufb01able by such a criterion\nis unclear. A version of ICs with a single conditioning set would be NP-hard to \ufb01nd, which can\nbe shown using a modi\ufb01ed version of our results from Section 4 (see Appendix A.4.1). On the\nother hand, a version with multiple conditioning sets (one for each si \u2208 S) would require additional\nalgorithmic breakthroughs, due to its similarity to the as-yet unsolved gIS.\n\n5.1 Ef\ufb01cient Algorithm for Finding Instrumental Cutsets\n\nTo demonstrate ef\ufb01ciency of IC, we develop a polynomial-time algorithm that \ufb01nds all structural\nparameters identi\ufb01able through iterative application of Theorem 5.1. To do so, we show that the\nconditions required by Theorem 5.1 can be reduced to \ufb01nding a match-block in Gaux:\nTheorem 5.2. Given directed graph G = (V, D), a target edge x \u2192 y, a set of \u201ccandidate sources\"\nS, and the vertex min-cut C between S and P a(y) closest to P a(y), then there exist subsets Sf \u2286 S\nand Tf \u2286 P a(y) where |Sf| = |Tf| + 1 = k such that\n\n1. the max-\ufb02ow from Sf to Tf \u222a {x} is k in G, and\n2. the max-\ufb02ow from Sf to Tf \u222a {y} in G(cid:48) where x \u2192 y is removed is k \u2212 1\n\nif and only if x is part of a match-block between C and P a(y) in G with all edges incoming to ci \u2208 C\nremoved.\n\nNote that the \u201cclosest min-cut\u201d C required by Theorem 5.2 can be found using the Ford-Fulkerson\nalgorithm with P a(y) as source and S as sink (Picard & Queyranne, 1982).\nTheorem 5.2 was proven by explicitly constructing the sets Sf and Tf using a match-block. The\nprocedure for doing so is given in Algorithm 2. It works by \ufb01nding a set Sf which has a full \ufb02ow to\nC, which in turn has a match-block to P a(y) (due to the requirement that none of the Sf have paths\nto y through Sib(y) \u2194 y). The min-cut ensures that once x \u2192 y is removed, all paths to y must still\ngo through the set C, and the match-block from C to P a(y) ensures that there is no way to reorder\nthe paths to create a \ufb02ow to y through a different parent. This guarantees that the \ufb02ow constraints are\nsatis\ufb01ed, so there is a corresponding IC. The full algorithm for \ufb01nding all edges identi\ufb01able with ICs\ncan be constructed by recursively applying the procedure on the auxiliary \ufb02ow graph, as shown in\nAlgorithm 5 (ICID)5.\n\n6 Conclusion\n\nWe have developed a new, polynomial-time algorithm for identi\ufb01cation in linear SCMs. Previous\nalgorithms with similar identi\ufb01cation power had either exponential or unknown complexity, with\nexisting implementations using exponential components. Finally, we also showed that the promising\nmethod called tsIV cannot handle arbitrarily large graphs due to its inherent computational complexity.\n\n5A Python implementation is available at https://github.com/dkumor/instrumental-cutsets\n\n8\n\n\fAcknowledgements\n\nBareinboim and Kumor are supported in parts by grants from NSF IIS-1704352, IIS-1750807\n(CAREER), IBM Research, and Adobe Research. Part of Chen\u2019s contributions were made while at\nIBM Research.\n\nReferences\nBardet, Magali. On the Complexity of a Gr\u00f6bner Basis Algorithm. pp. 8, 2002.\n\nBareinboim, Elias and Pearl, Judea. Causal inference and the data-fusion problem. Proceedings of\n\nthe National Academy of Sciences, 113(27):7345\u20137352, July 2016.\n\nBekker, Paul A., Merckens, Arjen, and Wansbeek, Tom J. Identi\ufb01cation, Equivalent Models, and\n\nComputer Algebra. Academic Press, 1994.\n\nBowden, Roger J and Turkington, Darrell A. Instrumental Variables, volume 8. Cambridge University\n\nPress, 1984.\n\nBrito, Carlos and Pearl, Judea. Generalized instrumental variables. In Proceedings of the Eighteenth\nConference on Uncertainty in Arti\ufb01cial Intelligence, pp. 85\u201393. Morgan Kaufmann Publishers Inc.,\n2002.\n\nBrito, Carlos Eduardo Fisch. Graphical Methods for Identi\ufb01cation in Structural Equation Models.\n\nPhD Thesis, University of California Los Angeles, CA 90095-1596, USA, 2004.\n\nChen, B. J. Pearl, and Bareinboim, E. Incorporating knowledge into structural equation models\nusing auxiliary variables. In Kambhampati, S. (ed.), Proceedings of the Twenty-\ufb01fth International\nJoint Conference on Arti\ufb01cial Intelligence, pp. 3577\u20133583. AAAI Press, New York, NY, 2016.\n\nChen, Bryant. Identi\ufb01cation and Overidenti\ufb01cation of Linear Structural Equation Models. In Lee,\nD. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Advances in Neural\nInformation Processing Systems 29, pp. 1587\u20131595. Curran Associates, Inc., 2016.\n\nChen, Bryant and Pearl, Judea. Graphical tools for linear structural equation modeling. Technical\n\nreport, DTIC Document, 2014.\n\nChen, Bryant, Kumor, Daniel, and Bareinboim, Elias. Identi\ufb01cation and Model Testing in Linear\nStructural Equation Models Using Auxiliary Variables. In Proceedings of the 34th International\nConference on Machine Learning - Volume 70, ICML\u201917, pp. 757\u2013766. JMLR.org, 2017.\n\nDraisma, Jan, Sullivant, Seth, and Talaska, Kelli. Positivity for Gaussian graphical models. Advances\nin Applied Mathematics, 50(5):661\u2013674, May 2013. ISSN 0196-8858. doi: 10.1016/j.aam.2013.\n03.001.\n\nDuff, I. S. and Reid, J. K. An Implementation of Tarjan\u2019s Algorithm for the Block Triangularization\n\nof a Matrix. ACM Transactions on Mathematical Software, 4(2):137\u2013147, June 1978.\n\nFisher, Franklin M. The Identi\ufb01cation Problem in Econometrics. McGraw-Hill, 1966.\n\nFoygel, Rina, Draisma, Jan, and Drton, Mathias. Half-trek criterion for generic identi\ufb01ability of\n\nlinear structural equation models. The Annals of Statistics, pp. 1682\u20131713, 2012.\n\nGarc\u00eda-Puente, Luis D., Spielvogel, Sarah, and Sullivant, Seth. Identifying Causal Effects with\nComputer Algebra. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Arti\ufb01cial\nIntelligence, UAI\u201910, pp. 193\u2013200, Arlington, Virginia, United States, 2010. AUAI Press. ISBN\n978-0-9749039-6-5.\n\nGessel, Ira M and Viennot, Xavier. Determinants, paths, and plane partitions. 1989.\n\nGon\u00e7alves, Bernardo and Porto, Fabio. A note on the complexity of the causal ordering problem.\n\nArti\ufb01cial Intelligence, 238:154\u2013165, 2016.\n\n9\n\n\fKoller, Daphne and Friedman, Nir. Probabilistic Graphical Models: Principles and Techniques. MIT\n\npress, 2009.\n\nPearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000.\n\n2nd edition, 2009.\n\nPicard, Jean-Claude and Queyranne, Maurice. On the structure of all minimum cuts in a network and\n\napplications. Mathematical Programming, 22(1):121\u2013121, December 1982.\n\nSchaefer, Thomas J. The complexity of satis\ufb01ability problems. In Proceedings of the Tenth Annual\nACM Symposium on Theory of Computing - STOC \u201978, pp. 216\u2013226, San Diego, California, United\nStates, 1978. ACM Press.\n\nSridhar, Natarajan, Agrawal, Rajiv, and Kinzel, Gary L. Algorithms for the structural diagnosis\nand decomposition of sparse, underconstrained design systems. Computer-Aided Design, 28(4):\n237\u2013249, April 1996.\n\nSullivant, Seth, Talaska, Kelli, and Draisma, Jan. Trek separation for Gaussian graphical models. The\n\nAnnals of Statistics, pp. 1665\u20131685, 2010.\n\nVan der Zander, Benito and Liskiewicz, Maciej. On Searching for Generalized Instrumental Variables.\nIn Proceedings of the 19th International Conference on Arti\ufb01cial Intelligence and Statistics\n(AISTATS-16), 2016.\n\nVan der Zander, Benito, Textor, Johannes, and Liskiewicz, Maciej. Ef\ufb01ciently Finding Conditional\nIJCAI 2015, Proceedings of the 24th International Joint\n\nInstruments for Causal Inference.\nConference on Arti\ufb01cial Intelligence, 2015.\n\nWeihs, Luca, Robinson, Bill, Dufresne, Emilie, Kenkel, Jennifer, Kubjas, Reginald McGee\nII Kaie, Reginald, McGee II, Nguyen, Nhan, Robeva, Elina, and Drton, Mathias. Determi-\nnantal Generalizations of Instrumental Variables. Journal of Causal Inference, 6(1), 2017. doi:\n10.1515/jci-2017-0009.\n\nWright, Philip G. Tariff on Animal and Vegetable Oils. Macmillan Company, New York, 1928.\n\nWright, Sewall. Correlation and causation. Journal of agricultural research, 20(7):557\u2013585, 1921.\n\n10\n\n\f", "award": [], "sourceid": 6781, "authors": [{"given_name": "Daniel", "family_name": "Kumor", "institution": "Purdue University"}, {"given_name": "Bryant", "family_name": "Chen", "institution": "Brex"}, {"given_name": "Elias", "family_name": "Bareinboim", "institution": "Purdue"}]}