{"title": "Local Causal Discovery of Direct Causes and Effects", "book": "Advances in Neural Information Processing Systems", "page_first": 2512, "page_last": 2520, "abstract": "We focus on the discovery and identification of direct causes and effects of a target variable in a causal network. State-of-the-art algorithms generally need to find the global causal structures in the form of complete partial directed acyclic graphs in order to identify the direct causes and effects of a target variable. While these algorithms are effective, it is often unnecessary and wasteful to find the global structures when we are only interested in one target variable (such as class labels). We propose a new local causal discovery algorithm, called Causal Markov Blanket (CMB), to identify the direct causes and effects of a target variable based on Markov Blanket Discovery. CMB is designed to conduct causal discovery among multiple variables, but focuses only on finding causal relationships between a specific target variable and other variables. Under standard assumptions, we show both theoretically and experimentally that the proposed local causal discovery algorithm can obtain the comparable identification accuracy as global methods but significantly improve their efficiency, often by more than one order of magnitude.", "full_text": "Local Causal Discovery of Direct Causes and Effects\n\nTian Gao\n\nQiang Ji\n\nDepartment of ECSE\n\nRensselaer Polytechnic Institute, Troy, NY 12180\n\n{gaot, jiq}@rpi.edu\n\nAbstract\n\nWe focus on the discovery and identi\ufb01cation of direct causes and effects of a target\nvariable in a causal network. State-of-the-art causal learning algorithms generally\nneed to \ufb01nd the global causal structures in the form of complete partial directed\nacyclic graphs (CPDAG) in order to identify direct causes and effects of a target\nvariable. While these algorithms are effective, it is often unnecessary and wasteful\nto \ufb01nd the global structures when we are only interested in the local structure of\none target variable (such as class labels). We propose a new local causal discov-\nery algorithm, called Causal Markov Blanket (CMB), to identify the direct causes\nand effects of a target variable based on Markov Blanket Discovery. CMB is de-\nsigned to conduct causal discovery among multiple variables, but focuses only on\n\ufb01nding causal relationships between a speci\ufb01c target variable and other variables.\nUnder standard assumptions, we show both theoretically and experimentally that\nthe proposed local causal discovery algorithm can obtain the comparable identi\ufb01-\ncation accuracy as global methods but signi\ufb01cantly improve their ef\ufb01ciency, often\nby more than one order of magnitude.\n\n1\n\nIntroduction\n\nCausal discovery is the process to identify the causal relationships among a set of random variables.\nIt not only can aid predictions and classi\ufb01cations like feature selection [4], but can also help pre-\ndict consequences of some given actions, facilitate counter-factual inference, and help explain the\nunderlying mechanisms of the data [13]. A lot of research efforts have been focused on predict-\ning causality from observational data [13, 18]. They can be roughly divided into two sub-areas:\ncausal discovery between a pair of variables and among multiple variables. We focus on multivari-\nate causal discovery, which searches for correlations and dependencies among variables in causal\nnetworks [13]. Causal networks can be used for local or global causal prediction, and thus they can\nbe learned locally and globally. Many causal discovery algorithms for causal networks have been\nproposed, and the majority of them belong to global learning algorithms as they seek to learn global\ncausal structures. The Spirtes-Glymour-Scheines (SGS) [18] and Peter-Clark (P-C) algorithm [19]\ntest for the existence of edges between every pair of nodes in order to \ufb01rst \ufb01nd the skeleton, or\nundirected edges, of causal networks and then discover all the V-structures, resulting in a partially\ndirected acyclic graph (PDAG). The last step of these algorithms is then to orient the rest of edges\nas much as possible using Meek rules [10] while maintaining consistency with the existing edges.\nGiven a causal network, causal relationships among variables can be directly read off the structure.\nDue to the complexity of the P-C algorithm and unreliable high order conditional independence tests\n[9], several works [23, 15] have incorporated the Markov Blanket (MB) discovery into the causal\ndiscovery with a local-to-global approach. Growth and Shrink (GS) [9] algorithm uses the MBs\nof each node to build the skeleton of a causal network, discover all the V-structures, and then use\nthe Meek rules to complete the global causal structure. The max-min hill climbing (MMHC) [23]\nalgorithm also \ufb01nds MBs of each variable \ufb01rst, but then uses the MBs as constraints to reduce the\nsearch space for the score-based standard hill climbing structure learning methods. In [15], authors\n\n1\n\n\fuse Markov Blanket with Collider Sets (CS) to improve the ef\ufb01ciency of the GS algorithm by com-\nbining the spouse and V-structure discovery. All these local-to-global methods rely on the global\nstructure to \ufb01nd the causal relationships and require \ufb01nding the MBs for all nodes in a graph, even\nif the interest is the causal relationships between one target variable and other variables. Differ-\nent MB discovery algorithms can be used and they can be divided into two different approaches:\nnon-topology-based and topology-based. Non-topology-based methods [5, 9], used by CS and GS\nalgorithms, greedily test the independence between each variable and the target by directly using the\nde\ufb01nition of Markov Blanket. In contrast, more recent topology-based methods [22, 1, 11] aim to\nimprove the data ef\ufb01ciency while maintaining a reasonable time complexity by \ufb01nding the parents\nand children (PC) set \ufb01rst and then the spouses to complete the MB.\nLocal learning of causal networks generally aims to identify a subset of causal edges in a causal\nnetwork. Local Causal Discovery (LCD) algorithm and its variants [3, 17, 7] aim to \ufb01nd causal edges\nby testing the dependence/independence relationships among every four-variable set in a causal\nnetwork. Bayesian Local Causal Discovery (BLCD) [8] explores the Y-structures among MB nodes\nto infer causal edges [6]. While LCD/BLCD algorithms aim to identify a subset of causal edges via\nspecial structures among all variables, we focus on \ufb01nding all the causal edges adjacent to one target\nvariable. In other words, we want to \ufb01nd the causal identities of each node, in terms of direct causes\nand effects, with respect to one target node. We \ufb01rst use Markov Blankets to \ufb01nd the direct causes\nand effects, and then propose a new Causal Markov Blanket (CMB) discovery algorithm, which\ndetermines the exact causal identities of MB nodes of a target node by tracking their conditional\nindependence changes, without \ufb01nding the global causal structure of a causal network. The proposed\nCMB algorithm is a complete local discovery algorithm and can identify the same direct causes and\neffects for a target variable as global methods under standard assumptions. CMB is more scalable\nthan global methods, more ef\ufb01cient than local-to-global methods, and is complete in identifying\ndirect causes and effects of one target while other local methods are not.\n\n2 Backgrounds\nWe use V to represent the variable space, capital letters (such as X, Y ) to represent variables, bold\nletters (such as Z, MB) to represent variable sets, and use |Z| to represent the size of set Z. X \u22a5\u22a5 Y\nand X \u22a5\\\u22a5 Y represent independence and dependence between X and Y , respectively. We assume\nreaders are familar with related concepts in causal network learning, and only review a few major\nones here. In a causal network or causal Bayesian Network [13], nodes correspond to the random\nvariables in a variable set V. Two nodes are adjacent if they are connected by an edge. A directed\nedge from node X to node Y , (X, Y ) \u2208 V, indicates X is a parent or direct cause of Y and Y is\na child or direct effect of X [12]. Moreover, If there is a directed path from X to Y , then X is an\nancestor of Y and Y is a descendant of X. If nonadjacent X and Y have a common child, X and\nY are spouses. Three nodes X, Y , and Z form a V-structure [12] if Y has two incoming edges from\nX and Z, forming X \u2192 Y \u2190 Z, and X is not adjacent to Z. Y is a collider in a path if Y has two\nincoming edges in this path. Y with nonadjacent parents X and Z is an unshielded collider. A path\nJ from node X and Y is blocked [12] by a set of nodes Z, if any of following holds true: 1) there is\na non-collider node in J belonging to Z. 2) there is a collider node C on J such that neither C nor\nany of its descendants belong to Z. Otherwise, J is unblocked or active.\nA PDAG is a graph which may have both undirected and directed edges and has at most one edge\nbetween any pair of nodes [10]. CPDAGs [2] represent Markov equivalence classes of DAGs, captur-\ning the same conditional independence relationships with the same skeleton but potentially different\nedge orientations. CPDAGs contain directed edges that has the same orientation for every DAG in\nthe equivalent class and undirected edges that have reversible orientations in the equivalent class.\nLet G be the causal DAG of a causal network with variable set V and P be the joint probability dis-\ntribution over variables in V. G and P satisfy Causal Markov condition [13] if and only if, \u2200X \u2208 V,\nX is independent of non-effects of X given its direct causes. The causal faithfulness condition [13]\nstates that G and P are faithful to each other, if all and every independence and conditional indepen-\ndence entailed by P is present in G. It enables the recovery of G from sampled data of P . Another\nwidely-used assumption by existing causal discovery algorithms is causal suf\ufb01ciency [12]. A set of\nvariables X \u2286 V is causally suf\ufb01cient, if no set of two or more variables in X shares a common\ncause variable outside V. Without causal suf\ufb01ciency assumption, latent confounders between adja-\ncent nodes would be modeled by bi-directed edges [24]. We also assume no selection bias [20] and\n\n2\n\n\fwe can capture the same independence relationships among variables from the sampled data as the\nones from the entire population.\nMany concepts and properties of a DAG hold in causal networks, such as d-separation and MB.\nA Markov Blanket [12] of a target variable T , MBT , in a causal network is the minimal set of\nnodes conditioned on which all other nodes are independent of T , denoted as X \u22a5\u22a5 T|MBT ,\u2200X \u2286\n{V\\ T}\\ MBT . Given an unknown distribution P that satis\ufb01ed the Markov condition with respect\nto an unknown DAG G0, Markov Blanket Discovery is the process used to estimate the MB of a\ntarget node in G0, from independently and identically distributed (i.i.d) data D of P . Under the\ncausal faithfulness assumption between G0 and P , the MB of a target node T is unique and is the\nset of parents, children, and spouses of T (i.e., other parents of children of T ) [12]. In addition, the\nparents and children set of T , PCT , is also unique. Intuitively, the MB can directly facilitate causal\ndiscovery. If conditioning on the MB of a target variable T renders a variable X independent of\nT , then X cannot be a direct cause or effect of T . From the local causal discovery point of view,\nalthough MB may contain nodes with different causal relationships with the target, it is reasonable\nto believe that we can identify their relationships exactly, up to the Markov equivalence, with further\ntests.\nLastly, exiting causal network learning algorithms all use three Meek rules [10], which we assume\nthe readers are familiar with, to orient as many edges as possible given all V-structures in PDAGs to\nobtain CPDAG. The basic idea is to orient the edges so that 1) the edge directions do not introduce\nnew V-structures, 2) preserve the no-cycle property of a DAG, and 3) enforce 3-fork V-structures.\n\n3 Local Causal Discovery of Direct Causes and Effects\n\nExisting MB discovery algorithms do not directly offer the exact causal identities of the learned MB\nnodes of a target. Although the topology-based methods can \ufb01nd the PC set of the target within\nthe MB set, they can only provide the causal identities of some children and spouses that form v-\nstructures. Nevertheless, following existing works [4, 15], under standard assumptions, every PC\nvariable of a target can only be its direct cause or effect:\nTheorem 1. Causality within a MB. Under the causal faithfulness, suf\ufb01ciency, correct indepen-\ndence tests, and no selection bias assumptions, the parent and child nodes within a target\u2019s MB set\nin a causal network contains all and only the direct causes and effects of the target variable.\n\nThe proof can be directly derived from the PC set de\ufb01nition of a causal network. Therefore, using\nthe topology-based MB discovery methods, if we can discover the exact causal identities of the PC\nnodes within the MB, causal discovery of direct causes and effects of the target can therefore be\nsuccessfully accomplished.\nBuilding on MB discovery, we propose a new local causal discovery algorithm, Causal Markov\nBlanket (CMB) discovery as shown in Algorithm 1. It identi\ufb01es the direct causes and effects of a\ntarget variable without the need of \ufb01nding the global structure or the MBs of all other variables in\na causal network. CMB has three major steps: 1) to \ufb01nd the MB set of the target and to identify\nsome direct causes and effects by tracking the independence relationship changes among a target\u2019s\nPC nodes before and after conditioning on the target node, 2) to repeat Step 1 but conditioned on\none PC node\u2019s MB set, and 3) to repeat Step 1 and 2 with unidenti\ufb01ed neighboring nodes as new\ntargets to identify more direct causes and effects of the original target.\nStep 1: Initial identi\ufb01cation. CMB \ufb01rst \ufb01nds the MB nodes of a target T , MBT , using a topology-\nbased MB discovery algorithm that also \ufb01nds PCT . CMB then uses the CausalSearch subroutine,\nshown in Algorithm 2, to get an initial causal identities of variables in PCT by checking every\nvariable pair in PCT according to Lemma 1.\nLemma 1. Let (X, Y ) \u2208 PCT , the PC set of the target T \u2208 V in a causal DAG. The independence\nrelationships between X and Y can be divided into the following four conditions:\nC1 X \u22a5\u22a5 Y and X \u22a5\u22a5 Y |T ; this condition can not happen.\nC2 X \u22a5\u22a5 Y and X \u22a5\\\u22a5 Y |T \u21d2 X and Y are both the parents of T .\nC3 X \u22a5\\\u22a5 Y and X \u22a5\u22a5 Y |T \u21d2 at least one of X and Y is a child of T .\nC4 X \u22a5\\\u22a5 Y and X \u22a5\\\u22a5 Y |T \u21d2 their identities are inconclusive and need further tests.\n\n3\n\n\fAlgorithm 1 Causal Markov Blanket Discovery Algorithm\n1: Input: D: Data; T : target variable\n2: Output: IDT :\n\nthe causal identities of all\n\n13:\n\nnodes with respect to T\n{Step 1: Establish initial ID }\n3: IDT = zeros(|V|, 1);\n4: (MBT , PCT ) \u2190 F indM B(T,D);\n5: Z \u2190 \u2205;\n6: IDT \u2190 CausalSearch(D, T, PCT , Z, IDT )\n7: for one X in each pair (X, Y ) with idT = 4 do\n8: MBX \u2190 F indM B(X,D);\n9:\n10:\n\n{Step 2: Further test variables with idT = 4}\n\nZ \u2190 {MBX \\ T} \\ Y ;\nIDT \u2190 CausalSearch(D, T, PCT , Z, IDT );\nif no element of IDT is equal to 4, break;\n\n11:\n12: for every pair of parents (X, Y ) of T do\n\nif \u2203Z s.t. (X, Z) and (Y, Z) are idT = 4 pairs\nthen\n\nIDT (Z) = 1;\n\n14:\n15: IDT (X) \u2190 3,\u2200X that IDT (X) = 4;\n16: for each X with idT = 3 do\n17:\n\n{Step 3: Resolve variable set with idT = 3}\n\nRecursively \ufb01nd IDX, without going back to\nthe already queried variables;\nupdate IDT according to IDX;\nif IDX (T ) = 2 then\n\nIDT (X) = 1;\nfor every Y in idT = 3 variable pairs\n(X, Y ) do\n\n18:\n19:\n20:\n21:\n\nIDT (Y ) = 2;\n\n22:\n23:\n24: Return: IDT\n\nif no element of IDT is equal to 3, break;\n\n13:\n14:\n15:\n16:\n17:\n18:\n19:\n20:\n21:\n\nAlgorithm 2 CausalSearch Subroutine\n1: Input: D: Data; T : target variable; PCT :\nthe PC set of T ; Z: the conditioned variable\nset; ID: current ID\n\n2: Output: IDT : the new causal identities of\n\nall nodes with respect to T\n{Step 1: Single PC }\n3: if |PCT| = 1 then\nIDT (PCT ) \u2190 3;\n4:\n{Step 2: Check C2 & C3}\n5: for every X, Y \u2208 PCT do\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n\nIDT (X) \u2190 1; IDT (Y ) \u2190 1;\nif IDT (X) = 1 then\nelse if IDT (Y ) (cid:54)= 2 then\n\nIDT (Y ) \u2190 2\nIDT (Y ) \u2190 3\n\nif X \u22a5\u22a5 Y |Z and X \u22a5\\\u22a5 Y |T \u222a Z then\nelse if X \u22a5\\\u22a5 Y |Z and X \u22a5\u22a5 Y |T\u222aZ then\n\nif IDT (Y ) = 1 then\nelse if IDT (X) (cid:54)= 2 then\n\nIDT (X) \u2190 2\nIDT (X) \u2190 3\n\nadd (X, Y ) to pairs with idT = 3\n\nelse\n\nif IDT (X) & IDT (Y ) = 0 or 4 then\n\nIDT (X) \u2190 4; IDT (Y ) \u2190 4\nadd (X, Y ) to pairs with idT = 4\n\n{Step 3: identify idT = 3 pairs with known\nparents}\n\n22: for every X such that IDT (X) = 1 do\n23:\n\nfor every Y in idT = 3 variable pairs\n(X, Y ) do\n\nIDT (Y ) \u2190 2;\n\n24:\n25: Return: IDT\n\nC1 does not happen because the path X \u2212 T \u2212 Y is unblocked either not given T or given T , and\nthe unblocked path makes X and Y dependent on each other. C2 implies that X and Y form a\nV-structure with T as the corresponding collider, such as node C in Figure 1a which has two parents\nA and B. C3 indicates that the paths between X and Y are blocked conditioned on T , which means\nthat either one of (X, Y ) is a child of T and the other is a parent, or both of (X, Y ) are children of\nT . For example, node D and F in Figure 1a satisfy this condition with respect to E. C4 shows that\nthere may be another unblocked path from X and Y besides X \u2212 T \u2212 Y . For example, in Figure\n1b, node D and C have multiple paths between them besides D \u2212 T \u2212 C. Further tests are needed\nto resolve this case.\nNotation-wise, we use IDT to represent the causal identities for all the nodes with respect to T ,\nIDT (X) as variable X\u2019s causal identity to T , and the small case idT as the individual ID of a node\nto T . We also use IDX to represent the causal identities of nodes with respect to node X. To avoid\nchanging the already identi\ufb01ed PCs, CMB establishes a priority system1. We use the idT = 1 to\nrepresent nodes as the parents of T , idT = 2 children of T , idT = 3 to represent a pair of nodes that\ncannot be both parents (and/or ambiguous pairs from Markov equivalent structures, to be discussed\nat Step 2), and idT = 4 to represent the inconclusiveness. A lower number id cannot be changed\n\n1Note that the identi\ufb01cation number is slightly different from the condition number in Lemma 1.\n\n4\n\n\fFigure 1: a) A Sample Causal Network. b) A Sample Network with C4 nodes. The only active path\nbetween D and C conditioned on MBC \\ {T, D} is D \u2212 T \u2212 C.\n\ninto a higher number (shown by Line 11\u223c15 of Algorithm 2). If a variable pair satis\ufb01es C2, they\nwill both be labeled as parents (Line 7 of Algorithm 2). If a variable pair satis\ufb01es C3, one of them\nis labeled as idT = 2 only if the other variable within the pair is already identi\ufb01ed as a parent;\notherwise, they are both labeled as idT = 3 (Line 9\u223c12 and 15\u223c17 of Algorithm 2). If a PC node\nremains inconclusive with idT = 0, it is labeled as idT = 4 in Line 20 of Algorithm 2. Note that\nif T has only one PC node, it is labeled as idT = 3 (Line 4 of Algorithm 2). Non-PC nodes always\nhave idT = 0.\nStep 2: Resolve idT = 4. Lemma 1 alone cannot identify the variable pairs in PCT with idT = 4\ndue to other possible unblocked paths, and we have to seek other information. Fortunately, by\nde\ufb01nition, the MB set of one of the target\u2019s PC node can block all paths to that PC node.\nLemma 2. Let (X, Y ) \u2208 PCT , the PC set of the target T \u2208 V in a causal DAG. The independence\nrelationships between X and Y , conditioned on the MB of X minus {Y, T}, MBX \\ {Y, T}, can\nbe divided into the following four conditions:\nC1 X \u22a5\u22a5 Y |MBX \\ {Y, T} and X \u22a5\u22a5 Y |T \u222a MBX \\ Y ; this condition can not happen.\nC2 X \u22a5\u22a5 Y |MBX \\ {Y, T} and X \u22a5\\\u22a5 Y |T \u222a MBX \\ Y \u21d2 X and Y are both the parents of T .\nC3 X \u22a5\\\u22a5 Y |MBX \\{Y, T} and X \u22a5\u22a5 Y |T \u222a MBX \\ Y \u21d2 at least one of X and Y is a child of T .\nC4 X \u22a5\\\u22a5 Y |MBX \\ {Y, T} and X \u22a5\\\u22a5 Y |T \u222a MBX \\ Y \u21d2 then X and Y is directly connected.\nC1\u223c3 are very similar to those in Lemma 1. C4 is true because, conditioned on T and the MB of X\nminus Y , the only potentially unblocked paths between X and Y are X \u2212 T \u2212 Y and/or X \u2212 Y . If\nC4 happens, then the path X\u2212T \u2212Y has no impact on the relationship between X and Y , and hence\nX \u2212 Y must be directly connected. If X and Y are not directly connected and the only potentially\nunblocked path between X and Y is X \u2212 T \u2212 Y , and X and Y will be identi\ufb01ed by Line 10 of\nAlgorithm 1 with idT \u2208 {1, 2, 3}. For example in Figure 1b, conditioned on MBC \\ {T, D}, i.e.,\n{A, B}, the only path between C and D is through T. However, if X and Y are directly connected,\nthey will remain with idT = 4 (such as node D and E from Figure 1b). In this case, X, Y , and\nT form a fully connected clique, and edges among the variables that form a fully connected clique\ncan have many different orientation combinations without affecting the conditional independence\nrelationships. Therefore, this case needs further tests to ensure Meek rules are satis\ufb01ed. The third\nMeek rule (enforcing 3-fork V-structures) is \ufb01rst enforced by Line 14 of Algorithm 1. Then the rest\nof idT = 4 nodes are changed to have idT = 3 by Line 15 of Algorithm 1 and to be further processed\n(even though they could be both parents at the same time) with neighbor nodes\u2019 causal identities.\nTherefore, Step 2 of Algorithm 1 makes all variable pairs with idT = 4 to become identi\ufb01ed either\nas parents, children, or with idT = 3 after taking some neighbors\u2019 MBs into consideration. Note\nthat Step 2 of CMB only needs to \ufb01nd the MB\u2019s for a small subset of the PC variables (in fact only\none MB for each variable pair with idT = 4).\nStep 3: Resolve idT = 3. After Step 2, some PC variables may still have idT = 3. This could\nhappen because of the existence of Markov equivalence structures. Below we show the condition\nunder which the CMB can resolve the causal identities of all PC nodes.\n\n5\n\nABCDGEFTCBDAE(\ud835\udc4e)(\ud835\udc4f)\fLemma 3. The Identi\ufb01ability Condition. For Algorithm 1 to fully identify all the causal relation-\nships within the PC set of a target T , 1) T must have at least two nonadjacent parents, 2) one of T \u2019s\nsingle ancestors must contain at least two nonadjacent parents, or 3) T has 3 parents that form a\n3-fork pattern as de\ufb01ned in Meeks rules.\n\nWe use single ancestors to represent ancestor nodes that do not have a spouse with a mutual child that\nis also an ancestor of T . If the target does not meet any of the conditions in Lemma 2, C2 will never\nbe satis\ufb01ed and all PC variables within a MB will have idT = 3. Without a single parent identi\ufb01ed,\nit is impossible to infer the identities of children nodes using C3. Therefore, all the identities of the\nPC nodes are uncertain, even though the resulting structure could be a CPDAG.\nStep 3 of CMB searches for a non-single ancestor of T to infer the causal directions. For each node\nX with idT = 3, CMB tries to identify its local causal structure recursively. If X\u2019s PC nodes are\nall identi\ufb01ed, it would return to the target with the resolved identities; otherwise, it will continue\nto search for a non-single ancestor of X. Note that CMB will not go back to already-searched\nvariables with unresolved PC nodes without providing new information. Step 3 of CMB checks the\nidenti\ufb01ability condition for all the ancestors of the target. If a graph structure does not meet the\nconditions of Lemma 3, the \ufb01nal IDT will contain some idT = 3, which indicates reversible edges\nin CPDAGs. The found causal graph using CMB will be a PDAG after Step 2 of Algorithm 1, and\nit will be a CPDAG after Step 3 of Algorithm 1.\nCase Study. The procedure using CMB to identify the direct causes and effects of E in Figure 1a\nhas the following 3 steps. Step 1: CMB \ufb01nds the MB and PC set of E. The PC set contains node\nD and F . Then, IDE(D) = 3 and IDE(F ) = 3. Step 2: to resolve the variable pair D and F\nwith idE = 3, 1) CMB \ufb01nds the PC set of D, containing C, E, and G. Their idD are all 3\u2019s, since\nD contains only one parent. 2) To resolve IDD, CMB checks causal identities of node C and G\n(without going back to E). The PC set of C contains A, B, and D. CMB identi\ufb01es IDC(A) = 1,\nIDC(B) = 1, and IDC(D) = 2. Since C resolves all its PC nodes, CMB returns to node D\nwith IDD(C) = 1. 3) With the new parent C, IDD(G) = 2, IDD(E) = 2, and CMB returns to\nnode E with IDE(D) = 1. Step 3: the IDE(D) = 1, and after resolving the pair with idE = 3,\nIDE(F ) = 2.\nTheorem 2. The Soundness and Completeness of CMB Algorithm. If the identi\ufb01ability condition\nis satis\ufb01ed, using a sound and complete MB discovery algorithm, CMB will identify the direct causes\nand effects of the target under the causal faithfulness, suf\ufb01ciency, correct independence tests, and\nno selection bias assumptions.\n\nProof. A sound and complete MB discovery algorithm \ufb01nd all and only the MB nodes of a target.\nUsing it and under the causal suf\ufb01ciency assumption, the learned PC set contains all and only the\ncause-effect variables by Theorem 1. When Lemma 3 is satis\ufb01ed, all parent nodes are identi\ufb01able\nthrough V-structure independence changes, either by Lemma 1 or by Lemma 2. Also since children\ncannot be conditionally independent of another PC node given its MB minus the target node (C2),\nall parents identi\ufb01ed by Lemma 1 and 2 will be the true positive direct causes. Therefore, all and\nonly the true positive direct causes will be correctly identi\ufb01ed by CMB. Since PC variables can only\nbe direct causes or direct effects, all and only the direct effects are identi\ufb01ed correctly by CMB.\n\nIn the cases where CMB fails to identify all the PC nodes, global causal discovery methods cannot\nidentify them either. Speci\ufb01cally, structures failing to satisfy Lemma 3 can have different orien-\ntations on some edges while preserving the skeleton and v-structures, hence leading to Markov\nequivalent structures. For the cases where T has all single ancestors, the edge directions among all\nsingle ancestors can always be reversed without introducing new V-structures and DAG violations,\nin which cases the Meek rules cannot identify the causal directions either. For the cases with fully\nconnected cliques, these fully connected cliques do not meet the nonadjacent-parents requirement\nfor the \ufb01rst Meek rule (no new V-structures), and the second Meek rule (preserving DAGs) can\nalways be satis\ufb01ed within a clique by changing the direction of one edge. Since CMB orients the\n3-fork V-structure in the third Meek rule correctly by Line 12\u223c14 of Algorithm 1, CMB can identify\nthe same structure as the global methods that use the Meek rules.\nTheorem 3. Consistency between CMB and Global Causal Discovery Methods. For the same\nDAG G, Algorithm 1 will correctly identify all the direct causes and effects of a target variable T\n\n6\n\n\fas the global and local-to-global causal discovery methods2 that use the Meek rules [10], up to G\u2019s\nCPDAG under the causal faithfulness, suf\ufb01ciency, correct independence tests, and no selection bias\nassumptions.\n\nProof. It has been shown that causal methods using Meek rules [10] can identify up to a graph\u2019s\nCPDAG. Since Meek rules cannot identify the structures that fail Lemma 3, the global and local-to-\nglobal methods can only identify the same structures as CMB. Since CMB is sound and complete in\nidentifying these structures by Theorem 2, CMB will identify all direct causes and effects up to G\u2019s\nCPDAG.\n\n3.1 Complexity\n\nThe complexity of CMB algorithm is dominated by the step of \ufb01nding the MB, which can have an\nexponential complexity [1, 16]. All other steps of CMB are trivial in comparison. If we assume a\nuniform distribution on the neighbor sizes in a network with N nodes, then the expected time com-\nplexity of Step 1 of CMB is O( 1\nN ), while local-to-global methods are O(2N ).\nN\nIn later steps, CMB also needs to \ufb01nd MBs for a small subset of nodes that include 1) one node\nbetween every pair of nodes that meet C4, and 2) a subset of the target\u2019s neighboring nodes that\nprovide additional clues for the target. Let l be the total size of these nodes, then CMB reduces the\ncost by N\nl\n\ntimes asymptotically.\n\ni=1 2i) = O( 2N\n\n(cid:80)N\n\n4 Experiments\n\nWe use benchmark causal learning datasets to evaluate the accuracy and ef\ufb01ciency of CMB with\nfour other causal discovery algorithms discussed: P-C, GS, MMHC, CS, and the local causal dis-\ncovery algorithm LCD2 [7]. Due to page limit, we show the results of the causal algorithms on four\nmedium-to-large datasets: ALARM, ALARM3, CHILD3, and INSUR3. They contain 37 to 111\nnodes. We use 1000 data samples for all datasets. For each global or local-to-global algorithm, we\n\ufb01nd the global structure of a dataset and then extract causal identities of all nodes to a target node.\nCMB \ufb01nds causal identities of every variable with respect to the target directly. We repeat the dis-\ncovery process for each node in the datasets, and compare the discovered causal identities of all the\nalgorithms to all the Markov equivalent structures with the known ground truth structure. We use the\nedge scores [15] to measure the number of missing edges, extra edges, and reversed edges3 in each\nnode\u2019s local causal structure and report average values along with its standard deviation, for all the\nnodes in a dataset. We use the existing implementation [21] of HITON-MB discovery algorithm to\n\ufb01nd the MB of a target variable for all the algorithms. We also use the existing implementations [21]\nfor P-C, MMHC, and LCD2 algorithms. We implement GS, CS, and the proposed CMB algorithms\nin MATLAB on a machine with 2.66GHz CPU and 24GB memory. Following the existing proto-\ncol [15], we use the number of conditional independence tests needed (or scores computed for the\nscore-based search method MMHC) to \ufb01nd the causal structures given the MBs4, and the number\nof times that MB discovery algorithms are invoked to measure the ef\ufb01ciency of various algorithms.\nWe also use mutual-information-based conditional independence tests with a standard signi\ufb01cance\nlevel of 0.02 for all the datasets without worrying about parameter tuning.\nAs shown in Table 1, CMB consistently outperforms the global discovery algorithms on benchmark\ncausal networks, and has comparable edge accuracy with local-to-global algorithms. Although CMB\nmakes slightly more total edge errors in ALARM and ALARM3 datasets than CS, CMB is the best\nmethod on CHILD3 and INSUR3. Since LCD2 is an incomplete algorithm, it never \ufb01nds extra or\nreversed edges but misses the most amount of edges. Ef\ufb01ciency-wise, CMB can achieve more than\none order of magnitude speedup, sometimes two orders of magnitude as shown in CHILD3 and\nINSUR3, than the global methods. Compared to local-to-global methods, CMB also can achieve\n\n2We specify the global and local-to-global causal methods to be P-C [19], GS [9] and CS [15].\n3If an edge is reversible in the equivalent class of the original graph but are not in the equivalent class of the\n\nlearned graph, it is considered as reversed edges as well.\n\n4For global methods, it is the number of tests needed or scores computed given the moral graph of the global\n\nstructure. For LCD2, it would be the total number of tests since it does not use moral graph or MBs.\n\n7\n\n\fTable 1: Performance of Various Causal Discovery Algorithms on Benchmark Networks\n\nDataset Method\nAlarm\n\nGS\nCS\n\nGS\nCS\n\nErrors:\nExtra\n1.59\u00b10.19\nP-C\nMMHC 1.29\u00b10.18\n0.39\u00b10.44\n0.42\u00b10.10\n0.00\u00b10.00\nLCD2\n0.69\u00b10.13\nCMB\n3.71\u00b10.57\nP-C\nMMHC 2.36\u00b10.11\n1.24\u00b10.23\n1.26\u00b10.16\n0.00\u00b10.00\nLCD2\n1.41\u00b10.13\nCMB\n4.32\u00b10.68\nP-C\nMMHC 1.98\u00b10.10\n0.88\u00b10.04\n0.94\u00b10.20\n0.00\u00b10.00\nLCD2\n0.92\u00b10.12\nCMB\n4.76\u00b11.33\nP-C\nMMHC 2.39\u00b10.18\n1.94\u00b10.06\n1.92\u00b10.08\n0.00\u00b10.00\n1.72\u00b10.07\n\nLCD2\nCMB\n\nGS\nCS\n\nGS\nCS\n\nAlarm3\n\nChild3\n\nInsur3\n\nEdges\nMissing\n2.19\u00b10.14\n1.94\u00b10.09\n0.87\u00b10.48\n0.64\u00b10.10\n2.49\u00b10.00\n0.61\u00b10.11\n2.21\u00b10.25\n2.45\u00b10.08\n1.41\u00b10.05\n1.47\u00b10.08\n3.85\u00b10.00\n1.55\u00b10.27\n2.69\u00b10.08\n1.57\u00b10.04\n0.75\u00b10.08\n0.91\u00b10.14\n2.63\u00b10.00\n0.84\u00b10.16\n2.50\u00b10.11\n2.53\u00b10.06\n1.44\u00b10.05\n1.56\u00b10.06\n5.03\u00b10.00\n1.39\u00b10.06\n\nReversed\n0.32\u00b10.10\n0.24\u00b10.06\n1.13\u00b10.23\n0.38\u00b10.08\n0.00\u00b10.0\n0.51\u00b10.10\n1.37\u00b10.04\n0.72\u00b10.08\n0.99\u00b10.14\n0.63\u00b10.14\n0.00\u00b10.0\n0.78\u00b10.25\n0.84\u00b10.10\n0.43\u00b10.04\n1.03\u00b10.08\n0.53\u00b10.08\n0.00\u00b10.0\n0.60\u00b10.10\n1.29\u00b10.11\n0.76\u00b10.07\n1.19\u00b10.10\n0.89\u00b10.09\n0.00\u00b10.0\n1.19\u00b10.05\n\nTotal\n\n4.10\u00b10.19\n3.46\u00b10.23\n2.39\u00b10.44\n1.43\u00b10.10\n2.49\u00b10.00\n1.81\u00b10.11\n7.30\u00b10.68\n5.53\u00b10.27\n3.64\u00b10.13\n3.38\u00b10.13\n3.85\u00b10.00\n3.73\u00b10.11\n7.76\u00b10.98\n4.00\u00b10.93\n2.66\u00b10.33\n2.37\u00b10.33\n2.63\u00b10.00\n2.36\u00b10.31\n8.55\u00b10.81\n5.68\u00b10.43\n4.57\u00b10.33\n4.37\u00b10.23\n5.03\u00b10.00\n4.30\u00b10.21\n\nEf\ufb01ciency\nNo. Tests\n4.0e3\u00b14.0e2\n1.8e3\u00b11.7e3\n586.5\u00b172.2\n331.4\u00b161.9\n1.4e3\u00b10\n53.7\u00b14.5\n1.6e4\u00b14.0e2\n3.7e3\u00b16.1e2\n2.1e3\u00b11.2e2\n699.1\u00b160.4\n1.2e4\u00b10\n50.3\u00b16.2\n8.3e4\u00b12.9e3\n6.6e3\u00b18.2e2\n2.1e3\u00b12.5e2\n1.0e3\u00b14.8e2\n3.6e3\u00b10\n78.2\u00b115.2\n2.5e5\u00b11.2e4\n3.1e4\u00b15.2e2\n4.5e4\u00b12.2e3\n2.6e4\u00b13.9e3\n6.6e3\u00b10\n159.8\u00b138.5\n\n-\n\nNo. MB\n37\u00b10\n37\u00b10\n37\u00b1 0\n\n-\n\n-\n\n2.61 \u00b1 0.12\n111 \u00b1 0\n111 \u00b1 0\n111\u00b10\n\n2.58 \u00b1 0.09\n\n-\n\n-\n\n-\n\n-\n\n-\n\n60 \u00b10\n60\u00b10\n60\u00b1 0\n\n81 \u00b1 0\n81\u00b10\n81\u00b10\n\n2.53 \u00b1 0.15\n\n2.46 \u00b1 0.11\n\nmore than one order of speedup on ALARM3, CHILD3, and INSUR3. In addition, on these datasets,\nCMB only invokes MB discovery algorithms between 2 to 3 times, drastically reducing the MB calls\nof local-to-global algorithms. Since independence test comparison is unfair to LCD2 who does not\nuse MB discovery or \ufb01nd moral graphs, we also compared time ef\ufb01ciency between LCD2 and CMB.\nCMB is 5 times faster on ALARM, 4 times faster on ALARM3 and CHILD3, and 8 times faster on\nINSUR3 than LCD2.\nIn practice, the performance of CMB depends on two factors: the accuracy of independence tests\nand MB discovery algorithms. First, independence tests may not always be accurate and could\nintroduce errors while checking the four conditions of Lemma 1 and 2, especially under insuf\ufb01cient\ndata samples. Secondly, causal discovery performance heavily depends on the performance of the\nMB discovery step, as the error could propagate to later steps of CMB. Improvements on both areas\ncould further improve CMB\u2019s accuracy. Ef\ufb01ciency-wise, CMB\u2019s complexity can still be exponential\nand is dominated by the MB discovery phrase, and thus its worst case complexity could be the same\nas local-to-global approaches for some special structures.\n\n5 Conclusion\n\nWe propose a new local causal discovery algorithm CMB. We show that CMB can identify the\nsame causal structure as the global and local-to-global causal discovery algorithms with the same\nidenti\ufb01cation condition, but uses a fraction of the cost of the global and local-to-global approaches.\nWe further prove the soundness and completeness of CMB. Experiments on benchmark datasets\nshow the comparable accuracy and greatly improved ef\ufb01ciency of CMB for local causal discovery.\nPossible future works could study assumption relaxations, especially without the causal suf\ufb01ciency\nassumption, such as by using a similar procedure as FCI algorithm and the improved CS algorithm\n[14] to handle latent variables in CMB.\n\n8\n\n\fReferences\n[1] Constantin Aliferis, Ioannis Tsamardinos, Alexander Statnikov, C. F. Aliferis M. D, Ph. D, I. Tsamardi-\nnos Ph. D, and Er Statnikov M. S. Hiton, a novel markov blanket algorithm for optimal variable selection,\n2003.\n\n[2] David Maxwell Chickering. Optimal structure identi\ufb01cation with greedy search. Journal of Machine\n\nLearning Research, 2002.\n\n[3] Gregory F Cooper. A simple constraint-based algorithm for ef\ufb01ciently mining observational databases for\n\ncausal relationships. Data Mining and Knowledge Discovery, 1(2):203\u2013224, 1997.\n\n[4] Isabelle Guyon, Andre Elisseeff, and Constantin Aliferis. Causal feature selection. 2007.\n[5] Daphne Koller and Mehran Sahami. Toward optimal feature selection. In ICML 1996, pages 284\u2013292.\n\nMorgan Kaufmann, 1996.\n\n[6] Subramani Mani, Constantin F Aliferis, Alexander R Statnikov, and MED NYU. Bayesian algorithms for\n\ncausal data mining. In NIPS Causality: Objectives and Assessment, pages 121\u2013136, 2010.\n\n[7] Subramani Mani and Gregory F Cooper. A study in causal discovery from population-based infant birth\nand death records. In Proceedings of the AMIA Symposium, page 315. American Medical Informatics\nAssociation, 1999.\n\n[8] Subramani Mani and Gregory F Cooper. Causal discovery using a bayesian local causal discovery algo-\n\nrithm. Medinfo, 11(Pt 1):731\u2013735, 2004.\n\n[9] Dimitris Margaritis and Sebastian Thrun. Bayesian network induction via local neighborhoods. In Ad-\n\nvances in Neural Information Processing Systems 12, pages 505\u2013511. MIT Press, 1999.\n\n[10] Christopher Meek. Causal inference and causal explanation with background knowledge. In Proceedings\nof the Eleventh conference on Uncertainty in arti\ufb01cial intelligence, pages 403\u2013410. Morgan Kaufmann\nPublishers Inc., 1995.\n\n[11] Teppo Niinimaki and Pekka Parviainen. Local structure disocvery in bayesian network. In Proceedings\n\nof Uncertainy in Arti\ufb01cal Intelligence, Workshop on Causal Structure Learning, pages 634\u2013643, 2012.\n\n[12] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan\n\nKaufmann Publishers, Inc., 2 edition, 1988.\n\n[13] Judea Pearl. Causality: models, reasoning and inference, volume 29. Cambridge Univ Press, 2000.\n[14] Jean-Philippe Pellet and Andr\u00b4e Elisseeff. Finding latent causes in causal networks: an ef\ufb01cient approach\nbased on markov blankets. In Advances in Neural Information Processing Systems, pages 1249\u20131256,\n2009.\n\n[15] Jean-Philippe Pellet and Andre Ellisseeff. Using markov blankets for causal structure learning. Journal\n\nof Machine Learning, 2008.\n\n[16] Jose M. Pe`oa, Roland Nilsson, Johan Bj\u00a8orkegren, and Jesper Tegn\u00b4er. Towards scalable and data ef\ufb01cient\n\nlearning of markov boundaries. Int. J. Approx. Reasoning, 45(2):211\u2013232, July 2007.\n\n[17] Craig Silverstein, Sergey Brin, Rajeev Motwani, and Jeff Ullman. Scalable techniques for mining causal\n\nstructures. Data Mining and Knowledge Discovery, 4(2-3):163\u2013192, 2000.\n\n[18] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, 2nd edition,\n\n2000.\n\n[19] Peter Spirtes, Clark Glymour, Richard Scheines, Stuart Kauffman, Valerio Aimale, and Frank Wimberly.\n\nConstructing bayesian network models of gene expression networks from microarray data, 2000.\n\n[20] Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent\nvariables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in arti\ufb01cial intel-\nligence, pages 499\u2013506. Morgan Kaufmann Publishers Inc., 1995.\n\n[21] Alexander Statnikov, Ioannis Tsamardinos, Laura E. Brown, and Constatin F. Aliferis. Causal explorer:\nA matlab library for algorithms for causal discovery and variable selection for classi\ufb01cation. In Causation\nand Prediction Challenge at WCCI, 2008.\n\n[22] Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. Time and sample ef\ufb01cient discov-\nery of markov blankets and direct causal relations. In Proceedings of the ninth ACM SIGKDD interna-\ntional conference on Knowledge discovery and data mining, KDD \u201903, pages 673\u2013678, New York, NY,\nUSA, 2003. ACM.\n\n[23] Ioannis Tsamardinos, LauraE. Brown, and ConstantinF. Aliferis. The max-min hill-climbing bayesian\n\nnetwork structure learning algorithm. Machine Learning, 65(1):31\u201378, 2006.\n\n[24] Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent con-\n\nfounders and selection bias. Arti\ufb01cial Intelligence, 172(16):1873\u20131896, 2008.\n\n9\n\n\f", "award": [], "sourceid": 1494, "authors": [{"given_name": "Tian", "family_name": "Gao", "institution": "Rensselaer Polytechnic Institute"}, {"given_name": "Qiang", "family_name": "Ji", "institution": "Rensselaer Polytechnic Institute"}]}