{"title": "An Algorithm to Learn Polytree Networks with Hidden Nodes", "book": "Advances in Neural Information Processing Systems", "page_first": 15110, "page_last": 15119, "abstract": "Ancestral graphs are a prevalent mathematical tool to take into account latent (hidden) variables in a probabilistic graphical model. In ancestral graph representations, the nodes are only the observed (manifest) variables and the notion of m-separation fully characterizes the conditional independence relations among such variables, bypassing the need to explicitly consider latent variables. However, ancestral graph models do not necessarily represent the actual causal structure of the model, and do not contain information about, for example, the precise number and location of the hidden variables. Being able to detect the presence of latent variables while also inferring their precise location within the actual causal structure model is a more challenging task that provides more information about the actual causal relationships among all the model variables, including the latent ones. In this article, we develop an algorithm to exactly recover graphical models of random variables with underlying polytree structures when the latent nodes satisfy specific degree conditions. Therefore, this article proposes an approach for the full identification of hidden variables in a polytree. We also show that the algorithm is complete in the sense that when such degree conditions are not met, there exists another polytree with fewer number of latent nodes satisfying the degree conditions and entailing the same independence relations among the observed variables, making it indistinguishable from the actual polytree.", "full_text": "An Algorithm to Learn Polytree Networks with\n\nHidden Nodes\n\nFiroozeh Sepehr\nDepartment of EECS\n\nUniversity of Tennessee Knoxville\n\n1520 Middle Dr, Knoxville, TN 37996\n\nDonatello Materassi\nDepartment of EECS\n\nUniversity of Tennessee Knoxville\n\n1520 Middle Dr, Knoxville, TN 37996\n\ndawn@utk.edu\n\ndmateras@utk.edu\n\nAbstract\n\nAncestral graphs are a prevalent mathematical tool to take into account latent (hid-\nden) variables in a probabilistic graphical model. In ancestral graph representations,\nthe nodes are only the observed (manifest) variables and the notion of m-separation\nfully characterizes the conditional independence relations among such variables,\nbypassing the need to explicitly consider latent variables. However, ancestral graph\nmodels do not necessarily represent the actual causal structure of the model, and\ndo not contain information about, for example, the precise number and location of\nthe hidden variables. Being able to detect the presence of latent variables while\nalso inferring their precise location within the actual causal structure model is a\nmore challenging task that provides more information about the actual causal rela-\ntionships among all the model variables, including the latent ones. In this article,\nwe develop an algorithm to exactly recover graphical models of random variables\nwith underlying polytree structures when the latent nodes satisfy speci\ufb01c degree\nconditions. Therefore, this article proposes an approach for the full identi\ufb01cation\nof hidden variables in a polytree. We also show that the algorithm is complete\nin the sense that when such degree conditions are not met, there exists another\npolytree with fewer number of latent nodes satisfying the degree conditions and\nentailing the same independence relations among the observed variables, making it\nindistinguishable from the actual polytree.\n\n1\n\nIntroduction\n\nThe presence of unmeasured variables is a fundamental challenge in discovery of causal relationships\n[1, 2, 3]. When the causal diagram is a Directed Acyclic Graph (DAG) with unmeasured variables,\na common approach is to use ancestral graphs to describe the independence relations among the\nmeasured variables [2]. The main advantage of ancestral graphs is that they involve only the measured\nvariables and successfully encode all their conditional independence relations via m-separation.\nFurthermore, complete algorithms have been devised to obtain ancestral graphs from observational\ndata, e.g., the work in [3]. However, recovering the actual structure of the original DAG is something\nthat ancestral graphs somehow circumvent. For example, it might be known that the actual causal\ndiagram has a polytree structure including the hidden nodes, but the ancestral graph associated with\nthe measured variables might not even be a polytree [4]. Instead, the recovery of causal diagrams\nincluding the location of their hidden variables is a very challenging task and algorithmic solutions\nare available only for speci\ufb01c scenarios [5, 6, 7, 8]. For example, in the case of speci\ufb01c distributions\n(i.e., Gaussian and Binomial) when the causal diagram is known to be a rooted tree, the problem\nhas been solved by exploiting the additivity of a metric along the paths of the tree [6, 7, 8, 9]. In the\ncase of generic distributions, though, additive metrics might be too di\ufb03cult to de\ufb01ne or cannot be\nde\ufb01ned in general. Furthermore, rooted trees can be considered a rather limiting class of networks\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fsince they represent probability distributions which can only be factorized according to second order\nconditional distributions [10].\nThis article makes a novel contribution towards the recovery of more general causal diagrams.\nIndeed, it provides an algorithm to learn causal diagrams making no assumptions on the underlying\nprobability distribution, and considering polytree structures which can represent factorizations\ninvolving conditional distributions of arbitrarily high order. Furthermore, it is shown that a causal\ndiagram with a polytree structure can be exactly recovered if and only if each hidden node satis\ufb01es\nthe following conditions: (i) the node has at least two children; (ii) if the node has exactly one parent,\nsuch a parent is not hidden; (iii) the node has at least degree 3, or each of its two children has at least\nanother parent. The provided algorithm recovers every polytree structure with hidden nodes satisfying\nthese conditions, and, remarkably, makes use only of third order statistics. If the degree conditions are\nnot satis\ufb01ed, then it is shown that there exists another polytree with fewer number of hidden random\nvariables which entails the same independence relations among the observed variables. Indeed, in\nthis case, when no additional information/observations are provided, no test can be constructed to\ndetermine the true structure. Another main advantage of this proposed approach lies in the fact that it\nfollows a form of Occam\u2019s razor principle since in the case where the degree conditions on the hidden\nnodes are not met, then a polytree with minimal number of hidden nodes is selected. We \ufb01nd this\nproperty quite relevant in application scenarios since Occam\u2019s razor is arguably one of the cardinal\nprinciples in all sciences.\n\n2 Preliminaries, Assumptions and Problem De\ufb01nition\n\nIn order to formulate our problem, we \ufb01rst introduce a generalization of the notions of directed and\nundirected graphs (see for example [11, 12]) which also considers a partition of the set of nodes into\nvisible and hidden nodes.\nDe\ufb01nition 1 (Latent partially directed graph). A latent partially directed graph \u00afG(cid:96) is a 4-ple\n(V, L, E, (cid:126)E) where\n\n\u2022 the disjoint sets V and L are named the set of visible nodes and the set of hidden nodes,\n\u2022 the set E is the set of undirected edges containing unordered pairs of (V \u222a L) \u00d7 (V \u222a L),\n\u2022 the set (cid:126)E is the set of directed edges containing ordered pairs of (V \u222a L) \u00d7 (V \u222a L).\n\nWe denote the unordered pair of two elements yi, y j \u2208 V \u222a L as yi \u2212 y j, and the ordered pair of yi, y j\n(when yi precedes y j) as yi \u2192 y j. In a latent partially directed graph the sets E and (cid:126)E do not share\nany edges. Namely, yi \u2212 y j \u2208 E implies that both yi \u2192 y j and y j \u2192 yi are not in (cid:126)E. (cid:3)\nA latent partially directed graph is a fully undirected graph when (cid:126)E = \u2205, and we simplify the notation\nby writing G(cid:96) = (V, L, E). Similarly, when E = \u2205, we have a fully directed graph, and we denote\nit by (cid:126)G(cid:96) = (V, L, (cid:126)E). Furthermore, if we drop the distinction between visible and hidden nodes and\nconsider V \u222a L as the set of nodes, we recover the standard notions of undirected and directed graphs.\nThus, latent partially directed graphs inherit, in a natural way, all notions associated with standard\ngraphs (e.g., path, degree, neighbor, etc., see for example [11]). In the scope of this article, we denote\ndegree, outdegree, indegree, children, parents, descendants and ancestors of a node y in graph (cid:126)G\nusing deg(cid:126)G (y), deg+\n(y), ch(cid:126)G (y), pa(cid:126)G (y), de(cid:126)G (y) and an(cid:126)G (y), respectively (see [11, 12] for\n(cid:126)G\nprecise de\ufb01nitions). Furthermore, the notion of restriction of a graph to a subset of nodes follows\nimmediately.\nDe\ufb01nition 2 (Restriction of a latent partially directed graph). The restriction of a latent partially\ndirected graph \u00afG(cid:96) = (V, L, E, (cid:126)E) with respect to a set of nodes A \u2286 V \u222a L is the latent partially\ndirected graph obtained by considering only the nodes in A and the edges linking pairs of nodes\nwhich are both in A. (cid:3)\n\n(y), deg\u2212\n\n(cid:126)G\n\nMoreover, a latent partially directed graph is called a latent partially directed tree when there exists\nexactly one path connecting any pair of nodes.\nDe\ufb01nition 3 (Latent partially directed tree). A latent partially directed tree (cid:126)P(cid:96) is a latent partially\ndirected graph \u00afG(cid:96) = (V, L, E, (cid:126)E) where every pair of nodes yi, y j \u2208 V \u222a L is connected by exactly one\npath.\n\n2\n\n\fTrivially, latent partially directed trees generalize the notions of undirected trees and polytrees\n(directed trees) [13]. In a latent partially directed tree, we de\ufb01ne a hidden cluster as a group of hidden\nnodes that are connected to each other via a path constituted exclusively of hidden nodes.\nDe\ufb01nition 4 (Hidden cluster). A hidden cluster in a latent partially directed tree (cid:126)P(cid:96) = (V, L, E, (cid:126)E)\nis a set C \u2286 L such that for each distinct pair of nodes yi, y j \u2208 C the unique path connecting them\ncontains only nodes in C and no node in C is linked to a node which is in L \\ C. (cid:3)\nObserve that each node in a hidden cluster has neighbors which are either visible or hidden nodes\nof the same cluster. Figure 1 (a) depicts a latent directed tree (or a latent polytree) and its hidden\nclusters C1 and C2 highlighted by the dotted lines.\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: A hidden polytree (a) and its collapsed hidden polytree (b), a minimal hidden polytree (c).\n\nFurthermore, we introduce the set of (visible) neighbors of a hidden cluster, its closure and its degree.\nDe\ufb01nition 5 (Neighbors, closure, and degree of a hidden cluster). In a latent partially directed tree,\nthe set of all visible nodes linked to any of the nodes of a hidden cluster C is the set of neighbors of C\nand is denoted by N(C). We de\ufb01ne the degree of the hidden cluster as |N(C)|, namely, the number of\nneighbors of the cluster. We refer to the restriction of a latent polytree to a hidden cluster and its\nneighbors as the closure of the hidden cluster. (cid:3)\n\nObserve that the neighbors of C1 are shaded with orange color in Figure 1 (a). We also remind the\nnotion of a root node and de\ufb01ne the notion of a root of a hidden cluster.\nDe\ufb01nition 6 (Root of a latent polytree, and root of a hidden cluster in a latent polytree). In a latent\npolytree (cid:126)P(cid:96) = (V, L, (cid:126)E), a root is a node yr \u2208 V \u222a L with indegree equal to zero. Also, we de\ufb01ne any\nroot of the restriction of the polytree to one of its hidden clusters as the root of the hidden cluster. (cid:3)\n\nFor example, in Figure 1 (a), node y1 is a root of the latent polytree and node yh3 is a root of the hidden\ncluster C1. In this article, we make extensive use of the restriction of a polytree to the descendants of\none of its roots. We de\ufb01ne such a restriction as the rooted subtree of the polytree associated with\nthat root. Additionally, given a latent partially directed tree, we de\ufb01ne its collapsed representation\nby replacing each hidden cluster with a single hidden node. The formal de\ufb01nition is as follows and\nFigure 1 (b) depicts the collapsed representation of the latent polytree of Figure 1 (a).\nDe\ufb01nition 7 (Collapsed representation). We de\ufb01ne the collapsed representation of (cid:126)P(cid:96) = (V, L, E, (cid:126)E)\nas the latent partially directed tree (cid:126)Pc = (V, Lc, Ec, (cid:126)Ec) where nc is the number of hidden clusters\nC1, ..., Cnc, and Lc := C1 \u222a ... \u222a Cnc, and\n\nEc := {yi \u2212 y j \u2208 E | yi, y j \u2208 V} \u222a {yi \u2212 Ck | \u2203y j \u2208 Ck, yi \u2212 y j \u2208 E} \u222a {Ck \u2212 y j | \u2203yi \u2208 Ck, yi \u2212 y j \u2208 E}\n(cid:126)Ec := {yi \u2192 y j \u2208 (cid:126)E | yi, y j \u2208 V} \u222a {yi \u2192 Ck | \u2203y j \u2208 Ck, yi \u2192 y j \u2208 (cid:126)E} \u222a {Ck \u2192 y j | \u2203yi \u2208 Ck, yi \u2192 y j \u2208 (cid:126)E}.(cid:3)\n\nIn this article, we show the cases where graphical models with polytree structures can be recovered\nfrom the independence relations involving only visible nodes. Speci\ufb01cally, we assume that a polytree\nis a perfect map (see [14, 12]) for a probabilistic model de\ufb01ned over the variables V \u222a L where V and\nL are disjoint sets. We \ufb01nd conditions under which it is possible to recover information about the\nperfect map of the probabilistic model considering only independence relations of the form I(yi,\u2205, y j)\n(read yi and y j are independent) and I(yi, yk, y j) (read yi and y j are conditionally independent given\nyk) for all nodes yi, y j, yk \u2208 V. One of the fundamental requirements of solving this problem is that\nall hidden nodes need to satisfy certain degree conditions summarized in the following de\ufb01nition.\nDe\ufb01nition 8 (Minimal latent polytree). A latent polytree (cid:126)P(cid:96) = (V, L, (cid:126)E) is minimal if every hidden\nnode yh \u2208 L satis\ufb01es one of the following conditions:\n\n3\n\nh12h710645h631h2h3h4h511C1C2789C221064531C111789h12h710645h631h3h511C1121413C2C378914\f\u2022 deg+\n\u2022 deg+\n\n(cid:126)P(cid:96)\n\n(yh) \u2265 2 and deg(cid:126)P(cid:96)\n(yh) = 2 and deg\u2212\n\n(cid:0)yc2\n(cid:1) , deg\u2212\n(cid:0)yc1\n(yh)| = 1, then pa(cid:126)P(cid:96)\n(yh) \u2265 3 and if |pa(cid:126)P(cid:96)\n(yh) = 0 and deg\u2212\n\n(cid:1) \u2265 2 where ch(cid:126)P(cid:96)\n\n(yh) \u2286 V;\n\n(cid:126)P(cid:96)\n\n(cid:126)P(cid:96)\n\n(cid:126)P(cid:96)\n\n(cid:126)P(cid:96)\n\n(yh) = {yc1 , yc2}. (cid:3)\nNote that the nodes yh2, yh4, yh5, yh7 in Figure 1 (a) do not satisfy the minimality conditions and\ntherefore the hidden polytree is not minimal. Instead, Figure 1 (c) shows a minimal latent polytree.\nThe algorithm we propose to recover the structure of a latent polytree can be decomposed in several\ntasks and the hidden nodes which are roots with outdegree equal to 2 and at least one visible child\nrequire to be dealt with in a special way in the last task of the algorithm. Therefore, we de\ufb01ne the\nfollowing two types of hidden nodes to make this distinction.\nDe\ufb01nition 9 (Type-I and type-II hidden nodes). In a minimal latent polytree, we classify a hidden\nnode yh as type-II when deg(cid:126)G (yh) = 2 with at least one visible child. All other hidden nodes are\nclassi\ufb01ed as type-I.\n\nIn the minimal latent polytree of Figure 2 (a), the hidden nodes yh2 and yh3 are type-II hidden nodes,\nwhile all the other hidden nodes are type-I.\n\n(a)\n\n(b)\n\n(c)\n\nFigure 2: A minimal latent polytree (cid:126)P(cid:96) with type-II hidden nodes (a), its quasi-skeleton (b), and its\ncollapsed quasi-skeleton (c).\n\nWe de\ufb01ne the quasi-skeleton of a minimal latent polytree to deal with type-II hidden nodes separately.\nDe\ufb01nition 10 (Quasi-skeleton of a latent polytree). In a minimal latent polytree (cid:126)P(cid:96) = (V, L, (cid:126)E), the\nquasi-skeleton of (cid:126)P(cid:96) is the undirected graph obtained by removing the orientation of all edges in (cid:126)P(cid:96),\nand removing all the type-II hidden nodes and then linking its two children together. (cid:3)\n\nIn Figure 2 (b), we have the quasi-skeleton of the polytree of Figure 2 (a). Observe that we can\neasily de\ufb01ne the collapsed representation of a quasi-skeleton of a latent polytree by \ufb01nding the\nquasi-skeleton \ufb01rst and then \ufb01nding its collapsed representation as in Figure 2 (c).\nAs it is well known in the theory of graphical models, in the general case, from a set of conditional\nindependence statements (formally, a semi-graphoid) faithful to a Directed Acyclic Graph (DAG), it\nis not possible to recover the full DAG [15, 1]. What can be recovered for sure is the pattern of the\nDAG, namely the skeleton and the v-structures (i.e., yi \u2192 yk \u2190 y j) of the DAG [15, 1]. In this article,\nwe show that, similarly, in the case of a minimal latent polytree, we are able to recover the pattern of\nthe polytree from the independence statements involving only the visible variables.\nDe\ufb01nition 11 (Pattern of a polytree). Let (cid:126)P = (N, (cid:126)E) be a polytree. The pattern of (cid:126)P is a partially\ndirected graph where the orientation of all the v-structures (i.e., yi \u2192 yk \u2190 y j) are known and\nas many as the remaining undirected edges are oriented in such a way that the other alternative\norientation would result in a v-structure. (cid:3)\n\nNow we have all the necessary tools to formulate the problem.\nProblem Formulation. Assume a semi-graphoid de\ufb01ned over a set of variables V\u222aL. Let the latent\npolytree (cid:126)P(cid:96) = (V, L, (cid:126)E) be faithful to the semi-graphoid and assume that the nodes in L satisfy the\nminimality conditions. Recover the pattern of (cid:126)P(cid:96) from conditional independence relations involving\nonly nodes in V.\nRemark 12. The proposed solution makes only use of the conditional independence relations of the\nform I(yi,\u2205, y j) and I(yi, yk, y j) for all yi, y j, yk \u2208 V.\n\n4\n\nh259h3h5h6341187h46h1211059h5h6341187h46h1211059C2341187C162110\f3 An Algorithm to Reconstruct Minimal Hidden Polytrees\n\nOur algorithm for learning the pattern of a minimal latent polytree is made of the following 5 tasks:\n\n1. Using the independence statements involving the visible nodes, determine the number of rooted\n\nsubtrees in the latent polytree and their respective sets of visible nodes;\n\n2. Given all the visible nodes belonging to each rooted subtree, determine the collapsed quasi-\n\nskeleton of each rooted subtree;\n\n3. Merge the overlapping hidden clusters in the collapsed quasi-skeleton of each rooted subtree to\n\nobtain the collapsed quasi-skeleton of the latent polytree;\n\n4. Determine the quasi-skeleton of the latent polytree from the collapsed quasi-skeleton of the\n\nlatent polytree (recover type-I hidden nodes);\n\n5. Obtain the pattern of the latent polytree from the recovered quasi-skeleton of the latent polytree\n\n(recover type-II hidden nodes and edge orientations).\n\nFigure 3 shows the stage of the recovery of the polytree structure at the end of each task. The\nfollowing subsections provide more details about each task, but the most technical results are in the\nSupplemental Material. We stress that the \ufb01rst two tasks mostly leverage previous work about rooted\ntrees and the main novelty of this article lies in tasks 3, 4 and 5.\n\n(True)\n\n(Task 1)\n\n(Task 2)\n\n(Task 3)\n\n(Task 4)\n\n(Task 5)\n\nFigure 3: The actual minimal latent polytree (True); the lists of visible nodes for each rooted subtree\n(Task 1), collapsed quasi-skeletons of the rooted subtrees (Task 2), merging of the overlapping hidden\nclusters (Task 3), detection of type-I hidden nodes (Task 4), detection of type-II hidden nodes along\nwith orientation of the edges to obtain the pattern of the hidden polytree (Task 5). Observe that the\nfull polytree is not recovered at the end of task 5 since the edge y9 \u2212 y18 is left undirected.\n\n3.1 Task 1: Determine the visible nodes of each rooted subtree\n\nThis \ufb01rst task can be performed by the Pairwise-Finite Distance Algorithm (PFDA), presented in [16]\nand reported in the Supplementary Material as Algorithm 4. As shown in [16], PFDA takes as input\nthe set of visible nodes of a latent polytree and outputs sets of visible nodes with the property that\neach set corresponds to the visible descendants of a root of the latent polytree, when the polytree is\nminimal. In the following theorem, we show that the output of PFDA applied to the independence\nstatements is the same as described above. See Supplementary Material for the proof of this theorem.\nTheorem 13. Consider a latent polytree (cid:126)P(cid:96) = (V, L, (cid:126)E) faithful to a probabilistic model. Assume that\nthe hidden nodes in L satisfy the minimality conditions. Then PFDA, applied to the independence\nstatements of the probabilistic model with the form I(yi,\u2205, y j) for all yi, y j \u2208 V, outputs a collection\nof sets, such that each of them is given by all the visible descendants of a root of (cid:126)P(cid:96).\n\n5\n\n2710h451h1h24638h5h7h69h311171516141312h141810514638911171516141312417151614102913121715161413127613127151114171691811115171516141514171691715161491101312131267478CFCACDCHCI1715161492101312CB131263CC181113121411715162478CA(cid:48)531069182710h451h24638h5h7h69111715161413124182710h451h1h24638h5h7h69h311171516141312h1418\f3.2 Task 2: Determine the collapsed quasi-skeleton of each rooted subtree\n\nThe second task is performed by the Reconstruction Algorithm for Latent Rooted Trees in [17]. We\nreport it as Algorithm 5 in the Supplementary Material for completeness. The input of this algorithm\nis the set Vr of the visible nodes belonging to a rooted subtree Tr and independence relations of the\nform I(yi, yk, y j) or \u00acI(yi, yk, y j) for distinct yi, y j, yk \u2208 Vr. Its output is the collapsed quasi-skeleton\nof Tr. Thus, we can call this algorithm on all of the sets of visible nodes V1, ..., Vnr where nr is\nthe number of roots, obtained from Task 1, and \ufb01nd the collapsed quasi-skeletons of all the rooted\nsubtrees of the latent polytree. This result is formalized in the following theorem. See Supplementary\nMaterial for the proof of this theorem.\n\nTheorem 14. Let (cid:126)P(cid:96) = (V, L, (cid:126)E) be a minimal latent polytree. Consider a root yr of (cid:126)P(cid:96) and let\nVr = V \u2229 de(cid:126)P(cid:96)\n(yr). The output of Reconstruction Algorithm for Latent Rooted Trees applied to Vr is\nthe collapsed quasi-skeleton of the rooted subtree with root node yr.\n\n3.3 Task 3: Merge the overlapping hidden clusters of the collapsed rooted trees\n\nBy applying the Reconstruction Algorithm for Latent Rooted Trees on each set of visible nodes in\nthe same rooted tree, we have, as an output, the collapsed quasi-skeletons of all rooted subtrees in the\noriginal hidden polytree. In the general case, some hidden clusters in the collapsed quasi-skeleton of\nthe rooted subtrees might overlap, namely, they might share some hidden nodes in the original hidden\npolytree. The following theorem provides a test on the sets of visible nodes of the rooted subtrees in\na minimal latent polytree to determine if two hidden clusters in two distinct collapsed quasi-skeletons\nof two rooted subtrees belong to the same cluster in the collapsed quasi-skeleton of the polytree. See\nSupplementary Material for the proof of this theorem.\n\nTheorem 15. Consider a minimal latent polytree (cid:126)P(cid:96). Let C1 and C2 be two distinct hidden clusters\nin the collapsed quasi-skeletons of two rooted subtrees of (cid:126)P(cid:96). If the set of neighbors of C1 and the set\nof neighbors of C2 share at least a pair of visible nodes, i.e., |N(C1) \u2229 N(C2)| \u2265 2, then the nodes in\nC1 and C2 belong to the same hidden cluster in the collapsed quasi-skeleton of (cid:126)P(cid:96).\n\nThis theorem is the enabling result for the Hidden Cluster Merging Algorithm (HCMA), presented in\nAlgorithm 1, which merges all the collapsed quasi-skeletons associated with the individual rooted\nsubtrees, obtained from Task 2, into the collapsed quasi-skeleton of the polytree. This algorithm\nstarts with the collapsed quasi-skeleton of the rooted subtrees, then \ufb01nds pairs of clusters that overlap\nby testing if they share at least one pair of visible neighbors (see Theorem 15), and then merges the\noverlapping pairs. This procedure is repeated until no clusters are merged anymore.\n\nAlgorithm 1 Hidden Cluster Merging Algorithm\n\nInput the collapsed quasi-skeleton of the rooted subtrees Ti = (Vi, Li, Ei) for i = 1, ..., nr\nOutput the collapsed quasi-skeleton P of the latent polytree\n\n1: Initialize the set of clusters P with the hidden clusters of all Ti, i.e., P := {{C1},{C2}, ...,{Ck}}\n2: while there are two elements Ci, C j \u2208 P such that |N(Ci) \u2229 N(C j)| \u2265 2 do\n3:\n4:\n5: end while\n6: De\ufb01ne the polytree P = (\u222aiVi,P, E) where E := {{ya, yb} | \u2203 i : ya, yb \u2208 Vi, ya \u2212 yb \u2208 Ei} \u222a\n\nremove Ci, C j from P and add Ci \u222a C j to P\nde\ufb01ne N(Ci \u222a C j) := N(Ci) \u222a N(C j)\n\n{{ya, Cb} | \u2203 i, h : ya \u2208 Vi, yh \u2208 Li, Li \u2286 Cb, Cb \u2208 P, ya \u2212 yh \u2208 Ei}\n\nThe following theorem guarantees that, for a minimal latent polytree, the output of HCMA is the\ncollapsed quasi-skeleton of the polytree. See Supplementary Material for the proof of this theorem.\n\nTheorem 16. Let (cid:126)P(cid:96) = (V, L, (cid:126)E) be a minimal latent polytree and let Ti = (Vi, Li, Ei) for i = 1, ..., nr\nbe the collapsed quasi-skeletons of the rooted subtrees of (cid:126)P(cid:96). Then HCMA outputs the collapsed\nquasi-skeleton of (cid:126)P(cid:96).\n\n6\n\n\f3.4 Task 4: Determine the quasi-skeleton of the latent polytree from the collapsed\n\nquasi-skeleton of the latent polytree (recover type-I hidden nodes)\n\nAfter performing the HCMA, the output is the collapsed quasi-skeleton of the latent polytree, thus,\nthe structure of the hidden nodes within each hidden cluster is not known yet. Note that the restriction\nof the original polytree to the closure of a hidden cluster is a smaller polytree. The goal of this task is\nto recover the structure of the hidden clusters by focusing on each individual closure (i.e., recover\nType-I hidden nodes and their connectivities). Given the closure of a hidden cluster, the basic strategy\nis to detect one root of the hidden cluster along with the visible nodes (if any) linked to this root.\nThen, we label such a root as a visible node, add edges between this node and its visible neighbors,\nand subsequently apply the same strategy recursively to the descendants of such a detected root.\nSince we focus on the closure of a speci\ufb01c hidden cluster, say C, we de\ufb01ne the following sets\n\u02dcVr = Vr \u2229 N(C) for r = 1, ..., nr where nr is the number of rooted subtrees in the latent polytree and\nVr are the sets of visible nodes in each rooted subtree (obtained from Task 1). A fundamental result\nfor detection of a root of a hidden cluster is the following theorem. See Supplementary Material for\nthe proof of this theorem.\nTheorem 17. Let (cid:126)P(cid:96) be a minimal latent polytree and let (cid:126)T r = (Vr, Lr, (cid:126)Er) with r = 1, ..., nr be all\nthe rooted subtrees of (cid:126)P(cid:96). Let C be a hidden cluster in the collapsed quasi-skeleton of (cid:126)P(cid:96). De\ufb01ne\n\u02dcVr := Vr \u2229 N(C) for r = 1, ..., nr where nr is the number of roots in (cid:126)P(cid:96). Then, Tr contains a hidden\nroot of C if and only if \u02dcVr (cid:44) \u2205 and for all \u02dcVr(cid:48) with r(cid:48) (cid:44) r we have | \u02dcVr \\ \u02dcVr(cid:48)| > 1 or | \u02dcVr(cid:48) \\ \u02dcVr| \u2264 1.\nTo make the application of this theorem more clear, consider the latent polytree introduced in\nFigure 3 (True). After applying the \ufb01rst three tasks, we obtain the collapsed quasi-skeleton of the\nlatent polytree as depicted in Figure 3 (Task 3). Observe that the rooted subtrees (cid:126)T 1 (with root y1)\nand (cid:126)T 2 (with root y2) satisfy the conditions of Theorem 17 indicating that they contain a root of the\nhidden cluster. The following lemma allows one to \ufb01nd the visible nodes linked to a hidden root in\nthe closure of a hidden cluster. See Supplementary Material for the proof of this lemma.\nLemma 18. Let (cid:126)P(cid:96) be a minimal latent polytree. Consider a hidden root yh of a hidden cluster C in\nthe collapsed quasi-skeleton of (cid:126)P(cid:96) where yh belongs to the rooted subtree Tr = (Vr, Lr, (cid:126)Er). De\ufb01ne\n\u02dcVr(cid:48) := Vr(cid:48) \u2229 N(C) for r(cid:48) = 1, ..., nr where nr is the number of roots in (cid:126)P(cid:96). The visible nodes linked to\nyh are given by the set W \\ W where\n\nI := {r} \u222a {r(cid:48)such that | \u02dcVr \\ \u02dcVr(cid:48)| = | \u02dcVr(cid:48) \\ \u02dcVr| = 1},\n\nW :=\n\n\u02dcVi,\n\nW :=\n\n\u02dcVi.\n\n(cid:91)\n\ni\u2208I\n\n(cid:91)\n\ni(cid:60)I\n\nWe follow the example of Figure 3 to show the steps of Task 4 in more details. Without loss\nof generality, choose Tr = T1. Consider the closure of CA(cid:48) obtained at the end of Task 3\nand then apply Lemma 18 to obtain I = {1, 2}, W = {y1, y2, y10, y12, y13, y14, y15, y16, y17}, W =\n{y5, y6, y9, y11, y12, y13, y14, y15, y16, y17}, and thus W \\ W = {y1, y2, y10}. Therefore, the visible nodes\nlinked to the hidden root in T1 are y1, y2 and y10. Now we introduce the Hidden Cluster Learning\nAlgorithm (HCLA), presented in Algorithm 2, to learn the structure of a hidden cluster.\nAgain, consider the closure of the hidden cluster CA(cid:48) as depicted in Figure 4 (Task 4a) which we\nobtained at the end of Task 3. Then, apply Hidden Node Detection procedure to CA(cid:48) and observe\nthat the output at the end of Step 23 of Algorithm 2 is in Figure 4 (Task 4b). The output of the\nmerging in Steps 24-27 is depicted in Figure 4 (Task 4c) and the output of the merging in Step 28 is\ndepicted in Figure 4 (Task 4d). Now, we can apply the same procedure recursively to the remaining\nhidden clusters to obtain the \ufb01nal output of Task 4, the quasi-skeleton of the polytree, as depicted in\nFigure 3 (Task 4).\nHere, we show that the output of HCLA is the quasi-skeleton of the latent polytree. See Supplementary\nMaterial for the proof of this theorem.\nTheorem 19. Let (cid:126)P(cid:96) = (V, L, (cid:126)E) be a minimal latent polytree. When HCLA is applied to all hidden\nclusters of the collapsed quasi-skeleton of (cid:126)P(cid:96), the output P = (V, E) is the quasi-skeleton of (cid:126)P(cid:96).\nFurthermore, HCLA also outputs, for each pair yi, y j \u2208 V, the relation I(yi,\u2205, y j) if and only if the\npath connecting yi and y j in (cid:126)P(cid:96) contains an inverted fork.\n\n7\n\n\fya, yb \u2208(cid:83)\n\nAlgorithm 2 Hidden Cluster Learning Algorithm\nInput the collapsed quasi-skeleton of a minimal polytree (cid:126)P(cid:96), collapsed quasi-skeletons of the\nrooted subtrees Ti = (Vi, Li, Ei) for i = 1, ..., nr, and the set of the hidden clusters P = {C1, ..., CnC}\nOutput P and the independence relations of the form I(ya,\u2205, yb) or \u00acI(ya,\u2205, yb) for all nodes\n\ni Vi\nCall Hidden Node Detection Procedure(C1) where C1 is the \ufb01rst element of P\n\nCompute \u02dcVi = Vi \u2229 N(C)\nFind \u02dcVr which satis\ufb01es | \u02dcVr \\ \u02dcVr(cid:48)| > 1 or | \u02dcVr(cid:48) \\ \u02dcVr| \u2264 1 for all r(cid:48) (cid:44) r (as in Theorem 17)\nInitialize W := \u02dcVr, W := \u2205, and I := {r}\nfor all i = 1, ..., nr with i (cid:44) r do\n\nif | \u02dcVr \\ \u02dcVi| = 1 and | \u02dcVi \\ \u02dcVr| = 1 (as in Lemma 18) then\nelse\n\n1: while P (cid:44) \u2205 do\n2:\n3: end while\n4: procedure Hidden Node Detection(C)\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n\nW := W \u222a \u02dcVi and I := I \u222a {i}\nW := W \u222a \u02dcVi\n\nend if\n\nend for\nA new hidden node yh is revealed\nAdd yh to all the rooted trees Ti with i \u2208 I, namely Vi := Vi \u222a {yh}\n(cid:111)\nAdd the independence relation \u00acI(yh,\u2205, y) for all y \u2208 Vi with i \u2208 I, and add the independence\n\nLink all nodes in W \\ W to yh in all Ti with i \u2208 I, namely Ei := Ei \u222a(cid:110){yh, y} | y \u2208 W \\ W\n\nrelation I(yh,\u2205, y) for all other nodes y\n\nfor all i \u2208 I do\n\n18:\n19:\n20:\n21:\n22:\n23:\n24:\n25:\n26:\n27:\n28:\n29: end procedure\n\ncreate nk = |W \u2229 W| new clusters: C(i)\nlink yh to C(i)\n1 , ..., C(i)\nnk\n1 , ..., C(i)\nlink each cluster C(i)\nend for\nwhile \u2203ya, yb \u2208 N(C(i)\nj ) \u222a N(C(i)\n\nnk to a distinct element in W \u2229 W\nk ) such that ya, yb \u2208 \u02dcVm where m (cid:60) I do\n\n1 , ..., C(i)\nnk\n\nmerge the two hidden clusters C(i)\nupdate the structure of Ti with the new hidden clusters\n\nj and C(i)\nk\n\nend while\nLet P = (V,P, E) be the output of HCMA applied to Ti = (Vi, Li, Ei), for i = 1, ..., nr\n\n3.5 Task 5: Obtain the pattern of the latent polytree from the recovered quasi-skeleton of\n\nthe latent polytree (recover type-II hidden nodes and edge orientations)\n\nOnce the quasi-skeleton of the latent polytree has been obtained, the only missing nodes to recover\nthe full skeleton are the type-II hidden nodes of the original polytree. Interestingly, the detection\nof such hidden nodes can be performed concurrently with the recovery of the edge orientations. In\nparticular, we apply Rebane and Pearl\u2019s algorithm in [13] to orient the edges of the quasi-skeleton\nof the polytree. Then, we have that the edges receiving double orientations imply the presence of\na type-II hidden node between the two linked nodes. Thus, the Hidden Root Recovery Algorithm\n(HRRA), presented in Algorithm 3, is simply an implementation of Rebane and Pearl\u2019s algorithm\n(Steps 1-4), as depicted in Figure 4 (Task 5a), with the additional detection of type-II hidden nodes\n(Steps 5-10).\nAs a consequence, we have this \ufb01nal result stated in Theorem 20 to prove that HRRA outputs the\npattern of the latent polytree. See Supplementary Material for the proof of this theorem.\nTheorem 20. Let (cid:126)P(cid:96) be a minimal latent polytree. When the input is the quasi-skeleton of (cid:126)P(cid:96) with\nthe independence statements of the form I(yi,\u2205, y j) or \u00acI(yi,\u2205, y j) for all the pairs of nodes yi and\ny j, the output of HRRA is the pattern of (cid:126)P(cid:96).\n\nFor a complete step by step example of this algorithm see the Supplementary Material.\n\n8\n\n\f(Task 4a)\n\n(Task 4b)\n\n(Task 4c)\n\n(Task 4d)\n\n(Task 5a)\n\n(Task 5b)\n\nFigure 4: The closure of the hidden cluster CA(cid:48) of the latent polytree in Figure 3 (True) obtained\nafter Task 3 (Task 4a), the hidden clusters obtained after Step 23 of HCLA (Task 4b), merging of\nthe overlapping hidden clusters as in Steps 24-27 of HCLA (Task 4c), merging of the overlapping\nhidden clusters as in Step 28 of HCLA (Task 4d), orienting the edges in the quasi-skeleton of the\nlatent polytree as in Steps 1-4 of HRRA (Task 5a), and recovering type-II hidden nodes (Task 5b).\n\nInput P = (V, E), the quasi-skeleton of a latent polytree, and the independence relations of the\n\nAlgorithm 3 Hidden Root Recovery Algorithm\nform I(yi,\u2205, y j) or \u00acI(yi,\u2205, y j) for all nodes yi, y j \u2208 V\nOutput the partially directed polytree \u00afP = (V, E, (cid:126)E)\n\nif yi \u2212 yk, y j \u2212 yk \u2208 E and I(yi,\u2205, y j), then add yi \u2192 yk and y j \u2192 yk to (cid:126)E\nif yi \u2192 yk \u2208 (cid:126)E, yk \u2212 y j \u2208 E and \u00acI(yi,\u2205, y j), then add yk \u2192 y j to (cid:126)E\n\n1: while additional edges are oriented do\n2:\n3:\n4: end while\n5: Remove the edges that are oriented in (cid:126)E from E\n6: for all yi, y j such that yi \u2192 y j, y j \u2192 yi \u2208 (cid:126)E do\n7:\n8:\n9:\n10:\n11: end for\n\na new hidden node of Type-II is detected which is a parent of yi and y j\nremove yi \u2192 y j, y j \u2192 yi from (cid:126)E\nadd a new node yh to V\nadd yh \u2192 y j, yh \u2192 yi to (cid:126)E\n\n4 Conclusions and Discussion\n\nWe have provided an algorithm to reconstruct the pattern of a latent polytree graphical model. The\nalgorithm only requires the second and third order statistics of the observed variables and no prior\ninformation about the number and location of the hidden nodes is assumed. An important property of\nthe proposed approach is that the algorithm is sound under speci\ufb01c degree conditions on the hidden\nvariables. If such degree conditions are not met, it is shown, in the Supplementary Material, that there\nexists another latent polytree with fewer number of hidden nodes entailing the same independence\nrelations. In this sense, the proposed algorithm always recover a minimal graphical model in the\nsense of hidden nodes following a form of Occam\u2019s razor principle. Future work will study how\nthis algorithm performs under limited amount of data and how to deal with situations when the\nmeasurements are not exact.\n\nAcknowledgments\n\nThis work has been partially supported by NSF (CNS CAREER #1553504).\n\n9\n\n1113121411715162CA(cid:48)51069115171516141514171691015161412171312CF(cid:48)CH(cid:48)CI(cid:48)13126CC(cid:48)h2C(1)1C(1)2C(1)3C(1)4C(1)5C(1)6115171516141514171691015161412171312CF(cid:48)CH(cid:48)CI(cid:48)13126CC(cid:48)h2C(1)1C(1)2115910151614121713126h2CA(cid:48)(cid:48)CB(cid:48)(cid:48)2710h451h24638h5h7h69h3111715161413124182710h451h1h24638h5h7h69h311171516141312h1418\fReferences\n[1] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition, 2009.\n\n[2] Thomas Richardson and Peter Spirtes. Ancestral graph markov models. The Annals of Statistics, 30(4):962\u2013\n\n1030, 2002.\n\n[3] Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent\n\nconfounders and selection bias. Arti\ufb01cial Intelligence, 172(16):1873 \u2013 1896, 2008.\n\n[4] Guangdi Li, Anne-Mieke Vandamme, and Jan Ramon. Learning ancestral polytrees. 2014. The Workshop\nof Learning Tractable Probabilistic Models at The 31st International Conference on Machine Learning\n(ICML 2014), Date: 2014/01/01 - 2014/01/01, Location: Beijing, China.\n\n[5] P\u00e9ter L Erd\u00f6s, Michael A Steel, L\u00e1szl\u00f3A Sz\u00e9kely, and Tandy J Warnow. A few logs su\ufb03ce to build (almost)\n\nall trees: Part ii. Theoretical Computer Science, 221(1-2):77\u2013118, 1999.\n\n[6] Animashree Anandkumar and Ragupathyraj Valluvan. Learning loopy graphical models with latent\n\nvariables: E\ufb03cient methods and guarantees. The Annals of Statistics, 41(2):401\u2013435, 2013.\n\n[7] Elchanan Mossel. Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Biol.\n\nBioinformatics, 4(1):108\u2013116, January 2007.\n\n[8] Yacine Jernite, Yonatan Halpern, and David Sontag. Discovering hidden variables in noisy-or networks\nusing quartet tests. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger,\neditors, Advances in Neural Information Processing Systems 26, pages 2355\u20132363. Curran Associates,\nInc., 2013.\n\n[9] Myung Jin Choi, Vincent YF Tan, Animashree Anandkumar, and Alan S Willsky. Learning latent tree\n\ngraphical models. Journal of Machine Learning Research, 12:1771\u20131812, 2011.\n\n[10] Rapha\u00ebl Mourad, Christine Sinoquet, Nevin Lianwen Zhang, Tengfei Liu, and Philippe Leray. A survey on\n\nlatent tree models and applications. Journal of Arti\ufb01cial Intelligence Research, 47:157\u2013203, 2013.\n\n[11] Reinhard Diestel. Graph Theory. Springer, 5th edition, 2010.\n\n[12] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. The MIT Press,\n\n2009.\n\n[13] G. Rebane and J. Pearl. The recovery of causal polytrees from statistical data. In Proceedings of the Third\n\nConference on Uncertainty Arti\ufb01cial Intelligence, pages 222\u2013228, 1987.\n\n[14] T. Verma and J. Pearl. Causal Networks: Semantics and Expressiveness.\n\nWorkshop on Uncertainty in Arti\ufb01cial Intelligence, pages 352\u2013359, 1988.\n\nIn Proceedings of the 4th\n\n[15] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, prediction, and search, volume 81. The MIT Press,\n\n2000.\n\n[16] F. Sepehr and D. Materassi. Learning networks of linear dynamic systems with tree topologies. IEEE\n\nTransactions on Automatic Control, 2019. doi: 10.1109/TAC.2019.2915153.\n\n[17] D. Materassi. Reconstructing tree structures of dynamic systems with hidden nodes under nonlinear\ndynamics. In 2016 24th Mediterranean Conference on Control and Automation (MED), pages 1331\u20131336,\nJune 2016.\n\n10\n\n\f", "award": [], "sourceid": 8657, "authors": [{"given_name": "Firoozeh", "family_name": "Sepehr", "institution": "University of Tennessee"}, {"given_name": "Donatello", "family_name": "Materassi", "institution": "University of Minnesota"}]}