{"title": "Efficient Projection onto the Perfect Phylogeny Model", "book": "Advances in Neural Information Processing Systems", "page_first": 4104, "page_last": 4114, "abstract": "Several algorithms build on the perfect phylogeny model to infer evolutionary trees. This problem is particularly hard when evolutionary trees are inferred from the fraction of genomes that have mutations in different positions, across different samples. Existing algorithms might do extensive searches over the space of possible trees. At the center of these algorithms is a projection problem that assigns a fitness cost to phylogenetic trees. In order to perform a wide search over the space of the trees, it is critical to solve this projection problem fast. In this paper, we use Moreau's decomposition for proximal operators, and a tree reduction scheme, to develop a new algorithm to compute this projection. Our algorithm terminates with an exact solution in a finite number of steps, and is extremely fast. In particular, it can search over all evolutionary trees with fewer than 11 nodes, a size relevant for several biological problems (more than 2 billion trees) in about 2 hours.", "full_text": "Ef\ufb01cient Projection onto the Perfect Phylogeny Model\n\nBei Jia\u2217\n\njiabe@bc.edu\n\nSurjyendu Ray\nraysc@bc.edu\n\nBoston College\n\nAbstract\n\nSam Safavi\n\nsafavisa@bc.edu\n\nJos\u00e9 Bento\n\njose.bento@bc.edu\n\nSeveral algorithms build on the perfect phylogeny model to infer evolutionary trees.\nThis problem is particularly hard when evolutionary trees are inferred from the\nfraction of genomes that have mutations in different positions, across different\nsamples. Existing algorithms might do extensive searches over the space of possible\ntrees. At the center of these algorithms is a projection problem that assigns a \ufb01tness\ncost to phylogenetic trees. In order to perform a wide search over the space of\nthe trees, it is critical to solve this projection problem fast. In this paper, we use\nMoreau\u2019s decomposition for proximal operators, and a tree reduction scheme, to\ndevelop a new algorithm to compute this projection. Our algorithm terminates with\nan exact solution in a \ufb01nite number of steps, and is extremely fast. In particular, it\ncan search over all evolutionary trees with fewer than 11 nodes, a size relevant for\nseveral biological problems (more than 2 billion trees) in about 2 hours.\n\nIntroduction\n\n1\nThe perfect phylogeny model (PPM) [1, 2] is used in biology to study evolving populations. It\nassumes that the same position in the genome never mutates twice, hence mutations only accumulate.\nConsider a population of organisms evolving under the PPM. The evolution process can be described\nby a labeled rooted tree, T = (r,V,E), where r is the root, i.e., the common oldest ancestor, the\nnodes V are the mutants, and the edges E are mutations acquired between older and younger mutants.\nSince each position in the genome only mutates once, we can associate with each node v (cid:54)= r, a\nunique mutated position, the mutation associated to the ancestral edge of v. By convention, let us\nassociate with the root r, a null mutation that is shared by all mutants in T . This allows us to refer\nto each node v \u2208 V as both a mutation in a position in the genome (the mutation associated to the\nancestral edge of v), and a mutant (the mutant with the fewest mutations that has a mutation v).\nHence, without loss of generality, V = {1, . . . , q}, E = {2, . . . , q}, where q is the length of the\ngenome, and r = 1 refers to both the oldest common ancestor and the null mutation shared by all.\nOne very important use of the PPM is to infer how mutants of a common ancestor evolve [3\u20138].\nA common type of data used for this purpose is the frequency, with which different positions in\nthe genome mutate across multiple samples, obtained, e.g., from whole-genome or targeted deep\nsequencing [9]. Consider a sample s, one of p samples, obtained at a given stage of the evolution\nprocess. This sample has many mutants, some with the same genome, some with different genomes.\nLet F \u2208 Rq\u00d7p be such that Fv,s is the fraction of genomes in s with a mutation in position v in the\ngenome. Let M \u2208 Rq\u00d7p be such that Mv,s is the fraction of mutant v in s. By de\ufb01nition, the columns\nof M must sum to 1. Let U \u2208 {0, 1}q\u00d7q be such that Uv,v(cid:48) = 1, if and only if mutant v is an ancestor\nof mutant v(cid:48), or if v = v(cid:48). We denote the set of all possible U matrices, M matrices and labeled\nrooted trees T , by U, M and T , respectively. See Figure 1 for an illustration. The PPM implies\n(1)\nOur work contributes to the problem of inferring clonal evolution from mutation-frequencies: How\ndo we infer M and U from F ? Note that \ufb01nding U is the same as \ufb01nding T (see Lemma B.2).\n\nF = U M.\n\n\u2217Bei Jia is currently with Element AI.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: Black lines are genomes. Red circles indicate mutations. gi is the mutant with fewest mutations\nwith position i mutated. Mutation 1, the mutation in the null position, i = 1, is shared by all mutants. g1 is the\norganism before mutant evolution starts. In sample s = 3, 2/10 of the mutants are type g2, hence M2,3 = 2/10,\nand 3/10 of the mutations occur in position 7, hence F7,3 = 3/10. The tree shows the mutants\u2019 evolution.\nAlthough model (1) is simple, simultaneously inferring M and U from F can be hard [3]. One\npopular inference approach is the following optimization problem over U, M and F ,\n\n(2)\n\nmin\n\nU\u2208U C(U ),\n\nC(U ) = min\n\nM,F\u2208Rq\u00d7p (cid:107) \u02c6F \u2212 F(cid:107) subject to F = U M, M \u2265 0, M(cid:62)1 = 1,\n\n(3)\nwhere (cid:107) \u00b7 (cid:107) is the Frobenius norm, and \u02c6F \u2208 Rq\u00d7p contains the measured fractions of mutations\nper position in each sample, which are known and \ufb01xed. In a nutshell, we want to project our\nmeasurement \u02c6F onto the space of valid PPM models.\nProblem (2) is a hard mixed integer-continuous optimization problem. To approximately solve it,\nwe might \ufb01nd a \ufb01nite subset {Ui} \u2282 U, that corresponds to a \u201cheuristically good\u201d subset of trees,\n{Ti} \u2282 T , and, for each \ufb01xed matrix Ui, solve (3), which is a convex optimization problem. We can\nthen return Tx, where x \u2208 arg mini C(Ui). Fortunately, in many biological applications, e.g., [3\u20138],\nthe reconstructed evolutionary tree involves a very small number of mutated positions, e.g., q \u2264 11.\nIn practice, a position v might be an effective position that is a cluster of multiple real positions in the\ngenome. For a small q, we can compute C(U ) for many trees, and hence approximate M, U, and get\nuncertainty measures for these estimates. This is important, since data is generally scarce and noisy.\nContributions: (i) we propose a new algorithm to compute C(U ) exactly in O(q2p) steps, the \ufb01rst\nnon-iterative algorithm to compute C(U ); (ii) we compare its performance against state-of-the-art\niterative algorithms, and observe a much faster convergence. In particular, our algorithm scales much\nfaster than O(q2p) in practice; (iii) we implement our algorithm on a GPU, and show that it computes\nthe cost of all (more than 2 billion) trees with \u2264 11 nodes, in \u2264 2.5 hours.\n2 Related work\nA problem related to ours, but somewhat different, is that of inferring a phylogenetic tree from\nsingle-cell whole-genome sequencing data. Given all the mutations in a set of mutants, the problem is\nto arrange the mutants in a phylogenetic tree, [10, 11]. Mathematically, this corresponds to inferring\nT from partial or corrupted observation of U. If the PPM is assumed, and all the mutations of all the\nmutants are correctly observed, this problem can be solved in linear time, e.g., [12]. In general, this\nproblem is equivalent to \ufb01nding a minimum cost Steiner tree on a hypercube, whose nodes and edges\nrepresent mutants and mutations respectively, a problem known to be hard [13].\nWe mention a few works on clonality inference, based on the PPM, that try to infer both U and M\nfrom \u02c6F . No previous work solves problem (2) exactly in general, even for trees of size q \u2264 11. Using\nour fast projection algorithm, we can solve (2) exactly by searching over all trees, if q \u2264 11. Ref. [3]\n(AncesTree) reduces the space of possible trees T to subtrees of a heuristically constructed DAG.\nThe authors use the element-wise 1-norm in (3) and, after introducing more variables to linearize the\nproduct U M, reduce this search to solving a MILP, which they try to solve via branch and bound.\nRef. [6] (CITUP) searches the space of all unlabeled trees, and, for each unlabeled tree, tries to solve\nan MIQP, again using branch and bound techniques, which \ufb01nds a labeling for the unlabeled tree,\nand simultaneously minimizes the distance (cid:107) \u02c6F \u2212 F(cid:107). Refs. [5] and [14] (PhyloSub/PhyloWGS), use\na stochastic model to sample trees that are likely to explain the data. Their model is based on [15],\nwhich generates hierarchical clusterings of objects, and from which lineage trees can be formed. A\nscore is then computed for these trees, and the highest scoring trees are returned.\nProcedure (2) can be justi\ufb01ed as MLE if we assume the stochastic model \u02c6F = F + N (0, I\u03c32),\nwhere F , U and M satisfy the PPM model, and N (0, I\u03c32) represents additive, component-wise,\n\n2\n\ng1 g3 s = 1 s = 2 g1 g3 g6 g7 g2 g4 g1 g3 g1 g7 g6 g3 g1 g7 g6 g3 g5 g2 g4 Evolutionary tree, T g2 g1 g1 2666666664F1,3F2,3F3,3F4,3F5,3F6,3F7,33777777775=266666666410/102/105/101/101/102/103/103777777775s = 3 (M1,3,\u00b7\u00b7\u00b7,M7,3)=(2,2,2,1,1,1,1)g5 U=266666666411111110100000011000100010000000100000101001000013777777775/10\fGaussian measurement noise, with zero mean and covariance I\u03c32. Alternative stochastic models can\nbe assumed, e.g., as M \u2212 U\u22121 \u02c6F = N (0, I\u03c32), where M is non-negative and its columns must sum\nto one, and N (0, I\u03c32) is as described before. For this model, and for each matrix U, the cost C(U ) is\na projection of U\u22121 \u02c6F onto the probability simplex M \u2265 0, M(cid:62)1 = 1. Several fast algorithms are\nknown for this problem, e.g., [16\u201320] and references therein. In a pq-dimensional space, the exact\nprojection onto the simplex can be done in O(qp) steps.\nOur algorithm is the \ufb01rst to solve (3) exactly in a \ufb01nite number of steps. We can also use iterative\nmethods to solve (3). One advantage of our algorithm is that it has no tuning parameters, and requires\nno effort to check for convergence for a given accuracy. Since iterative algorithms can converge\nvery fast, we numerically compare the speed of our algorithm with different implementations of\nthe Alternating Direction Method of Multipliers (ADMM) [21], which, if properly tuned, has a\nconvergence rate that equals the fastest convergence rate among all \ufb01rst order methods [22] under\nsome convexity assumptions, and is known to produce good solutions for several other kinds of\nproblems, even for non-convex ones [23\u201329].\n3 Main results\nWe now state our main results, and explain the ideas behind their proofs. Detailed proofs can be\nfound in the Appendix.\nOur algorithm computes C(U ) and minimizers of (3), resp. M\u2217 and F \u2217, by solving an equivalent\nproblem. Without loss of generality, we assume that p = 1, since, by squaring the objective in (3), it\ndecomposes into p independent problems. Sometimes we denote C(U ) by C(T ), since given U, we\ncan specify T , and vice-versa. Let \u00afi be the closest ancestor of i in T = (r,V,E). Let \u2206i be the set of\nall the ancestors of i in T , plus i. Let \u2202i be the set of children of i in T .\nTheorem 3.1 (Equivalent formulation). Problem (3) can be solved by solving\n\n(Zi \u2212 Z\u00afi)2 subject to Zi \u2264 t \u2212 Ni ,\u2200i \u2208 V,\n\n(4)\n(5)\n\nmin\n\nt\u2208R t + L(t),\n\nL(t) = min\nZ\u2208Rq\n\n1\n\n2(cid:88)i\u2208V\n\nM\u2217i = \u2212Z\u2217i + Z\u2217\u00afi +(cid:88)r\u2208\u2202i\n\nwhere Ni =(cid:80)j\u2208\u2206i\n\nminimizes (5) for t = t\u2217, and M\u2217, F \u2217 minimize (3), then\n\n\u02c6Fj, and, by convention, Z\u00afi = 0 for i = r. In particular, if t\u2217 minimizes (4), Z\u2217\n\n(Z\u2217r \u2212 Z\u2217\u00afr ) and F \u2217i = \u2212Z\u2217i + Z\u2217\u00afi ,\u2200i \u2208 V.\n\n(6)\n\nFurthermore, t\u2217, M\u2217, F \u2217 and Z\u2217 are unique.\nTheorem 3.1 comes from a dual form of (3), which we build using Moreau\u2019s decomposition [30].\n3.1 Useful observations\n\nLet Z\u2217(t) be the unique minimizer of (5) for some t. The main ideas behind our algorithm depend\non a few simple properties of the paths {Z\u2217(t)} and {L(cid:48)(t)}, the derivative of L(t) with respect to t.\nNote that L is also a function of N, as de\ufb01ned in Theorem 3.1, which depends on the input data \u02c6F .\nLemma 3.2. L(t) is a convex function of t and N. Furthermore, L(t) is continuous in t and N, and\nL(cid:48)(t) is non-decreasing with t.\nLemma 3.3. Z\u2217(t) is continuous as a function of t and N. Z\u2217(t\u2217) is continuous as a function of N.\nLet B(t) = {i : Z\u2217(t)i = t \u2212 Ni}, i.e., the set of components of the solution at the boundary of (5).\nVariables in B are called \ufb01xed, and we call other variables free. Free (resp. \ufb01xed) nodes are nodes\ncorresponding to free (resp. \ufb01xed) variables.\nLemma 3.4. B(t) is piecewise constant in t.\nConsider dividing the tree T = (r,V,E) into subtrees, each with at least one free node, using B(t)\nas separation points. See Figure 4 in Appendix A for an illustration. Each i \u2208 B(t) belongs to at\nmost degree(i) different subtrees, where degree(i) is the degree of node i, and each i \u2208 V\\B(t)\nbelongs exactly to one subtree. Let T1, . . . , Tk be the set of resulting (rooted, labeled) trees. Let\nTw = (rw,Vw,Ew), where the root rw is the closest node in Tw to r. We call {Tw} the subtrees\ninduced by B(t). We de\ufb01ne Bw(t) = B(t) \u2229 Vw, and, when it does not create ambiguity, we drop the\nindex t in Bw(t). Note that different Bw(t)\u2019s might have elements in common. Also note that, by\nconstruction, if i \u2208 Bw, then i must be a leaf of Tw, or the root of Tw.\n\n3\n\n\fDe\ufb01nition 3.5. The (Tw,Bw)-problem is the optimization problem over |Vw\\B(t)| variables\n\n(7)\n\nmin\n\n{Zj :j\u2208Vw\\B(t)}\n\n(1/2) (cid:88)j\u2208Vw\n\n(Zj \u2212 Z\u00afj)2,\n\nd(t + L(t))/dt = 0|t=t\u2217 \u21d4 L(cid:48)(t\u2217) = \u22121.\n\nwhere \u00afj is the parent of j in Tw, Z\u00afj = 0 if j = rw, and Zj = Z\u2217(t)j = t \u2212 Nj if j \u2208 Bw(t).\nLemma 3.6. Problem (5) decomposes into k independent problems. In particular, the minimizers\n{Z\u2217(t)j : j \u2208 Vw\\B(t)} are determined as the solution of the (Tw,Bw)-problem. If j \u2208 Vw, then\nZ\u2217(t)j = c1t + c2 , where c1 and c2 depend on j but not on t, and 0 \u2264 c1 \u2264 1.\nLemma 3.7. Z\u2217(t) and L(cid:48)(t) are piecewise linear and continuous in t. Furthermore, Z\u2217(t) and\nL(cid:48)(t) change linear segments if and only if B(t) changes.\nLemma 3.8. If t \u2264 t(cid:48), then B(t(cid:48)) \u2286 B(t). In particular, B(t) changes at most q times with t.\nLemma 3.9. Z\u2217(t) and L(cid:48)(t) have less than q + 1 different linear segments.\n3.2 The Algorithm\nIn a nutshell, our algorithm computes the solution path {Z\u2217(t)}t\u2208R and the derivative {L(cid:48)(t)}t\u2208R.\nFrom these paths, it \ufb01nds the unique t\u2217, at which\n(8)\nIt then evaluates the path Z\u2217(t) at t = t\u2217, and uses this value, along with (6), to \ufb01nd M\u2217 and F \u2217, the\nunique minimizers of (3). Finally, we compute C(T ) = (cid:107) \u02c6F \u2212 F \u2217(cid:107).\nWe know that {Z\u2217(t)} and {L(cid:48)(t)} are continuous piecewise linear, with a \ufb01nite number of different\nlinear segments (Lemmas 3.7, 3.8 and 3.9). Hence, to describe {Z\u2217(t)} and {L(cid:48)(t)}, we only need\nto evaluate them at the critical values, t1 > t2 > \u00b7\u00b7\u00b7 > tk, at which Z\u2217(t) and L(cid:48)(t) change linear\nsegments. We will later use Lemma 3.7 as a criteria to \ufb01nd the critical values. Namely, {ti} are\nthe values of t at which, as t decreases, new variables become \ufb01xed, and B(t) changes. Note that\nvariables never become free once \ufb01xed, by Lemma 3.8, which also implies that k \u2264 q.\nThe values {Z\u2217(ti)} and {L(cid:48)(ti)} are computed sequentially as follows. If t is very large, the\nconstraint in (5) is not active, and Z\u2217(t) = L(t) = L(cid:48)(t) = 0. Lemma 3.7 tells us that, as we\ndecrease t, the \ufb01rst critical value is the largest t for which this constraint becomes active, and at which\nB(t) changes for the \ufb01rst time. Hence, if i = 1, we have ti = maxs{Ns}, Z\u2217(ti) = L(cid:48)(ti) = 0, and\nB(ti) = arg maxs{Ns}. Once we have ti, we compute the rates Z(cid:48)\u2217(ti) and L(cid:48)(cid:48)(ti) from B(ti) and\nT , as explained in Section 3.3. Since the paths are piecewise linear, derivatives are not de\ufb01ned at\ncritical points. Hence, here, and throughout this section, these derivatives are taken from the left, i.e.,\nZ(cid:48)\u2217(ti) = limt\u2191ti(Z\u2217(ti) \u2212 Z\u2217(t))/(ti \u2212 t) and L(cid:48)(cid:48)(ti) = limt\u2191ti(L(cid:48)(ti) \u2212 L(cid:48)(t))/(ti \u2212 t).\nSince Z(cid:48)\u2217(t) and L(cid:48)(cid:48)(t) are constant for t \u2208 (ti+1, ti], for t \u2208 (ti+1, ti] we have\n\nZ\u2217(t)r = Z\u2217(ti)r + (t \u2212 ti)Z(cid:48)\u2217(ti)r = t \u2212 Nr,\n\nZ\u2217(t) = Z\u2217(ti) + (t \u2212 ti)Z(cid:48)\u2217(ti), L(cid:48)(t) = L(cid:48)(ti) + (t \u2212 ti)L(cid:48)(cid:48)(ti),\n\n(9)\nand the next critical value, ti+1, is the largest t < ti, for which new variables become \ufb01xed, and B(t)\nchanges. The value ti+1 is found by solving for t < ti in\n(10)\nand keeping the largest solution among all r /\u2208 B. Once ti+1 is computed, we update B with the new\nvariables that became \ufb01xed, and we obtain Z\u2217(ti+1) and L(cid:48)(ti+1) from (9). The process then repeats.\nBy Lemma 3.2, L(cid:48) never increases. Hence, we stop this process (a) as soon as L(cid:48)(ti) < \u22121, or (b)\nwhen all the variables are in B, and thus there are no more critical values to compute. If (a), let tk\nbe the last critical value with L(cid:48)(tk) > \u22121, and if (b), let tk be the last computed critical value. We\nuse tk and (9) to compute t\u2217, at which L(cid:48)(t\u2217) = \u22121 and also Z\u2217(t\u2217). From Z\u2217(t\u2217) we then compute\nM\u2217 and F \u2217 and C(U ) = (cid:107) \u02c6F \u2212 F \u2217(cid:107).\nThe algorithm is shown compactly in Alg. 1. Its inputs are \u02c6F and T , represented, e.g., using a linked-\nnodes data structure. Its outputs are minimizers to (3). It makes use of a procedure ComputeRates,\nwhich we will explain later. This procedure terminates in O(q) steps and uses O(q) memory. Line 5\ncomes from solving (10) for t. In line 14, the symbols M\u2217(Z\u2217, T ) and F \u2217(Z\u2217, T ) remind us that M\u2217\nand F \u2217 are computed from Z\u2217 and T using (6). The correctness of Alg. 1 follows from the Lemmas\nin Section 3.1, and the explanation above. In particular, since there are at most q + 1 different linear\nregimes, the bound q in the for-loop does not prevent us from \ufb01nding any critical value. Its time\ncomplexity is O(q2), since each line completes in O(q) steps, and is executed at most q times.\n\n4\n\n\fTheorem 3.10 (Complexity). Algorithm 1 \ufb01nishes in O(q2) steps, and requires O(q) memory.\nTheorem 3.11 (Correctness). Algorithm 1 outputs the solution to (3).\nAlgorithm 1 Projection onto the PPM (input: T and \u02c6F ; output: M\u2217 and F \u2217)\n\n(cid:48)\u2217\n\n(ti)r\n\n(cid:46) Update rates of change\n\nif r /\u2208 B(ti), tr < ti, and Pr = \u2212\u221e otherwise}\n\n\u02c6Fj, for all i \u2208 V (cid:46) This takes O(q) steps using a DFS, see proof of Theorem 3.10\n(cid:46) Initialize\n\n(Z(cid:48)\u2217(ti),L(cid:48)(cid:48)(ti)) = ComputeRates(B(ti), T )\n\u2217\n(ti)r\u2212tiZ\nP = {Pr : Pr = Nr+Z\n1\u2212Z(cid:48)\u2217(ti)r\nti+1 = maxr Pr\nB(ti+1) = B(ti) \u222a arg maxr Ps\nZ\u2217(ti+1) = Z\u2217(ti) + (ti+1 \u2212 ti)Z(cid:48)\u2217(ti)\nL(cid:48)(ti+1) = L(cid:48)(ti) + (ti+1 \u2212 ti)L(cid:48)(cid:48)(ti)\nif L(cid:48)(ti+1) < \u22121 then break\n\n1: Ni =(cid:80)j\u2208\u2206i\n2: i = 1, ti = maxr{Nr}, B(ti) = arg maxr{Nr}, Z\u2217(ti) = 0, L(cid:48)(ti) = 0.\n3: for i = 1 to q do\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11: end for\n12: t\u2217 = ti \u2212 1+L\n13: Z\u2217 = Z\u2217(ti) + (t\u2217 \u2212 ti)Z(cid:48)\u2217(ti)\n14: return M\u2217(Z\u2217, T ), F \u2217(Z\u2217, T )\n3.3 Computing the rates\nWe now explain how the procedure ComputeRates works. Recall that it takes as input the tree T and\nthe set B(ti), and it outputs the derivatives Z(cid:48)\u2217(ti) and L(cid:48)(cid:48)(ti).\nA simple calculation shows that if we compute Z(cid:48)\u2217(ti), then computing L(cid:48)(cid:48)(ti) is easy.\nLemma 3.12. L(cid:48)(cid:48)(ti) can be computed from Z(cid:48)\u2217(ti) in O(q) steps and with O(1) memory as\n\n(cid:46) Update next critical value from (9)\n(cid:46) Update list of \ufb01xed variables\n(cid:46) Update solution path\n(cid:46) Update objective\u2019s derivative\n(cid:46) If already passed by t\u2217, then exit the for-loop\n\n(cid:46) Find solution to (8)\n(cid:46) Find minimizers of (5) for t = t\u2217\n(cid:46) Return solution to (3) using (6), which takes O(q) steps\n\n(ti)\n\nL(cid:48)(cid:48)(ti)\n\n(cid:48)\n\n(11)\n\nL(cid:48)(cid:48)(ti) =(cid:88)j\u2208V\n\n(Z(cid:48)\u2217(ti)j \u2212 Z(cid:48)\u2217(ti)\u00afj)2,\n\nwhere \u00afj is the closest ancestor to j in T . We note that if j \u2208 B(ti), then, by de\ufb01nition,\nZ(cid:48)\u2217(ti)j = 1. Assume now that j \u2208 V\\B(ti). Lemma 3.6 implies we can \ufb01nd Z(cid:48)\u2217(ti)j by solving\nthe (Tw = (rw,Vw,Ew),Bw)-problem as a function of t, where w is such that j \u2208 Vw. In a nutshell,\nComputeRates is a recursive procedure to solve all the (Tw,Bw)-problems as an explicit function of t.\nIt suf\ufb01ces to explain how ComputeRates solves one particular (Tw,Bw)-problem explicitly. To\nsimplify notation, in the rest of this section, we refer to Tw and Bw as T and B. Recall that, by the\nde\ufb01nition of T = Tw and B = Bw, if i \u2208 B, then i must be a leaf of T , or the root of T .\nDe\ufb01nition 3.13. Consider a rooted tree T = (r,V,E), a set B \u2286 V, and variables {Zj : j \u2208 V} such\nthat, if j \u2208 B, then Zj = \u03b1jt + \u03b2j for some \u03b1 and \u03b2. We de\ufb01ne the (T,B, \u03b1, \u03b2, \u03b3)-problem as\n\nmin\n\n{Zj :j\u2208V\\B}\n\n1\n\n2(cid:88)j\u2208V\n\n\u03b3j(Zj \u2212 Z\u00afj)2,\n\n(12)\n\nwhere \u03b3 > 0, \u00afj is the closest ancestor to j in T , and Z\u00afj = 0 if j = r.\nWe refer to the solution of the (T,B, \u03b1, \u03b2, \u03b3)-problem as {Z\u2217j : j \u2208 V\\B}, which uniquely minimizes\n(12). Note that (12) is unconstrained and its solution, Z\u2217, is a linear function of t. Furthermore, the\n(Tw,Bw)-problem is the same as the (Tw,Bw, 1,\u2212N, 1)-problem, which is what we actually solve.\nWe now state three useful lemmas that help us solve any (T,B, \u03b1, \u03b2, \u03b3)-problem ef\ufb01ciently.\nLemma 3.14 (Pruning). Consider the solution Z\u2217 of the (T,B, \u03b1, \u03b2, \u03b3)-problem. Let j \u2208 V\\B be\na leaf. Then Z\u2217j = Z\u2217\u00afj . Furthermore, consider the ( \u02dcT,B, \u03b1, \u03b2, \u03b3)-problem, where \u02dcT = (\u02dcr, \u02dcV, \u02dcE) is\nequal to T with node j pruned, and let its solution be \u02dcZ\u2217. We have that Z\u2217i = \u02dcZ\u2217i , for all i \u2208 \u02dcV.\nLemma 3.15 (Star problem). Let T be a star such that node 1 is the center node, node 2 is the\nroot, and nodes 3, . . . , r are leaves. Let B = {2, . . . , r}. Let Z\u22171 \u2208 R be the solution of the\n(T,B, \u03b1, \u03b2, \u03b3)-problem. Then,\n\nZ\u22171 =(cid:18) \u03b31\u03b12 +(cid:80)r\n\u03b31 +(cid:80)r\n\ni=3 \u03b3r\u03b1r\n\ni=3 \u03b3r (cid:19) t +(cid:18) \u03b31\u03b22 +(cid:80)r\n\u03b31 +(cid:80)r\n\ni=3 \u03b3r\u03b2r\n\ni=3 \u03b3r (cid:19) .\n\nIn particular, to \ufb01nd the rate at which Z\u22171 changes with t, we only need to know \u03b1 and \u03b3, not \u03b2.\n\n(13)\n\n5\n\n\f\u02dc\u03b1j = (cid:80)r\n(cid:80)r\n\ni=1 \u03b3i\u03b1i\ni=1 \u03b3i\n\n, \u02dc\u03b2j = (cid:80)r\n(cid:80)r\n\ni=1 \u03b3i\u03b2i\ni=1 \u03b3i\n\n, \u02dc\u03b3j =\uf8eb\uf8ed(\u03b3j)\u22121 +(cid:32) r(cid:88)i=1\n\n\u03b3i(cid:33)\u22121\uf8f6\uf8f8\n\n\u22121\n\n,\n\n(14)\n\nLemma 3.16 (Reduction). Consider the (T,B, \u03b1, \u03b2, \u03b3)-problem such that j, \u00afj \u2208 V\\B, and such that\nj has all its children 1, . . . , r \u2208 B. Let Z\u2217 be its solution. Consider the ( \u02dcT, \u02dcB, \u02dc\u03b1, \u02dc\u03b2, \u02dc\u03b3) \u2212 problem,\nwhere \u02dcT = (\u02dcr, \u02dcV, \u02dcE) is equal to T with nodes 1, . . . , r removed, and \u02dcB = (B\\{1, . . . , r}) \u222a {j}. Let\n\u02dcZ\u2217 be its solution. If (\u02dc\u03b1i, \u02dc\u03b2i, \u02dc\u03b3i) = (\u03b1i, \u03b2i, \u03b3i) for all i \u2208 B\\{1, . . . , r}, and \u02dc\u03b1j, \u02dc\u03b2j and \u02dc\u03b3j satisfy\n\n\u2248\nT ,\n\n\u2248\n\nB,\n\n\u2248\n\u03b1,\n\n\u2248\n\u03b2,\n\n\u2248\n\n\u2248\n\u03b1it +\n\n\u2248\n\u03b2 are such that Z\u2217i (t) =\n\nB are all the neighbors of j. See Figure 2-(right).\n\nthen Z\u2217i = \u02dcZ\u2217i for all i \u2208 V\\{j}.\nLemma 3.15 and Lemma 3.16 allow us to recursively solve any (T,B, \u03b1, \u03b2, \u03b3)-problem, and obtain\nfor it an explicit solution of the form Z\u2217(t) = c1t + c2, where c1 and c2 do not depend on t.\nAssume that we have already repeatedly pruned T , by repeatedly invoking Lemma 3.14, such that,\nif i is a leaf, then i \u2208 B. See Figure 2-(left). First, we \ufb01nd some node j \u2208 V\\B such that all of\nits children are in B. If \u00afj \u2208 B, then \u00afj must be the root, and the (T,B, \u03b1, \u03b2, \u03b3)-problem must be\na star problem as in Lemma 3.15. We can use Lemma 3.15 to solve it explicitly. Alternatively, if\n\u00afj /\u2208 V\\B, then we invoke Lemma 3.16, and reduce the (T,B, \u03b1, \u03b2, \u03b3)-problem to a strictly smaller\n( \u02dcT, \u02dcB, \u02dc\u03b1, \u02dc\u03b2, \u02dc\u03b3)-problem, which we solve recursively. Once the ( \u02dcT, \u02dcB, \u02dc\u03b1, \u02dc\u03b2, \u02dc\u03b3)-problem is solved, we\nhave an explicit expression Z\u2217i (t) = c1it + c2i for all i \u2208 V\\{j}, and, in particular, we have an\nexplicit expression Z\u2217\u00afj (t) = c1\u00afjt + c2\u00afj. The only free variable of the (T,B, \u03b1, \u03b2, \u03b3)-problem to be\n\u2248\ndetermined is Z\u2217j (t). To compute Z\u2217j (t), we apply Lemma 3.15 to the (\n\u03b3)-problem, where\n\u2248\n\u03b3 are the components of \u03b3 corresponding to nodes that are neighbors of j, \u2248\nT is a star around j, \u2248\n\u03b1\n\u2248\nand\n\u03b2i for all i that are neighbors of j, and for which Z\u2217i (t) is already\nknown, and\nThe algorithm is compactly described in Alg. 2. It is slightly different from the description above for\ncomputational ef\ufb01ciency. Instead of computing Z\u2217(t) = c1t + c2, we keep track only of c1, the rates,\nand we do so only for the variables in V\\B. The algorithm assumes that the input T has been pruned.\nThe inputs T , B, \u03b1, \u03b2 and \u03b3 are passed by reference. They are modi\ufb01ed inside the algorithm but, once\nComputeRatesRec \ufb01nishes, they keep their initial values. Throughout the execution of the algorithm,\nT = (r,V,E) encodes (1) a doubly-linked list where each node points to its children and its parent,\nwhich we call T.a, and (b) a a doubly-linked list of all the nodes in V\\B for which all the children\nare in B, which we call T.b. In the proof of Theorem 3.17, we prove how this representation of T can\nbe kept updated with little computational effort. The input Y , also passed by reference, starts as an\nuninitialized array of size q, where we will store the rates {Z(cid:48)\u2217i }. At the end, we read Z(cid:48)\u2217 from Y .\nAlgorithm 2 ComputeRatesRec (input: T = (r,V,E),B, \u03b1, \u03b2, \u03b3, Y )\n1: Let j be some node in V\\B whose children are in B\n2: if \u00afj \u2208 B then\n3:\n4: else\n5:\n6:\n7:\n8:\n\n(cid:46) We read j from T.b in O(1) steps\nSet Yj using (13) in Lemma 3.15 (cid:46) If \u00afj \u2208 B, then the (T,B, \u03b1, \u03b2, \u03b3)-problem is star-shaped\nModify (T,B, \u03b1, \u03b2, \u03b3) to match ( \u02dcT, \u02dcB, \u02dc\u03b1, \u02dc\u03b2, \u02dc\u03b3) de\ufb01ned by Lemma 3.16 for j in line 1\nComputeRatesRec(T,B, \u03b1, \u03b2, \u03b3, Y ) (cid:46) Sets Yi = Z(cid:48)\u2217i\nRestore (T,B, \u03b1, \u03b2, \u03b3) to its original value before line 5 was executed\nCompute Yj from (13), using for \u03b1, \u03b2, \u03b3 in (13) the values \u2248\n\u03b1,\nponents of \u03b3 corresponding to nodes that are neighbors of j in T , and \u2248\nZ\u2217i =\n9: end if\n\n\u2248\n\u03b2i for all i that are neighbors of j in T , and for which Z\u2217i is already known\n\nfor all i \u2208 V\\B; Yj is not yet de\ufb01ned\n\u2248\n\u03b3, where \u2248\n\u03b3 are the com-\n\u2248\n\u03b2 are such that\n\n\u03b1 and\n\n\u2248\n\u03b2,\n\n\u2248\n\u03b1it +\n\nLet q be the number of nodes of the tree T that is the input at the zeroth level of the recursion.\nTheorem 3.17. Algorithm 2 correctly computes Z(cid:48)\u2217 for the (T,B, \u03b1, \u03b2, \u03b3)-problem, and it can be\nimplemented to \ufb01nish in O(q) steps, and to use O(q) memory.\nThe correctness of Algorithm 2 follows from Lemmas 3.14-3.16, and the explanation above. Its\ncomplexity is bounded by the total time spent on the two lines that actually compute rates during\n\n6\n\n\fFigure 2: Red squares represent \ufb01xed nodes, and black circles free nodes. (Left) By repeatedly invoking Lemma\n3.14, we can remove nodes 2, 3, and 4 from the original problem, since their associated optimal values are equal\nto the optimal value for node 1. (Right) We can compute the rates for all the free nodes of a subtree recursively\nby applying Lemma 3.16 and Lemma 3.15. We know the linear behavior of variables associated to red squares.\nthe whole recursion, lines 3 and 8. All the other lines only transform the input problem into a more\ncomputable form. Lines 3 and 8 solve a star-shaped problem with at most degree(j) variables,\nwhich, by inspecting (13), we know can be done in O(degree(j)) steps. Since, j never takes the same\nvalue twice, the overall complexity is bounded by O((cid:80)j\u2208V\ndegree(j)) = O(|E|) = O(q). The O(q)\nbound on memory is possible because all the variables that occupy signi\ufb01cant memory are being\npassed by reference, and are modi\ufb01ed in place during the whole recursive procedure.\nThe following lemma shows how the recursive procedure to solve a (T,B, \u03b1, \u03b2, \u03b3)-problem can\nbe used to compute the rates of change of Z\u2217(t) of a (T,B)-problem. Its proof follows from the\nobservation that the rate of change of the solution with t in (13) in Lemma 3.15 only depends on \u03b1\nand \u03b2, and that the reduction equations (14) in Lemma 3.16 never make \u03b1(cid:48) or \u03b3(cid:48) depend on \u03b2.\nLemma 3.18 (Rates only). Let Z\u2217(t) be the solution of the (T,B)-problem, and let \u02dcZ\u2217(t) be the\nsolution of the (T,B, 1, 0, 1)-problem. Then, Z\u2217(t) = c1t + c2, and \u02dcZ\u2217(t) = c1t for some c1 and c2.\nWe \ufb01nally present the full algorithm to compute Z(cid:48)\u2217(ti) and L(cid:48)(cid:48) \u2217 (ti) from T and B(ti).\nAlgorithm 3 ComputeRates (input: T and B(ti) output: Z(cid:48)\u2217(ti) and L(cid:48)(cid:48)(ti))\n1: Z(cid:48)\u2217(ti)j = 1 for all j \u2208 B(ti)\n2: for each (Tw,Bw)-problem induced by B(ti) do\n3:\n4:\n5:\n6: end for\n7: Compute L(cid:48)(cid:48)(ti) from Z(cid:48)\u2217(ti) using Lemma 3.12\n8: return Z(cid:48)\u2217(ti) and L(cid:48)(cid:48)(ti)\nThe following theorem follows almost directly from Theorem 3.17.\nTheorem 3.19. Alg. 3 correctly computes Z(cid:48)\u2217(ti) and L(cid:48)(cid:48)(ti) in O(q) steps, and uses O(q) memory.\n4 Reducing computation time in practice\n\nSet \u02dcTw to be Tw pruned of all leaf nodes in Bw, by repeatedly evoking Lemma 3.14\nComputeRatesRec( \u02dcTw, j,Bw, 1, 0, 1, \u02dcZ(cid:48)\u2217)\nZ(cid:48)\u2217(ti)j = \u02dcZ(cid:48)\u2217j for all j \u2208 Vw\\B\n\nOur numerical results are obtained for an improved version of Algorithm 1. We now explain the main\nidea behind this algorithm.\nThe bulk of the complexity of Alg. 1 comes from line 4, i.e., computing the rates {Z(cid:48)\u2217(ti)j}j\u2208V\\B(ti)\nfrom B(ti) and T . For a \ufb01xed j \u2208 V\\B(ti), and by Lemma 3.6, the rate Z(cid:48)\u2217(ti)j, depends only on\none particular (Tw = (rw,Vw,Ew),Bw)-problem induced by B(ti). If exactly this same problem is\ninduced by both B(ti) and B(ti+1), which happens if the new nodes that become \ufb01xed in line 7 of\nround i of Algorithm 1 are not in Vw\\Bw, then we can save computation time in round i + 1, by not\nrecomputing any rates for j \u2208 Vw\\Bw, and using for Z(cid:48)\u2217(ti+1)j the value Z(cid:48)\u2217(ti)j.\nFurthermore, if only a few {Z(cid:48)\u2217j } change from round i to round i + 1, then we can also save\ncomputation time in computing L(cid:48)(cid:48) from Z(cid:48)\u2217 by subtracting from the sum in the right hand side of\nequation (11) the terms that depend on the previous, now changed, rates, and adding new terms that\ndepend on the new rates.\n\n7\n\nZ\u21e41=Z\u21e42=Z\u21e43=Z\u21e44Z1Z2Z3Z4Z1Reduce Problem Z\u00afjZjZjZ\u21e4\u00afjUse to define the star problem Z\u21e4\u00afjSolve the star problem Z\u21e4\u00afjSolve big problem Solve smaller problem Z\u21e4jZ\u21e4\u00afj(T,B,\u21b5,,)(\u02dcT,\u02dcB,\u02dc\u21b5,\u02dc,\u02dc)(\u02dc\u02dcT,\u02dc\u02dcB,\u02dc\u02dc\u21b5,\u02dc\u02dc,\u02dc\u02dc)\fFinally, if the rate Z(cid:48)\u2217j does not change, then the value of t < ti at which Z\u2217j (t) might intersect t\u2212 Nj,\nand become \ufb01xed, given by Pj in line 5, also does not change. (Note that this is not obvious from\nthe formula for Pr in line 5). If not all {Pr} change from round i to round i + 1, we can also save\ncomputation time in computing the maximum, and maximizers, in line 7 by storing P in a maximum\nbinary heap, and executing lines 5 and 7 by extracting all the maximal values from the top of the\nheap. Each time any Pr changes, the heap needs to be updated.\n5 Numerical results\nOur algorithm to solve (3) exactly in a \ufb01nite number of steps is of interest in itself. Still, it is\ninteresting to compare it with other algorithms. In particular, we compare the convergence rate of our\nalgorithm with two popular methods that solve (3) iteratively: the Alternating Direction Method of\nMultipliers (ADMM), and the Projected Gradient Descent (PGD) method. We apply the ADMM,\nand the PGD, to both the primal formulation (3), and the dual formulation (4). We implemented all\nthe algorithms in C, and derived closed-form updates for ADMM and PG, see Appendix F. We ran\nall algorithms on a single core of an Intel Core i5 2.5GHz processor.\nFigure 5-(left) compares different algorithms for a random Galton\u2013Watson input tree truncated to\nhave q = 1000 nodes, with the number of children of each node chosen uniformly within a \ufb01xed\nrange, and for a random input \u02c6F \u2208 Rq, with entries chosen i.i.d. from a normal distribution. We\nobserve the same behavior for all random instances that was tested. We gave ADMM and PGD an\nadvantage by optimally tuning them for each individual problem-instance tested. In contrast, our\nalgorithm requires no tuning, which is a clear advantage. At each iteration, the error is measured\nas maxj{|Mj \u2212 M\u2217j |}. Our algorithm is about 74\u00d7 faster than its closest competitor (PGD-primal)\nfor 10\u22123 accuracy. In Figure 5-(right), we show the average run time of our algorithm versus the\nproblem size, for random inputs of the same form. The scaling of our algorithm is (almost) linear,\nand much faster than our O(q2p), p = 1, theoretical bound.\n\nFigure 3: (Left) Time that the different algorithms take to solve our problem for trees of with 1000 nodes.\n(Right) Average run time of our algorithm for problems of different sizes. For each size, each point is averaged\nover 500 random problem instances.\nFinally, we use our algorithm to exactly solve (2) by computing C(U ) for all trees and a given input \u02c6F .\nExactly solving (2) is very important for biology, since several relevant phylogenetic tree inference\nproblems deal with trees of small sizes. We use an NVIDIA QUAD P5000 GPU to compute the cost\nof all possible trees with q nodes in parallel, and return the tree with the smallest cost. Basically, we\nassign to each GPU virtual thread a unique tree, using Prufer sequences [31], and then have each\nthread compute the cost for its tree. For q = 10, we compute the cost of all 100 million trees in about\n8 minutes, and for q = 11, we compute the cost of all 2.5 billion trees in slightly less than 2.5 hours.\nCode to solve (3) using Alg. 1, with the improvements of Section 4, can be found in [32]. More\nresults using our algorithm can be found in Appendix G.\n6 Conclusions and future work\nWe propose a new direct algorithm that, for a given tree, computes how close the matrix of frequency\nof mutations per position is to satisfying the perfect phylogeny model. Our algorithm is faster than\nthe state-of-the-art iterative methods for the same problem, even if we optimally tune them. We use\nthe proposed algorithm to build a GPU-based phylogenetic tree inference engine for the trees of\nrelevant biological sizes. Unlike existing algorithms, which only heuristically search a small part of\nthe space of possible trees, our algorithm performs a complete search over all trees relatively fast. It\nis an open problem to \ufb01nd direct algorithms that can provably solve our problem in linear time on\naverage, or even for a worst-case input.\nAcknowledgement: This work was partially funded by NIH/1U01AI124302, NSF/IIS-1741129,\nand a NVIDIA hardware grant.\n\n8\n\n00.10.20.30.40.5Time in seconds00.050.10.15ErrorADMM PrimalADMM DualProjected Gradient Descent PrimalProjected Gradient Descent DualOur Algorithm = 0.0027 seconds0200040006000800010000Problem size00.0050.010.0150.020.025Average run time\fReferences\n[1] Richard R Hudson. Properties of a neutral allele model with intragenic recombination. Theoret-\n\nical population biology, 23(2):183\u2013201, 1983.\n\n[2] Motoo Kimura. The number of heterozygous nucleotide sites maintained in a \ufb01nite population\n\ndue to steady \ufb02ux of mutations. Genetics, 61(4):893\u2013903, 1969.\n\n[3] Mohammed El-Kebir, Layla Oesper, Hannah Acheson-Field, and Benjamin J Raphael. Recon-\nstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinfor-\nmatics, 31(12):i62\u2013i70, 2015.\n\n[4] Mohammed El-Kebir, Gryte Satas, Layla Oesper, and Benjamin J Raphael. Multi-state per-\nfect phylogeny mixture deconvolution and applications to cancer sequencing. arXiv preprint\narXiv:1604.02605, 2016.\n\n[5] Wei Jiao, Shankar Vembu, Amit G Deshwar, Lincoln Stein, and Quaid Morris. Inferring clonal\nevolution of tumors from single nucleotide somatic mutations. BMC bioinformatics, 15(1):35,\n2014.\n\n[6] Salem Malikic, Andrew W McPherson, Nilgun Donmez, and Cenk S Sahinalp. Clonality\ninference in multiple tumor samples using phylogeny. Bioinformatics, 31(9):1349\u20131356, 2015.\n\n[7] Victoria Popic, Raheleh Salari, Iman Hajirasouliha, Dorna Kashef-Haghighi, Robert B West,\nand Sera\ufb01m Batzoglou. Fast and scalable inference of multi-sample cancer lineages. Genome\nbiology, 16(1):91, 2015.\n\n[8] Gryte Satas and Benjamin J Raphael. Tumor phylogeny inference using tree-constrained\n\nimportance sampling. Bioinformatics, 33(14):i152\u2013i160, 2017.\n\n[9] Anna Schuh, Jennifer Becq, Sean Humphray, Adrian Alexa, Adam Burns, Ruth Clifford,\nStephan M Feller, Russell Grocock, Shirley Henderson, Irina Khrebtukova, et al. Monitoring\nchronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous\nclonal evolution patterns. Blood, 120(20):4191\u20134196, 2012.\n\n[10] David Fern\u00e1ndez-Baca. The perfect phylogeny problem. In Steiner Trees in Industry, pages\n\n203\u2013234. Springer, 2001.\n\n[11] Dan Gus\ufb01eld. Ef\ufb01cient algorithms for inferring evolutionary trees. Networks, 21(1):19\u201328,\n\n1991.\n\n[12] Zhihong Ding, Vladimir Filkov, and Dan Gus\ufb01eld. A linear-time algorithm for the perfect\nphylogeny haplotyping (pph) problem. Journal of Computational Biology, 13(2):522\u2013553,\n2006.\n\n[13] Michael R Garey and David S Johnson. Computers and intractability, volume 29. wh freeman\n\nNew York, 2002.\n\n[14] Amit G Deshwar, Shankar Vembu, Christina K Yung, Gun Ho Jang, Lincoln Stein, and Quaid\nMorris. Phylowgs: reconstructing subclonal composition and evolution from whole-genome\nsequencing of tumors. Genome biology, 16(1):35, 2015.\n\n[15] Zoubin Ghahramani, Michael I Jordan, and Ryan P Adams. Tree-structured stick breaking for\nhierarchical data. In Advances in neural information processing systems, pages 19\u201327, 2010.\n[16] Laurent Condat. Fast projection onto the simplex and the \\pmb {l} _\\mathbf {1} ball.\n\nMathematical Programming, 158(1-2):575\u2013585, 2016.\n\n[17] John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Ef\ufb01cient projections onto\nthe l1-ball for learning in high dimensions. In Proceedings of the 25th international conference\non Machine learning, pages 272\u2013279. ACM, 2008.\n\n[18] Pinghua Gong, Kun Gai, and Changshui Zhang. Ef\ufb01cient euclidean projections via piecewise\nroot \ufb01nding and its application in gradient projection. Neurocomputing, 74(17):2754\u20132766,\n2011.\n\n9\n\n\f[19] Jun Liu and Jieping Ye. Ef\ufb01cient euclidean projections in linear time. In Proceedings of the\n\n26th Annual International Conference on Machine Learning, pages 657\u2013664. ACM, 2009.\n\n[20] Christian Michelot. A \ufb01nite algorithm for \ufb01nding the projection of a point onto the canonical\n\nsimplex of Rn. Journal of Optimization Theory and Applications, 50(1):195\u2013200, 1986.\n\n[21] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed opti-\nmization and statistical learning via the alternating direction method of multipliers. Foundations\nand Trends R(cid:13) in Machine Learning, 3(1):1\u2013122, 2011.\n\n[22] Guilherme Fran\u00e7a and Jos\u00e9 Bento. An explicit rate bound for over-relaxed admm. In Information\n\nTheory (ISIT), 2016 IEEE International Symposium on, pages 2104\u20132108. IEEE, 2016.\n\n[23] Ning Hao, AmirReza Oghbaee, Mohammad Rostami, Nate Derbinsky, and Jos\u00e9 Bento. Testing\n\ufb01ne-grained parallelism for the admm on a factor-graph. In Parallel and Distributed Processing\nSymposium Workshops, 2016 IEEE International, pages 835\u2013844. IEEE, 2016.\n\n[24] Guilherme Fran\u00e7a and Jos\u00e9 Bento. How is distributed admm affected by network topology?\n\narXiv preprint arXiv:1710.00889, 2017.\n\n[25] Laurence Yang, Jos\u00e9 Bento, Jean-Christophe Lachance, and Bernhard Palsson. Genome-scale\n\nestimation of cellular objectives. arXiv preprint arXiv:1807.04245, 2018.\n\n[26] Charles JM Mathy, Felix Gonda, Dan Schmidt, Nate Derbinsky, Alexander A Alemi, Jos\u00e9\nBento, Francesco M Delle Fave, and Jonathan S Yedidia. Sparta: Fast global planning of\ncollision-avoiding robot trajectories.\n\n[27] Daniel Zoran, Dilip Krishnan, Jose Bento, and Bill Freeman. Shape and illumination from\nshading using the generic viewpoint assumption. In Advances in Neural Information Processing\nSystems, pages 226\u2013234, 2014.\n\n[28] Jos\u00e9 Bento, Nate Derbinsky, Charles Mathy, and Jonathan S Yedidia. Proximal operators for\n\nmulti-agent path planning. In AAAI, pages 3657\u20133663, 2015.\n\n[29] Jos\u00e9 Bento, Nate Derbinsky, Javier Alonso-Mora, and Jonathan S Yedidia. A message-passing\nalgorithm for multi-agent trajectory planning. In Advances in neural information processing\nsystems, pages 521\u2013529, 2013.\n\n[30] Jean-Jacques Moreau. D\u00e9composition orthogonale d\u2019un espace hilbertien selon deux c\u00f4nes\n\nmutuellement polaires. CR Acad. Sci. Paris, 225:238\u2013240, 1962.\n\n[31] H Prufer. Neuer beweis eines satzes uber per mutationen. Archiv derMathematik und Physik,\n\n27:742\u2013744, 1918.\n\n[32] Github repository for the PPM projection algorithm, https://github.com/bentoayr/\nefficient-projection-onto-the-perfect-phylogeny-model, Accessed: 2018-10-\n26.\n\n[33] Neal Parikh, Stephen Boyd, et al. Proximal algorithms. Foundations and Trends R(cid:13) in Optimiza-\n\ntion, 1(3):127\u2013239, 2014.\n\n[34] Yuchao Jiang, Yu Qiu, Andy J Minn, and Nancy R Zhang. Assessing intratumor heterogeneity\nand tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing.\nProceedings of the National Academy of Sciences, 113(37):E5528\u2013E5537, 2016.\n\n[35] Mohammed El-Kebir, Gryte Satas, Layla Oesper, and Benjamin J Raphael.\n\nInferring the\nmutational history of a tumor using multi-state perfect phylogeny mixtures. Cell systems,\n3(1):43\u201353, 2016.\n\n[36] Iman Hajirasouliha, Ahmad Mahmoody, and Benjamin J Raphael. A combinatorial approach\nfor analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics,\n30(12):i78\u2013i86, 2014.\n\n10\n\n\f[37] Paola Bonizzoni, Anna Paola Carrieri, Gianluca Della Vedova, Riccardo Dondi, and Teresa M\nPrzytycka. When and how the perfect phylogeny model explains evolution. In Discrete and\nTopological Models in Molecular Biology, pages 67\u201383. Springer, 2014.\n\n[38] Ancestree data used, https://github.com/raphael-group/ancestree/tree/master/\ndata/simulated/cov_1000_samples_4_mut_100_clone_10_pcr_removed, Accessed:\n2018-10-26.\n\n11\n\n\f", "award": [], "sourceid": 2034, "authors": [{"given_name": "Bei", "family_name": "Jia", "institution": "Element AI"}, {"given_name": "Surjyendu", "family_name": "Ray", "institution": "Boston College"}, {"given_name": "Sam", "family_name": "Safavi", "institution": "Boston College"}, {"given_name": "Jos\u00e9", "family_name": "Bento", "institution": "Boston College"}]}