{"title": "Mixture Matrix Completion", "book": "Advances in Neural Information Processing Systems", "page_first": 2193, "page_last": 2203, "abstract": "Completing a data matrix X has become an ubiquitous problem in modern data science, with motivations in recommender systems, computer vision, and networks inference, to name a few. One typical assumption is that X is low-rank. A more general model assumes that each column of X corresponds to one of several low-rank matrices. This paper generalizes these models to what we call mixture matrix completion (MMC): the case where each entry of X corresponds to one of several low-rank matrices. MMC is a more accurate model for recommender systems, and brings more flexibility to other completion and clustering problems. We make four fundamental contributions about this new model. First, we show that MMC is theoretically possible (well-posed). Second, we give its precise information-theoretic identifiability conditions. Third, we derive the sample complexity of MMC. Finally, we give a practical algorithm for MMC with performance comparable to the state-of-the-art for simpler related problems, both on synthetic and real data.", "full_text": "Mixture Matrix Completion\n\nDaniel Pimentel-Alarc\u00f3n\n\nDepartment of Computer Science\n\nGeorgia State University\n\nAtlanta, GA, 30303\npimentel@gsu.edu\n\nAbstract\n\nCompleting a data matrix X has become an ubiquitous problem in modern data\nscience, with motivations in recommender systems, computer vision, and networks\ninference, to name a few. One typical assumption is that X is low-rank. A more\ngeneral model assumes that each column of X corresponds to one of several low-\nrank matrices. This paper generalizes these models to what we call mixture matrix\ncompletion (MMC): the case where each entry of X corresponds to one of several\nlow-rank matrices. MMC is a more accurate model for recommender systems, and\nbrings more \ufb02exibility to other completion and clustering problems. We make four\nfundamental contributions about this new model. First, we show that MMC is theo-\nretically possible (well-posed). Second, we give its precise information-theoretic\nidenti\ufb01ability conditions. Third, we derive the sample complexity of MMC. Fi-\nnally, we give a practical algorithm for MMC with performance comparable to the\nstate-of-the-art for simpler related problems, both on synthetic and real data.\n\n1\n\nIntroduction\n\nMatrix completion aims to estimate the missing entries of an incomplete data matrix X. One of\nits main motivations arises in recommender systems, where each row represents an item, and each\ncolumn represents a user. We only observe an entry in X whenever a user rates an item, and the goal\nis to predict unseen ratings in order to make good recommendations.\n\nRelated Work. In 2009, Cand\u00e8s and Recht [1] introduced low-rank matrix completion (LRMC),\narguably the most popular model for this task. LRMC assumes that each column (user) can be\nrepresented as a linear combination of a few others, whence X is low-rank. Later in 2012, Eriksson\net. al. [2] introduced high-rank matrix completion (HRMC), also known as subspace clustering with\nmissing data. This more general model assumes that each column of X comes from one of several\nlow-rank matrices, thus allowing several types of users. Since their inceptions, both LRMC and\nHRMC have attracted a tremendous amount of attention (see [1\u201327] for a very incomplete list).\n\nPaper contributions. This paper introduces an even more general model: mixture matrix completion\n(MMC), which assumes that each entry in X (rather than column) comes from one out of several\nlow-rank matrices, and the goal is to recover the matrices in the mixture. Figure 1 illustrates the\ngeneralization from LRMC to HRMC and to MMC. One of the main motivations behind MMC is\nthat users often share the same account, and so each column in X may contain ratings from several\nusers. Nonetheless, as we show in Section 2, MMC is also a more accurate model for many other\ncontemporary applications, including networks inference, computer vision, and metagenomics. This\npaper makes several fundamental contributions about MMC:\n\n\u2013 Well posedness. First, we show that MMC is theoretically possible if we observe the right entries\n\nand the mixture is generic (precise de\ufb01nitions below).\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: In LRMC, X is a low-rank matrix. In HRMC, each column of X comes from one of several low-rank\nmatrices. In MMC, each entry comes from one of several low-rank matrices X1, . . . , XK; we only observe X\u2126,\nand our goal is to recover the columns of X1, . . . , XK that have observations in X\u2126.\n\n\u2013 Identi\ufb01ability conditions. We provide precise information-theoretical conditions on the entries\nthat need to be observed such that a mixture of K low-rank matrices is identi\ufb01able. These extend\nsimilar recent results of LRMC [3] and HRMC [4] to the setting of MMC. The subtlety in proving\nthese results is that there could exist false mixtures that agree with the observed entries, even if\nthe sampling is uniquely completable for LRMC and HRMC (see Example 1). In other words,\nthere exits samplings that are identi\ufb01able for LRMC (and HRMC) but are not identi\ufb01able for\nMMC, and so in general it is not enough to simply have K times more samples. Hence, it was\nnecessary to derive identi\ufb01ability conditions for MMC, similar to those of LRMC in [3] and\nHRMC in [4]. We point out that in contrast to typical completion theory [1, 2, 5\u201320], these type\nof identi\ufb01ability conditions are deterministic (not restricted to uniform sampling), and make no\ncoherence assumptions.\n\n\u2013 Sample complexity. If X \u2208 Rd\u00d7n is a mixture of K rank-r matrices, we show that with high\nprobability, our identi\ufb01ability conditions will be met if each entry is observed with probability\nO( K\nd max{r, log d}), thus deriving the sample complexity of MMC, which is the same as the\nsample complexity of HRMC [4], and simpli\ufb01es to O( 1\nd max{r, log d}) in the case of K = 1,\nwhich corresponds to the sample complexity of LRMC [3]. Intuitively, this means that information-\ntheoretically, we virtually pay no price for mixing low-rank matrices.\n\n\u2013 Practical algorithm. Our identi\ufb01ability results follow from a combinatorial analysis that is\ninfeasible in practice. To address this, we give a practical alternating algorithm for MMC whose\nperformance (in the more dif\ufb01cult problem of MMC) is comparable to state-of-the-art algorithms\nfor the much simpler problems of HRMC and LRMC.\n\n2 Motivating Applications\n\nBesides recommender systems, there are many important applications where data can be modeled as a\nmixture of low-rank matrices. Here are a few examples motivated by current data science challenges.\n\nNetworks Inference. Estimating the topology of a network (internet, sensor networks, biological\nnetworks, social networks) has been the subject of a large body of research in recent years [28\u201334].\nTo this end, companies routinely collect distances between nodes (e.g., computers) that connect with\nmonitors (e.g., Google, Amazon, Facebook) in a data matrix X. In a simpli\ufb01ed model, if node j\nis in subnet k, then the jth column can be modeled as the sum of (i) the distance between node j\nand router k, and (ii) the distance between router k and each of the monitors. Hence, the columns\n(nodes) corresponding to each subnet form a low-rank matrix, which is precisely the model assumed\nby HRMC. However, depending on the network\u2019s traf\ufb01c, each node may use different routes to\ncommunicate at different times. Consequently, the same column in X may contain measurements\nfrom different low-rank matrices. In other words, distance matrices of networks are a mixture of\nlow-rank matrices.\n\nComputer Vision. Background segmentation is one of the most fundamental and crucial tasks in\ncomputer vision, yet it can be tremendously challenging. The vectorized frames of a video can\nbe modeled as columns with some entries (pixels) in a low-rank background, and some outlier\nentries, corresponding to the foreground. Typical methods, like the acclaimed Robust PCA (principal\ncomponent analysis) [35\u201346], assume that the foreground is sparse and has no particular structure.\nHowever, in many situations this is not the case. For instance, since the location of an object in\nconsecutive frames is highly correlated, the foreground can be highly structured. Similarly, the\nforeground may not be sparse, specially if there are foreground objects moving close to the camera\n\n2\n\n\f(e.g., in a sel\ufb01e). Even state-of-the-art methods fail in scenarios like these, which are not covered\nby current models (see Figure 3 for an example). In contrast, MMC allows to use one matrix in the\nmixture to represent the background, other matrices to represent foreground objects (small or large,\neven dominant), and even other matrices to account for occlusions and other illumination/visual\nartifacts. Hence, MMC can be a more accurate model for video segmentation and other image\nprocessing tasks, including inpainting [47] and face clustering, which we explore in our experiments.\n\nMetagenomics. One contemporary challenge in Biology is to quantify the presence of different\ntypes of bacteria in a system (e.g., the human gut microbiome) [48\u201352]. The main idea is to collect\nseveral DNA samples from such a system, and use their genomic information to count the number of\nbacteria of each type (the genome of each bacterium determines its type). In practice, to obtain an\norganism\u2019s genome (e.g., a person\u2019s genome), biologists feed a DNA sample (e.g., blood or hair) to a\nsequencer machine that produces a series of reads, which are short genomic sequences that can later\nbe assembled and aligned to recover the entire genome. The challenge arises when the sequencer is\nprovided a sample with DNA from multiple organisms, as is the case in the human gut microbiome,\nwhere any sample will contain a mixture of DNA from multiple bacteria that cannot be disentangled\ninto individual bacterium. In this case, each read produced by the sequencer may correspond to a\ndifferent type of bacteria. Consequently, each DNA sample (column) may contain genes (rows) from\ndifferent types of bacteria, which is precisely the model that MMC describes.\n\n3 Problem Statement\n\nLet X1, . . . , XK \u2208 Rd\u00d7n be a set of rank-r matrices, and let \u21261, . . . , \u2126k \u2208 {0, 1}d\u00d7n indicate\ndisjoint sets of observed entries. Suppose X1, . . . , XK and \u21261, . . . , \u2126K are unknown, and we only\nobserve X\u2126, de\ufb01ned as follows:\n\n\u2013 If the (i, j)th entry of \u2126k is 1, then the (i, j)th entry of X\u2126 is equal to the (i, j)th entry of Xk.\n\n\u2013 If the (i, j)th entry of \u2126k is 0 for every k = 1, . . . , K, then the (i, j)th entry of X\u2126 is missing.\n\nThis way \u2126k indicates the entries of X\u2126 that correspond to Xk, and \u2126 := PK\n\u2126k indicates the set\nof all observed entries. Since \u21261, . . . , \u2126K are disjoint, \u2126 \u2208 {0, 1}d\u00d7n. Equivalently, each observed\nentry of X\u2126 corresponds to an entry in either X1 or X2 or . . . or XK (i.e., there are no collisions).\nIn words, X\u2126 contains a mixture of entries from several low-rank matrices.\n\nk=1\n\nThe goal of MMC is to recover all the columns of X1, . . . , XK that have observations in X\u2126 (see\nFigure 1 to build some intuition). In our recommendations example, a column x\u03c9 \u2208 X\u2126 will contain\nentries from Xk whenever x\u03c9 contains ratings from a user of the kth type. Similarly, the same\ncolumn will contain entries from X\u2113 whenever it also contains ratings from a user of the \u2113th type. We\nwould like to predict the preferences of both users, or more generally, all users that have ratings in\nx\u03c9. On the other hand, if x\u03c9 has no entries from Xk, then x\u03c9 involves no users of the kth type, and\nso it would be impossible (and futile) to try to recover such column of Xk. In MMC, the matrices\n\u21261, . . . , \u2126K play the role of the hidden variables constantly present in mixture problems. Notice that\nif we knew \u21261, . . . , \u2126K, then we could partition X\u2126 accordingly, and estimate X1, . . . , XK using\nstandard LRMC. The challenge is that we do not know \u21261, . . . , \u2126K.\n\n3.1 The Subtleties of MMC\n\nThe main theoretical dif\ufb01culty of MMC is that depending on the pattern of missing data, there could\nexist false mixtures. That is, matrices \u02dcX1, . . . , \u02dcXK, other than X1, . . . , XK, that agree with X\u2126,\neven if X1, . . . , XK are observed on uniquely completable patterns for LRMC.\nExample 1. Consider the next rank-1 matrices X1, X2, and their partially observed mixture X\u2126:\n\nX1 =\n\n\uf8ee\n\n\uf8ef\uf8ef\uf8ef\uf8f0\n\n1 2 3 4\n1 2 3 4\n1 2 3 4\n1 2 3 4\n1 2 3 4\n\n\uf8f9\n\n\uf8fa\uf8fa\uf8fa\uf8fb\n\n, X2 =\n\n\uf8ee\n\n\uf8ef\uf8ef\uf8ef\uf8f0\n\n2\n1\n4\n2\n6\n3\n4\n8\n5 10\n\n3\n6\n9\n12\n15\n\n3\n\n4\n8\n12\n16\n20\n\n\uf8f9\n\n\uf8fa\uf8fa\uf8fa\uf8fb\n\n, X\u2126 =\n\n\uf8ee\n\n\uf8ef\uf8ef\uf8ef\uf8f0\n\n1\n1\n3\n4\n\u00b7\n\n\u00b7\n2\n2\n8\n10\n\n3\n\u00b7\n3\n3\n15\n\n4\n\uf8f9\n8\n\u00b7\n4\n4\n\n\uf8fa\uf8fa\uf8fa\uf8fb\n\n.\n\n\fWe can verify that X1 and X2 are observed on uniquely completable sampling patterns for LRMC\n[3]. Nonetheless, we can construct the following false rank-1 matrices that agree with X\u2126:\n\n\u02dcX\n\n1 =\n\n\uf8ee\n\n\uf8ef\uf8ef\uf8ef\uf8f0\n\n60\n1\n3\n12\n60\n\n40\n2/3\n2\n8\n40\n\n15\n1/4\n3/4\n3\n15\n\n4\n\n1/15\n1/5\n4/5\n4\n\n\uf8f9\n\n\uf8fa\uf8fa\uf8fa\uf8fb\n\n, \u02dcX\n\n2 =\n\n\uf8ee\n\n\uf8ef\uf8ef\uf8ef\uf8f0\n\n1\n8\n1\n4\n40\n\n1/4\n2\n1/4\n1\n10\n\n3\n24\n3\n12\n120\n\n1\n8\n1\n4\n40\n\n\uf8f9\n\n\uf8fa\uf8fa\uf8fa\uf8fb\n\n.\n\nThis shows that even with unlimited computational power, if we exhaustively search all the identi\ufb01able\npatterns for LRMC, we can end up with false mixtures. Hence the importance of studying the\nidenti\ufb01able patterns for MMC.\n\nFalse mixtures arise because we do not know a priori which entries of X\u2126 correspond to each Xk.\nHence, it is possible that a rank-r matrix \u02dcX agrees with some entries from X1, other entries from X2,\nand so on. Furthermore, \u02dcX may even be the only rank-r matrix that agrees with such combination of\nentries, as in Example 1.\n\nRemark 1. Recall that LRMC and HRMC are tantamount to identifying the subspace(s) containing\nthe columns of X [3, 4]. In fact, if we knew such subspaces, LRMC and HRMC become almost trivial\nproblems (see Appendix A for details). Similarly, if no data is missing, HRMC simpli\ufb01es to subspace\nclustering, which has been studied extensively, and is now reasonably well-understood [53\u201362]. In\ncontrast, MMC remains challenging even if the subspaces corresponding to the low-rank matrices\nin the mixture are known, and even X is fully observed. We refer the curious reader to Appendix A,\nand point out the bottom row and the last column in Figure 2, which show the MMC error when the\nunderlying subspaces are known, and when X is fully observed.\n\n4 Main Theoretical Results\n\nExample 1 shows the importance of studying the identi\ufb01able patterns for MMC, which we do now.\nFirst recall that r + 1 samples per column are necessary for LRMC [3]. This implies that even if an\noracle told us \u21261, . . . , \u2126K, if we intend to recover a column of Xk, we need to observe it on at least\nr + 1 entries. Hence we assume without loss of generality that:\n\n(A1) Each column of \u2126k has either 0 or r + 1 non-zero entries.\n\nIn words, A1 requires that each column of Xk to be recovered is observed on exactly r + 1 entries.\nOf course, observing more entries may only aid completion. Hence, rather than an assumption,\nA1 describes the most dif\ufb01cult scenario where we have the bare minimum amount of information\nrequired for completion. We use A1 to ease notation, exposition and analysis. All our results can be\neasily extended to the case where A1 is droped (see Remark 2).\n\nWithout further assumptions on X, completion (of any kind) may be impossible. To see this consider\nthe simple example where X is only supported on the ith row. Then it would be impossible to recover\nX unless all columns were observed on the ith row. In most completion applications this would be\nunlikely. For example, in a movies recommender system like Net\ufb02ix, this would require that all the\nusers watched (and rated) the same movie.\n\nTo rule out scenarios like these, typical completion theory requires incoherence and uniform sampling.\nIncoherence guarantees that the information is well-spread over the matrix. Uniform sampling\nguarantees that all rows and columns are suf\ufb01ciently sampled. However, it is usually unclear (and\ngenerally unveri\ufb01able) whether an incomplete matrix is coherent. Furthermore, observations are\nhardly ever uniformly distributed. For instance, we do not expect children to watch adults movies.\n\nTo avoid these issues, instead of incoherence we will assume that X is a generic mixture of low-rank\nmatrices. More precisely, we assume that:\n\n(A2) X1, . . . , XK are drawn independently according to an absolutely continuous distribution\nwith respect to the Lebesgue measure on the determinantal variety (set of all d \u00d7 n,\nrank-r matrices).\n\n4\n\n\fA2 essentially requires that each Xk is a generic rank-r matrix. This type of genericity assumptions\nare becoming increasingly common in studies of LRMC, HRMC, and related problems [3, 4, 23\u2013\n27, 46]. See Appendix C for a further discussion on A2, and its relation to other common assumptions\nfrom the literature.\n\nIt gives a deterministic condition on \u2126\nWith this, we are ready to present our main theorem.\nto guarantee that X1, . . . , XK can be identi\ufb01ed from X\u2126. This provides information-theoretic\nrequirements for MMC. The proof is in Appendix B.\n\nTheorem 1. Let A1-A2 hold. Suppose there exist matrices {\u2126\u03c4 }r+1\nsubsets of (d \u2212 r + 1) columns of \u2126k, such that for every \u03c4 :\n\n\u03c4 =1 formed with disjoint\n\n(\u2020) Every matrix \u2126\u2032 formed with a proper subset of the columns in \u2126\u03c4 has at least r fewer\n\ncolumns than non-zero rows.\n\nThen all the columns of Xk that have observations in X\u2126 are identi\ufb01able.\n\nIn words, Theorem 1 states that MMC is possible as long as we observe the right entries in each Xk.\nThe intuition is that each of these entries imposes a constraint on what X1, . . . , XK may be, and the\npattern in \u2126 determines whether these constraints are redundant. Patterns satisfying the conditions of\nTheorem 1 guarantee that X1, . . . , XK is the only mixture that satis\ufb01es the constraints produced by\nthe observed entries.\nRemark 2. Recall that r + 1 samples per column are strictly necessary for completion. A1 requires\nthat we have exactly that minimum number of samples. If Xk is observed on more than r + 1 entries\nper column, it suf\ufb01ces that \u2126k contains a pattern satisfying the conditions of Theorem 1.\n\nTheorem 1 shows that MMC is possible if the samplings satisfy certain combinatorial conditions. Our\nnext result shows that if each entry of Xk is observed on X\u2126 with probability O( 1\nd max{r, log d}),\nthen with high probability \u2126k will satisfy such conditions. The proof is in Appendix B.\n\nTheorem 2. Suppose r \u2264 d\nentry of X\u2126 is equal to the corresponding entry of Xk with probability\n\n6 and n \u2265 (r + 1)(d \u2212 r + 1). Let \u01eb > 0 be given. Suppose that an\n\np \u2265 2\n\nd max(cid:8)2r, 12(cid:0)log( d\n\n\u01eb ) + 1(cid:1)(cid:9) .\n\nThen \u2126k satis\ufb01es the sampling conditions of Theorem 1 with probability \u2265 1 \u2212 2(r + 1)\u01eb.\n\nTheorem 2 shows that the sample complexity of MMC is O(K max{r, log d}) observations per\ncolumn of X\u2126. This is exactly the same as the sample complexity of HRMC [4], and simpli\ufb01es to\nO(max{r, log d}) if K = 1, corresponding to the sample complexity of LRMC [3]. Intuitively, this\nmeans that information-theoretically, we virtually pay no price for mixing low-rank matrices.\n\n5 Alternating Algorithm for MMC\n\nTheorems 1 and 2 show that MMC is theoretically possible under reasonable conditions (virtually\nthe same as LRMC and HRMC). However, these results follow from a combinatorial analysis that is\ninfeasible in practice (see Appendix B for details). To address this, we derive a practical alternating\nalgorithm for MMC, which we call AMMC (alternating mixture matrix completion).\n\nThe main idea is that MMC, like most mixture problems, can be viewed as a clustering task: if we\ncould determine the entries of X\u2126 that correspond to each Xk, then we would be able to partition X\u2126\ninto K incomplete low-rank matrices, and then complete them using standard LRMC. The question is\nhow to determine which entries of X\u2126 correspond to each Xk, i.e., how to determine \u21261, . . . , \u2126K.\nTo address this, let Uk \u2208 Rd\u00d7r be a basis for the subspace containing the columns of Xk, and let\nx\u03c9 denote the jth column of X\u2126, observed only on the entries indexed by \u03c9 \u2282 {1, . . . , d}. For any\nsubspace, matrix or vector that is compatible with a set of indices \u00b7, we use the subscript \u00b7 to denote\n\n5\n\n\f\u03c9 \u2208 R|\u03c9|\u00d7r denotes the restriction of Uk\nits restriction to the coordinates/rows in \u00b7. For example, Uk\nto the indices in \u03c9. Suppose x\u03c9 contains entries from Xk, and let \u03c9k \u2282 \u03c9 index such entries. Then\nour goal is to determine \u03c9k, as that would tell us the jth column of \u2126k. Since x\nk },\nwe can restate our goal as \ufb01nding the set \u03c9k \u2282 \u03c9 such that x\nk }.\n\nk \u2208 span{Uk\n\nk \u2208 span{Uk\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\nTo \ufb01nd \u03c9k, let \u03c5 \u2282 \u03c9, and let Pk\nspan{Uk\nthat \u03c9k is the largest set \u03c5 such that kPk\n\n\u03c5}. Recall that kPk\n\n\u03c5\n\n\u03c5\n\n\u03c5 := Uk\n\n\u03c5(UkT\n\n\u03c5\n\nUk\n\n\u03c5)\u22121UkT\n\nx\u03c5k \u2264 kx\u03c5k, with equality if and only if x\u03c5 \u2208 span{Uk\n\n\u03c5 denote the projection operator onto\n\u03c5}. It follows\n\nx\u03c5k = kx\u03c5k. In other words, \u03c9k is the solution to\n\narg max\n\n\u03c5\u2282\u03c9\n\nkPk\n\n\u03c5\n\nx\u03c5k \u2212 kx\u03c5k + |\u03c5|.\n\n(1)\n\nHowever, (1) is non-convex. Hence, in order to \ufb01nd the solution to (1), we propose the following\nerasure strategy. The main idea is to start our search with \u03c5 = \u03c9, and then iteratively remove the\nentries (coordinates) of \u03c5 that most increase the gap between kPk\nx\u03c5k and kx\u03c5k (hence the term\nerasure). We stop this procedure when kPk\nx\u03c5k is equal to kx\u03c5k (or close enough). More precisely,\nwe initialize \u03c5 = \u03c9, and then iteratively rede\ufb01ne \u03c5 as the set\n\n\u03c5\n\n\u03c5\n\n\u03c5 = \u03c5\\i,\n\nwhere\n\ni = arg max\n\ni\u2208\u03c5\n\nkPk\n\n\u03c5\\i\n\nx\n\n\u03c5\\ik \u2212 kx\n\n\u03c5\\ik.\n\n(2)\n\n\u03c5\n\nx\n\n\u03c5\\i\n\n\u03c5\\i and its projection Pk\n\nIn words, i is the coordinate of the vector x\u03c5 such that if ignored, the gap between the remaining\nvector x\n\u03c5\\i is reduced the most. At each iteration we remove (erase)\nsuch coordinate i from \u03c5. The intuition behind this approach is that the coordinates of x\u03c5 that\ndo not correspond to Xk are more likely to increase the gap between kPk\nx\u03c5k and kx\u03c5k. Notice\n\u03c5 = R|\u03c5| (because Uk is\nthat if Uk is in general position (guaranteed by A2) and |\u03c5| \u2264 r, then Uk\nr-dimensional). In such case, it is trivially true that x\u03c5 \u2208 span{Uk\nx\u03c5k = kx\u03c5k.\nHence the procedure above is guaranteed to terminate after at most k\u03c9k \u2212 r iterations. At such point,\n|\u03c5| = r, and we know that we were unable to \ufb01nd \u03c9k (or a subset of it). One alternative is to start\nwith a different \u03c50 ( \u03c9, and search again.\nThis procedure may remove some entries from \u03c9k along the way, so in general, the output of this\nprocess will be a set \u03c5 \u2282 \u03c9k. However, \ufb01nding a subset of \u03c9k is enough to \ufb01nd \u03c9k. To see this,\nk \u03b8k.\nrecall that since x\n\u03b8k. Furthermore, since |\u03c5| \u2265 r, we can \ufb01nd \u03b8k as\nSince \u03c5 \u2282 \u03c9k, it follows that x\u03c5 = Uk\nk \u03b8k, at this point we can identify \u03c9k by simple\nk = Uk\n\u03b8k = (UkT\n\u03b8k). Recall that \u03c9k determines the jth column of\ninspection (the matching entries in x\u03c9 and Uk\n\u2126k. Hence, if we repeat the procedure above for each column in X\u2126 and each k, we can recover\n\u21261, . . . , \u2126K. After this, we can use standard LRMC on X\u21261 , . . . , X\u2126K to recover X1, . . . XK\n(which is the ultimate goal of MMC).\n\nk }, there is a coef\ufb01cient vector \u03b8k \u2208 Rr such that x\n\n\u03c5}, whence kPk\n\nk \u2208 span{Uk\n\nx\u03c5. Since x\n\n\u03c5)\u22121UkT\n\nk = Uk\n\nUk\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n\u03c9\n\n\u03c5\n\n\u03c5\n\n\u03c5\n\n\u03c5\n\nThe catch here is that this procedure requires knowing Uk, which we do not know. So essentially we\nhave a chicken and egg problem: (i) if we knew Uk, we would be able to \ufb01nd \u2126k. (ii) If we knew\n\u2126k we would be able to \ufb01nd Uk (and Xk, using standard LRMC on X\u2126k ). Since we know neither,\nwe use a common technique for these kind of problems: alternate between \ufb01nding \u2126k and Uk. More\nprecisely, we start with some initial guesses \u02c6U1, . . . , \u02c6UK, and then alternate between the following\ntwo steps until convergence:\n\n\u03c5\n\n(i) Cluster. Let x\u03c9 be the jth column in X\u2126. For each k = 1, . . . , K, we \ufb01rst erase entries\nfrom \u03c9 to obtain a set \u03c5 \u2282 \u03c9 indicating entries likely to correspond to Xk. This erasure\nprocedure initializes \u03c5 = \u03c9, and then repeats (2), (replacing Pk with \u02c6Pk, which denotes the\nprojection operator onto span{ \u02c6Uk}) until we to obtain a set \u03c5 \u2282 \u03c9 such that the projection\nk \u02c6Pk\nx\u03c5k is close to kx\u03c5k. This way, the entries of x\u03c5 are likely to correspond to Xk. Using\nthese entries, we can estimate the coef\ufb01cient of the jth column of Xk with respect to Uk,\ngiven by \u02c6\u03b8k = ( \u02c6UkT\nk . With \u02c6\u03b8k we can also estimate the jth column of Xk as\n\u02c6xk := \u02c6Uk\u02c6\u03b8k. Notice that both \u03c5 and \u02c6xk are obtained using \u02c6Uk, which may be different from\nUk. It follows that \u03c5 may contain some entries that do not correspond to Xk, and \u02c6xk may be\ninaccurate. Hence, in general, x\u03c9 and \u02c6xk\n\u03c9 will have no matching entries, and so we cannot\nidentify \u03c9k by simple inspection, as before. However, we can repeat our procedure for each k\nto obtain estimates \u02c6x1\n\u03c9 , and then assign each entry of x\u03c9 to its closest match. More\n\n\u03c9, . . . , \u02c6xK\n\nk )\u22121 \u02c6UkT\n\n\u02c6Uk\n\nk x\n\n\u03c5\n\n\u03c5\n\n\u03c5\n\n\u03c5\n\nk\n\n6\n\n\fAMMC (this paper)\n\n)\n\u03b4\n(\n\nr\no\nr\nr\nE\nn\no\ni\nt\na\nz\ni\nl\na\ni\nt\ni\nn\nI\n\n1\n\n2\n\u2212\n0\n1\n\n5\n\u2212\n0\n1\n\n8\n\u2212\n0\n1\n\n0\n\ne\nt\na\nr\n\ns\ns\ne\nc\nc\nu\nS\n\n1\n\n5\n\n.\n\n0\n\n0\n\nLMaFit (LRMC)\n\nGSSC (HRMC)\n\nAMMC (MMC, this paper)\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\nSampling rate (p)\n\nSampling rate per matrix (p/K)\n\nFigure 2: Left: Success rate (average over 100 trials) of AMMC as a function of the fraction of observed entries\np and the distance \u03b4 between the true subspaces Uk and their initial estimates. Lightest represents 100% success\nrate; darkest represents 0%. Right: Comparison of state-of-the-art algorithms for LRMC, HRMC, and MMC\n(in their respective settings; see Figure 1). The performance of AMMC (in the more dif\ufb01cult problem of MMC)\nis comparable to the performance of state-of-the-art algorithms in the simpler problems of LRMC and HRMC.\n\nprecisely, our estimate \u02c6\u03c9k \u2282 \u03c9 (indicating the entries of x\u03c9 that we estimate that correspond\nto Xk) will contain entry i \u2208 \u03c9 if |xi \u2212 \u02c6xk\ni | for every \u2113 = 1, . . . , K. Repeating\nthis procedure for each column of X\u2126 will produce estimates \u02c6\u21261, . . . , \u02c6\u2126K. Speci\ufb01cally, the jth\ncolumn of \u02c6\u2126k \u2208 {0, 1}d\u00d7n will contain a 1 in the rows indicated by \u02c6\u03c9k.\n\ni | \u2264 |xi \u2212 \u02c6x\u2113\n\n(ii) Complete. For each k, complete X \u02c6\u2126k using your favorite LRMC algorithm. Then compute a\n\nnew estimate \u02c6Uk given by the leading r left singular vectors of the completion of X \u02c6\u2126k .\n\nThe entire procedure is summarized in Algorithm 1, in Appendix D, where we also discuss initializa-\ntion, generalizations to noise and outliers, and other simple extensions to improve performance.\n\n6 Experiments\n\nSimulations. We \ufb01rst present a series of synthetic experiments to study the performance of AMMC\n(Algorithm 1). In our simulations we \ufb01rst generate matrices Uk \u2208 Rd\u00d7r and \u0398k \u2208 Rr\u00d7n with\ni.i.d. N(0, 1) entries to use as bases and coef\ufb01cients of the low-rank matrices in the mixture, i.e.,\nXk = Uk\u0398k \u2208 Rd\u00d7n. Here d = n = 100, r = 5 and K = 2. With probability (1 \u2212 p), the (i, j)th\nentry of X\u2126 will be missing, and with probability p/K it will be equal to the corresponding entry in Xk.\nRecall that similar to EM and other alternating approaches, AMMC depends on initialization. Hence,\nwe study the performance of AMMC as a function of both p and the distance \u03b4 \u2208 [0, 1] between {Uk}\nand their initial estimates (measured as the normalized Frobenius norm of the difference between their\nprojection operators). We measure accuracy using the normalized Frobenius norm of the difference\nbetween each Xk and its completion. We considered a success if this quantity was below 10\u22128. The\nresults of 100 trials are summarized in Figure 2.\n\nNotice that the performance of AMMC decays nicely with the distance \u03b4 between the true subspaces\nUk and their initial estimates. We can see this type of behavior in similar state-of-the-art alternating\nalgorithms for the simpler problem of HRMC [19]. Since MMC is highly non-convex, it is not\nsurprising that if the initial estimates are poor (far from the truth), then AMMC may converge to a\nlocal minimum. Similarly, the performance of AMMC decays nicely with the fraction of observed\nentries p. Notice that even if X is fully observed (p = 1), if the initial estimates are very far from the\ntrue subspaces (\u03b4 = 1), then AMMC performs poorly. This shows, consistent with our discussing\nin Remark 1, that in practice MMC is a challenging problem even if X is fully observed. Hence,\nit is quite remarkable that AMMC works most of the time with as little as p \u2248 0.6, corresponding\nto observing \u2248 0.3 of the entries in each Xk. To put this under perspective, notice (Figure 2) that\nthis is comparable the amount of missing data tolerated by GSSC [19] and LMaFit [11], which are\nstate-of-the-art for the simpler problems of HRMC (special case of MMC where all entries in each\ncolumn of X correspond to the same Xk) and LRMC (special case where there is only one Xk).\n\n7\n\n\fMixture \u2014\u2014 Reconstructions \u2014\u2014\n\nOriginal\n\nRobust PCA MMC (this paper)\n\nFigure 3: Left 3: Reconstructed images from a mixture. Right 3: Original frame and segmented foreground.\n\nTo obtain Figure 2 we replicated the same setup as above, but with data generated according to\nthe HRMC and LRMC models. Hence, we conclude that the performance of AMMC (in the more\ndif\ufb01cult problem of MMC) is comparable to the performance of state-of-the-art algorithms for the\nmuch simpler problems of HRMC and LRMC.\n\nWe point out that according to Theorems 1 and 2, MMC is theoretically possible with p \u2265 1/2.\nHowever, we can see that (even if U1, . . . , UK are known, corresponding to \u03b4 = 0 in Figure 2) the\nperformance of AMMC is quite poor if p < 0.6. This shows two things: (i) MMC is challenging\neven if U1, . . . , UK are known (as discussed in Remark 1), and (ii) there is a gap between what is\ninformation-theoretically possible and what is currently possible in practice (with AMMC). In future\nwork we will explore algorithms that can approach the information-theoretic limits.\n\nReal Data: Face Clustering and Inpainting. It is well-known that images of an individual\u2019s face are\napproximately low-rank [63]. Natural images, however, usually contain faces of multiple individuals,\noften partially occluding each other, resulting in a mixture of low-rank matrices. In this experiment\nwe demonstrate the power of MMC in two tasks: \ufb01rst, classifying partially occluded faces in an image,\nand second, image inpainting [47]. To this end, we use the Yale B dataset [64], containing 2432\nphotos of 38 subjects (64 photos per subject), each photo of size 48 \u00d7 42. We randomly select two\nsubjects, and vectorize and concatenate their images to obtain two approximately rank-10 matrices\nX1, X2 \u2208 R2016\u00d764. Next we combine them into X \u2208 R2016\u00d764, whose each entry is equal to the\ncorresponding entry in X1 or X2 with equal probability. This way, each column of X contains a\nmixed image with pixels from multiple individuals. We aim at two goals: (i) classify the entries in X\naccording to X1 and X2, which in turn means locating and classifying the face of each individual\nin each image, and (ii) recover X1 and X2 from X, thus reconstructing the unobserved pixels in\neach image (inpainting). We repeat this experiment 30 times using AMMC (with gaussian random\ninitialization, known to produce near-orthogonal subspaces with high probability), obtaining a pixel\nclassi\ufb01cation error of 2.98%, and a reconstruction error of 4.1%, which is remarkable in light that the\nideal rank-10 approximation (no mixture, and full data) achieves 1.8%. Figure 3 shows an example,\nwith more in Figure 4 in Appendix E. Notice that in this case we cannot compare against other\nmethods, as AMMC is the \ufb01rst, and currently the only method for MMC.\n\nReal Data: MMC for Background Segmentation. As discussed in Section 2, Robust PCA models\na video as the superposition of a low-rank background plus a sparse foreground with no structure.\nMMC brings more \ufb02exibility, allowing multiple low-rank matrices to model background, structured\nforeground objects (sparse or abundant) and illumination artifacts, while at the same time also\naccounting for outliers (the entries/pixels that were assigned to no matrix in the mixture). In fact,\ncontrary to Robust PCA, MMC allows a very large (even dominant) fraction of outliers. In this\nexperiment we test AMMC in the task of background segmentation, using the Wall\ufb02ower [65] and\nthe I2R [66] datasets, containing videos of traf\ufb01c cameras, lobbies, and pedestrians in the street. For\neach video, we compare AMMC (with gaussian random initialization) against the best result amongst\nthe following state-of-the-art algorithms for Robust PCA: [35\u201339]. We chose these methods based\non the comprehensive review in [40], and previous reports [41\u201343] indicating that these algorithms\ntypically performed as well or better than several others, including [44, 45]. In most cases, both\nRobust PCA and AMMC perform quite similarly (see Figure 5 in Appendix E). However, in one\ncase AMMC achieves 87.67% segmentation accuracy (compared with the ground truth, manually\nsegmented), while Robust PCA only achieves 74.88% (Figure 3). Our hypothesis is that this is due to\nthe large portion of outliers (foreground). It is out of the scope of this paper, but of interest for future\nwork, to collect real datasets with similar properties, where AMMC can be further tested. We point\nout, however, that AMMC is orders of magnitude slower than Robust PCA. Our future work will also\nfocus on developing faster methods for MMC.\n\n8\n\n\fReferences\n\n[1] E. Cand\u00e8s and B. Recht, Exact matrix completion via convex optimization, Foundations of\n\nComputational Mathematics, 2009.\n\n[2] B. Eriksson, L. Balzano and R. Nowak, High-rank matrix completion and subspace clustering\n\nwith missing data, Arti\ufb01cial Intelligence and Statistics, 2012.\n\n[3] D. Pimentel-Alarc\u00f3n, N. Boston and R. Nowak, A characterization of deterministic sampling\npatterns for low-rank matrix completion, IEEE Journal of Selected Topics in Signal Processing,\n2016.\n\n[4] D. Pimentel-Alarc\u00f3n and R. Nowak, The information-theoretic requirements of subspace cluster-\n\ning with missing data, International Conference on Machine Learning, 2016.\n\n[5] E. Cand\u00e8s and T. Tao, The power of convex relaxation: near-optimal matrix completion, IEEE\n\nTransactions on Information Theory, 2010.\n\n[6] J. Cai, E. Cand\u00e8s and Z. Shen, A singular value thresholding algorithm for matrix completion,\n\nSIAM Journal on Optimization, 2010.\n\n[7] R. Keshavan, A. Montanari and S. Oh, Matrix completion from a few entries, IEEE Transactions\n\non Information Theory, 2010.\n\n[8] L. Balzano, R. Nowak, and B. Recht, Online identi\ufb01cation and tracking of subspaces from highly\nincomplete information, Allerton Conference on Communication, Control and Computing, 2010.\n\n[9] B. Recht, A simpler approach to matrix completion, Journal of Machine Learning Research,\n\n2011.\n\n[10] S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank\n\nminimization, Mathematical Programming, 2011.\n\n[11] Z. Wen, W. Yin and Y. Zhang, Solving a low-rank factorization model for matrix completion by\na non-linear successive over-relaxation algorithm, Mathematical Programming Computation,\n2012.\n\n[12] Y. Shen, Z. Wen and Y. Zhang, Augmented lagrangian alternating direction method for matrix\nseparation based on low-rank factorization, International Conference on Numerical Optimization\nand Numerical Linear Algebra, 2014.\n\n[13] E. Chunikhina, R. Raich and T. Nguyen, Performance analysis for matrix completion via\n\niterative hard-thresholded SVD, IEEE Statistical Signal Processing, 2014.\n\n[14] Y. Chen, S. Bhojanapalli, S. Sanghavi and R. Ward, Coherent matrix completion, International\n\nConference on Machine Learning, 2014.\n\n[15] Y. Chen, Incoherence-optimal matrix completion, IEEE Transactions on Information Theory,\n\n2015.\n\n[16] P. Jain, P. Netrapalli and S. Sanghavi, Low-rank matrix completion using alternating minimiza-\n\ntion, ACM symposium on Theory of computing, 2013.\n\n[17] L. Balzano, A. Szlam, B. Recht and R. Nowak, K-subspaces with missing data, IEEE Statistical\n\nSignal Processing, 2012.\n\n[18] D. Pimentel-Alarc\u00f3n, L. Balzano and R. Nowak, On the sample complexity of subspace cluster-\n\ning with missing data, IEEE Statistical Signal Processing, 2014.\n\n[19] D. Pimentel-Alarc\u00f3n, L. Balzano, R. Marcia, R. Nowak and R. Willett, Group-sparse subspace\n\nclustering with missing data, IEEE Statistical Signal Processing, 2016.\n\n[20] C. Yang, D. Robinson and R. Vidal, Sparse subspace clustering with missing entries, Interna-\n\ntional Conference on Machine Learning, 2015.\n\n[21] E. Elhamifar, High-rank matrix completion and clustering under self-expressive models, Ad-\n\nvances in Neural Information Processing Systems, 2016.\n\n[22] G. Ongie, R. Willett, R. Nowak and L. Balzano, Algebraic variety models for high-rank matrix\n\ncompletion, International Conference on Machine Learning, 2017.\n\n[23] D. Pimentel-Alarc\u00f3n, N. Boston and R. Nowak, Deterministic conditions for subspace iden-\nti\ufb01ability from incomplete sampling, IEEE International Symposium on Information Theory,\n2015.\n\n9\n\n\f[24] F. Kir\u00e1ly, L. Theran and R. Tomioka, The algebraic combinatorial approach for low-rank matrix\n\ncompletion, Journal of Machine Learning Research, 2015.\n\n[25] D. Pimentel-Alarc\u00f3n and R. Nowak, A converse to low-rank matrix completion, IEEE Interna-\n\ntional Symposium on Information Theory, 2016.\n\n[26] M. Ashraphijuo, X. Wang and V. Aggarwal, A characterization of sampling patterns for low-\nrank multiview data completion problem, IEEE International Symposium on Information Theory,\n2017.\n\n[27] M. Ashraphijuo, V. Aggarwal and X. Wang, A characterization of sampling patterns for low-\ntucker-rank tensor completion problem, IEEE International Symposium on Information Theory,\n2017.\n\n[28] R. Govindan and H. Tangmunarunkit, Heuristics for Internet Map Discovery, IEEE INFOCOM\n\n2000.\n\n[29] P. Barford, A. Bestavros, J. Byers and M. Crovella, On the marginal utility of network topology\n\nmeasurements, Proceedings of ACM Internet Measurement Workshop, 2001.\n\n[30] N. Spring, R. Mahajan, D. Wetherall and T. Anderson, Measuring ISP topologies with rocketfuel,\n\nIEEE/ACM Transactions on Networking, 2004.\n\n[31] D. Alderson, L. Li, W. Willinger and J. Doyle, Understanding internet topology: Principles,\n\nmodels and validation, IEEE/ACM Transactions on Networking, 2005.\n\n[32] R. Sherwood, A. Bender and N. Spring, DisCarte: A disjunctive Internet cartographer, ACM\n\nSIGCOMM, 2008.\n\n[33] B. Eriksson, P. Barford and R. Nowak, Network Discovery from Passive Measurements, ACM\n\nSIGCOMM, 2008.\n\n[34] B. Eriksson, P. Barford, J. Sommers and R. Nowak, DomainImpute: inferring unseen compo-\n\nnents in the Internet, IEEE INFOCOM, 2011.\n\n[35] Z. Lin, M. Chen, L. Wu, and Y. Ma, The augmented Lagrange multiplier method for exact\nrecovery of corrupted low-rank matrices, University of Illinois at Urbana-Champaign Technical\nReport, 2009.\n\n[36] Z. Lin, R. Liu and Z. Su, Linearized alternating direction method with adaptive penalty for low\n\nrank representation, Advances in Neural Information Processing Systems, 2011.\n\n[37] X. Ding, L. He and L. Carin, Bayesian robust principal component analysis, IEEE Transactions\n\non Image Processing, 2011.\n\n[38] X. Shu, F. Porikli and N. Ahuja, Robust orthonormal subspace learning: Ef\ufb01cient recovery\nof corrupted low-rank matrices, International Conference on Computer Vision and Pattern\nRecognition, 2014.\n\n[39] Y. Yang, Y. Feng and J. Suykens, A nonconvex relaxation approach to robust matrix completion,\n\nPreprint, 2014.\n\n[40] T. Bouwmans, A. Sobral, S. Javed, S. Jung and E. Zahzah, Decomposition into low-rank plus\nadditive matrices for background/foreground separation: A review for a comparative evaluation\nwith a large-scale dataset, Computer Science Review, 2016.\n\n[41] E. Cand\u00e8s, X. Li, Y. Ma and J. Wright, Robust principal component analysis?, Journal of the\n\nACM, 2011.\n\n[42] T. Bouwmans and E. Zahzah, Robust PCA via principal component pursuit: a review for a\ncomparative evaluation in video surveillance, Computer Vision and Image Understanding, 2014.\n\n[43] Y. Ma, Low-rank matrix recovery and completion via convex optimization, avaiable at http:\n\n//perception.csl.illinois.edu/matrix-rank/home.html.\n\n[44] X. Yuan and J. Yang, Sparse and low-rank matrix decomposition via alternating direction\nmethods, available at http://www.optimization-online.org/DB_HTML/2009/11/2447.\nhtml, 2009.\n\n[45] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen and Y. Ma, Fast convex optimization algorithms\nfor exact recovery of a corrupted low-rank matrix, Computational Advances in Multi-Sensor\nAdaptive Processing, 2009.\n\n10\n\n\f[46] D. Pimentel-Alarc\u00f3n and R. Nowak, Random consensus robust PCA, Electronic Journal of\n\nStatistics, 2017.\n\n[47] J. Mairal, F. Bach, J. Ponce and G. Sapiro, Online dictionary learning for sparse coding,\n\nInternational Conference on Machine Learning, 2009.\n\n[48] S. Highlander, High throughput sequencing methods for microbiome pro\ufb01ling: application to\n\nfood animal systems, Animal Health Research Reviews, 2012.\n\n[49] S. Mande, M. Mohammed and T. Ghosh, Classi\ufb01cation of metagenomic sequences: methods\n\nand challenges, Brie\ufb01ngs in Bioinformatics, 2012.\n\n[50] R. Ranjan, A. Rani, A. Metwally, H. McGee and D. Perkins, Analysis of the microbiome: Advan-\ntages of whole genome shotgun versus 16S amplicon sequencing, Biochemical and Biophysical\nResearch Communications, 2016.\n\n[51] N. Nguyen, T. Warnow, M. Pop and B. White, A perspective on 16S rRNA operational taxonomic\n\nunit clustering using sequence similarity, Bio\ufb01lms and Microbiomes, 2016.\n\n[52] G. Mar\u00e7ais, A. Delcher, A. Phillippy, R. Coston, S. Salzberg and A. Zimin, MUMmer4: A fast\n\nand versatile genome alignment system, PLoS Computational Biology, 2018.\n\n[53] R. Vidal, Subspace clustering, IEEE Signal Processing Magazine, 2011.\n\n[54] G. Liu, Z. Lin and Y. Yu, Robust subspace segmentation by low-rank representation, Interna-\n\ntional Conference on Machine Learning, 2010.\n\n[55] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu and Y. Ma, Robust recovery of subspace structures by\nlow-rank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.\n\n[56] M. Soltanolkotabi, E. Elhamifar and E. Cand\u00e8s, Robust subspace clustering, Annals of Statistics,\n\n2014.\n\n[57] C. Qu and H. Xu, Subspace clustering with irrelevant features via robust Dantzig selector,\n\nAdvances in Neural Information Processing Systems, 2015.\n\n[58] X. Peng, Z. Yi and H. Tang, Robust subspace clustering via thresholding ridge regression,\n\nAAAI Conference on Arti\ufb01cial Intelligence, 2015.\n\n[59] Y. Wang and H. Xu, Noisy sparse subspace clustering, International Conference on Machine\n\nLearning, 2013.\n\n[60] Y. Wang, Y. Wang and A. Singh, Differentially private subspace clustering, Advances in Neural\n\nInformation Processing Systems, 2015.\n\n[61] H. Hu, J. Feng and J. Zhou, Exploiting unsupervised and supervised constraints for subspace\n\nclustering, IEEE Pattern Analysis and Machine Intelligence, 2015.\n\n[62] E. Elhamifar and R. Vidal, Sparse subspace clustering: algorithm, theory, and applications,\n\nIEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.\n\n[63] R. Basri and D. Jacobs, Lambertian re\ufb02ectance and linear subspaces, IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, 2003.\n\n[64] A. Georghiades, P. Belhumeur and D. Kriegman, From few to many: Illumination cone models\nfor face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis\nand Machine Intelligence, 2001.\n\n[65] K. Toyama, J. Krumm, B. Brumitt and B. Meyers, Wall\ufb02ower: principles and practice of back-\nground maintenance, IEEE International Conference on Computer Vision, 1999. Dataset avail-\nable at: http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/\ntestimages.htm\n\n[66] L. Li, W. Huang, I. Gu and Q. Tian, Statistical modeling of complex backgrounds for foreground\nobject detection, IEEE Transactions on Image Processing, 2004. Dataset available at: http:\n//perception.i2r.a-star.edu.sg/bk_model/bk_index.html\n\n[67] A. Dempster, N. Laird and D. Rubin, Maximum likelihood from incomplete data via the EM\n\nalgorithm, Journal of the royal statistical society, 1977.\n\n[68] M. Tipping and C. Bishop, Mixtures of probabilistic principal component analysers, Neural\n\nComputation, 1999.\n\n[69] X. Yi, C. Caramanis and S. Sanghavi, Alternating Minimization for Mixed Linear Regression,\n\nInternational Conference on Machine Learning, 2014.\n\n11\n\n\f", "award": [], "sourceid": 1112, "authors": [{"given_name": "Daniel", "family_name": "Pimentel-Alarcon", "institution": "Georgia State University"}]}