{"title": "When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?", "book": "Advances in Neural Information Processing Systems", "page_first": 1141, "page_last": 1148, "abstract": "", "full_text": "When Does Non-Negative Matrix Factorization\n\nGive a Correct Decomposition into Parts?\n\nDavid Donoho\n\nDepartment of Statistics\n\nStanford University\nStanford, CA 94305\n\nVictoria Stodden\n\nDepartment of Statistics\n\nStanford University\nStanford, CA 94305\n\ndonoho@stat.stanford.edu\n\nvcs@stat.stanford.edu\n\nAbstract\n\nWe interpret non-negative matrix factorization geometrically, as the\nproblem of \ufb01nding a simplicial cone which contains a cloud of data\npoints and which is contained in the positive orthant. We show that under\ncertain conditions, basically requiring that some of the data are spread\nacross the faces of the positive orthant, there is a unique such simpli-\ncial cone. We give examples of synthetic image articulation databases\nwhich obey these conditions; these require separated support and facto-\nrial sampling. For such databases there is a generative model in terms\nof \u2018parts\u2019 and NMF correctly identi\ufb01es the \u2018parts\u2019. We show that our\ntheoretical results are predictive of the performance of published NMF\ncode, by running the published algorithms on one of our synthetic image\narticulation databases.\n\n1\n\nIntroduction\n\nIn a recent article in Nature [4], Lee and Seung proposed the notion of non-negative matrix\nfactorization (NMF) as a way to \ufb01nd a set of basis functions for representing non-negative\ndata. They claimed that the notion is particularly applicable to image articulation libraries\nmade up of images showing a composite object in many articulations and poses. They\nsuggested (in the very title of the article) that when used in the analysis of such data, NMF\nwould \ufb01nd the intrinsic \u2018parts\u2019 underlying the object being pictured.\n\nNMF is akin to other matrix decompositions which have been proposed previously, such\nas positive matrix factorization (PMF) of Juvela, Lehtinen, and Paatero [3], [2] and various\nminimum-volume transforms used in the analysis of remote-sensing data [1]. Numerous\napplications of these methods have been attempted [6], [7], [9].\n\nDespite all the literature and discussion of this method, two fundamental questions appear\nnot to have been posed clearly, let alone answered:\n\n\u2022 Under what assumptions is the notion of non-negative matrix factorization well-\n\nde\ufb01ned, for example is the factorization in some sense unique?\n\n\u2022 Under what assumptions is the factorization correct, recovering the \u2018right an-\n\nswer\u2019?\n\n\fIn this paper, we develop a geometric view of the setting underlying NMF factorization\nand derive geometric conditions under which the factorization is essentially unique, so\nNMF makes sense no matter what algorithm is being employed. We then consider those\nconditions in the setting of image articulation libraries. We describe a class of image li-\nbraries which are created by an NMF-style generative model, where different parts have\nseparate support, and where all different combinations of parts are exhaustively sampled.\nOur theory shows that, in such Separable Factorial Articulation Families, non-negative fac-\ntorization is effectively unique. In such libraries, NMF will indeed successfully \u2018\ufb01nd the\nparts\u2019. We construct such a library, showing a stick \ufb01gure with four limbs going through a\nrange of various motions, and verify that our theoretical analysis is predictive of the actual\nperformance of the Lee and Seung algorithm on this image library. Our viewpoint also\nexplains relations between NMF and other ideas for obtaining non-negative factorizations\nand explains why uniqueness and stability may fail under other conditions.\n\nWe note that Plumbley [5] has in some sense already validated NMF for datasets which\nare not only non-negative but which obey an independent components model. However,\nin our view, this is actually a result about independent components analysis, not NMF. For\nexample, for the kinds of image articulation families where each part is viewed in one of\nmany positions, the underlying exclusion principle \u2013 that a certain part can only be present\nin one particular articulation \u2013 guarantees that an ICA model does not apply. And this\nparts-based setting is exactly the setting for NMF envisioned by Seung and Lee.\n\n2 Non-Negative Matrix Factorization\n\nNMF seeks to decompose a non-negative n \u00d7 p matrix X, where each row contains the p\npixel values for one of the n images, into\n\nX = A\u03a8\n\n(1)\n\nwhere A is n \u00d7 r and \u03a8 is r \u00d7 p, and both A and \u03a8 have non-negative entries. The rows of\n\u03a8, denoted (\u03c8j)r\ni=1, belong to Rr and\ncan be thought of as coef\ufb01cient sequences representing the images in that basis. Recalling\nthat the rows of X, (xi), are individual images stored as row vectors, the representation\ntakes the form\n\nj=1, are basis elements in Rp and the rows of A, (\u03b1i)n\n\nr\n\nIndexing the pixels by k = 1, . . . , p, non-negativity of \u03b1i and \u03c8j can be written as:\n\nxi =\n\n\u03b1i\n\nj\u03c8j.\n\nX\n\nj=1\n\n\u03c8j(k) \u2265 0, j = 1, . . . , r, k = 1, . . . , p; \u03b1i\n\nj \u2265 0, j = 1, . . . , r, i = 1, . . . , n.\n\n(2)\n\nIt is clear that as a generative model, this approach makes sense; each of us can think of\nsome admittedly very simple imaging settings where the scene is composed out of \u2018standard\nparts\u2019 in a variety of positions, where these are represented by the \u03c8j and each image is\nmade by superposing some of those \u2018parts\u2019. In this setting each part is either present or\nabsent, and the corresponding coef\ufb01cient is thus positive or zero. An example of this kind\nwill be given in Section 4 below.\n\nWhat is less clear is whether, when the generative model actually holds and we generate a\nsynthetic dataset based on that model, the NMF matrix factorization of the dataset will yield\nunderlying basis elements which have some connection to the true generative elements. In\nthis paper we investigate this question and exhibit conditions under which NMF will in fact\nsuccessfully recover the true generative elements.\n\n\f3 Geometric Interpretation of the NMF Setting\n\nWe now describe a geometric viewpoint which will help explain the issues involved.\n\nEach image in our database of images can be thought of as a point in a p-dimensional space,\nwhose p coordinates are given by the intensity values in each of the p pixels. The fact that\nimage data are non-negative means that every such point lies in the positive orthant P of\nRp.\nThe factorization X = A\u03a8 says that there are vectors \u03c8j in Rp such that all the data points\nxi have a representation as non-negative linear combinations of the \u03c8j. This algebraic\ncharacterization has a geometric counterpart.\nDe\ufb01nition. Thesimplicial cone generatedbyvectors\u03a6 = (\u03c6j)r\n\nj=1 is\n\n\u0393 = \u0393\u03a6 = {x : x = X\n\n\u03b1j\u03c6j, \u03b1j \u2265 0}.\n\nj\n\nThe factorization (1) tells us geometrically that the (xi) all lie in the simplicial cone \u03a3\u03a8\ngenerated by the (\u03c8j).\n\nNow in general, for a given dataset (xi), there will be many possible simplicial cones\ncontaining the points in that dataset. Indeed, if \u0393\u03a8 is a simplicial cone containing the data,\nand \u0393\u03a6 is another cone containing the \ufb01rst, so that\n\u0393\u03a8 \u2282 \u0393\u03a6,\n\nthen the corresponding vectors \u03a6 = (\u03c6j) also can furnish a representation of the dataset\n(xi). Now for any simplicial cone, there can always be another cone containing it strictly,\nso there are an in\ufb01nite number of factorizations X = A\u03a8 with non-negative A, and various\n\u03a8 which are nontrivially different. Hence the constraint A \u2265 0 is not enough to lead to a\nwell-de\ufb01ned notion.\n\nHowever, the geometric viewpoint we are developing does not so far include the positivity\nconstraint \u03a8 \u2265 0 on the generating vectors of the simplicial cone. Geometrically, this\nconstraint demands that the simplicial cone \u0393\u03a8 lies inside the positive orthant P. Can we\nobtain uniqueness with this extra constraint?\n\nNot if the data values are strictly positive, so that\n\nXi,k \u2265 \u0001 > 0\n\n(3)\nGeometrically, this condition places the data points xi well inside the interior of the positive\northant P. It is then evident by visual inspection that there will be many simplicial cones\ncontaining the data. For example, P itself is a simplicial cone, and it contains the data\npoints. However, many other cones will also contain the data points. Indeed, for \u03b4 > 0\nconsider the collection of vectors \u03a6\u03b4 with individual vectors\n\n\u2200i, k.\n\n\u03c6\u03b4\n\nj = ej + \u03b41\n\nwhere ej denotes the usual vector in the standard basis, and 1 denotes the vector of all\nones. Then, for \u03b4 < \u0001, the cone \u0393\u03a6\u03b4 also contains all the data points. Geometrically \u0393\u03a6\u03b4\nis a dilation of the positive orthant that shrinks it slightly towards the main diagonal. Since\nthe positivity constraint (3) places all the data well inside the interior of the positive orthant,\nfor slight enough shrinkage it will still contain the data.\n\nIt follows from the geometric-algebraic correspondence that under the strict positivity con-\ndition (3), there are many distinct representations X = A\u03a8 where A \u2265 0 and \u03a8 \u2265 0.\nIn short, we must look for situations where the data do not obey strict positivity in order to\nhave uniqueness.\n\n\f4 An Example of Uniqueness\n\nWhen we take the non-negativity constraint on the generating elements (the extreme rays\nof the simplicial cone) into account, it can happen that there will only be one simplicial\ncone containing the data. This is completely clear if the data somehow \u2018\ufb01ll out\u2019 the positive\northant. What is perhaps surprising is that uniqueness can hold even when the data only\n\u2018\ufb01ll out\u2019 a proper subset of the positive orthant.\n\nHere is an example of how that can occur. Consider the \u2018ice-cream cone\u2019\n\nC = {x : x01 \u2265 pp \u2212 1||x||}\n\nwhere p is again the dimensionality of the dataspace.\nLemma 1. Thereisauniquesimplicialconewhichbothcontains C andisitselfcontained\ninthepositiveorthant.\n\nIndeed that unique cone is P itself; no simplicial cone contained inside P contains all of\nC!\n\nTo give a full proof, we introduce notions from the subject of convex duality [8]. Associated\nwith the primal domain of points x we have been dealing with so far, there is also the dual\ndomain of linear functionals \u03be acting on points x via \u03be0x. If we have a convex set C, its\ndual C \u2217 is de\ufb01ned as a collection of linear functionals which are positive on C:\n\nC \u2217 = {\u03be : \u03be0x \u2265 0 \u2200x \u2208 C}\n\nThe following facts are easily veri\ufb01ed:\nLemma 2.\n\n\u2022 If K isclosedandconvexthen(K \u2217)\u2217 = K.\n\u2022 The dual of a simplicial cone with p linearly independent generators, is another\n\nsimplicialconewith p generators.\n\n\u2022 Thepositiveorthantisself-dual: P \u2217 = P.\n\u2022 Dualityreversessetinclusion:\n\nB \u2282 C =\u21d2 C \u2217 \u2282 B\u2217.\n\n(4)\n\nWe also need\nDe\ufb01nition. Given a pointset (xi), its conical hull is the simplicial hull generated by the\nvectors (xi) themselves.\nLet X be the conical hull of a pointset. An abstraction of the NMF problem is:\nPrimal-Simplicial-Cone(r, X ) Find a simplicial cone with r generators contained in P\nandcontainingX.\nConsider now a problem in the dual domain, posed with reversed inclusions:\nDual-Simplicial-Cone(r, \u039e) Find a simplicial cone with r generators contained in \u039e and\ncontainingP .\nThe two problems are indeed dual:\nLemma 3. Every solution to Primal-Simplicial-Cone(r, X ) is dual to a solution of Dual-\nSimplicial-Cone(r, X \u2217),andvice-versa.\n\n\fProof. This is effectively the invocation of \u2018reversal of inclusion under duality\u2019 (4). Sup-\npose we \ufb01nd a simplicial cone \u0393 obeying\n\nThen (4) says that\n\nX \u2282 \u0393 \u2282 P.\n\nP \u2217 \u2282 \u0393\u2217 \u2282 X \u2217,\n\nand so a solution to the primal solves the dual. In the other direction, if we \ufb01nd a simplicial\ncone \u0393\u2217 obeying\n\nP \u2217 \u2282 \u0393\u2217 \u2282 X \u2217\n\nthen we have by (4)\n\n(X \u2217)\u2217 \u2282 (\u0393\u2217)\u2217 \u2282 (P \u2217)\u2217;\n\nwe simply apply (K \u2217)\u2217 = K three times to see that a solution to the dual corresponds to a\nsolution to the primal. QED\n\nOur motivation in introducing duality is to see something we couldn\u2019t in the primal: we\ncan see that evenif X isproperlycontainedin P,therecanbeauniquesimplicialhullfor\nX whichliesinsideP.\nThis follows from a simple observation about simplicial cones contained in convex cones.\nDe\ufb01nition. An extreme ray of a convex cone \u0393 is a ray Rx = {ax : a \u2265 0} where\nx \u2208 \u0393 cannot be represented as a proper convex combination of two points x0 and x1\nwhichbelongto\u0393 butnot Rx.\nFor example, a simplicial cone with r linearly independent generators has r extreme rays;\neach ray consists of all positive multiples of one generator.\nLemma 4. Supposethat\u0393 andG areconvexcones,that\u0393 \u2282 G \u2282 Rr,that\u0393 isasimplicial\nconewithr generatorsandthatG intersects\u0393 inexactlyr rayswhichareextremeraysofG.\nThen(a)theseraysarealsoextremeraysof\u0393 and(b)nosimplicialconewith r generators\n\u03930 6= \u0393 cansatisfy\u0393 \u2282 \u03930 \u2282 G.\nProof. (a) Since the rays in question are extreme rays of G, which contains \u0393, they are also\nextreme rays of \u0393. (b) Any simplicial cone \u03930 with r generators and lying \u2018in between\u2019 \u0393\nand G would have to also intersect G in the same r rays as \u0393 does. Those r rays would also\nhave to be extreme rays for \u03930, because they are extreme rays for G, which by hypothesis\ncontains \u03930. But a simplicial cone with r generators is completely determined by its r\nextreme rays. As \u0393 and \u03930 have the same extreme rays, \u0393 = \u03930. QED\nWe can now prove Lemma 1. Recall the cone C de\ufb01ned above. Its dual is\n\nC \u2217 = {\u03be : \u03be01 \u2265 ||\u03be||}\n\nNote (a) that every boundary ray of C \u2217 is extreme; and (b) that C \u2217 intersects P \u2217 on the\nn unit vectors ej. So by Lemma 4, P \u2217 uniquely solves the Dual-Simplicial-Cone(n, C \u2217)\nproblem and P solves the Primal-Simplicial-Cone(n, C) problem uniquely. QED.\n\n5 Uniqueness for Separable Factorial Articulation Families\n\nWe now describe families of articulated images which have at least a few \u2018realistic\u2019 features,\nand which, because of the relevant convex geometry, offer an essentially unique NMF.\n\nThe families of images we have in mind consist of black-and-white images with P parts,\neach exercised systematically through A articulations. As an illustration, Figure 1 shows\nsome sample images from the Swimmer dataset, which depicts a \ufb01gure with four moving\nparts (limbs), each able to exhibit four articulations (different positions).\n\n\fDe\ufb01nition. A Separable Factorial Articulation Family is a collection X of points x\nobeying these rules:\n\n[R1] Generative Model. Each image x in the database has a representation\n\nx =\n\nP\n\nA\n\nX\n\nX\n\nq=1\n\na=1\n\n\u03b1q,a\u03c8q,a\n\nwhere the generators \u03c8q,a \u2208 Rp obey the non-negativity constraint \u03c8q,a \u2265 0\nalong with the coef\ufb01cients \u03b1q,a \u2265 0. We speak of \u03c8q,a as the q\u2019th part in the\n\u2018a\u2019-th articulation.\n\n[R2] Separability. For each q, a there exists a pixel kq,a such that\n\n\u03c8q0,a0 (kq,a) = 1{a=a0,q=q0}\n\n(5)\n\nI.e. each part/articulation pair\u2019s presence or absence in the image is indicated by\na certain pixel associated to that pair.\n\n[R3] Complete Factorial Sampling. The dataset contains all AP images in which the P\n\nparts appear in all combinations of A articulations.\n\nFigure 1: Sample images from the Swimmer database depicting four stick \ufb01gures with four\nlimbs; the panels illustrate different articulations of the limbs.\n\nThe Swimmer dataset obeys these rules except for one disagreement: every image contains\nan invariant region (the torso). As it turns out this is of small importance.\nWe note that assumption [R2] forces the generators \u03c8q,a to be linearly independent, which\nforces p > A\u00b7P . Consequently, the linear span of the generators is some subspace V \u2282 Rp.\nTheorem 1. Given a database obeying rules [R1]-[R3], there is a unique simplicial hull\nwith r = A \u00b7 P generatorswhichcontainsallthepointsofthedatabase, andiscontained\ninP \u2229 V .\nSince the generative model [R1] implies that a particular simplicial hull with a speci\ufb01c\nchoice of r generators contains the dataset, and a successful application of NMF also gives\na simplicial hull with r generators containing the dataset, and the theorem says these must\nbe the same hull, in this setting NMF recovers the generative model. Formally,\n\n\fCorollary. Let X begeneratedbyrules [R1]-[R3]. Anyfactorizationobeying(1)and(2)\nmustrecoverthecorrectgenerators(\u03c8q,a) modulopermutationoflabelsandrescaling.\n\n6 Proof of Theorem 1.\n\nWe need to introduce the notion of duality relative to a vector space V \u2282 Rp. In the case\nof V \u2261 Rp this is just the notion of duality already introduced. Suppose that we have a set\nK \u2282 V ; its relative dual K v is the set of linear functionals \u03be which, viewed as members of\nRp also belong to V , and which obey \u03be0x \u2265 0 for x \u2208 K. In effect, the relative dual is the\nordinary dual taken within V rather than Rp. As a result, all the properties of Lemma 2 hold\nfor relative duality provided we talk about sets which are subsets of V ; e.g. (K v)v = K if\nK is a closed convex subset of V .\nDe\ufb01ne PV = V \u2229 P; this is a simplicial cone in V with r generators.\nLet again X denote the conical hull of X = (xi) and suppose that every (r\u22121)-dimensional\nface of PV contains r \u2212 1 linearly independent points from X. Since the face of a cone\nis a linear subspace, the face is uniquely determined by these r \u2212 1 points. The face is\npart of a supporting hyperplane to PV which is also a supporting hyperplane to X . The\nsupporting hyperplane de\ufb01nes a point \u03be \u2208 V which is in common between the duals P v\nV\nand X v. Similar statements hold for all the r different (r\u22121)-faces of PV . But more is true.\nBecause of the linear independence mentioned above, the different supporting hyperplanes\nin primal space correspond in fact to extreme rays in dual space \u2013 extreme rays for both\nV and X v. As this is true for all r of the (r \u2212 1)-dimensional faces, we are in a position\nP v\nto apply Lemma 4 with G = X v and \u0393 = P v\nV is the\nunique simplicial cone with r generators contained in X v and containing P v\nV . Theorem 1\nthen follows by duality.\n\nV . This gives the conclusion that P v\n\nIt remains to establish the assumption about existence of r \u2212 1 linear independent points\non each (r \u2212 1)-face. The faces of PV are exactly the r different subspaces\n\nFq,a = {x \u2208 V : \u03b1q,a = 0}.\n\nBy the Complete Factorial Sampling assumption [R3], there are AP \u22121(A \u2212 1) points of X\nin such a face. De\ufb01ne, for each (q0, a0) 6= (q, a),\n\n\u03c6q0,a0;q,a = Ave{x \u2208 X : \u03b1q,a = 0, \u03b1q0,a0 = 1}.\n\nThere are r \u2212 1 such terms, one for each part/articulation pair besides (q, a). By the Sepa-\nrability assumption [R2]:\n\nHence the (\u03c6q0,a0;q,a : (q0, a0) 6= (q, a)) are linearly independent. At the same time,\n\n\u03c6q0,a0;q,a(kq00 ,a00 ) = 1{q0=q00 ,a0=a00 }.\n\nso that each \u03c6q0,a0;q,a \u2208 Fq,a. Hence we have the required linearly independent subset in\neach face. QED\n\n\u03c6q0,a0;q,a(kq,a) = 0\n\n7 Empirical Veri\ufb01cation\n\nWe built the Swimmer image library of 256 32\u00d732 images. Each image contains a \u2018torso\u2019\nof 12 pixels in the center and four \u2018arms\u2019 of 6 pixels that can be in one of 4 positions. All\ncombinations of all possible arm positions gives us 256 images. See Figure 1 for examples.\n\nThis collection of images has four \u2018parts\u2019. It deviates slightly from the rules [R1]-[R5]\nbecause there is an invariant region (the torso). Figure 2 shows that the 16 different\npart/articulation pairs are properly resolved, but that the torso is not properly resolved.\n\n\fFigure 2: NMF Generators recovered from Swimmer database. The 16 images shown\nagreewellwiththeknownlistofgenerators(4\u2018limbs\u2019in4positionseach). Thepresence\nof the torso (i.e. an invariant region) violates our conditions for a Factorial Separable\nArticulationLibrary,and,notunexpectedly,ghostsofthetorsocontaminateseveralofthe\nreconstructedgenerators. LeeandSeung\u2019scode[4]wasused.\n\nAcknowledgments\n\nThis work was partially supported by NSF grants DMS-0077261, DMS-0140698, and ANI-\n008584 and a contract from DARPA ACMP. We would like to thank Aapo Hyv\u00a8arinen for\nnumerous helpful discussions.\n\nReferences\n\n[1] M. Craig. Minimum-volume transforms for remotely sensed data. IEEE Transactions on Geo-\nscience and Remote Sensing, 32(3):542-552, May 1994.\n\n[2] M. Juvela, K. Lehtinen, and P. Paatero. The use of positive matrix factorization in the analysis\nof molecular line spectra from the thumbprint nebula. In D. P. Clemens and R. Barvainis, editors,\nClouds, Cores, and Low Mass Stars, volume 65 of ASP Conference Series, 176-180, 1994.\n\n[3] M. Juvela, K. Lehtinen, and P. Paatero. The use of positive matrix factorization in the analysis of\nmolecular line spectra. MNRAS, 280:616-626, 1996.\n\n[4] D. Lee and S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature,\n401:788-791, 1999.\n\n[5] M. Plumbley. Conditions for nonnegative independent components analysis. Signal Processing\nLetters, IEEE, 9(6):177-180, 2002.\n\n[6] A. Polissar, P. Hopke, W. Malm, and J. Sisler. Atmospheric aerosol over alaska 1. spatial and\nseasonal variability. Journal of Geophysical Research, 103(D15):19035-19044, August 1998.\n\n[7] A. Polissar, P. Hopke, W. Malm, and J. Sisler. Atmospheric aerosol over alaska 2. elemental\ncomposition and sources. Journal of Geophysical Research, 103(D15):19045-19057, August 1998.\n\n[8] R. T. Rockefellar. Convex Analysis, Princeton University Press, 1970.\n\n[9] W. Size. Use and Abuse of Statistical Methods in the Earth Sciences, chapter 3, pages 33-46.\nOxford University Press, 1987.\n\n\f", "award": [], "sourceid": 2463, "authors": [{"given_name": "David", "family_name": "Donoho", "institution": null}, {"given_name": "Victoria", "family_name": "Stodden", "institution": null}]}