{"title": "Natural Images, Gaussian Mixtures and Dead Leaves", "book": "Advances in Neural Information Processing Systems", "page_first": 1736, "page_last": 1744, "abstract": "Simple Gaussian Mixture Models (GMMs) learned from pixels of natural image patches have been recently shown to be surprisingly strong performers in modeling the statistics of natural images. Here we provide an in depth analysis of this simple yet rich model. We show that such a GMM model is able to compete with even the most successful models of natural images in log likelihood scores, denoising performance and sample quality. We provide an analysis of what such a model learns from natural images as a function of number of mixture components --- including covariance structure, contrast variation and intricate structures such as textures, boundaries and more. Finally, we show that the salient properties of the GMM learned from natural images can be derived from a simplified Dead Leaves model which explicitly models occlusion, explaining its surprising success relative to other models.", "full_text": "Natural Images, Gaussian Mixtures and Dead Leaves \n\nDaniel Zoran \n\nYair Weiss \n\nInterdisciplinary Center for Neural Computation \n\nSchool of Computer Science and Engineering \n\nHebrew University of Jerusalem \n\nIsrael \n\nHebrew University of Jerusalem \n\nIsrael \n\nhttp : //www . cs . hu j i . ac .i l/ daniez \n\nyweiss@cs . huj i. ac . i l \n\nAbstract \n\nSimple Gaussian Mixture Models (GMMs) learned from pixels of natural image \npatches have been recently shown to be surprisingly strong performers in modeling \nthe statistics of natural images. Here we provide an in depth analysis of this simple \nyet rich model. We show that such a GMM model is able to compete with even \nthe most successful models of natural images in log likelihood scores, denoising \nperformance and sample quality. We provide an analysis of what such a model \nlearns from natural images as a function of number of mixture components -\nincluding covariance structure, contrast variation and intricate structures such as \ntextures, boundaries and more. Finally, we show that the salient properties of the \nGMM learned from natural images can be derived from a simplified Dead Leaves \nmodel which explicitly models occlusion, explaining its surprising success relative \nto other models. \n\n1 GMMs and natural image statistics models \n\nMany models for the statistics of natural image patches have been suggested in recent years. Finding \ngood models for natural images is important to many different research areas -\ncomputer vision, \nbiological vision and neuroscience among others. Recently, there has been a growing interest in \ncomparing different aspects of models for natural images such as log-likelihood and multi-information \nreduction performance, and much progress has been achieved [1,2, 3,4,5, 6]. Out of these results \nthere is one which is particularly interesting: simple, unconstrained Gaussian Mixture Models \n(GMMs) with a relatively small number of mixture components learned from image patches are \nextraordinarily good in modeling image statistics [6, 4]. This is a surprising result due to the simplicity \nof GMMs and their ubiquity. Another surprising aspect of this result is that many of the current \nmodels may be thought of as GMMs with an exponential or infinite number of components, having \ndifferent constraints on the covariance structure of the mixture components. \n\nIn this work we study the nature of GMMs learned from natural image patches. We start with a \nthorough comparison to some popular and cutting edge image models. We show that indeed, GMMs \nare excellent performers in modeling natural image patches. We then analyze what properties of \nnatural images these GMMs capture, their dependence on the number of components in the mixture \nand their relation to the structure of the world around us. Finally, we show that the learned GMM \nsuggests a strong connection between natural image statistics and a simple variant of the dead \nleaves model [7, 8] , explicitly modeling occlusions and explaining some of the success of GMMs in \nmodeling natural images. \n\n1 \n\n\f3.5 \n\n., ... - \u2022\u2022....... -.-.. -.. ---'-. \n1 ~~6\\8161\u00b7\u00b7 \n.. . . \n\n~ \n\n'[25 ..... \n\n---- ] 1_ \n\n~2 \n;t:: \n61.5 \n'\"51 \n0 \nH \n\n1 \n\n0.5 \n\n..... \n\n..... \n.. _ .. ; \n\n1-\n\nf---\n\n--\n\n--\n\n-... -.. --... --.-- ---.. -.- -. --------------MII+\u00b7\u00b7+ilIl B'II \n\n..... \n\n--\n\n1-\nI \n1-\n\nf-\nI \n\nI -\n1--\n\n' f-\n\nI\u00b7 \n\n1-\n\n--(cid:173)\n\n,--\n---\n\nc \u2022 \n\nc \n\n._ .. \n: \n\n1-----\n\nIND G peA G pe A L \n\nICA Dcse GSM MoGSM KL GMM \n\nNoisy INO GPCA GPCA L ICA DCSC GSMMoGSM KL GMM \n\n(a) Log Likelihood \n\n(b) Denoising \n\nFigure 1: (a) Log likelihood comparison - note how the GMM is able to outperform (or equal) all \nother models despite its simplicity. (b) Denoising performance comparison - the GMM outperforms \nall other models here as well, and denoising performance is more or less consistent with likelihood \nperformance. See text for more details. \n\n2 Natural image statistics models - a comparison \n\nAs a motivation for this work, we start by rigorously comparing current models for natural images with \nGMMs. While some comparisons have been reported before with a limited number of components in \nthe GMM [6], we want to compare to state-of-the-art models also varying the number of components \nsystematically. \n\nEach model was trained on 8 x 8 or 16 x 16 patches randomly sampled from the Berkeley Segmentation \nDatabase training images (a data set of millions of patches). The DC component of all patches \nwas removed, and we discard it in all calculations . In all experiments, evaluation was done on the \nsame, unseen test set of a 1000 patches sampled from the Berkeley test images. We removed patches \nhaving standard deviation below 0.002 (intensity values are between 0 and 1) as these are totally flat \npatches due to saturation and contain no structure (only 8 patches were removed from the test set). \nWe do not perform any further preprocessing. The models we compare are: White Gaussian Noise \n(Ind. G), PCA/Gaussian (PCA G), PCA/Laplace (PCA L), ICA (ICA) [9,10,11], 2xOvercompiete \nsparse coding (2 x OCSC) [9], Gaussian Scale Mixture (GSM), Mixture of Gaussian Scale Mixture \n(MoGSM) [6], Karklin and Lewicki (KL) [12] and the GMM (with 200 components). \n\nWe compare the models using three criteria - log likelihood on unseen data, denoising results on \nunseen data and visual quality of samples from each model. The complete details of training, testing \nand comparisons may be found in the supplementary material of this paper - we encourage the reader \nto read these details. All models and code are available online at: www.cs.huji.ac.ilJ~daniez \n\nLog likelihood The first experiment we conduct is a log likelihood comparison. For most of the \nmodels above, a closed form calculation of the likelihood is possible, but for the 2 x OCSC and KL \nmodels, we resort to Hamiltonian Importance Sampling (HAIS) [13]. HAIS allows us to estimate \nlikelihoods for these models accurately, and we have verified that the approximation given by HAIS \nis relatively accurate in cases where exact calculations are feasible (see supplementary material for \ndetails). The results of the experiment may be seen in Figure 1a. There are several interesting results \nin this figure. First, the important thing to note here is that GMMs outperforms all of the models \nand is similar in performance to Karklin and Lewicki. In [6] a GMM with far less components (2-5) \nhas been compared to some other models (notably Restricted Boltzman Machines which the GMM \noutperforms, and MoGSMs which slightly outperform the GMMs in this work). Second, ICA with \nits learned Gabor like filters [10] gives a very minor improvement when compared to PCA filters \nwith the same marginals. This has been noted before in [1]. Finally, overcomp1ete sparse coding is \nactually a bit worse than complete sparse coding - while this is counter intuitive, this result has been \nreported before as well [14, 2]. \n\nDenoising We compare the denoising performance of the different models. We added independent \nwhite Gaussian noise with known standard deviation IJ\"n = 25/ 255 to each of the patches in the \ntest set x. We then calculate the MAP estimate :X: of each model given the noisy patch. This can \n\n2 \n\n\fbe done in closed form for some of the models, and for those models where the MAP estimate \ndoes not have a closed form, we resort to numerical approximation (see supplementary material \nfor more details). The performance of each model was measured using Peak Signal to Noise Ratio \n(PSNR): PSNR = 10glO ( 1I Ix~xIl2 ) . Results can be seen in Figure lb. Again, the GMM performs \nextraordinarily well, outperforming all other models. As can be seen, results are consistent with the \nlog likelihood experiment - models with better likelihood tend to perform better in denoising [4]. \n\nSample Quality As opposed to log likelihood and denoising, generating samples from all the \nmodels compared here is easy. While it is more of a subjective measure, the visual quality of samples \nmay be an indicator to how well interesting structures are captured by a model. Figure 2 depicts \n16 x 16 samples from a subset of the models compared here. Note that the GMM samples capture a \nlot of the structure of natural images such as edges and textures, visible on the far right of the figure. \nThe Karklin and Lewicki model produces rather structured patches as well. GSM seems to capture \nthe contrast variation of images, but the patches themselves have very little structure (similar results \nobtained with MoGSM, not shown). PCA lacks any meaningful structure, other than 1/ f power \nspectrum. \n\nAs can be seen in the results we have just presented, the GMM is a very strong performer in modeling \nnatural image patches. While we are not claiming Gaussian Mixtures are the best models for natural \nimages, we do think this is an interesting result, and as we shall see later, it relates intimately to the \nstructure of natural images. \n\n3 Analysis of results \n\nSo far we have seen that despite their simplicity, GMMs are very capable models for natural images. \nWe now ask - what do these models learn about natural images, and how does this affect their \nperformance? \n\n3.1 How many mixture components do we need? \n\nWhile we try to learn our GMMs with as few a priori assumptions as possible, we do need to set \none important parameter - the number of components in the mixture. As noted above, many of the \ncurrent models of natural images can be written in the form of GMMs with an exponential or infinite \nnumber of components and different kinds of constraints on the covariance structure. Given this, \nit is quite surprising that a GMM with a relatively small number of component (as above) is able \nto compete with these models. Here we again evaluate the GMM as in the previous section but \nnow systematically vary the number of components and the size of the image patch. Results for the \n16 x 16 model are shown in figure 3, see supplementary material for other patch sizes. \n\nAs can be seen, moving from one component to two already gives a tremendous boost in performance, \nalready outperforming lCA but still not enough to outperform GSM, which is outperformed at around \n16 components. As we add more and more components to the mixture performance increases, but \nseems to be converging to some upper bound (which is not reached here, see supplementary material \nfor smaller patch sizes where it is reached). This shows that a small number of components is indeed \n\nPCAG \n\nGSM \n\nKL \n\nGMM \n\nNatural Images \n\nFigure 2: Samples generated from some of the models compared in this work. PCA G produces no \nstructure other than 1/ f power spectrum. GSM capture the contrast variation of image patches nicely, \nbut the patches themselves have no structure. The GMM and KL models produce quite structured \npatches - compare with the natural image samples on the right. \n\n3 \n\n\f27.5 \n\n27 \n\n\u00b7=~=-:\u00b7::...a\u00b7::,,:;:;\u00b7-.e-\"\"-,:f.r:-:::---? \n~ . -----/~~:------------------\nc::: 26.5 _. ____ .. __ ... ;,...,L-_______________________ _ \n~ \n'-\n\n? \n\n26 -./ - . - -- ----.----..... -\n\nI \nI \n\nI \n\n'=================== ==~\\\\~61jr ~ \no \n\n25.5-~r ........ -.-'\u00b7T -.-\"t-,.\u00b7'\u00b7\u00b7y_r-r-,.\u00b7,..T..,......-'\u00b7...,..,,.....,.. ... \u00b7 ... \u00b7,-r..,...,.\u00b7.,.\u00b7l \n7 \n\n6 \n\n1 \n\n2 \n5 \nlog1(Nulll Components) \n\n3 \n\n4 \n\n(b) Denoising \n\ni \n4 \n\ni \n2 \n\n3 \n\n5 \niog:): (Num Components) \n1 and the other with /1>2. Figure 7a depicts the generative process for both kind of \npatches and Figure 7b depicts samples from the model. \n\n4.3 Gaussian mixtures and dead leaves \n\nIt can be easily seen that the mini dead leaves model is, in fact, a GMM. For each configuration of \nhidden variables (denoting whether the patch is \"fiat\" or \"edge\", the scalar multiplier z and if it is \nan edge patch the second scalar multiplier Z 2, r and (J) we have a Gaussian for which we know the \ncovariance matrix exactly. Together, all configurations form a GMM - the interesting thing here is \nhow the stnlcture of the covariance matrix given the hidden variable relates to natural images. \n\nFor Flat patches, the covariance is trivial- it is merely the texture of the stationary texture process L; \nmultiplied by the corresponding contrast scalar z. Since we require the texture to be stationary its \neigenvectors are the Fourier basis vectors [18] (up to boundary effects), much like the ones visible in \nthe first two components in Figure 5. \n\nFor Edge patches, given the hidden variable we know which pixel belongs to which \"object\" in the \npatch, that is, we know the shape of the occlusion mask exactly. If i and j are two pixels in different \nobjects, we know they will be independent, and as such uncorrelated, resulting in zero entries in the \ncovariance matrix. Thus, if we arrange the pixels by their object assignment, the eigenvectors of such \na covariance matrix would be of the form: \n\nwhere v is an eigenvector of the stationary (within-object) covariance and the rest of the entries are \nzeros, thus eigenvectors of the covariance will be zero on one side of the occlusion mask and Fourier(cid:173)\nlike on the other side. Figure 7c depicts the eigenvector of such an edge component covariance - note \nthe similar structure to Figure 7d and 5. This block structure is a common structure in the GMM \nlearned from natural images, showing that indeed such a dead leaves model is consistent with what \nwe find in GMMs learned on natural images. \n\n7 \n\n\f(a) Log Likelihood Comparison \n\n(b) Mini Dead Leaves - ICA \n\n(c) Natural Images - ICA \n\nFigure 8: (a) Log likelihood comparison with mini dead leaves data. We train a GMM with a varying \nnumber of components from mini dead leaves samples, and test its likelihood on a test set. We \ncompare to a PCA, ICA and a GSM model, all trained on mini dead leaves samples - as can be seen, \nthe GMM outperforms these considerably. Both PCA and ICA seek linear transformations, but since \nthe underlying generative process is non-linear (see Figure 7a), they fail. The GSM captures the \ncontrast variation of the data, but does not capture occlusions, which are an important part of this \nmodel. (b) and (c) ICA filters learned on mini dead leaves and natural image patches respectively, \nnote the high similarity. \n\n4.4 From mini dead leaves to natural images \n\nWe repeat the log likelihood experiment from sections 2 and 3, comparing to PCA, ICA and GSM \nmodels to GMMs. This time, however, both the training setand test set are generated from the mini \ndead leaves model. Results can be seen in Figure 8a. Both ICA and PCA do the best job that they \ncan in terms of finding linear projections that decorrelate the data (or make it as sparse as possible). \nBut because the true generative process for the mini dead leaves is not a linear transformation of \nlID variables, neither of these does a very good job in terms of log likelihood. Interestingly - ICA \nfilters learned on mini dead leaves samples are astonishingly similar to those obtain when trained on \nnatural images - see Figure 8b and 8c. The GSM model can capture the contrast variation of the data \neasily, but not the structure due to occlusion. A GMM with enough components, on the other hand, is \ncapable of explicitly modeling contrast and occlusion using covariance functions such as in Figure 7c, \nand thus gives much better log likelihood to the dead leaves data. This exact same pattern of results \ncan be seen in natural image patches (Figure 2), suggesting that the main reason for the excellent \nperformance of GMMs on natural image patches is its ability to model both contrast and occlusions. \n\n5 Discussion \n\nIn this paper we have provided some additional evidence for the surprising success of GMMs in \nmodeling natural images. We have investigated the causes for this success and the different properties \nof natural images which are captured by the model. We have also presented an analytical generative \nmodel for image patches which explains many of the features learned by the GMM from natural \nimages, as well as the shortcomings of other models. \n\nOne may ask - is the mini dead leaves model a good model for natural images? Does it explain \neverything learned by the GMM? While the mini dead leaves model definitely explains some of the \nproperties learned by the GMM, at its current simple form presented here, it is not a much better \nmodel than a simple GSM model. When adding the occlusion process into the model, the mini dead \nleaves gains -0.1 bit/pixel when compared to the GSM texture process it uses on its own. This makes \nit as good as a 32 component GMM, but significantly worse than the 200 components model (for \n8 x 8 patches). There are two possible explanations for this. One is that the GSM texture process \nis just not enough, and a richer texture process is needed (much like the one learned by the GMM). \nThe second is that the simple occlusion model we use here is too simplistic, and does not allow for \ncapturing the variable structures of occlusion present in natural images. Both of these may serve \nas a starting point for a more efficient and explicit model for natural images, handling occlusions \nand different texture processes explicitly. There have been several works in this direction already \n[19,20,21], and we feel this may hold promise for creating links to higher level visual tasks such as \nsegmentation, recognition and more. \nAcknowledgments \nThe authors wish to thank the Charitable Gatsby Foundation and the ISF for support. \n\n8 \n\n\fReferences \n\n[1] M. Bethge, \"Factorial coding of natural images: how effective are linear models in removing higher-order \n\ndependencies?\" vol. 23, no. 6, pp. 1253-1268, June 2006. \n\n[2] P. Berkes, R. Turner, and M. Sahani, \"On sparsity and overcompleteness in image models,\" in NIPS, 2007. \n[3] S. Lyu and E. P. Simoncelli, \"Nonlinear extraction of iindependent componentsuof natural images using \n\nradial Gaussianization,\" Neural Computation, vol. 21 , no. 6, pp. 1485-1519, Jun 2009. \n\n[4] D. Zoran and Y. Weiss, \"From learning models of natural image patches to whole image restoration,\" in \n\nComputer Vision (ICCV), 2011 IEEE International Conference on. \n\nIEEE, 2011, pp. 479-486. \n\n[5] B. Culpepper, J. Sohl-Dickstein, and B. Olshausen, \"Building a better probabilistic model of images by \n\nfactorization,\" in Computer Vision (ICCV), 20111EEE International Conference on. \n\nIEEE, 2011. \n\n[6] L. Theis, S. Gerwinn, F. Sinz, and M. Bethge, \"In all likelihood, deep belief is not enough,\" The Journal of \n\nMachine Learning Research, vol. 999888, pp. 3071-3096, 2011. \n\n[7] G. Matheron, Random sets and integral geometry. Wiley New York, 1975, vol. 1. \n[8] X. Pitkow, \"Exact feature probabilities in images with occlusion,\" Journal of Vision, vol. 10, no. 14,2010. \n[9] B. 01shausen et al., \"Emergence of simple-cell receptive field properties by learning a sparse code for \n\nnatural images,\" Nature, vol. 381, no. 6583, pp. 607-609, 1996. \n\n[10] A. J. Bell and T. J. Sejnowski, \"The independent components of natural scenes are edge filters,\" Vision \n\nResearch, vol. 37, pp. 3327-3338, 1997. \n\n[11] A. Hyvarinen and E. Oja, \"Independent component analysis: algorithms and applications,\" Neural networks, \n\nvol. 13, no. 4-5, pp. 411-430, 2000. \n\n[12] Y. Karklin and M. Lewicki, \"Emergence of complex cell properties by learning to generalize in natural \n\nscenes,\" Nature, November 2008. \n\n[13] J. Sohl-Dickstein and B. Culpepper, \"Hamiltonian annealed importance sampling for partition function \n\nestimation,\" 2011. \n\n[14] M. Lewicki and B. Olshausen, \"Probabilistic framework for the adaptation and comparison of image codes,\" \n\nJOSA A, vol. 16, no. 7, pp. 1587-1601 , 1999. \n\n[15] A. Lee, D. Mumford, and J. Huang, \"Occlusion models for natural images: A statistical study of a \nscale-invariant dead leaves model,\" International Journal of Computer Vision, vol. 41, no. 1, pp. 35-59, \n2001. \n\n[16] C. Zetzsche, E. Barth, and B. Wegmann, \"The importance of intrinsically two-dimensional image features \nin biological vision and picture coding,\" in Digital images and human vision. MIT Press, 1993, p. 138. \n[17] E. Simoncelli, \"Bayesian denoising of visual images in the wavelet domain,\" Lecture Notes in Statistics -\n\nNew York-Springer Verlag, pp. 291-308,1999. \n\n[18] D. Field, \"What is the goal of sensory coding?\" Neural computation, vol. 6, no. 4, pp. 559-601, 1994. \n[19] J. Lucke, R. Turner, M. Sahani, and M. Henniges, \"Occlusive components analysis,\" Advances in Neural \n\nInformation Processing Systems, vol. 22, pp. 1069-1077, 2009~ \n\n[20] G. Puertas, J. Bornschein, and 1. Lucke, \"The maximal causes of natural scenes are edge filters,\" in NIPS, \n\nvol. 23 , 2010,pp. 1939-1947. \n\n[21] N. Le Roux, N. Heess, J. Shotton, and J. Winn, \"Learning a generative model of images by factoring \n\nappearance and shape,\" Neural Computation, vol. 23, no. 3, pp. 593-650, 2011. \n\n9 \n\n\f", "award": [], "sourceid": 844, "authors": [{"given_name": "Daniel", "family_name": "Zoran", "institution": null}, {"given_name": "Yair", "family_name": "Weiss", "institution": null}]}