{"title": "Assignment of Multiplicative Mixtures in Natural Images", "book": "Advances in Neural Information Processing Systems", "page_first": 1217, "page_last": 1224, "abstract": null, "full_text": "Assignment of Multiplicative Mixtures in\n\nNatural Images\n\nOdelia Schwartz\n\nHHMI and Salk Institute\n\nLa Jolla, CA 92014\nodelia@salk.edu\n\nTerrence J. Sejnowski\nHHMI and Salk Institute\n\nLa Jolla, CA 92014\n\nterry@salk.edu\n\nPeter Dayan\nGCNU, UCL\n\n17 Queen Square, London\ndayan@gatsby.ucl.ac.uk\n\nAbstract\n\nIn the analysis of natural images, Gaussian scale mixtures (GSM) have\nbeen used to account for the statistics of (cid:2)lter responses, and to inspire hi-\nerarchical cortical representational learning schemes. GSMs pose a crit-\nical assignment problem, working out which (cid:2)lter responses were gen-\nerated by a common multiplicative factor. We present a new approach\nto solving this assignment problem through a probabilistic extension to\nthe basic GSM, and show how to perform inference in the model using\nGibbs sampling. We demonstrate the ef(cid:2)cacy of the approach on both\nsynthetic and image data.\n\nUnderstanding the statistical structure of natural images is an important goal for visual\nneuroscience. Neural representations in early cortical areas decompose images (and likely\nother sensory inputs) in a way that is sensitive to sophisticated aspects of their probabilistic\nstructure. This structure also plays a key role in methods for image processing and coding.\nA striking aspect of natural images that has re(cid:3)ections in both top-down and bottom-up\nmodeling is coordination across nearby locations, scales, and orientations. From a top-\ndown perspective, this structure has been modeled using what is known as a Gaussian\nScale Mixture model (GSM).1(cid:150)3 GSMs involve a multi-dimensional Gaussian (each di-\nmension of which captures local structure as in a linear (cid:2)lter), multiplied by a spatialized\ncollection of common hidden scale variables or mixer variables(cid:3) (which capture the coordi-\nnation). GSMs have wide implications in theories of cortical receptive (cid:2)eld development,\neg the comprehensive bubbles framework of Hyv\u00a4arinen.4 The mixer variables provide the\ntop-down account of two bottom-up characteristics of natural image statistics, namely the\n\u2018bowtie\u2019 statistical dependency,5,6 and the fact that the marginal distributions of receptive\n(cid:2)eld-like (cid:2)lters have high kurtosis.7,8 In hindsight, these ideas also bear a close relation-\nship with Ruderman and Bialek\u2019s multiplicative bottom-up image analysis framework9 and\nstatistical models for divisive gain control.6 Coordinated structure has also been addressed\nin other image work,10(cid:150)14 and in other domains such as speech15 and (cid:2)nance.16\nMany approaches to the unsupervised speci(cid:2)cation of representations in early cortical areas\nrely on the coordinated structure.17(cid:150)21 The idea is to learn linear (cid:2)lters (eg modeling simple\ncells as in22,23), and then, based on the coordination, to (cid:2)nd combinations of these (perhaps\nnon-linearly transformed) as a way of (cid:2)nding higher order (cid:2)lters (eg complex cells). One\ncritical facet whose speci(cid:2)cation from data is not obvious is the neighborhood arrangement,\nie which linear (cid:2)lters share which mixer variables.\n\n(cid:3)Mixer variables are also called mutlipliers, but are unrelated to the scales of a wavelet.\n\n\fHere, we suggest a method for (cid:2)nding the neighborhood based on Bayesian inference of\nthe GSM random variables. In section 1, we consider estimating these components based\non information from different-sized neighborhoods and show the modes of failure when\ninference is too local or too global. Based on these observations, in section 2 we propose\nan extension to the GSM generative model, in which the mixer variables can overlap prob-\nabilistically. We solve the neighborhood assignment problem using Gibbs sampling, and\ndemonstrate the technique on synthetic data. In section 3, we apply the technique to image\ndata.\n\n1 GSM inference of Gaussian and mixer variables\n\nIn a simple, n-dimensional, version of a GSM, (cid:2)lter responses l are synthesized y by mul-\ntiplying an n-dimensional Gaussian with values g = fg1 : : : gng, by a common mixer\nvariable v.\n(1)\nWe assume g are uncorrelated ((cid:27)2 along diagonal of the covariance matrix). For the ana-\nlytical calculations, we assume that v has a Rayleigh distribution:\n\nl = vg\n\np[v] / [v exp(cid:0)v2=2]a where 0 < a (cid:20) 1 parameterizes the strength of the prior\n\n(2)\nFor ease, we develop the theory for a = 1. As is well known,2 and repeated in (cid:2)gure 1(B),\nthe marginal distribution of the resulting GSM is sparse and highly kurtotic. The joint\nconditional distribution of two elements l1 and l2, follows a bowtie shape, with the width\nof the distribution of one dimension increasing for larger values (both positive and negative)\nof the other dimension.\nThe inverse problem is to estimate the n+1 variables g1 : : : gn; v from the n (cid:2)lter responses\nl1 : : : ln. It is formally ill-posed, though regularized through the prior distributions. Four\nposterior distributions are particularly relevant, and can be derived analytically from the\nmodel:\nrv\n\n1\n2\n\n(cid:27) )\n\n( l\nB(1(cid:0) n\n\n2 ; l\n\n(n(cid:0)2)\n\n1\n\njl1j\n(cid:27) (cid:17)\njl1j\n\ndistribution\np (cid:27)\nexp(cid:16)(cid:0) v2\nB(cid:16) 1\n2 ;\nv(cid:0)(n(cid:0)1) exp(cid:16)(cid:0) v2\n(cid:27) )\np(cid:27)jl1j\nexp(cid:16)(cid:0) g2\nB(cid:16)(cid:0) 1\n2 ;\nexp(cid:16)(cid:0) g2\n\n2 (cid:0) l2\n2v2(cid:27)2(cid:17)\n2v2(cid:27)2(cid:17)\n2 (cid:0) l2\n2(cid:27)2 (cid:0) l2\n1 (cid:17)\n1 (cid:0) l2\n1 (cid:17)\n\ng(n(cid:0)3)\n1\n\n(cid:27) (cid:17)\njl1j\n\n1\n2g2\n\n1\n2g2\n\n1\ng2\n1\n\nl2\nl2\n\n1\n\n(2(cid:0)n)\n\n(cid:27)\n\nq jl1j\n\nposterior mean\n(cid:27) (cid:17)\nB(cid:16)1;\njl1j\n(cid:27) (cid:17)\nB(cid:16) 1\njl1j\n2 ;\n(cid:27) )\nB( 3\n2 ; l\n2 (cid:0) n\nB(1(cid:0) n\n(cid:27) )\n2 ; l\n(cid:27) (cid:17)\nB(cid:16)0;\njl1j\nB(cid:16)(cid:0) 1\n(cid:27) (cid:17)\njl1j\n2 ;\n(cid:27) )\nB( n\n2 (cid:0) 1\n2 ; l\n(cid:27) )\nB( n\n2 (cid:0)1; l\n\nq l\n(cid:27)q jl1j\n\n(cid:27) q jl1j\n\n(cid:27)\n\n(cid:27)\n\np[vjl1]\n\np[vjl]\np[jg1jjl1]\n\np(cid:27)jl1j(cid:16) jl1j\nl (cid:17)\n2 (cid:0)1; l\n\nB( n\n\n1\n2\n\n(cid:27)q jl1j\n\nl\n\n(cid:27) )\n\n1\n2(cid:27)2\n\np[jg1jjl]\nwhere B(n; x) is the modi(cid:2)ed Bessel function of the second kind (see also24), l = pPi l2\nand gi is forced to have the same sign as li, since the mixer variables are always positive.\nNote that p[vjl1] and p[g1jl1] (rows 1,3) are local estimates, while p[vjl] and p[gjl] (rows\n2,4) are estimates according to (cid:2)lter outputs fl1 : : : lng. The posterior p[vjl] has also been\nestimated numerically in noise removal for other mixer priors, by Portilla et al25\nThe full GSM speci(cid:2)es a hierarchy of mixer variables. Wainwright2 considered a pre-\nspeci(cid:2)ed tree-based hierarhical arrangement. In practice, for natural sensory data, given a\nheterogeneous collection of li, it is advantageous to learn the hierachical arrangement from\nexamples. In an approach related to that of the GSM, Karklin and Lewicki19 suggested\n\ni\n\nyWe describe the l as being (cid:2)lter responses even in the synthetic case, to facilitate comparison\n\nwith images.\n\n\fA\n\nv\na\n\ng ... g\n\n1\n\n20\n\nv\n\nb\n\ng ... g\n21\n\n40\n\nMultiply\n\nMultiply\n\nl ... l\n1\n\n20\n\nl ... l\n21\n\n40\n\nB \n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\n0.1\n\n0\n-5\n\nl\n\n2\n\n0\n\nl\n\n21\n\n0\n\n5\n\n0\nl\n\n1\n\n0\n\n1\n\nl\n\n0\n\n1\n\nl\n\nC\n\nMixer\n\nActual \n\n1 filter, too local\n\n20 filters\n\n40 filters, too global\n\n0.06\n\n0.06\n\n0.06\n\n0.06\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\n0\n\n0\n\nv\na\n\nD\n\nGaussian \n\n0.06\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\n0\n-5\n\n0\ng\n1\n\n5\n\n5\n\n0\n\n0\n\n5\n\nE(v |l \n)\n1\n\na\n\n0\n\n0\nE(v | l\n\na\n\n5\n )\n\n1 .. 20\n\n0\n\n0\nE(v | l\n\na\n\n5\n )\n\n1 .. 40\n\n0.06\n\n0.06\n\n0.06\n\n0\n -5\n\n0\n\n5\n\nE(g | l \n)\n1\n\n1\n\n0\n -5\n0\nE(g |l\n\n1\n\n5\n )\n\n1 .. 20\n\n0\n\n-5\n\n0\n\n5\n\nE(g | l\n\n1\n\n )\n\n1 .. 40\n\nE Gaussian\n joint conditional\n\ng\n2\n\nE(g | l \n)\n2\n\n2\n\nE(g |l\n\n2\n\n )\n1 .. 20\n\nE(g | l\n\n2\n\n )\n\n1 .. 40\n\ng\n1\n\nE(g | l 1)\n\n1\n\nE(g |l\n\n1\n\n )\n\n1 .. 20\n\nE(g | l\n\n1\n\n )\n\n1 .. 40\n\nFigure 1: A Generative model: each (cid:2)lter response is generated by multiplying its Gaussian\nvariable by either mixer variable v(cid:11), or mixer variable v(cid:12). B Marginal and joint conditional\nstatistics (bowties) of sample synthetic (cid:2)lter responses. For the joint conditional statistics,\nintensity is proportional to the bin counts, except that each column is independently re-scaled\nto (cid:2)ll the range of intensities. C-E Left: actual distributions of mixer and Gaussian variables;\nother columns: estimates based on different numbers of (cid:2)lter responses. C Distribution of\nestimate of the mixer variable v(cid:11). Note that mixer variable values are by de(cid:2)nition positive.\nD Distribution of estimate of one of the Gaussian variables, g1. E Joint conditional statistics\nof the estimates of Gaussian variables g1 and g2.\n\ngenerating log mixer values for all the (cid:2)lters and learning the linear combinations of a\nsmaller collection of underlying values. Here, we consider the problem in terms of multiple\nmixer variables, with the linear (cid:2)lters being clustered into groups that share a single mixer.\nThis poses a critical assignment problem of working out which (cid:2)lter responses share which\nmixer variables. We (cid:2)rst study this issue using synthetic data in which two groups of (cid:2)lter\nresponses l1 : : : l20 and l21 : : : l40 are generated by two mixer variables v(cid:11) and v(cid:12) ((cid:2)gure 1).\nWe attempt to infer the components of the GSM model from the synthetic data.\nFigure 1C;D shows the empirical distributions of estimates of the conditional means of a\nmixer variable E(v(cid:11)jflg) and one of the Gaussian variables E(g1jflg) based on different\nassumed assignments. For estimation based on too few (cid:2)lter responses, the estimates do not\nwell match the actual distributions. For example, for a local estimate based on a single (cid:2)lter\nresponse, the Gaussian estimate peaks away from zero. For assignments including more\n(cid:2)lter responses, the estimates become good. However, inference is also compromised if the\nestimates for v(cid:11) are too global, including (cid:2)lter responses actually generated from v(cid:12) (C and\nD, last column). In (E), we consider the joint conditional statistics of two components, each\n\n\fA Generative model\n\nB\n\nActual \n\nv\n\na\n\n1\n\nv\na\n\nv\n\nb\n\nvv\n\ng\n\ng ... g\n\n1\n\n100\n\n0\n\n100\n1\nFilter number\n\nInferred\n\nv\n\na\n\n1\n\n0\n\nv\n\nb\n\nv\n\ng\n\n1\n\n100\n1\nFilter number\n\n0\n\n100\n1\nFilter number\n\nv\n\nb\n\n1\n\nv\n1\n\ng\n\nMultiply\n\nl ... l\n1\n\n100\n\n1\n\n0\n\n1\n100\nFilter number\n\n0\n\n100\n1\nFilter number\n\n0\n\n100\n1\nFilter number\n\nC\n\nPixel\n\nGaussian \n\n0.2\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\nl\n\n2\n\n0\n\n-20\n\n20\n\n0\n\nl\n\n1\n\n0\n\nl\n\n1\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\nGibbs fit\nassumed\n\n0.1\n\nE(g | l )\n\n2\n\n0\n\n0\n\n-4\n\n0\n\n4\n\nE(g | l )\n\n1\n\n0\n\nE(g | l )\n\n1\n\nMixer\n\n0.15\n\nGibbs fit\nassumed\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\nE(v | l )\n\nb\n\n0\n\n0\n\n15\n\n0\n\n0\n\nE(v | l )\n\na\n\nE(v | l )\n\na\n\nFigure 2: A Generative model in which each (cid:2)lter response is generated by multiplication\nof its Gaussian variable by a mixer variable. The mixer variable, v(cid:11), v(cid:12), or v(cid:13), is chosen\nprobabilistically upon each (cid:2)lter response sample, from a Rayleigh distribution with a = :1.\nB Top: actual probability of (cid:2)lter associations with v(cid:11), v(cid:12), and v(cid:13); Bottom: Gibbs estimates\nof probability of (cid:2)lter associations corresponding to v(cid:11), v(cid:12), and v(cid:13). C Statistics of generated\n(cid:2)lter responses, and of Gaussian and mixer estimates from Gibbs sampling.\n\nestimating their respective g1 and g2. Again, as the number of (cid:2)lter responses increases,\nthe estimates improve, provided that they are taken from the right group of (cid:2)lter responses\nwith the same mixer variable. Speci(cid:2)cally, the mean estimates of g1 and g2 become more\nindependent (E, third column). Note that for estimations based on a single (cid:2)lter response,\nthe joint conditional distribution of the Gaussian appears correlated rather than independent\n(E, second column); for estimation based on too many (cid:2)lter responses (40 in this example),\nthe joint conditional distribution of the Gaussian estimates shows a dependent (rather than\nindependent) bowtie shape (E, last column). Mixer variable joint statistics also deviate\nfrom the actual when the estimations are too local or global (not shown).\nWe have observed qualitatively similar statistics for estimation based on coef(cid:2)cients in\nnatural images. Neighborhood size has also been discussed in the context of the quality of\nnoise removal, assuming a GSM model.26\n\n2 Neighborhood inference: solving the assignment problem\n\nThe plots in (cid:2)gure 1 suggest that it should be possible to infer the assignments, ie work\nout which (cid:2)lter responses share common mixers, by learning from the statistics of the\nresulting joint dependencies. Hard assignment problems (in which each (cid:2)lter response\npays allegiance to just one mixer) are notoriously computationally brittle. Soft assignment\nproblems (in which there is a probabilistic relationship between (cid:2)lter responses and mixers)\nare computationally better behaved. Further, real world stimuli are likely better captured\nby the possibility that (cid:2)lter responses are coordinated in somewhat different collections in\ndifferent images.\nWe consider a richer, mixture GSM as a generative model (Figure 2). To model the genera-\ntion of (cid:2)lter responses li for a single image patch, we multiply each Gaussian variable gi by\na single mixer variable from the set v1 : : : vm. We assume that gi has association probabil-\n\n\fity pij (satisfying Pj pij = 1;8i) of being assigned to mixer variable vj. The assignments\nare assumed to be made independently for each patch. We use si 2 f1; 2; : : : mg for the\nassignments:\n(3)\n\nli = givsi\n\nInference and learning in this model proceeds in two stages, according to the expectation\nmaximization algorithm. First, given a (cid:2)lter response li, we use Gibbs sampling for the\nE phase to (cid:2)nd possible appropriate (posterior) assignments. Williams et al.27 suggested\nusing Gibbs sampling to solve a similar assignment problem in the context of dynamic tree\nmodels. Second, for the M phase, given the collection of assignments across multiple (cid:2)lter\nresponses, we update the association probabilities pij. Given sample mixer assignments,\nwe can estimate the Gaussian and mixer components of the GSM using the table of sec-\ntion 1, but restricting the (cid:2)lter response samples just to those associated with each mixer\nvariable.\nWe tested the ability of this inference method to (cid:2)nd the associations in the probabilistic\nmixer variable synthetic example shown in (cid:2)gure 2, (A,B). The true generative model spec-\ni(cid:2)es probabilistic overlap of 3 mixer variables. We generated 5000 samples for each (cid:2)lter\naccording to the generative model. We ran the Gibbs sampling procedure, setting the num-\nber of possible neighborhoods to 5 (e.g., > 3); after 500 iterations the weights converged\nnear to the proper probabilities. In (B, top), we plot the actual probability distributions\nfor the (cid:2)lter associations with each of the mixer variables. In (B, bottom), we show the\nestimated associations: the three non-zero estimates closely match the actual distributions;\nthe other two estimates are zero (not shown). The procedure consistently (cid:2)nds correct as-\nsociations even in larger examples of data generated with up to 10 mixer variables. In (C)\nwe show an example of the actual and estimated distributions of the mixer and Gaussian\ncomponents of the GSM. Note that the joint conditional statistics of both mixer and Gaus-\nsian are independent, since the variables were generated as such in the synthetic example.\nThe Gibbs procedure can be adjusted for data generated with different parameters a of\nequation 2, and for related mixers,2 allowing for a range of image coef(cid:2)cient behaviors.\n\n3\n\nImage data\n\nHaving validated the inference model using synthetic data, we turned to natural images.\nWe derived linear (cid:2)lters from a multi-scale oriented steerable pyramid,28 with 100 (cid:2)lters,\nat 2 preferred orientations, 25 non-overlapping spatial positions (with spatial subsampling\nof 8 pixels), and two phases (quadrature pairs), and a single spatial frequency peaked at 1=6\ncycles/pixel. The image ensemble is 4 images from a standard image compression database\n(boats, goldhill, plant leaves, and mountain) and 4000 samples.\nWe ran our method with the same parameters as for synthetic data, with 7 possible neigh-\nborhoods and Rayleigh parameter a = :1 (as in (cid:2)gure 2). Figure 3 depicts the association\nweights pij of the coef(cid:2)cients for each of the obtained mixer variables. In (A), we show\na schematic (template) of the association representation that will follow in (B, C) for the\nactual data. Each mixer variable neighborhood is shown for coef(cid:2)cients of two phases\nand two orientations along a spatial grid (one grid for each phase). The neighborhood is\nillustrated via the probability of each coef(cid:2)cient to be generated from a given mixer vari-\nable. For the (cid:2)rst two neighborhoods (B), we also show the image patches that yielded\nthe maximum log likelihood of P (vjpatch). The (cid:2)rst neighborhood (in B) prefers ver-\ntical patterns across most of its (cid:147)receptive (cid:2)eld(cid:148), while the second has a more localized\nregion of horizontal preference. This can also be seen by averaging the 200 image patches\nwith the maximum log likelihood. Strikingly, all the mixer variables group together two\nphases of quadrature pair (B, C). Quadrature pairs have also been extracted from cortical\ndata, and are the components of ideal complex cell models. Another tendency is to group\n\n\fA\n\nPhase 1\n\nPhase 2\n\nPhase 1\n\nPhase 2\n\n19\n\n0\n\nn\no\ni\nt\ni\ns\no\np\nY\n\n \n\n-19\n\n-19\n\n0\n\n19\nX position\n\n19\n\n0\n\nn\no\ni\nt\ni\ns\no\np\nY\n\n \n\n-19\n\n-19\n\n0\n\n19\nX position\n\nB\n\nNeighborhood\n\nExample max patches\n\nAverage\n\nC\n\nNeighborhood\n\nNeighborhood\n\nExample max patches\n\nAverage\n\nD\n\nCoefficient\n\nGaussian \n\nMixer\n\n0.25\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\n2l\n\n0\n\n-50\n\n50\n\n0\n\nl\n\n1\n\n0\nl\n\n1\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\nGibbs fit\nassumed\n\n0.12\n\nE(g | l )\n\n2\n\n0\n\n0\n-5\n\n0\n\n5\n\nE(g | l )\n\n1\n\n0\n\nE(g | l )\n\n1\n\nn\no\ni\nt\nu\nb\ni\nr\nt\ns\ni\nD\n\nGibbs fit\nassumed\n\n0.15\n\n)\n\nE(v | l )\n\nb\n\n0\n\n0\n\n15\n\n0\n\n0\n\nE(v | l )\n\na\n\nE(v | l )\n\na\n\nFigure 3: A Schematic of the mixer variable neighborhood representation. The probability\nthat each coef(cid:2)cient is associated with the mixer variable ranges from 0 (black) to 1 (white).\nLeft: Vertical and horizontal (cid:2)lters, at two orientations, and two phases. Each phase is\nplotted separately, on a 38 by 38 pixel spatial grid. Right: summary of representation, with\n(cid:2)lter shapes replaced by oriented lines. Filters are approximately 6 pixels in diameter, with\nthe spacing between (cid:2)lters 8 pixels. B First two image ensemble neighborhoods obtained\nfrom Gibbs sampling. Also shown, are four 38(cid:2)38 pixel patches that had the maximum\nlog likelihood of P (vjpatch), and the average of the (cid:2)rst 200 maximal patches. C Other\nimage ensemble neighborhoods. D Statistics of representative coef(cid:2)cients of two spatially\ndisplaced vertical (cid:2)lters, and of inferred Gaussian and mixer variables.\n\norientations across space. The phase and iso-orientation grouping bear some interesting\nsimilarity to other recent suggestions;17,18 as do the maximal patches.19 Wavelet (cid:2)lters\nhave the advantage that they can span a wider spatial extent than is possible with current\nICA techniques, and the analysis of parameters such as phase grouping is more controlled.\nWe are comparing the analysis with an ICA (cid:2)rst-stage representation, which has other ob-\nvious advantages. We are also extending the analysis to correlated wavelet (cid:2)lters;25 and to\nsimulations with a larger number of neighborhoods.\nFrom the obtained associations, we estimated the mixer and Gaussian variables according\nto our model. In (D) we show representative statistics of the coef(cid:2)cients and of the inferred\nvariables. The learned distributions of Gaussian and mixer variables are quite close to our\nassumptions. The Gaussian estimates exhibit joint conditional statistics that are roughly\nindependent, and the mixer variables are weakly dependent.\nWe have thus far demonstrated neighborhood inference for an image ensemble, but it is also\ninteresting and perhaps more intuitive to consider inference for particular images or image\nclasses. In (cid:2)gure 4 (A-B) we demonstrate example mixer variable neighborhoods derived\nfrom learning patches of a zebra image (Corel CD-ROM). As before, the neighborhoods\nare composed of quadrature pairs; however, the spatial con(cid:2)gurations are richer and have\n\n\fA\n\nNeighborhood\n\nAverage \n\nB\n\nNeighborhood\n\nAverage \n\nExample max patches\n\nTop 25 max patches\n\nExample max patches\n\nTop 25 max patches\n\nFigure 4: Example of Gibbs on Zebra image.\nImage is 151(cid:2)151 pixels, and each spa-\ntial neighborhood spans 38(cid:2)38 pixels. A, B Example mixer variable neighborhoods. Left:\nexample mixer variable neighborhood, and average of 200 patches that yielded the maxi-\nmum likelihood of P (vjpatch). Right: Image and marked on top of it example patches that\nyielded the maximum likelihood of P (vjpatch).\n\nnot been previously reported with unsupervised hierarchical methods: for example, in (A),\nthe mixture neighborhood captures a horizontal-bottom/vertical-top spatial con(cid:2)guration.\nThis appears particularly relevant in segmenting regions of the front zebra, as shown by\nmarking in the image the patches i that yielded the maximum log likelihood of P (vjpatch).\nIn (B), the mixture neighborhood captures a horizontal con(cid:2)guration, more focused on\nthe horizontal stripes of the front zebra. This example demonstrates the logic behind a\nprobabilistic mixture: coef(cid:2)cients corresponding to the bottom horizontal stripes might be\nlinked with top vertical stripes (A) or to more horizontal stripes (B).\n\n4 Discussion\n\nWork on the study of natural image statistics has recently evolved from issues about scale-\nspace hierarchies, wavelets, and their ready induction through unsupervised learning mod-\nels (loosely based on cortical development) towards the coordinated statistical structure of\nthe wavelet components. This includes bottom-up (eg bowties, hierarchical representations\nsuch as complex cells) and top-down (eg GSM) viewpoints. The resulting new insights\ninform a wealth of models and ideas and form the essential backdrop for the work in this\npaper. They also link to impressive engineering results in image coding and processing.\nA most critical aspect of an hierarchical representational model is the way that the structure\nof the hierarchy is induced. We addressed the hierarchy question using a novel extension\nto the GSM generative model in which mixer variables (at one level of the hierarchy) en-\njoy probabilistic assignments to (cid:2)lter responses (at a lower level). We showed how these\nassignments can be learned (using Gibbs sampling), and illustrated some of their attractive\nproperties using both synthetic and a variety of image data. We grounded our method (cid:2)rmly\nin Bayesian inference of the posterior distributions over the two classes of random variables\nin a GSM (mixer and Gaussian), placing particular emphasis on the interplay between the\ngenerative model and the statistical properties of its components.\nAn obvious question raised by our work is the neural correlate of the two different posterior\nvariables. The Gaussian variable has characteristics resembling those of the output of divi-\nsively normalized simple cells;6 the mixer variable is more obviously related to the output\nof quadrature pair neurons (such as orientation energy or motion energy cells, which may\nalso be divisively normalized). How these different information sources may subsequently\nbe used is of great interest.\n\n\fAcknowledgements This work was funded by the HHMI (OS, TJS) and the Gatsby Char-\nitable Foundation (PD). We are very grateful to Patrik Hoyer, Mike Lewicki, Zhaoping Li,\nSimon Osindero, Javier Portilla and Eero Simoncelli for discussion.\nReferences\n[1] D Andrews and C Mallows. Scale mixtures of normal distributions. J. Royal Stat. Soc., 36:99(cid:150)102, 1974.\n[2] M J Wainwright and E P Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In S. A. Solla, T. K.\nLeen, and K.-R. M\u00a4uller, editors, Adv. Neural Information Processing Systems, volume 12, pages 855(cid:150)861, Cambridge, MA,\nMay 2000. MIT Press.\n\n[3] M J Wainwright, E P Simoncelli, and A S Willsky. Random cascades on wavelet trees and their use in modeling and\nanalyzing natural imagery. Applied and Computational Harmonic Analysis, 11(1):89(cid:150)123, July 2001. Special issue on\nwavelet applications.\n\n[4] A Hyv\u00a4arinen, J Hurri, and J Vayrynen. Bubbles: a unifying framework for low-level statistical properties of natural image\n\nsequences. Journal of the Optical Society of America A, 20:1237(cid:150)1252, May 2003.\n\n[5] R W Buccigrossi and E P Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE\n\nTrans Image Proc, 8(12):1688(cid:150)1701, December 1999.\n\n[6] O Schwartz and E P Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8):819(cid:150)825,\n\nAugust 2001.\n\n[7] D J Field. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am.\n\nA, 4(12):2379(cid:150)2394, 1987.\n\n[8] H Attias and C E Schreiner. Temporal low-order statistics of natural sounds. In M Jordan, M Kearns, and S Solla, editors,\n\nAdv in Neural Info Processing Systems, volume 9, pages 27(cid:150)33. MIT Press, 1997.\n\n[9] D L Ruderman and W Bialek. Statistics of natural images: Scaling in the woods. Phys. Rev. Letters, 73(6):814(cid:150)817, 1994.\n[10] C Zetzsche, B Wegmann, and E Barth. Nonlinear aspects of primary vision: Entropy reduction beyond decorrelation. In\n\nInt\u2019l Symposium, Society for Information Display, volume XXIV, pages 933(cid:150)936, 1993.\n\n[11] J Huang and D Mumford. Statistics of natural images and models. In CVPR, page 547, 1999.\n[12] J. Romberg, H. Choi, and R. Baraniuk. Bayesian wavelet domain image modeling using hidden Markov trees. In Proc.\n\nIEEE Int\u2019l Conf on Image Proc, Kobe, Japan, October 1999.\n\n[13] A Turiel, G Mato, N Parga, and J P Nadal. The self-similarity properties of natural images resemble those of turbulent (cid:3)ows.\n\nPhys. Rev. Lett., 80:1098(cid:150)1101, 1998.\n\n[14] J Portilla and E P Simoncelli. A parametric texture model based on joint statistics of complex wavelet coef(cid:2)cients. Int\u2019l\n\nJournal of Computer Vision, 40(1):49(cid:150)71, 2000.\n\n[15] Helmut Brehm and Walter Stammler. Description and generation of spherically invariant speech-model signals. Signal\n\nProcessing, 12:119(cid:150)141, 1987.\n\n[16] T Bollersley, K Engle, and D Nelson. ARCH models. In B Engle and D McFadden, editors, Handbook of Econometrics V.\n\n1994.\n\n[17] A Hyv\u00a4arinen and P Hoyer. Emergence of topography and complex cell properties from natural images using extensions of\nICA. In S. A. Solla, T. K. Leen, and K.-R. M\u00a4uller, editors, Adv. Neural Information Processing Systems, volume 12, pages\n827(cid:150)833, Cambridge, MA, May 2000. MIT Press.\n\n[18] P Hoyer and A Hyv\u00a4arinen. A multi-layer sparse coding network learns contour coding from natural images. Vision Research,\n\n42(12):1593(cid:150)1605, 2002.\n\n[19] Y Karklin and M S Lewicki. Learning higher-order structures in natural images. Network: Computation in Neural Systems,\n\n14:483(cid:150)499, 2003.\n\n[20] W Laurenz and T Sejnowski. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715(cid:150)\n\n770, 2002.\n\n[21] C Kayser, W Einh\u00a4auser, O D\u00a4ummer, P K\u00a4onig, and K P K\u00a4ording. Extracting slow subspaces from natural videos leads to\ncomplex cells. In G Dorffner, H Bischof, and K Hornik, editors, Proc. Int\u2019l Conf. on Arti\ufb01cial Neural Networks (ICANN-01),\npages 1075(cid:150)1080, Vienna, Aug 2001. Springer-Verlag, Heidelberg.\n\n[22] B A Olshausen and D J Field. Emergence of simple-cell receptive (cid:2)eld properties by learning a sparse factorial code. Nature,\n\n381:607(cid:150)609, 1996.\n\n[23] A J Bell and T J Sejnowski. The \u2019independent components\u2019 of natural scenes are edge (cid:2)lters. Vision Research, 37(23):3327(cid:150)\n\n3338, 1997.\n\n[24] U Grenander and A Srivastava. Probabibility models for clutter in natural images. IEEE Trans. on Patt. Anal. and Mach.\n\nIntel., 23:423(cid:150)429, 2002.\n\n[25] J Portilla, V Strela, M Wainwright, and E Simoncelli. Adaptive Wiener denoising using a Gaussian scale mixture model in\nthe wavelet domain. In Proc 8th IEEE Int\u2019l Conf on Image Proc, pages 37(cid:150)40, Thessaloniki, Greece, Oct 7-10 2001. IEEE\nComputer Society.\n\n[26] J Portilla, V Strela, M Wainwright, and E P Simoncelli. Image denoising using a scale mixture of Gaussians in the wavelet\n\ndomain. IEEE Trans Image Processing, 12(11):1338(cid:150)1351, November 2003.\n\n[27] C K I Williams and N J Adams. Dynamic trees.\n\nIn M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Adv. Neural\n\nInformation Processing Systems, volume 11, pages 634(cid:150)640, Cambridge, MA, 1999. MIT Press.\n\n[28] E P Simoncelli, W T Freeman, E H Adelson, and D J Heeger. Shiftable multi-scale transforms. IEEE Trans Information\n\nTheory, 38(2):587(cid:150)607, March 1992. Special Issue on Wavelets.\n\n\f", "award": [], "sourceid": 2604, "authors": [{"given_name": "Odelia", "family_name": "Schwartz", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}, {"given_name": "Peter", "family_name": "Dayan", "institution": null}]}