{"title": "Reducing statistical dependencies in natural signals using radial Gaussianization", "book": "Advances in Neural Information Processing Systems", "page_first": 1009, "page_last": 1016, "abstract": "We consider the problem of efficiently encoding a signal by transforming it to a new representation whose components are statistically independent. A widely studied linear solution, independent components analysis (ICA), exists for the case when the signal is generated as a linear transformation of independent non- Gaussian sources. Here, we examine a complementary case, in which the source is non-Gaussian but elliptically symmetric. In this case, no linear transform suffices to properly decompose the signal into independent components, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We then demonstrate this methodology in the context of natural signal statistics. We first show that the joint distributions of bandpass filter responses, for both sound and images, are better described as elliptical than linearly transformed independent sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.", "full_text": "Reducing statistical dependencies in natural signals\n\nusing radial Gaussianization\n\nSiwei Lyu\n\nComputer Science Department\nUniversity at Albany, SUNY\n\nAlbany, NY 12222\n\nlsw@cs.albany.edu\n\nAbstract\n\nEero P. Simoncelli\n\nCenter for Neural Science\n\nNew York University\nNew York, NY 10003\neero@cns.nyu.edu\n\nWe consider the problem of transforming a signal to a representation in which\nthe components are statistically independent. When the signal is generated as a\nlinear transformation of independent Gaussian or non-Gaussian sources, the solu-\ntion may be computed using a linear transformation (PCA or ICA, respectively).\nHere, we consider a complementary case, in which the source is non-Gaussian\nbut elliptically symmetric. Such a source cannot be decomposed into indepen-\ndent components using a linear transform, but we show that a simple nonlinear\ntransformation, which we call radial Gaussianization (RG), is able to remove all\ndependencies. We apply this methodology to natural signals, demonstrating that\nthe joint distributions of nearby bandpass \ufb01lter responses, for both sounds and im-\nages, are closer to being elliptically symmetric than linearly transformed factorial\nsources. Consistent with this, we demonstrate that the reduction in dependency\nachieved by applying RG to either pairs or blocks of bandpass \ufb01lter responses is\nsigni\ufb01cantly greater than that achieved by PCA or ICA.\n\n1 Introduction\n\nSignals may be manipulated, transmitted or stored more e\ufb03ciently if they are transformed to a rep-\nresentation in which there is no statistical redundancy between the individual components. In the\ncontext of biological sensory systems, the e\ufb03cient coding hypothesis [1, 2] proposes that the princi-\nple of reducing redundancies in natural signals can be used to explain various properties of biological\nperceptual systems. Given a source model, the problem of deriving an appropriate transformation\nto remove statistical dependencies, based on the statistics of observed samples, has been studied for\nmore than a century. The most well-known example is principal components analysis (PCA), a lin-\near transformation derived from the second-order signal statistics (i.e., the covariance structure), that\ncan fully eliminate dependencies for Gaussian sources. Over the past two decades, a more general\nmethod, known as independent component analysis (ICA), has been developed to handle the case\nwhen the signal is sampled from a linearly transformed factorial source. ICA and related methods\nhave shown success in many applications, especially in deriving optimal representations for natural\nsignals [3, 4, 5, 6].\n\nAlthough PCA and ICA bases may be computed for nearly any source, they are only guaranteed to\neliminate dependencies when the assumed source model is correct. And even in cases where these\nmethodologies seems to produce an interesting solution, the components of the resulting represen-\ntation may be far from independent. A case in point is that of natural images, for which derived ICA\ntransformations consist of localized oriented basis functions that appear similar to the receptive \ufb01eld\ndescriptions of neurons in mammalian visual cortex [3, 5, 4]. Although dependency between the\nresponses of such linear basis functions is reduced compared to that of the original pixels, this reduc-\n\n1\n\n\fLinearly \ntransformed \nfactorial \n\nElliptical\n\nFactorial\n\nGaussian \n\nSpherical \n\nFig. 1. Venn diagram of the relationship between density models. The two circles represent the linearly\ntransformed factorial densities as assumed by the ICA methods, and elliptically symmetric densities\n(ESDs). The intersection of these two classes is the set of all Gaussian densities. The factorial densities\nform a subset of the linearly transformed factorial densities and the spherically symmetric densities\nform a subset of the ESDs.\n\ntion is only slightly more than that achieved with PCA or other bandpass \ufb01lters [7, 8]. Furthermore,\nthe responses of ICA and related \ufb01lters still exhibit striking higher-order dependencies [9, 10, 11].\n\nHere, we consider the dependency elimination problem for the class of source models known as\nelliptically symmetric densities (ESDs) [12]. For ESDs, linear transforms have no e\ufb00ect on the\ndependencies beyond second-order, and thus ICA decompositions o\ufb00er no advantage over PCA. We\nintroduce an alternative nonlinear procedure, which we call radial Gaussianization (RG). In RG,\nthe norms of whitened signal vectors are nonlinearly adjusted to ensure that the resulting output\ndensity is a spherical Gaussian, whose components are statistically independent. We \ufb01rst show that\nthe joint statistics of proximal bandpass \ufb01lter responses for natural signals (sounds and images) are\nbetter described as an ESD than linearly transformed factorial sources. Consistent with this, we\ndemonstrate that the reduction in dependency achieved by applying RG to such data is signi\ufb01cantly\ngreater than that achieved by PCA or ICA. A preliminary version of portions of this work was\ndescribed in [13].\n\n2 Elliptically Symmetric Densities\n\nThe density of a random vector x \u2208 Rd with zero mean is elliptically symmetric if it is of the form:\n\np(x) =\n\n1\n\n1\n2\n\n\u03b1|\u03a3|\n\nf \u2212\n\n1\n2\n\nxT \u03a3\u22121x! ,\n\n(1)\n\nR \u221e\n\nwhere \u03a3 is a positive de\ufb01nite matrix,\n\nf (\u00b7) is the generating function satisfying f (\u00b7) \u2265 0 and\nf (\u2212r2/2) rd\u22121 dr < \u221e, and the normalizing constant \u03b1 is chosen so that the density integrates\n0\nto one [12]. The de\ufb01nitive characteristic of an ESD is that the level sets of constant probability are\nellipsoids determined by \u03a3. In the special case when \u03a3 is a multiple of the identity matrix, the level\nsets of p(x) are hyper-spheres and the density is known as a spherically symmetric density (SSD).\nAssuming x has \ufb01nite second-order statistics, \u03a3 is a multiple of the covariance matrix, which implies\nthat any ESD can be transformed into an SSD by a PCA/whitening operation.\n\nWhen the generating function is an exponential, the resulting ESD is a zero-mean multivariate Gaus-\nsian with covariance matrix \u03a3. In this case, x can also be regarded as a linear transformation of a\nvector s containing independent unit-variance Gaussian components, as: x = \u03a3\u22121/2s. In fact, the\nGaussian is the only density that is both elliptically symmetric and linearly decomposable into inde-\npendent components [14]. In other words, the Gaussian densities correspond to the intersection of\nthe class of ESDs and the class assumed by the ICA methods. As a special case, a spherical Gaussian\nis the only spherically symmetric density that is also factorial (i.e., has independent components).\nThese relationships are illustrated in a Venn diagram in Fig. 1.\n\nApart from the special case of Gaussian densities, a linear transformation such as PCA or ICA cannot\ncompletely eliminate dependencies in the ESDs. In particular, PCA and whitening can transform\nan ESD variable to a spherically symmetric variable, xwht, but the resulting density will not be\nfactorial unless it is Gaussian. And ICA would apply an additional rotation (i.e., an orthogonal\n\n2\n\n\f(a)\n\n(b)\n\nrout\n\ng(r)\n\n(c)\n\npout(r)\n\n(d)\n\n(e)\n\npin(r)\n\nrin\n\n(f )\n\nFig. 2. Radial Gaussianization procedure for 2D data. (a,e): 2D joint densities of a spherical Gaussian\nand a non-Gaussian SSD, respectively. The plots are arranged such that a spherical Gaussian has equal-\nspaced contours. (b,f): radial marginal densities of the spherical Gaussian in (a) and the SSD in (e),\nrespectively. Shaded regions correspond to shaded annuli in (a) and (e). (c): the nonlinear mapping\nthat transforms the radii of the source to those of the spherical Gaussian. (d): log marginal densities of\nthe Gaussian in (a) and the SSD in (e), as red dashed line and green solid line, respectively.\n\nmatrix) to transform xwht to a new set of coordinates maximizing a higher-order contrast function\n(e.g., kurtosis). However, for spherically symmetric xwht, p(xwht) is invariant to rotation, and thus\nuna\ufb00ected by orthogonal transformations.\n\n3 Radial Gaussianization\n\nGiven that linear transforms are ine\ufb00ective in removing dependencies from a spherically symmetric\nvariable xwht (and hence the original ESD variable x), we need to consider non-linear mappings. As\ndescribed previously, a spherical Gaussian is the only SSD with independent components. Thus, a\nnatural solution for eliminating the dependencies in a non-Gaussian spherically symmetric xwht is to\ntransform it to a spherical Gaussian.\n\nSelecting such a non-linear mapping without any further constraint is a highly ill-posed problem.\nIt is natural to restrict to nonlinear mappings that act radially, preserving the spherical symme-\ntry. Speci\ufb01cally, one can show that the generating function of p(xwht) is completely determined\nby its radial marginal distribution: pr(r) = rd\u22121\n\u03b2 f (\u2212r2/2), where r = kxwhtk, \u0393(\u00b7) is the standard\nGamma function, and \u03b2 is the normalizing constant that ensures that the density integrates to one.\nIn the special case of a spherical Gaussian of unit variance, the radial marginal is a chi-density\nrd\u22121\n2d/2\u22121\u0393(d/2) exp(\u2212r2/2). We de\ufb01ne the radial Gaussianization\nwith d degrees of freedom: p\u03c7(r) =\n(RG) transformation as xrg = g(kxwhtk) xwht\nkxwhtk , where nonlinear function g(\u00b7) is selected to map the\nradial marginal density of xwht to the chi-density. Solving for a monotonic g(\u00b7) is a standard one-\ndimensional density-mapping problem, and the unique solution is the composition of the inverse\ncumulative density function (CDF) of p\u03c7 with the CDF of pr: g(r) = F \u22121\n\u03c7 Fr(r). A illustration of\nthe procedure is provided in Fig. 2. In practice, we can estimate Fr(r) from a histogram computed\nfrom training data, and use this to construct a numerical approximation (i.e., a look-up table) of the\ncontinuous function \u02c6g(r). Note that the accuracy of the estimated RG transformation will depend on\nthe number of data samples, but is independent of the dimensionality of the data vectors.\n\nIn summary, a non-Gaussian ESD signal can be radially Gaussianized by \ufb01rst applying PCA and\nwhitening operations to remove second-order dependency (yielding an SSD), followed by a nonlin-\near transformation that maps the radial marginal to a chi-density.\n\n4 Application to Natural Signals\n\nAn understanding of the statistical behaviors of source signals is bene\ufb01cial for many problems in\nsignal processing, and can also provide insights into the design and functionality of biological sen-\nsory systems. Gaussian signal models are widely used, because they are easily characterized and\noften lead to clean and e\ufb03cient solutions. But many naturally occurring signals exhibit striking\n\n3\n\n\fnon-Gaussian statistics, and much recent literature focuses on the problem of characterizing and\nexploiting these behaviors. Speci\ufb01cally, ICA methodologies have been used to derive linear repre-\nsentations for natural sound and image signals whose coe\ufb03cients are maximally sparse or indepen-\ndent [3, 5, 6]. These analyses generally produced basis sets containing bandpass \ufb01lters resembling\nthose used to model the early transformations of biological auditory and visual systems.\n\nDespite the success of ICA methods in providing a fundamental motivation for sensory receptive\n\ufb01elds, there are a number of simple observations that indicate inconsistencies in this interpreta-\ntion. First, the responses of ICA or other bandpass \ufb01lters exhibit striking dependencies, in which\nthe variance of one \ufb01lter response can be predicted from the amplitude of another nearby \ufb01lter re-\nsponse [10, 15]. This suggests that although the marginal density of the bandpass \ufb01lter responses are\nheavy-tailed, their joint density is not consistent with the linearly transformed factorial source model\nassumed by ICA. Furthermore, the marginal distributions of a wide variety of bandpass \ufb01lters (even\na \u201c\ufb01lter\u201d with randomly selected zero-mean weights) are all highly kurtotic [7]. This would not be\nexpected for the ICA source model: projecting the local data onto a random direction should result\nin a density that becomes more Gaussian as the neighborhood size increases, in accordance with a\ngeneralized version of the central limit theorem [16]. A recent quantitative study [8] further showed\nthat the oriented bandpass \ufb01lters obtained through ICA optimization on images lead to a surprisingly\nsmall improvement in reducing dependency relative to decorrelation methods such as PCA. Taken\ntogether, all of these observations suggest that the \ufb01lters obtained through ICA optimization repre-\nsent a \u201cshallow\u201d optimum, and are perhaps not as uniquely suited for image or sound representation\nas initially believed. Consistent with this, recently developed models for local image statistics model\nlocal groups of image bandpass \ufb01lter responses with non-Gaussian ESDs [e.g., 17, 18, 11, 19, 20].\nThese all suggest that RG might provide an appropriate means of eliminating dependencies in natu-\nral signals. Below, we test this empirically.\n\n4.1 Dependency Reduction in Natural Sounds\n\nWe \ufb01rst apply RG to natural sounds. We used sound clips from commercial CDs, which have a\nsampling frequency of 44100 Hz and typical length of 15 \u2212 20 seconds, and contents including\nanimal vocalization and recordings in natural environments. These sound clips were \ufb01ltered with a\nbandpass gammatone \ufb01lter, which are commonly used to model the peripheral auditory system [21].\nIn our experiments, analysis was based on a \ufb01lter with center frequency of 3078 Hz.\n\nShown in the top row of column (a) in Fig.3 are contour plots of the joint histograms obtained\nfrom pairs of coe\ufb03cients of a bandpass-\ufb01ltered natural sound, separated with di\ufb00erent time inter-\nvals. Similar to the empirical observations for natural images [17, 11], the joint densities are non-\nGaussian, and have roughly elliptically symmetric contours for temporally proximal pairs. Shown\nin the top row of column (b) in Fig.3 are the conditional histograms corresponding to the same pair\nof signals. The \u201cbow-tie\u201d shaped conditional distribution, which has been also observed in natural\nimages [10, 11, 15], indicates that the conditional variance of one signal depends on the value of the\nother. This is a highly non-Gaussian behavior, since the conditional variances of a jointly Gaussian\ndensity are always constant, independent of the value of the conditioning variable. For pairs that\nare distant, both the second-order correlation and the higher-order dependency become weaker. As\na result, the corresponding joint histograms show more resemblance to the factorial product of two\none-dimensional super-Gaussian densities (bottom row of column (a) in Fig.3), and the shape of the\ncorresponding conditional histograms (column (b)) is more constant, all as would be expected for\ntwo independent random variables .\n\nAs described in previous sections, the statistical dependencies in an elliptically symmetric random\nvariable can be e\ufb00ectively removed by a linear whitening operation followed by a nonlinear radial\nGaussianization, the latter being implemented as histogram transform of the radial marginal den-\nsity of the whitened signal. Shown in columns (c) and (d) in Fig.3 are the joint and conditional\nhistograms of the transformed data. First, note that when the two signals are nearby, RG is highly\ne\ufb00ective, as suggested by the roughly Gaussian joint density (equally spaced circular contours), and\nby the consistent vertical cross-sections of the conditional histogram. However, as the temporal sep-\naration between the two signals increases, the e\ufb00ects of RG become weaker (middle row, Fig. 3).\nWhen the two signals are distant (bottom row, Fig.3), they are nearly independent, and applying RG\ncan actually increase dependency, as suggested by the irregular shape of the conditional densities\n(bottom row, column (d)).\n\n4\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n0.1 msec\n(4 samples)\n\n1.5 msec\n(63 samples)\n\n3.5 msec\n(154 samples)\n\nFig. 3. Radial Gaussianization of natural sounds.\n(a): Contour plots of joint histograms of pairs\nof band-pass \ufb01lter responses of a natural sound clip. Each row corresponds to pairs with di\ufb00erent\ntemporal separation, and levels are chosen so that a spherical Gaussian density will have equally spaced\ncontours. (c) Joint histograms after whitening and RG transformation. (b,d): Conditional histograms\nof the same data shown in (a,c), computed by independently normalizing each column of the joint\nhistogram. Histogram intensities are proportional to probability, except that each column of pixels is\nindependently rescaled so that the largest probability value is displayed as white.\n\nTo quantify more precisely the dependency reduction achieved by RG, we measure the statistical\ndependency of our multivariate sources using the multi-information (MI) [22], which is de\ufb01ned as\nthe Kulback-Leibler divergence [23] between the joint distribution and the product of its marginals:\n\nI(x) = DKL(cid:0)p(x) kQk p(xk)(cid:1) = Pd\n\nk=1 H(xk) \u2212 H(x), where H(x) = R p(x) log (p(x)) dx is the dif-\nferential entropy of x, and H(xk) denotes the di\ufb00erential entropy of the kth component of x. As\na measure of statistical dependency among the elements of x, MI is non-negative, and is zero if\nand only if the components of x are mutually independent. Furthermore, MI is invariant to any\ntransformation on individual components of x (e.g., element-wise rescaling).\n\nTo compare the e\ufb00ect of di\ufb00erent dependency reduction methods, we estimated the MI of pairs of\nbandpass \ufb01lter responses with di\ufb00erent temporal separations. This is achieved with a non-parametric\n\u201cbin-less\u201d method based on the order statistics [24], which alleviates the strong bias and variance\nintrinsic to the more traditional binning (i.e., \u201cplug-in\u201d) estimators. It is especially e\ufb00ective in this\ncase, where the data dimensionality is two. We computed the MI for each pair of raw signals, as well\nas pairs of the PCA, ICA and RG transformed signals. The ICA transformation was obtained using\nRADICAL [25], an algorithm that directly optimizes the MI using a smoothed grid search over a\nnon-parametric estimate of entropy.\n\nThe results, averaged over all 10 sounds, are plotted in Fig. 4. First, we note that PCA produces a\nrelatively modest reduction in MI: roughly 20% for small separations, decreasing gradually as the\nseparation increase. We also see that ICA o\ufb00ers very little additional reduction over PCA for small\nseparations. In contrast, the nonlinear RG transformation achieves an impressive reduction (nearly\n100%) in MI for pairs separated by less than 0.5 msec. This can be understood by considering the\njoint and conditional histograms in Fig. 3. Since the joint density of nearby pairs is approximately\nelliptically symmetric, ICA cannot provide much improvement beyond what is obtained with PCA,\nwhile RG is expected to perform well. On the other hand, the joint densities of more distant pairs\n(beyond 2.5 msec) are roughly factorial, as seen in the bottom row of Fig. 3. In this case, neither\nPCA nor ICA is e\ufb00ective in further reducing dependency, as is seen in the plots of Fig. 4, but RG\nmakes the pairs more dependent, as indicated by an increase in MI above that of the original pairs\nfor separation over 2.5 msec. This is a direct result of the fact that the data do not adhere to the\nelliptically symmetric source model assumptions underlying the RG procedure. For intermediate\nseparations (0.2 to 2 msec), there is a transition of the joint densities from elliptically symmetric\nto factorial (second row in Fig. 3), and ICA is seen to o\ufb00er a modest improvement over PCA. We\n\n5\n\n\f \n\n0.5\n\nraw\npca/ica\nrg\n\n)\nf\nf\n\ne\no\nc\n/\ns\nt\ni\n\nb\n(\n \nI\n\nM\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \r\n1\n\n \n\nraw\npca/ica\nrg\n\n2\n\n4\n\n8\n\n16\n\n32\n\nseparation (samples)\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n)\nf\nf\n\ne\no\nc\n/\ns\nt\ni\n\nb\n(\n \nI\n\nM\n\n0\n \r\n0.1\n\n0.5\n\n1\nseparation (msec)\n\n1.5 2 2.5 3.5\n\nFig. 4. Left: Multi-information (in bits/coe\ufb03cient) for pairs of bandpass \ufb01lter responses of natural\naudio signals, as a function of temporal separation. Shown are the MI of the raw \ufb01lter response pairs,\nas well as the MI of the pairs transformed with PCA, ICA, and RG. Results are averaged over 10\nnatural sound signals. Right: Same analysis for pairs of bandpass \ufb01lter responses averaged over 8\nnatural images.\n\nblk size = 3x3\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\nblk size = 7x7\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\nblk size = 15x15\n\n1.3\n\n1.2\n\n1.1\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.2\n\n0.3\n\n0.4\n3 \u00d7 3\n\n0.5\n\n0.6\n\n0.4\n\n0.5\n\n0.7\n\n0.8\n\n0.9\n\n0.6\n\n0.7\n\n0.6\n7 \u00d7 7\n\n0.8\n0.9\n15 \u00d7 15\n\n1\n\n1.1\n\nFig. 5. Reduction of MI (bits/pixel) achieved with ICA and RG transforms, compared to that achieved\nwith PCA, for pixel blocks of various sizes. The x-axis corresponds to \u2206Ipca. Pluses denotes \u2206Irg, and\ncircles denotes \u2206Iica. Each plotted symbol corresponds to the result from one image in our test set.\n\nfound qualitatively similar behaviors (right column in Fig. 4) when analyzing pairs of bandpass \ufb01lter\nresponses of natural images using the data sets described in the next section.\n\n4.2 Dependency Reduction in Natural Images\n\nWe have also examined the ability of RG to reduce dependencies of image pixel blocks with lo-\ncal mean removed. We examined eight images of natural woodland scenes from the van Hateren\ndatabase [26]. We extracted the central 1024 \u00d7 1024 region from each, computed the log of the in-\ntensity values, and then subtracted the local mean [8] by convolving with an isotropic bandpass \ufb01lter\nthat captures an annulus of frequencies in the Fourier domain ranging from \u03c0/4 to \u03c0 radians/pixel.\nWe denote blocks taken from these bandpass \ufb01ltered images as xraw. These blocks were then trans-\nformed with PCA (denoted xpca), ICA (denoted xica) and RG (denoted xrg). These block data are\nof signi\ufb01cantly higher dimension than the \ufb01lter response pairs examined in the previous section.\nFor this reason, we switched our ICA computations from RADICAL to the more e\ufb03cient FastICA\nalgorithm [27], with a contrast function g(u) = 1 \u2212 exp(\u2212u2) and using the symmetric approach for\noptimization.\n\nWe would like to compare the dependency reduction performance of each of these methods using\nmulti-information. However, direct estimation of MI becomes di\ufb03cult and less accurate with higher\ndata dimensionality. Instead, as in [8], we can avoid direct estimation of MI by evaluating and\ncomparing the di\ufb00erences in MI of the various transformed blocks relative to xraw. Speci\ufb01cally, we\nuse \u2206Ipca = I(xraw) \u2212 I(xpca) as a reference value, and compare this with \u2206Iica = I(xraw) \u2212 I(xica) and\n\u2206Irg = I(xraw) \u2212 I(xrg). Full details of this computation are described in [13].\n\n6\n\n\fShown in Fig.5 are scatter plots of \u2206Ipca versus \u2206Iica (red circles) and \u2206Irg (blue pluses) for various\nblock sizes. Each point corresponds to MI computation over blocks from one of eight bandpass-\n\ufb01ltered test images. As the \ufb01gure shows, RG achieves signi\ufb01cant reduction in MI for most images,\nand this holds over a range of block sizes, whereas ICA shows only a very small improvement\nrelative to PCA1. We again conclude that ICA does not o\ufb00er much advantage over second-order\ndecorrelation algorithms such as PCA, while RG o\ufb00ers signi\ufb01cant improvements. These results may\nbe attributed to the fact that the joint density for local pixel blocks tend to be close to be elliptically\nsymmetric [17, 11].\n\n5 Conclusion\n\nWe have introduced a new signal transformation known as radial Gaussianization (RG), which can\neliminate dependencies of sources with elliptically symmetric densities. Empirically, we have shown\nthat RG transform is highly e\ufb00ective at removing dependencies between pairs of samples in band-\npass \ufb01ltered sounds and images, and within local blocks of bandpass \ufb01ltered images.\n\nOne important issue underlying our development of this methodology is the intimate relation be-\ntween source models and dependency reduction methods. The class of elliptically symmetric densi-\nties represents a generalization of the Gaussian family that is complementary to the class of linearly\ntransformed factorial densities (see Fig. 1). The three dependency reduction methods we have dis-\ncussed (PCA, ICA and RG) are each associated with one of these classes, and are each guaranteed\nto produce independent responses when applied to signals drawn from a density belonging to the\ncorresponding class. But applying one of these methods to a signal with an incompatible source\nmodel may not achieve the expected reduction in dependency (e.g., applying ICA to an ESD), and\nin some cases can even increase dependencies (e.g., applying RG to a factorial density).\n\nSeveral recently published methods are related to RG. An iterative Gaussianization scheme trans-\nforms any source model to a spherical Gaussian by alternating between linear ICA transformations\nand nonlinear histogram matching to map marginal densities to Gaussians [28]. However, in gen-\neral, the overall transformation of iterative Gaussianization is an alternating concatenation of many\nlinear/nonlinear transformations, and results in a substantial distortion of the original source space.\nFor the special case of ESDs, RG provides a simple one-step procedure with minimal distortion.\nAnother nonlinear transform that has also been shown to be able to reduce higher-order dependen-\ncies in natural signals is divisive normalization [15]. In the extended version of this paper [13], we\nshow that there is no ESD source model for whose dependencies can be completely eliminated by\ndivisive normalization. On the other hand, divisive normalization provides a rough approximation\nto RG, which suggests that RG might provide a more principled justi\ufb01cation for normalization-like\nnonlinear behaviors seen in biological sensory systems.\n\nThere are a number of extensions of RG that are worth considering in the context of signal repre-\nsentation. First, we are interested in speci\ufb01c sub-families of ESD for which the nonlinear mapping\nof signal amplitudes in RG may be expressed in closed form. Second, the RG methodology pro-\nvides a solution to the e\ufb03cient coding problem for ESD signals in the noise-free case, and it is\nworthwhile to consider how the solution would be a\ufb00ected by the presence of sensor and/or chan-\nnel noise. Third, we have shown that RG substantially reduces dependency for nearby samples of\nbandpass \ufb01ltered image/sound, but that performance worsens as the coe\ufb03cients become more sep-\narated, where their joint densities are closer to factorial. Recent models of natural images [29, 30]\nhave used Markov random \ufb01elds based on local elliptically symmetric models, and these are seen to\nprovide a natural transition of pairwise joint densities from elliptically symmetric to factorial. We\nare currently exploring extensions of the RG methodology to such global models. And \ufb01nally, we\nare currently examining the statistics of signals after local RG transformations, with the expectation\nthat remaining statistical regularities (e.g., orientation and phase dependencies in images) can be\nstudied, modeled and removed with additional transformations.\n\nReferences\n[1] F Attneave. Some informational aspects of visual perception. Psych. Rev., 61:183\u2013193, 1954.\n\n1Similar results for the comparison of ICA to PCA were obtained with a slightly di\ufb00erent method of remov-\n\ning the mean values of each block [8].\n\n7\n\n\f[2] H B Barlow. Possible principles underlying the transformation of sensory messages. In W A Rosenblith,\n\neditor, Sensory Communication, pages 217\u2013234. MIT Press, Cambridge, MA, 1961.\n\n[3] B A Olshausen and D J Field. Emergence of simple-cell receptive \ufb01eld properties by learning a sparse\n\ncode for natural images. Nature, 381:607\u2013609, 1996.\n\n[4] A van der Schaaf and J H van Hateren. Modelling the power spectra of natural images: Statistics and\n\ninformation. Vision Research, 28(17):2759\u20132770, 1996.\n\n[5] A J Bell and T J Sejnowski. The \u2019independent components\u2019 of natural scenes are edge \ufb01lters. Vision\n\nResearch, 37(23):3327\u20133338, 1997.\n\n[6] M S Lewicki. E\ufb03cient coding of natural sounds. Nature Neuroscience, 5(4):356\u2013363, 2002.\n[7] R. Baddeley. Searching for \ufb01lters with \u201cinteresting\u201d output distributions: an uninteresting direction to\n\nexplore. Network, 7:409\u2013421, 1996.\n\n[8] Matthias Bethge. Factorial coding of natural images: how e\ufb00ective are linear models in removing higher-\n\norder dependencies? J. Opt. Soc. Am. A, 23(6):1253\u20131268, 2006.\n\n[9] B Wegmann and C Zetzsche. Statistical dependence between orientation \ufb01lter outputs used in an human\nvision based image code. In Proc Visual Comm. and Image Processing, volume 1360, pages 909\u2013922,\nLausanne, Switzerland, 1990.\n\n[10] E P Simoncelli. Statistical models for images: Compression, restoration and synthesis. In Proc 31st Asilo-\nmar Conf on Signals, Systems and Computers, volume 1, pages 673\u2013678, Paci\ufb01c Grove, CA, November\n2-5 1997. IEEE Computer Society.\n\n[11] M J Wainwright and E P Simoncelli. Scale mixtures of Gaussians and the statistics of natural im-\nages. In S. A. Solla, T. K. Leen, and K.-R. M\u00a8uller, editors, Adv. Neural Information Processing Systems\n(NIPS*99), volume 12, pages 855\u2013861, Cambridge, MA, May 2000. MIT Press.\n\n[12] K.T. Fang, S. Kotz, and K.W. Ng. Symmetric Multivariate and Related Distributions. Chapman and Hall,\n\nLondon, 1990.\n\n[13] S. Lyu and E. P. Simoncelli. Nonlinear extraction of \u201cindependent components\u201d of elliptically symmet-\nric densities using radial Gaussianization. Technical Report TR2008-911, Computer Science Technical\nReport, Courant Inst. of Mathematical Sciences, New York University, April 2008.\n\n[14] D. Nash and M. S. Klamkin. A spherical characterization of the normal distribution. Journal of Multi-\n\nvariate Analysis, 55:56\u2013158, 1976.\n\n[15] O Schwartz and E P Simoncelli. Natural signal statistics and sensory gain control. Nature Neuroscience,\n\n4(8):819\u2013825, August 2001.\n\n[16] William Feller. An Introduction to Probability Theory and Its Applications, volume 1. Wiley, January\n\n1968.\n\n[17] C Zetzsche and G Krieger. The atoms of vision: Cartesian or polar? J. Opt. Soc. Am. A, 16(7), July 1999.\n[18] J. Huang and D. Mumford. Statistics of natural images and models. In IEEE International Conference on\n\nComputer Vision and Pattern Recognition (CVPR), 1999.\n\n[19] A Srivastava, X Liu, and U Grenander. Universal analytical forms for modeling image probability. IEEE\n\nPat. Anal. Mach. Intell., 24(9):1200\u20131214, Sep 2002.\n\n[20] Y. Teh, M. Welling, and S. Osindero. Energy-based models for sparse overcomplete representations.\n\nJournal of Machine Learning Research, 4:1235\u20131260, 2003.\n\n[21] P I M Johannesma. The pre-response stimulus ensemble of neurons in the cochlear nucleus. In Symposium\n\non Hearing Theory (IPO), pages 58\u201369, Eindhoven, Holland, 1972.\n\n[22] M. Studeny and J. Vejnarova. The multiinformation function as a tool for measuring stochastic depen-\ndence. In M. I. Jordan, editor, Learning in Graphical Models, pages 261\u2013297. Dordrecht: Kluwer., 1998.\n\n[23] T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006.\n[24] A. Kraskov, H. St\u00a8ogbauer, and P. Grassberger. Estimating mutual information. Phys. Rev. E, 69(6):66\u201382,\n\nJun 2004.\n\n[25] E. G. Learned-Miller and J. W. Fisher. ICA using spacings estimates of entropy. Journal of Machine\n\nLearning Research, 4(1):1271\u20131295, 2000.\n\n[26] J H van Hateren and A van der Schaaf. Independent component \ufb01lters of natural images compared with\n\nsimple cells in primary visual cortex. Proc. R. Soc. Lond. B, 265:359\u2013366, 1998.\n\n[27] A. Hyv\u00a8arinen. Fast and robust \ufb01xed-point algorithms for independent component analysis. IEEE Trans-\n\nactions on Neural Networks, 10(3):626\u2013634, 1999.\n\n[28] Scott Saobing Chen and Ramesh A. Gopinath. Gaussianization.\n\nSystems (NIPS), pages 423\u2013429, 2000.\n\nIn Advances in Neural Computation\n\n[29] S. Roth and M. Black. Fields of experts: A framework for learning image priors. In IEEE Conference on\n\nComputer Vision and Patten Recognition (CVPR), volume 2, pages 860\u2013867, 2005.\n\n[30] S Lyu and E P Simoncelli. Statistical modeling of images with \ufb01elds of Gaussian scale mixtures.\n\nIn\nB Sch\u00a8olkopf, J Platt, and T Hofmann, editors, Adv. Neural Information Processing Systems 19, volume 19,\nCambridge, MA, May 2007. MIT Press.\n\n8\n\n\f", "award": [], "sourceid": 120, "authors": [{"given_name": "Siwei", "family_name": "Lyu", "institution": null}, {"given_name": "Eero", "family_name": "Simoncelli", "institution": null}]}