{"title": "Emergence of Topography and Complex Cell Properties from Natural Images using Extensions of ICA", "book": "Advances in Neural Information Processing Systems", "page_first": 827, "page_last": 833, "abstract": null, "full_text": "Emergence of Topography and Complex \n\nCell Properties from Natural Images \n\nusing Extensions of ICA \n\nAapo Hyviirinen and Patrik Hoyer \n\nNeural Networks Research Center \nHelsinki University of Technology \n\nP.O. Box 5400, FIN-02015 HUT, Finland \n\naapo.hyvarinen~hut.fi, patrik.hoyer~hut.fi \n\nhttp://www.cis.hut.fi/projects/ica/ \n\nAbstract \n\nIndependent component analysis of natural images leads to emer(cid:173)\ngence of simple cell properties, Le. linear filters that resemble \nwavelets or Gabor functions. \nIn this paper, we extend ICA to \nexplain further properties of VI cells. First, we decompose natural \nimages into independent subspaces instead of scalar components. \nThis model leads to emergence of phase and shift invariant fea(cid:173)\ntures, similar to those in VI complex cells. Second, we define a \ntopography between the linear components obtained by ICA. The \ntopographic distance between two components is defined by their \nhigher-order correlations, so that two components are close to each \nother in the topography if they are strongly dependent on each \nother. This leads to simultaneous emergence of both topography \nand invariances similar to complex cell properties. \n\n1 \n\nIntroduction \n\nA fundamental approach in signal processing is to design a statistical generative \nmodel of the observed signals. Such an approach is also useful for modeling the \nproperties of neurons in primary sensory areas. The basic models that we consider \nhere express a static monochrome image J (x, y) as a linear superposition of some \nfeatures or basis functions bi (x, y): \n\nn \n\nJ(x, y) = 2: bi(x, Y)Si \n\ni=l \n\n(1) \n\nwhere the Si are stochastic coefficients, different for each image J(x, y). Estimation \nof the model in Eq. (1) consists of determining the values of Si and bi(x, y) for all i \nand (x, y), given a sufficient number of observations of images, or in practice, image \npatches J(x,y). We restrict ourselves here to the basic case where the bi(x,y) form \nan invertible linear system. Then we can invert Si =< Wi, J > where the Wi denote \nthe inverse filters, and < Wi, J >= L.x,y Wi(X, y)J(x, y) denotes the dot-product. \n\n\f828 \n\nA. Hyviirinen and P Hoyer \n\nThe Wi (x, y) can then be identified as the receptive fields of the model simple cells, \nand the Si are their activities when presented with a given image patch I(x, y). \nIn the basic case, we assume that the Si are nongaussian, and mutually independent. \nThis type of decomposition is called independent component analysis (ICA) [3, 9, \n1, 8], or sparse coding [13]. Olshausen and Field [13] showed that when this model \nis estimated with input data consisting of patches of natural scenes, the obtained \nfilters Wi(X,y) have the three principal properties of simple cells in VI: they are \nlocalized, oriented, and bandpass (selective to scale/frequency). Van Hateren and \nvan der Schaaf [15] compared quantitatively the obtained filters Wi(X, y) with those \nmeasured by single-cell recordings of the macaque cortex, and found a good match \nfor most of the parameters. \nWe show in this paper that simple extensions of the basic ICA model explain emer(cid:173)\ngence of further properties of VI cells: topography and the invariances of complex \ncells. Due to space limitations, we can only give the basic ideas in this paper. More \ndetails can be found in [6, 5, 7]. \nFirst, using the method of feature subspaces [11], we model the response of a com(cid:173)\nplex cell as the norm of the projection of the input vector (image patch) onto a \nlinear subspace, which is equivalent to the classical energy models. Then we maxi(cid:173)\nmize the independence between the norms of such projections, or energies. Thus we \nobtain features that are localized in space, oriented, and bandpass, like those given \nby simple cells, or Gabor analysis. In contrast to simple linear filters, however, the \nobtained feature subspaces also show emergence of phase invariance and (limited) \nshift or translation invariance. Maximizing the independence, or equivalently, the \nsparseness of the norms of the projections to feature subspaces thus allows for the \nemergence of exactly those invariances that are encountered in complex cells. \nSecond, we extend this model of independent subspaces so that we have overlapping \nsubspaces, and every subspace corresponds to a neighborhood on a topographic grid. \nThis is called topographic ICA, since it defines a topographic organization between \ncomponents. Components that are far from each other on the grid are independent, \nlike in ICA. In contrast, components that are near to each other are not independent: \nthey have strong higher-order correlations. This model shows emergence of both \ncomplex cell properties and topography from image data. \n\n2 \n\nIndependent subspaces as complex cells \n\nIn addition to the simple cells that can be modelled by basic ICA, another important \nclass of cells in VI is complex cells. The two principal properties that distinguish \ncomplex cells from simple cells are phase invariance and (limited) shift invariance. \nThe purpose of the first model in this paper is to explain the emergence of such \nphase and shift invariant features using a modification of the ICA model. The \nmodification is based on combining the principle of invariant-feature subspaces [11] \nand the model of multidimensional independent component analysis [2]. \n\nInvariant feature subspaces. The principle of invariant-feature subspaces \nstates that one may consider an invariant feature as a linear subspace in a feature \nspace. The value of the invariant, higher-order feature is given by (the square of) the \nnorm of the projection of the given data point on that subspace, which is typically \nspanned by lower-order features. A feature subspace, as any linear subspace, can \nalways be represented by a set of orthogonal basis vectors, say Wi(X, y), i = 1, ... , m, \nwhere m is the dimension of the subspace. Then the value F(I) of the feature F \nwith input vector I(x, y) is given by F(I) = L::l < Wi, I >2, where a square root \n\n\fEmergence of VI properties using Extensions of leA \n\n829 \n\nmight be taken. In fact, this is equivalent to computing the distance between the \ninput vector I (X, y) and a general linear combination of the basis vectors (filters) \nWi(X, y) of the feature subspace [11]. In [11], it was shown that this principle, when \ncombined with competitive learning techniques, can lead to emergence of invariant \nimage features. \n\nMultidimensional independent component analysis. \nIn multidimensional \nindependent component analysis [2] (see also [12]), a linear generative model as in \nEq. (1) is assumed. In contrast to ordinary leA, however, the components (re(cid:173)\nsponses) Si are not assumed to be all mutually independent. Instead, it is assumed \nthat the Si can be divided into couples, triplets or in general m-tuples, such that \nthe Si inside a given m-tuple may be dependent on each other, but dependencies \nbetween different m-tuples are not allowed. Every m-tuple of Si corresponds to m \nbasis vectors bi(x, y). The m-dimensional probability densities inside the m-tuples \nof Si is not specified in advance in the general definition of multidimensional leA [2]. \nIn the following, let us denote by J the number of independent feature subspaces, \nand by Sj,j = 1, ... , J the set of the indices of the Si belonging to the subspace of \nindex j . \n\nInvariant-feature subspaces can be embedded \n\nIndependent feature subspaces. \nin multidimensional independent component analysis by considering probability dis(cid:173)\ntributions for the m-tuples of Si that are spherically symmetric, i.e. depend only \non the norm. In other words, the probability density Pj (.) of the m-tuple with \nindex j E {1, ... , J}, can be expressed as a function of the sum of the squares of the \nsi,i E Sj only. For simplicity, we assume further that the Pj(') are equal for all j, \ni.e. for all subspaces. \nAssume that the data consists of K observed image patches I k (x, y), k = 1, ... , K. \nThen the logarithm of the likelihood L of the data given the model can be expressed \nas \n\n10gL(wi(x, y), i = L.n) = L L 10gp(L < Wi, h >2) + Klog I det WI \n\n(2) \n\nK \n\nJ \n\nk=1 j=1 \n\niESj \n\nwhere P(LiESj sT) = pj(si,i E Sj) gives the probability density inside the j-th \nm-tuple of Si, and W is a matrix containing the filters Wi(X, y) as its columns. \nAs in basic leA, prewhitening of the data allows us to consider the Wi(X, y) to be \northonormal, and this implies that log I det WI is zero [6]. Thus we see that the \nlikelihood in Eq. (2) is a function of the norms of the projections of Ik(x,y) on \nthe subspaces indexed by j, which are spanned by the orthonormal basis sets given \nby Wi(X, y), i E Sj. Since the norm of the projection of visual data on practically \nany subspace has a supergaussian distribution, we need to choose the probability \ndensity P in the model to be sparse [13], i.e. supergaussian [8]. For example, we \ncould use the following probability distribution \n\nlogp( L st) = -O:[L s~11/2 + {3, \n\n(3) \n\niESj \n\niESj \n\nwhich could be considered a multi-dimensional version of the exponential distribu(cid:173)\ntion. Now we see that the estimation of the model consists of finding subspaces \nsuch that the norms of the projections of the (whitened) data on those subspaces \nhave maximally sparse distributions. \n\nThe introduced \"independent (feature) subspace analysis\" is a natural generalization \nof ordinary leA. In fact, if the projections on the subspaces are reduced to dot(cid:173)\nproducts, i.e. projections on 1-D subs paces , the model reduces to ordinary leA \n\n\f830 \n\nA. Hyviirinen and P. Hoyer \n\n(provided that, in addition, the independent components are assumed to have non(cid:173)\nskewed distributions). It is to be expected that the norms of the projections on \nthe subspaces represent some higher-order, invariant features. The exact nature of \nthe invariances has not been specified in the model but will emerge from the input \ndata, using only the prior information on their independence. \n\nWhen independent subspace analysis is applied to natural image data, we can iden(cid:173)\ntify the norms of the projections (2:iESj st)1/2 as the responses of the complex \ncells. If the individual filter vectors Wi(X, y) are identified with the receptive fields \nof simple cells, this can be interpreted as a hierarchical model where the complex \ncell response is computed from simple cell responses Si, in a manner similar to the \nclassical energy models for complex cells. Experiments (see below and [6]) show \nthat the model does lead to emergence of those invariances that are encountered in \ncomplex cells. \n\n3 Topographic leA \n\nThe independent subspace analysis model introduces a certain dependence structure \nfor the components Si. Let us assume that the distribution in the subspace is sparse, \nwhich means that the norm of the projection is most of the time very near to zero. \nThis is the case, for example, if the densities inside the subspaces are specified as \nin (3). Then the model implies that two components Si and Sj that belong to the \nsame subspace tend to be nonzero simultaneously. In other words, s; and S] are \npositively correlated. This seems to be a preponderant structure of dependency in \nmost natural data. For image data, this has also been noted by Simoncelli [14). \nNow we generalize the model defined by (2) so that it models this kind of depen(cid:173)\ndence not only inside the m-tuples, but among all ''neighboring'' components. A \nneighborhood relation defines a topographic order [10). (A different generalization \nbased on an explicit generative model is given in [5].) We define the model by the \nfollowing likelihood: \n\n10gL(wi(x,y),i = 1, ... ,n) = LLG(Lh(i,j) < Wi,h >2) +KlogldetWI (4) \n\nK \n\nn \n\nn \n\nk=I j=l \n\ni=l \n\nHere, h(i, j) is a neighborhood function, which expresses the strength of the con(cid:173)\nnection between the i-th and j-th units. The neighborhood function can be defined \nin the same way as with the self-organizing map [10). Neighborhoods can thus be \ndefined as one-dimensional or two-dimensional; 2-D neighborhoods can be square \nor hexagonal. A simple example is to define a 1-D neighborhood relation by \n\nh(i,j) = {I, \n\nif Ii - ~I ~ m \n\n0, otherwIse. \n\n(5) \n\nThe constant m defines here the width of the neighborhood. \nThe function G has a similar role as the log-density of the independent components \nin classic ICA. For image data, or other data with a sparse structure, G should be \nchosen as in independent subspace analysis, see Eq. (3). \n\nProperties of the topographic leA model. Here, we consider for simplicity \nonly the case of sparse data. The first basic property is that all the components Si are \nuncorrelated, as can be easily proven by symmetry arguments [5]. Moreover, their \nvariances can be defined to be equal to unity, as in classic ICA. Second, components \nSi and S j that are near to each other, Le. such that h( i, j) is significantly non-zero, \n\n\fEmergence oj VI properties using Extensions oj leA \n\n831 \n\ntend to be active (non-zero) at the same time. In other words, their energies sf \nand s; are positively correlated. Third, latent variables that are far from each \nother are practically independent. Higher-order correlation decreases as a function \nof distance, assuming that the neighborhood is defined in a way similar to that in \n(5). For details, see [5]. \n\nLet us note that our definition of topography by higher-order correlations is very \ndifferent from the one used in practically all existing topographic mapping methods. \nUsually, the distance is defined by basic geometrical relations like Euclidean distance \nor correlation. Interestingly, our principle makes it possible to define a topography \neven among a set of orthogonal vectors whose Euclidean distances are all equal. \nSuch orthogonal vectors are actually encountered in leA, where the basis vectors \nand filters can be constrained to be orthogonal in the whitened space. \n\n4 Experiments with natural image data \n\nWe applied our methods on natural image data. The data was obtained by taking \n16 x 16 pixel image patches at random locations from monochrome photographs \ndepicting wild-life scenes (animals, meadows, forests, etc.). Preprocessing consisted \nof removing the De component and reducing the dimension of the data to 160 by \npeA. For details on the experiments, see [6, 5]. \nFig. 1 shows the basis vectors of the 40 feature subspaces (complex cells), when \nIt can be seen that the basis vectors \nsubspace dimension was chosen to be 4. \nassociated with a single complex cell all have approximately the same orientation \nand frequency. Their locations are not identical, but close to each other. The phases \ndiffer considerably. Every feature subspace can thus be considered a generalization \nof a quadrature-phase filter pair as found in the classical energy models, enabling \nthe cell to be selective to some given orientation and frequency, but invariant to \nphase and somewhat invariant to shifts. Using 4 dimensions instead of 2 greatly \nenhances the shift invariance of the feature subspace. \nIn topographic leA, the neighborhood function was defined so that every neighbor(cid:173)\nhood consisted of a 3 x 3 square of 9 units on a 2-D torus lattice [10]. The obtained \nbasis vectors, are shown in Fig. 2. The basis vectors are similar to those obtained \nby ordinary leA of image data [13, 1]. In addition, they have a clear topographic \norganization. In addition, the connection to independent subspace analysis is clear \nfrom Fig. 2. Two neighboring basis vectors in Fig. 2 tend to be of the same orienta(cid:173)\ntion and frequency. Their locations are near to each other as well. In contrast, their \nphases are very different. This means that a neighborhood of such basis vectors, i.e. \nsimple cells, is similar to an independent subspace. Thus it functions as a complex \ncell. This was demonstrated in detail in [5]. \n\n5 Discussion \n\nWe introduced here two extensions of leA that are especially useful for image \nmodelling. The first model uses a subspace representation to model invariant fea(cid:173)\ntures. It turns out that the independent subspaces of natural images are similar \nto complex cells. The second model is a further extension of the independent sub(cid:173)\nspace model. This topographic leA model is a generative model that combines \ntopographic mapping with leA. As in all topographic mappings, the distance in \nthe representation space (on the topographic \"grid\") is related to some measure of \ndistance between represented components. In topographic leA, the distance be(cid:173)\ntween represented components is defined by higher-order correlations, which gives \n\n\f832 \n\nA. Hyviirinen and P Hoyer \n\nthe natural distance measure in the context of leA. \n\nAn approach closely related to ours is given by Kohonen's Adaptive Subspace Self(cid:173)\nOrganizing Map [11). However, the emergence of shift invariance in [11) was condi(cid:173)\ntional to restricting consecutive patches to come from nearby locations in the image, \ngiving the input data a temporal structure like in a smoothly changing image se(cid:173)\nquence. Similar developments were given by F6ldiak [4). In contrast to these two \ntheories, we formulated an explicit image model. This independent subspace analy(cid:173)\nsis model shows that emergence of complex cell properties is possible using patches \nat random, independently selected locations, which proves that there is enough in(cid:173)\nformation in static images to explain the properties of complex cells. Moreover, by \nextending this subspace model to model topography, we showed that the emergence \nof both topography and complex cell properties can be explained by a single principle: \nneighboring cells should have strong higher-order correlations. \n\nReferences \n[1] A.J. Bell and T.J. Sejnowski. The 'independent components' of natural scenes are \n\nedge filters. Vision Research, 37:3327-3338, 1997. \n\n[2] J.-F. Cardoso. Multidimensional independent component analysis. In Proc. IEEE Int. \nConf. on Acoustics, Speech and Signal Processing (ICASSP'98), Seattle, WA, 1998. \n[3] P. Comon. Independent component analysis - a new concept? Signal Processing, \n\n36:287-314, 1994. \n\n[4] P. Foldiak. Learning invariance from transformation sequences. Neural Computation, \n\n3:194-200, 1991. \n\n[5] A. Hyvarinen and P. O. Hoyer. Topographic independent component analysis. 1999. \n\nSubmitted, available at http://www.cis.hut.firaapo/. \n\n[6] A. Hyvarinen and P. O. Hoyer. Emergence of phase and shift invariant features by \n\ndecomposition of natur:al images into independent feature subspaces. Neural Compu(cid:173)\ntation, 2000. (in press). \n\n[7] A. Hyvarinen, P. O. Hoyer, and M. Inki. The independence assumption: Analyzing \nthe independence of the components by topography. In M. Girolami, editor, Advances \nin Independent Component Analysis. Springer-Verlag, 2000. in press. \n\n[8] A. Hyvarinen and E. Oja. A fast fixed-point algorithm for independent component \n\nanalysis. Neural Computation, 9(7):1483-1492, 1997. \n\n[9] C. Jutten and J. Herault. Blind separation of sources, part I: An adaptive algorithm \n\nbased on neuromimetic architecture. Signal Processing, 24:1-10, 1991. \n\n[10] T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, Heidelberg, New York, \n\n1995. \n\n[11] T. Kohonen. Emergence of invariant-feature detectors in the adaptive-subspace self(cid:173)\n\norganizing map. Biological Cybernetics, 75:281-291, 1996. \n\n[12] J. K. Lin. Factorizing multivariate function classes. In Advances in Neural Information \n\nProcessing Systems, volume 10, pages 563-569. The MIT Press, 1998. \n\n[13] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties \n\nby learning a sparse code for natural images. Nature, 381:607-609, 1996. \n\n[14] E. P. Simoncelli and O. Schwartz. Modeling surround suppression in VI neurons \nwith a statistically-derived normalization model. In Advances in Neural Information \nProcessing Systems 11, pages 153-159. MIT Press, 1999. \n\n[15] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural \nimages compared with simple cells in primary visual cortex. Proc. Royal Society \nser. B, 265:359-366, 1998. \n\n\fEmergence of Vi properties using Extensions of leA \n\n- \", \n\n;II \n\n833 \n\n.. \n\n:II \u2022 \n\u2022 \u2022 \n\u2022 \n\n- , \n\n\u2022 \n.. \u2022 \n\n\u2022 \n\n\u2022 \n\u2022 \u2022 \n\n'11'1.; \n\n\u2022 \u2022 \n\n\u2022 \u2022 \n\n\u2022 \n\n\u2022 \n\u2022 \n\n\u2022 \u2022 \n\n\u2022 \n\n\u2022 \n\nFigure 1: Independent subspaces of natural image data. The model gives Gabor(cid:173)\nlike basis vectors for image windows. Every group of four basis vectors corresponds \nto one independent feature subspace, or complex cell. Basis vectors in a subspace \nare similar in orientation, location and frequency. In contrast, their phases are very \ndifferent. \n\n\u2022 \n\n-\n\n\" \n\n\u2022 \nIiioiII -\n\u2022 \u2022 \n\n~ \n\niii \n\nI \n\nI \niii I; \n\n\u2022 .. \n\n, \n'i \n~ \n\n\u2022 \n\n.-\n\nI \n\n.. \n\n.. \" \n\nI \n\n\u2022 \n\n\u2022 \n\nFigure 2: Topographic leA of natural image data. This gives Gabor-like basis vec(cid:173)\ntors as well. Basis vectors that are similar in orientation, location and/or frequency \nare close to each other. The phases of near by basis vectors are very different, giving \neach neighborhood properties similar to a complex cell. \n\n\f", "award": [], "sourceid": 1670, "authors": [{"given_name": "Aapo", "family_name": "Hyv\u00e4rinen", "institution": null}, {"given_name": "Patrik", "family_name": "Hoyer", "institution": null}]}