{"title": "Source Separation and Density Estimation by Faithful Equivariant SOM", "book": "Advances in Neural Information Processing Systems", "page_first": 536, "page_last": 542, "abstract": null, "full_text": "Source Separation and Density \n\nEstimation by Faithful Equivariant SOM \n\nJuan K. Lin \n\nDepartment of Physics \nUniversity of Chicago \n\nChicago, IL 60637 \njk-lin@uchicago.edu \n\nDavid G. Grier \n\nDepartment of Physics \nUniversity of Chicago \n\nChicago, IL 60637 \n\nd-grier@uchicago.edu \n\nJack D. Cowan \n\nDepartment of Math \nUniversity of Chicago \n\nChicago, IL 60637 \n\nj-cowan@uchicago.edu \n\nAbstract \n\nWe couple the tasks of source separation and density estimation \nby extracting the local geometrical structure of distributions ob(cid:173)\ntained from mixtures of statistically independent sources. Our \nmodifications of the self-organizing map (SOM) algorithm results \nin purely digital learning rules which perform non-parametric his(cid:173)\ntogram density estimation. The non-parametric nature of the sep(cid:173)\naration allows for source separation of non-linear mixtures. An \nanisotropic coupling is introduced into our SOM with the role of \naligning the network locally with the independent component con(cid:173)\ntours. This approach provides an exact verification condition for \nsource separation with no prior on the source distributions. \n\n1 \n\nINTRODUCTION \n\nMuch of the current work on visual cortex modeling has focused on the generation of \ncoding which captures statistical independence and sparseness (Bell and Sejnowski \n1996, Olshausen and Field 1996). The Bell and Sejnowski model suffers from the \nparametric and intrinsically non-local nature of their source separation algorithm, \nwhile the Olshausen and Field model does not achieve true sparse-distributed cod(cid:173)\ning where each cell has the same response probability (Field 1994). In this paper, we \nconstruct an extensively modified SOM with equipartition of activity as a steady(cid:173)\nstate for the task of local statistical independence processing and sparse-distributed \ncoding. \n\n\fSOFM for Density Approximation and leA \n\n537 \n\nRitter and Schulten (1986) demonstrated that the density of the Kohonen SOM \nunits is not proportional to the input density in the steady-state. In one dimen(cid:173)\nsion the Kohonen net under-represents high density and over-represents low den(cid:173)\nsity regions. Thus SOM's are generally not used for density estimation. Several \nmodifications for controlling the magnification of the representation have appeared. \nRecently, Bauer et. al. (1996) used an \"adaptive step size\" , and Lin and Cowan \n(1996) used an Lp-norm weighting to control the magnification. Here we concen(cid:173)\ntrate on the later's \"faithful representation\" algorithms for source separation and \ndensity estimation. \n\n2 SHARPLY PEAKED DISTRIBUTIONS \n\nMixtures of sharply peaked source distributions will contain high density contours \nwhich correspond to the independent component contours. Blind separation can be \nperformed rapidly for this case in a net with one dimensional branched topology. A \ndigital learning rule where the updates only take on discrete values was used: 1 \n\nwhere K is the learning rate, A(\u20ac) \npositions, and ( the input. \n\n(1) \nthe neighborhood function, {w} the SOM unit \n\n. \n\n. . \" \n\n.' ~ \", \n\n_1133 \u2022\u2022.\u2022. \n\n. \" , '; \", ', \n\n. .. . . \n. \" \n\n. \n\" . .. . \n. \n. \" \n\n. . \" \n\n\" , \n\n, \" \n\n. ' .. .. \n\n\\ , \n\nFigure 1: Left: linear source separation by branched net. Dashed lines correspond \nto the independent component axes. Net configuration is shown every 200 points. \nDots denote the unit positions after 4000 points. Right: Voronoi partition of the \nvector space by the SOM units. \n\nWe performed source separation and coding of two mixed signals in a net with the \ntopology of two cross-linked branches (see Fig. (1)). The neighborhood function \n\nIThe sign function sgn(i) takes on a value of 1 for i > 0, 0 for i = 0 and -1 for i < O. \n\nHere the sign function acts component-wise on the vector. \n\n\f538 \n\n1. K. Lin, 1. D. Cowan and D. G. Grier \n\nA(\u20ac) \nis taken to be Gaussian where \u20ac is the distance to the winning unit along the \nbranch structure. Two speech audio files were randomly mixed and pre-whitened \nfirst to decorrelate the two mixtures. Since pre-whitening tends to orthogonalize \nthe independent component axes, much of the processing that remains is rotation \nto find the independent component coordinate system. A typical simulation is \nshown in Fig. (1). The branches of the net quickly zero in on the high density \ndirections. As seen from the nearest-neighbor Voronoi partition of the distribution \n(Fig. 1 b), the branched SOM essentially performs a one dimensional equipartition of \nthe mixture. The learning rule Eqn. 1 attempts to place each unit at the component(cid:173)\nwise median of the distribution encompassed by its Voronoi partition. For sharply \npeaked sources, the algorithm will place the units directly on top of the high density \nridges. \n\nTo demonstrate the generality of our non-parametric approach, we perform source \nseparation and density coding of a non-linear mixture. Because our network has \nlocal dynamics, with enough units, the network can follow the curved \"independent \ncomponent contours\" of the input distribution. The result is shown in Fig. (2). \n\nFigure 2: Source separation of non-linear mixture. The mixture is given by ~1 = \n-2sgn(st} . s~ + 1.1s 1 - S2, ~2 = -2sgn(s2) . s~ + SI + 1.1s2. Left: the SOM \nconfiguration is shown periodically in the figure, with the configuration after 12000 \npoints indicated by the dots. Dashed lines denote two independent component \ncontours. Right: the sources (SI' S2), mixtures (6, 6) and pseudo-histogram(cid:173)\nequalized representations (01, 02) . \n\nTo unmix the input, a parametric separation approach can be taken where least \nsquares fit to the branch contours is used. For the source separation in Fig. CIa), \nassuming linear mixing and inserting the branch coordinate system into an unmix(cid:173)\ning matrix, we find a reduction of the amplitudes of the mixtures to less than one \npercent of the signal. This is typical of the quality of separation obtained in our \nsimulations. For the non-linear source separation in Fig. (2), parametric unmix(cid:173)\ning can similarly be accomplished by least squares fit to polynomial contours with \n\n\fSOFM for Density Approximation and leA \n\n539 \n\nquadratic terms. Alternatively, taking full advantage of the non-parametric nature \nof the SOM approach, an approximation of the independent sources can be con(cid:173)\nstructed from the positions Wi. of the winning unit. Or as we show in Fig. (2b), the \ncell labels i* can be used to give a pseudo-histogram-equalized source representa(cid:173)\ntion. This non-parametric approach is thus much more general in the sense that no \nmodel is needed of the mixing transformation. Since there is only one winning unit \nalong one branch, only one output channel is active at any given time. For sharply \npeaked source distributions such as speech, this does not significantly hinder the \nfidelity of the source representation since the input sources hover around zero most \nof the time. This property also has the potential for utilization in compression. \nHowever, for a full rigorous histogram-equalized source representation, we must \nturn to a network with a topology that matches the dimensionality of the input. \n\n3 ARBITRARY DISTRIBUTIONS \n\nFor mixtures of sources with arbitrary distributions, we seek a full N dimensional \nequipartition. We define an (M, N) partition of !RN to be a partition of !RN into \n(M + 1)N regions by M parallel cuts normal to each of N distinct directions. The \nsimplest equipartition of a source mixtures is the trivial equipartition along the \nindependent component axes (ICA). Our goal is to achieve this trivial ICA aligned \nequipartition using a hypercube architecture SOM with M + 1 units per dimension. \nFor an (M, N) equipartition, since the number of degrees of freedom to define the \nM N hyperplanes grows quadratically in N, while the number of constraints grows \nexponentially in N, for large enough M the desired trivial equipartition will the \nunique (M, N) equipartition. We postulate that M = 2 suffices for uniqueness. \nComplementary to this claim, it is known that a (1, N) equipartition does not exist \nfor arbitrary distributions for N ~ 5 (Ramos 1996). The uniqueness of the (M, N) \nequipartition of source mixtures thus provides an exact verification condition for \nnoiseless source separation. \nWith? = i-: - i, the digital equipartition learning rule is given by: \n\n~wi -\n~Wi\u00b7 -\n\n~A(?)\u00b7 sgn(?) \nL~Wi' \n\ni \n\n(2) \n\n(3) \n\nwhere \n\n(4) \nEquipartion of the input distribution can easily be shown to be a steady-state of \nthe dynamics. Let qk be the probability measure of unit k. For the steady-state: \n\nA(?) = A( -?). \n\n< ~wk > = 0 \n\nL q; . A({ - k) . sgn(i - k) + qk L A( k - i) . sgn( k - i) \nL(q; - qk) . A(i - k) . sgn(i - k), \n\ni \n\nfor all units k. By inspection, equipartition, where q; = qk~ for all units i is a \nsolution to the equation above. It has been shown that equipartition is the only \n\n\f540 \n\nJ. K. Lin, J. D. Cowan and D. G. Grier \n\nsteady-state of the learning rule in two dimensional rectangular SOM's (Lin and \nCowan 1996), though with the highly overconstrained steady-state equations, the \nresult should be much more general. \n\nOne further modification of the SOM is required. The desired trivial ICA equipar(cid:173)\ntition is not a proper Voronoi partition except when the independent component \naxes are orthogonal. To obtain the desired equipartition, it is necessary to change \nthe definition of the winning unit i-. Let \n\nbe the winning region of the unit at wi' Since a histogram-equalized representation \nindependent of the mixing transformation A is desired, we require that \n\ni.e., n is equivariant under the action of A (see e.g. Golubitsky 1988). \n\n{An(w)} = {n(Aw)} , \n\n(6) \n\n(5) \n\n,./ \n\n,./ \n\n.-------------. \nVoronoi \n\n, \n, \n, \n, \n, \n\n, \n, \n, \n, \n, \n\n.------- ------. \nEqu ivariant \n\n.: ; \n\n. :\" :,, , \n\nI I,: \nI ~i.f \n;; ! .' . \n\n.\" . \n\nFigure 3: Left: Voronoi and equivariant partitions of the a primitive cell. Right: \nconfiguration of the SOM after 4000 points. Initially the units of the SOM were \nequally spaced and aligned along the two mixture coordinate directions. \n\nIn two dimensions, we modify the tessellation by dividing up a primitive cell amongst \nits constituent units along lines joining the midpoints of the sides. For a primitive \ncell composed of units at ii, b, c and J, the region of the primitive cell represented \nby ii is the simply connected polygon defined by vertices at ii, (it + b)/2, (it + d)/2 \nand (it+b+c+d)/4. The two partitions are contrasted in Fig. (3a). Our modified \nequivariant partition satisfies Eqn. (6) for all non-singular linear transformations. \nThe learning rule given above was shown to have an equipartition steady state. It \nremains, however, to align the partitions so that it becomes a valid (M, N) partition. \nThe addition of a local anisotropic coupling which physically, in analogy to elastic \nnets, might correspond to a bending modulus along the network's axes, will tend \nto align the partitions and enhance convergence to the desired steady state. We \n\n\fSOFMfor Density Approximation and leA \n\n541 \n\nsupplemented the digital learning rule (Eqs. (2)-(3)) with a movement of the units \ntowards the intersections of least squares line fits to the SOM grid. \n\nNumerics are shown in Fig. 3b, where alignment with the independent component \ncoordinate system and density estimation in the form of equipartition can be seen . \nThe aligned equipartition representation formed by the network gives histogram(cid:173)\nequalized representations of the independent sources, which, because of the equiv(cid:173)\nariant nature of the SOM, will be independent of the mixing matrix. \n\n4 DISCUSSION \n\nMost source separation algorithms are parametric density estimation approaches \n(e.g. Bell and Sejnowski 1995, Pearlmutter and Parra 1996). Alternatively in \nparallel with this work, the standard SOM was used for the separation of both \ndiscrete and uniform sources (Herrmann and Yang 1996, Pajunen et. al. 1996). The \nsource separation approach taken here is very general in the sense that no a priori \nassumptions about the individual source distributions and mixing transformation \nare made. Our approach's local non-parametric nature allows for source separation \nof non-linear mixtures and also possibly the separation of more sharply peaked \nsources from fewer mixtures. The low to high dimensional map required for the \nlater task will be prohibitively difficult for parametric unmixing approaches. \n\nFor density estimation in the form of equipartition, we point out the importance \nof a digital scale-invariant algorithm. Direct dependence on ( and Wi has been \nextracted out of the learning rule. Because the update depends only upon the \npartition, the network learns from its own coarse response to stimuli. This along \nwith the equivariant partition modification underscore the dynamic partition nature \nof the our algorithm. More direct computational geometry partitioning algorithms \nare currently being pursued. It is also clear that a hybrid local parametric density \nestimation approach will work for the separation of sharply peaked sources (Bishop \net. al. 1996, Utsugi 1996). \n\n5 CONCLUSIONS \n\nWe have extracted the local geometrical structure of transformations of product dis(cid:173)\ntributions. By modifying the SOM algorithm we developed a network with the ca(cid:173)\npability of non-parametrically separating out non-linear source mixtures. Sharply \npeaked sources allow for quick separation via a branched SOM network. For arbi(cid:173)\ntrary source distributions, we introduce the (M,N) equipartition, the uniqueness of \nwhich provides an exact verification condition for source separation. \n\nFundamentally, equipartition of activity is a very sensible resource allocation prin(cid:173)\nciple. In this work, the local equipartition coding and source separation processing \nproceed in tandem, resulting in optimal coding and processing of source mixtures. \nWe believe the digital \"counting\" aspect of the learning rule, the learning based on \nthe network's own coarse response to stimuli, the local nature of the dynamics, and \nthe coupling of coding and processing make this an attractive approach from both \ncomputational and neural modeling perspectives. \n\n\f542 \n\nReferences \n\n1. K. Lin, 1. D. Cowan and D. G. Grier \n\nBauer, H.-U., Der, R., and Herrmann, M. 1996. Controlling the magnification factor \nof self-organizing feature maps. Neural Compo 8, 757-771. \nBell, A. J., and Sejnowski, T. J. 1995. An information-maximization approach to \nblind separation and blind deconvolution. Neural Compo 7,1129-1159. \nBell, A. J., and Sejnowski, T. J. 1996. Edges are the \"independent components\" of \nnatural scenes. NIPS *9. \nBishop, C. M. and Williams, C. 1996. GTM: A principled alternative to the self(cid:173)\norganizing map. NIPS *9. \nField, D. J. 1994. What is the goal of sensory coding? Neural Compo 6,559-601. \nGolubitsky, M., Stewart, 1., and Schaeffer, D. G. 1988. Singularities and Groups in \nBifurcation Theory. Springer-Verlag, Berlin. \nHerrmann,M. and Yang, H. H. 1996. Perspectives and limitations of self-organizing \nmaps in blind separation of source signals. Proc. lCONlP'96. \nHertz, J., Krogh A., and Palmer, R. G. 1991. Introduction to the Theory of Neural \nComputation. Addison-Wesley, Redwood City. \nKohonen, T. 1995. Self-Organizing Maps. Springer-Verlag, Berlin. \nLin, J. K. and Cowan, J. D. 1996. Faithful representation of separable input distri(cid:173)\nbutions. To appear in Neural Computation. \nOlshausen, B. A. and D. J. Field 1996. Emergence of simple-cell receptive field \nproperties by learning a sparse code for natural images. Nature 381, 607-609. \nPajunen, P., Hyvarinen, A. and Karhunen, J. 1996. Nonlinear blind source separa(cid:173)\ntion by self-organizing maps. Proc. lCONIP'96. \nPearlmutter, B. A. and Parra, L. 1996. Maximum likelihood blind source separation: \na context-sensitive generalization of lCA. NIPS *9. \nRamos, E. A. 1996. Equipartition of mass distributions by hyperplanes. Discrete \nComput. Geom. 15, 147-167. \nRitter, H., and Schulten, K. 1986. On the stationary state of Kohonen's self(cid:173)\norganizing sensory mapping. Bioi. Cybern., 54,99-106. \nUtsugi, A. 1996. Hyperparameter selection for self-organizing maps. To appear in \nNeural Computation. \n\n\f", "award": [], "sourceid": 1190, "authors": [{"given_name": "Juan", "family_name": "Lin", "institution": null}, {"given_name": "Jack", "family_name": "Cowan", "institution": null}, {"given_name": "David", "family_name": "Grier", "institution": null}]}