{"title": "Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior", "book": "Advances in Neural Information Processing Systems", "page_first": 293, "page_last": 300, "abstract": null, "full_text": "Interpreting Neural Response Variability as\n\nMonte Carlo Sampling of the Posterior\n\nPatrik O. Hoyer and Aapo Hyv\u00a8arinen\n\nNeural Networks Research Centre\nHelsinki University of Technology\n\nP.O. Box 9800, FIN-02015 HUT, Finland\n\nhttp://www.cis.hut.\ufb01/phoyer/\n\npatrik.hoyer@hut.\ufb01\n\nAbstract\n\nThe responses of cortical sensory neurons are notoriously variable, with\nthe number of spikes evoked by identical stimuli varying signi\ufb01cantly\nfrom trial to trial. This variability is most often interpreted as \u2018noise\u2019,\npurely detrimental to the sensory system. In this paper, we propose an al-\nternative view in which the variability is related to the uncertainty, about\nworld parameters, which is inherent in the sensory stimulus. Speci\ufb01-\ncally, the responses of a population of neurons are interpreted as stochas-\ntic samples from the posterior distribution in a latent variable model. In\naddition to giving theoretical arguments supporting such a representa-\ntional scheme, we provide simulations suggesting how some aspects of\nresponse variability might be understood in this framework.\n\n1 Introduction\n\nDuring the past half century, a wealth of data has been collected on the response properties\nof cortical sensory neurons. The majority of this research has focused on how the mean\n\ufb01ring rates of individual neurons depend on the sensory stimulus. Similarly, mathematical\nmodels have mainly focused on describing how the mean \ufb01ring rate could be computed\nfrom the input. One aspect which this research does not address is the high variability of\ncortical neural responses. The trial-to-trial variation in responses to identical stimuli are\nsigni\ufb01cant [1, 2], and several trials are typically required to get an adequate estimate of the\nmean \ufb01ring rate.\n\nThe standard interpretation is that this variability re\ufb02ects \u2018noise\u2019 which limits the accuracy\nof the sensory system [2, 3]. In the standard model, the \ufb01ring rate is given by\n\nrate \u0001\u0003\u0002\u0005\u0004 stimulus\n\n\u0006\b\u0007\n\nnoise\n\n(1)\n\nis the \u2019tuning function\u2019 of the cell in question. Here, the magnitude of the noise\nmay depend on the stimulus. Experimental results [1, 2] seem to suggest that the amount of\n, and not on the particular\n\nwhere \u0002\nvariability depends only on the mean \ufb01ring rate, i.e. \u0002\u0005\u0004 stimulus\n\n Current address: 4 Washington Place, Rm 809, New York, NY 10003, USA\n\n\t\n\u0006\n\fstimulus that evoked it. Speci\ufb01cally, spike count variances tend to grow in proportion to\nspike count means [1, 2]. This has been taken as evidence for something like a Poisson\nprocess for neural \ufb01ring.\n\nThis standard view is not completely satisfactory. First, the exquisite sensitivity and the\nreliability of many peripheral neurons (see, e.g. [3]) show that neurons in themselves need\nnot be very unreliable. In vitro experiments [4] also suggest that the large variability does\nnot have its origin in the neurons themselves, but is a property of intact cortical circuits.\nOne is thus tempted to point at synaptic \u2018background\u2019 activity as the culprit, attributing the\nvariability of individual neurons to variable inputs. This seems reasonable, but it is not\nquite clear why such modulation of \ufb01ring should be considered meaningless noise rather\nthan re\ufb02ecting complex neural computations.\n\nSecond, the above model does a poor job of explaining neural responses in the phenomenon\nknown as \u2019visual competition\u2019: When viewing ambiguous (bistable) \ufb01gures, perception,\nand the responses of many neurons with it, oscillates between two distinct states (for a\nreview, see [5]). In other words, a single stimulus can yield two very different \ufb01ring rates\nin a single neuron depending on how the stimulus is interpreted. In the above model, this\nmeans that either (a) the noise term needs to have a bimodal distribution, or (b) we are\nforced to accept the fact that neurons can be tuned to stimulus interpretations, rather than\nstimuli themselves. The former solution is clearly unattractive. The latter seems sensible,\nbut we have then simply transformed the problem of oscillating \ufb01ring rates into a problem\nof oscillating interpretations: Why should there be variability (over time, and over trials) in\nthe interpretation of a stimulus?\n\nWhat would be highly desirable is a theoretical framework in which the variability of re-\nsponses could be shown to have a speci\ufb01c purpose. One suggestion [6] is that variability\ncould improve the signal to noise ratio through a phenomenon known as \u2018stochastic reso-\nnance\u2019. Another recent suggestion is that variability contributes to the contrast invariance\nof visual neurons [7].\n\nIn this paper, we will propose an alternative explanation for the variability of neural re-\nsponses. This hypothesis attempts to account for both aspects of variability described\nabove: the Poisson-like \u2018noise\u2019 and the oscillatory responses to ambiguous stimuli. Our\nsuggestion is based on the idea that cortical circuits implement Bayesian inference in la-\ntent variable models [8, 9, 10]. Speci\ufb01cally, we propose that neural \ufb01ring rates might be\nviewed as representing Monte Carlo samples from the posterior distribution over the latent\nvariables, given the observed input. In this view, the response variability is related to the\nuncertainty, about world parameters, which is inherent in any stimulus. This representa-\ntion would allow not only the coding of parameter values but also of their uncertainties.\nThe latter could be accomplished by pooling responses over time, or over a population of\nredundant cells.\n\nOur proposal has a direct connection to Monte Carlo methods widely used in engineering.\nThese methods use built-in randomness to solve dif\ufb01cult problems that cannot be solved\nanalytically. In particular, such methods are one of the main options for performing ap-\nproximate inference in Bayesian networks [11]. With that in mind, it is perhaps even a bit\nsurprising that Monte Carlo sampling has not, to our knowledge, previously been suggested\nas an explanation for the randomness of neural responses.\n\nAlthough the approach proposed is not speci\ufb01c to sensory modality, we will here, for con-\ncreteness, exclusively concentrate on vision. We shall start by, in the next section, review-\ning the basic probabilistic approach to vision. Then we will move on to further explain the\nproposal of this contribution.\n\n\f2 The latent variable approach to vision\n\n2.1 Bayesian models of high-level vision\n\nRecently, a growing number of researchers have argued for a probabilistic approach to\nvision, in which the functioning of the visual system is interpreted as performing Bayesian\ninference in latent variable models, see e.g. [8, 9, 10]. The basic idea is that the visual\ninput is seen as the observed data in a probabilistic generative model. The goal of vision\nis to estimate the latent (i.e. unobserved or hidden) variables that caused the given sensory\nstimulation.\n\nIn this framework, there are a number of world parameters that contribute to the observed\ndata. These could be, for example, object identities, dimensions and locations, surface\nproperties, lighting direction, and so forth. These parameters are not directly available to\nthe sensory system, but must be estimated from the effects that they have on the images\nprojected onto the retinas. Collecting all the unknown world variables into the vector\nand all sensory data into the vector\ncaused a given sensory stimulus is\n\n, the probability that a given set of world parameters\n\n\u0004\u0003\n\n\u0001\b\u0006\u0006\u0005\n\n\u0001\u0007\u0003\n\n(2)\n\n\u0001\u0007\u0003\n\u0004\u0003\n\n\u0001\b\u0006\n\ndescribes\nis known\n\nis the prior probability of the set of world parameters\n\nwhere \u0002\n, and \u0002\nhow sensory data is generated from the world parameters. The distribution \u0002\nas the posterior distribution.\nA speci\ufb01c perceptual task then consists of estimating some subset of the world variables,\ngiven the observed data [10]. In face recognition, for example, one wants to know the\nidentity of a person but one does not care about the speci\ufb01c viewpoint or the direction of\nlighting. Note, however, that sometimes one might speci\ufb01cally want to estimate viewpoint\nor lighting, disregarding identity, so one cannot just automatically throw out that informa-\ntion [10]. In a latent variable model, all relevant information is contained in the complete\n. To estimate the identity\nposterior distribution \u0002\none must use the marginal posterior \u0002\n, obtained by integrating out\nthe viewpoint and lighting variables. Bayesian models of high-level vision model the visual\nsystem as performing these types of computations, but typically do not specify how they\nmight be neurally implemented.\n\n\u0004 identity\n\n\u0004 identity\n\nsensory data\n\nsensory data\n\nviewpoint\n\nlighting\n\n2.2 Neural network models of low-level vision\n\nThis probabilistic approach has not only been suggested as an abstract framework for vi-\nsion, but in fact also as a model for interpreting actual neural \ufb01ring patterns in the early\nvisual cortex [12, 13]. In this line of research, the hypothesis is that the activity of indi-\nvidual neurons can be associated with hidden state variables, and that the neural circuitry\nimplements probabilistic inference.1\nThe model of Olshausen and Field [12], known as sparse coding or independent compo-\nnent analysis (ICA) [14], depending on the viewpoint taken, is perhaps the most in\ufb02uen-\nare\ntial latent variable model of early visual processing to date. The hidden variables\nindependent and sparse, such as is given, for instance, by the double-sided exponential\nis then given by a\ndistribution \u0002\nlinear combination of the\n,\n\u0007\u001c\u001b\n\n\u0015 . The observed data vector\n\n, plus additive isotropic Gaussian noise. That is,\n\n1Here, it must be stressed that in these low-level neural network models, the hidden variables that\nthe neurons represent are not what we would typically consider to be the \u2018causal\u2019 variables of a visual\nscene. Rather, they are low-level visual features similar to the optimal stimuli of neurons in the early\nvisual cortex. The belief is that more complex hierarchical models will eventually change this.\n\n\u0001\f\u000b\u000e\r\u0010\u000f\b\u0004\u0012\u0011\u0014\u0013\n\b\u0018\t\n\n\u0001\u001a\u0019\n\n\b\n\t\n\n\u0006\u0017\u0016\n\n\n\u0001\n\u0002\n\u0004\n\u0002\n\u0004\n\n\u0006\n\u0002\n\u0004\n\n\u0006\n\t\n\u0004\n\n\u0006\n\n\u0004\n\n\u0006\n\u0004\n\t\n\t\n\u0003\n\u0006\n\u0003\n\u0006\n\u0004\n\b\n\t\n\u0006\n\u0015\n\u0003\n\b\n\t\n\u0003\n\u0013\n\u0001\n\u0001\n\n\fwhere \u0019\n\ncovariance matrix\n\n\u0001\u0003\u0002 .\n\nis a matrix of model parameters (weights), and\n\nis Gaussian with zero mean and\n\n.\n\n\u000f\u000e\n\n\u0001\b\u0006\n\ndata, the basis vectors (columns of \u0019\n\nHow does this abstract probabilistic model relate to neural processing? Olshausen and\nField showed that when the model parameters are estimated (learned) from natural image\n) come to resemble V1 simple cell receptive \ufb01elds.\nMoreover, the latent variables\nrelate to the activities of the corresponding cells. Specif-\nically, Olshausen and Field suggested [12] that the \ufb01ring rates of the neurons correspond\nto the maximum a posteriori (MAP) estimate of the latent variables, given the image input:\n\u0001\u0006\u0005\b\u0007\n\t\f\u000b\r\u0005\nAn important problem with this kind of a MAP representation is that it attempts to repre-\nsent a complex posterior distribution using only a single point (at the maximum). Such a\nrepresentation cannot adequately represent multimodal posterior distributions, nor does it\nprovide any way of coding the uncertainty of the value (the width of the peak). Many other\nproposed neural representations of probabilities face similar problems [11] (however, see\n[15] for a recent interesting approach to representing distributions). Indeed, it has been said\n[10, 16] that how probabilities actually are represented in the brain is one of the most impor-\ntant unanswered questions in the probabilistic approach to perception. In the next section\nwe suggest an answer based on the idea that probability distributions might be represented\nusing response variability.\n\n3 Neural responses as samples from the posterior distribution?\n\nAs discussed in the previous section, the distribution of primary interest to a sensory sys-\ntem is the posterior distribution over world parameters. In all but absolutely trivial models,\ncomputing and representing such a distribution requires approximative methods, of which\none major option is Monte Carlo methods. These generate stochastic samples from a given\ndistribution, without explicitly calculating it, and such samples can then be used to approx-\nimately represent or perform computations on that distribution [11].\n\nCould the brain use a Monte Carlo approach to perform Bayesian inference? If neural\n\ufb01ring rates are used (even indirectly) to represent continuous-valued latent variables, one\npossibility would be for \ufb01ring rate variability to represent a probability distribution over\nthese variables. Here, there are two main possibilities:\n\n(a) Variability over time. A single neuron could represent a continuous distribution if\nits \ufb01ring rate \ufb02uctuated over time in accordance with the distribution to be repre-\nsented. At each instant in time, the instantaneous \ufb01ring rate would be a random\nsample from the distribution to be represented.\n\n(b) Variability over neurons. A distribution could be instantaneously represented if\nthe \ufb01ring rate of each neuron in a pool of identical cells was independently and\nrandomly drawn from the distribution to be represented.\n\nNote that these are not exclusive, both types of variability could potentially coexist. Also\nnote that both cases lead to trial-to-trial variability, as all samples are assumed independent.\n\nBoth possibilities have their advantages. The \ufb01rst option is much more ef\ufb01cient in terms\nof the number of cells required, which is particularly important for representing high-\ndimensional distributions. In this case, dependencies between variables can naturally be\nrepresented as temporal correlations between neurons representing different parameters.\nThis is not nearly as straightforward for case (b). On the other hand, in terms of processing\nspeed, this latter option is clearly preferred to the former. Any decisions should optimally\nbe based on the whole posterior distribution, and in case (a) this would require collecting\nsamples over an extended period of time.\n\n\u001b\n\n\b\n\t\n\u0004\n\n\u0002\n\u0004\n\n\u0003\n\f10\n\n1\n\n10\n\n1\n\n10\n\n1\n\n10\n\n1\n\n0.1\n\n0.1\n\n1\n\n0.1\n\n0.1\n\n10\n\n1\n\n0.1\n\n0.1\n\n10\n\n1\n\n0.1\n\n0.1\n\n10\n\n1\n\n10\n\nFigure 1: Variance of response versus mean response, on log-log axes, for 4 representative\nmodel neurons. Each dot gives the mean (horizontal axis) and variance (vertical axis) of\nthe response of the model neuron in question to one particular stimulus. Note that the scale\nof responses is completely arbitrary.\n\nWe will now explain how both aspects of response variability described in the introduction\ncan be understood in this framework. First, we will show how a simple mean-variance re-\nlationship can arise through sampling in the independent component analysis model. Then,\nwe will consider how the variability associated with the phenomenon of visual competion\ncan be interpreted using sampling.\n\n3.1 Example 1: Posterior sampling in ICA\n\n\u0015\u0002\u0001\n\n\u0001\u001a\u000b\u000e\r\u0010\u000f\b\u0004\u0012\u0011\n\nfor all 50 patches\n\nHere, we sample the posterior distribution in the ICA model of natural images, and show\nhow this might relate to the conspicious variance-mean relation of neural response vari-\nability. First, we used standard ICA methods [17] to estimate a complete basis \u0019\nfor the\n40-dimensional principal subspace of\n-pixel natural image patches. Motivated by\nthe non-negativity of neural \ufb01ring rates we modi\ufb01ed the model to assume single-sided ex-\n[18], and augmented the basis so that a pair of neurons\nponential priors \u0002\ncoded separately for the positive and negative parts of each original independent compo-\nnent. We then took 50 random natural image patches and sampled the posterior distribu-\ntions \u0002\nFrom the 1000 collected samples, we calculated the mean and variance of the response of\neach neuron to each stimulus separately. We then plotted the variance against the mean\nindependently for each neuron in log-log coordinates. Figure 1 shows the plots from 4\nrandomly selected neurons. The crucial thing to note is that, as for real neurons [1], the\nvariance of the response is systematically related to the mean response, and does not seem\nto depend on the particular stimulus used to elicit a given mean response. This feature of\nneural variability is perhaps the single most important reason to believe that the variability\nis meaningless noise inherent in neural \ufb01ring; yet we have shown that something like this\nmight arise through sampling in a simple probabilistic model.\n\n, taking a total of 1000 samples in each case. 2\n\n\u0001\b\u0006\n\n\t\u000f\n\n\u0010\u0012\u0011\n\n, with population standard deviations\n\nFollowing [1, 2], we \ufb01tted lines to the plots, modeling the variance as var \u0001\u0004\u0003\u0006\u0005 mean\u0007\nOver the whole population (80 model neurons), the mean values of \u0003\n\t\u000b\n\n\f\u000e\r\n(respectively). Although these\nand\nvalues do not actually match those obtained from physiology (most reports give values of \u0003\nbetween 1 and 2, and\nclose to 1, see [1, 2]), this is to be expected. First, the values of these\nparameters probably depend on the speci\ufb01cs of the ICA model, such as its dimensionality\nand the noise level; we did not optimize these to attempt to \ufb01t physiology. Second, and\nmore importantly, we do not believe that ICA is an exact model of V1 function. Rather, the\nvisual cortex would be expected to employ a much more complicated, hierarchical, image\n\nwere\n\nand\n\nand\n\n\t\u000f\n\n.\n\n\t\u000f\n\n\t\u000e\u0013\n\n\u0015\u0015\u0014\n\n2This was accomplished using a Markov Chain Monte Carlo method, as described in the Ap-\n\npendix. However, the technical details of this method are not very relevant to this argument.\n\n\n\n\u0015\n\u0004\n\b\n\t\n\u0006\n\b\n\t\n\u0006\n\u0004\n\n\u0003\n\u0001\n\b\n\b\n\fmodel. Thus, our main goal was not to show that the particular parameters of the variance-\nmean relation could be explained in this framework, but rather the surprising fact that such\na simple relation might arise as a result of posterior sampling in a latent variable model.\n\n3.2 Example 2: Visual competition as sampling\n\nAs described in the introduction, in addition to the mean-variance relationship observed\nthroughout the visual cortex, a second sort of variability is that observed in visual compe-\ntition. This phenomenon arises when viewing a bistable \ufb01gure, such as the famous Necker\ncube or Rubin\u2019s vase/face \ufb01gure. These \ufb01gures each have two interpretations (explana-\ntions) that both cannot reasonably explain the image simultaneously. In a latent variable\nimage model, this corresponds to the case of a bimodal posterior distribution.\n\nWhen such \ufb01gures are viewed, the perception oscillates between the two interpretations (for\na review of this phenomenon, see [5]). This corresponds to jumping from mode to mode\nin the posterior distribution. This can directly be interpreted as sampling of the posterior.\nWhen the stimulus is modi\ufb01ed so that one interpretations is slightly more natural than the\nother one, the former is dominant for a relatively longer period compared with the latter\n(again, see [5]), just as proper sampling takes relatively more samples from the mode which\nhas larger probability mass. Although the above might be considered purely \u2018perceptual\u2019\nsampling, animal studies indicate that especially in higher-level visual areas many neurons\nmodulate their responses in sync with the animal\u2019s perceptions [5, 19]. This link proves\nthat some form of sampling is clearly taking place on the level of neural \ufb01ring rates as well.\n\nNote that this phenomenon might be considered as evidence for sampling scheme (a) and\nagainst (b). If we instantaneously could represent whole distributions, we should be able to\nkeep both interpretations in mind simultaneously. This is in fact (weak) evidence against\nany scheme of representing whole distributions instantaneously, by the same logic.\n\n4 Conclusions\n\nOne of the key unanswered questions in theoretical neuroscience seems to be: How are\nprobabilities represented by the brain? In this paper, we have proposed that probability dis-\ntributions might be represented using response variability. If true, this would also present\na functional explanation for the signi\ufb01cant amount of cortical neural \u2018noise\u2019 observed. Al-\nthough it is clear that the variability degrades performance on many perceptual tasks of the\nlaboratory, it might well be that it plays an important function in everyday sensory tasks.\nOur proposal would be one possible way in which it might do so.\n\nDo actual neurons employ such a computational scheme? Although our arguments and\nsimulations suggest that it might be possible (and should be kept in mind), future research\nwill be needed to answer that question. As we see it, key experiments would compare\nmeasured \ufb01ring rate variability statistics (single unit variances, or perhaps two-unit covari-\nances) to those predicted by latent variable models. Of particular interest are cases where\ncontextual information reduces the uncertainty inherent in a given stimulus; our hypothesis\npredicts that in such cases neural variability is also reduced.\n\nA \ufb01nal question concerns how neurons might actually implement Monte Carlo sampling\nin practice. Because neurons cannot have global access to the activity of all other neurons\nin the population, the only possibility seems to be something akin to Gibbs sampling [20].\nSuch a scheme might require only relatively local information and could thus conceivably\nbe implemented in actual neural networks.\nAcknowledgements \u2014 Thanks to Paul Hoyer, Jarmo Hurri, Bruno Olshausen, Liam Panin-\nski, Phil Sallee, Eero Simoncelli, and Harri Valpola for discussions and comments.\n\n\fReferences\n\n[1] A. F. Dean. The variability of discharge of simple cells in the cat striate cortex. Experimental\n\nBrain Research, 44:437\u2013440, 1981.\n\n[2] D. J. Tolhurst, J. A. Movshon, and A. F. Dean. The statistical reliability of signals in single\n\nneurons in cat and monkey visual cortex. Vision Research, 23:775\u2013785, 1983.\n\n[3] A. J. Parker and W. T. Newsome. Sense and the single neuron: Probing the physiology of\n\nperception. Annual Review of Neuroscience, 21:227\u2013277, 1998.\n\n[4] G. R. Holt, W. R. Softky, C. Koch, and R. J. Douglas. Comparison of discharge variability\nin vitro and in vivo in cat visual cortex neurons. Journal of Neurophysiology, 75:1806\u20131814,\n1996.\n\n[5] R. Blake and N. K. Logothetis. Visual competition. Nature Reviews Neuroscience, 3:13\u201321,\n\n2002.\n\n[6] M. Rudolph and A. Destexhe. Do neocortical pyramidal neurons display stochastic resonance?\n\nJournal of Computational Neuroscience, 11:19\u201342, 2001.\n\n[7] J. S. Anderson, I. Lampl, D. C. Gillespie, and D. Ferster. The contribution of noise to contrast\n\ninvariance of orientation tuning in cat visual cortex. Science, 290:1968\u20131972, 2000.\n\n[8] D. C. Knill and W. Richards, editors. Perception as Bayesian Inference. Cambridge University\n\nPress, 1996.\n\n[9] R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki, editors. Probabilistic Models of the Brain.\n\nMIT Press, 2002.\n\n[10] D. Kersten and P. Schrater. Pattern inference theory: A probabilistic approach to vision. In\n\nR. Mausfeld and D. Heyer, editors, Perception and the Physical World. Wiley & Sons, 2002.\n\n[11] P. Dayan. Recognition in hierarchical models. In F. Cucker and M. Shub, editors, Foundations\n\nof Computational Mathematics. Springer, Berlin, Germany, 1997.\n\n[12] B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy\n\nemployed by V1? Vision Research, 37:3311\u20133325, 1997.\n\n[13] R. P. N. Rao and D. H. Ballard. Predictive coding in the visual cortex: a functional interpretation\n\nof some extra-classical receptive \ufb01eld effects. Nature Neuroscience, 2(1):79\u201387, 1999.\n\n[14] A. J. Bell and T. J. Sejnowski. The \u2018independent components\u2019 of natural scenes are edge \ufb01lters.\n\nVision Research, 37:3327\u20133338, 1997.\n\n[15] R. S. Zemel, P. Dayan, and A. Pouget. Probabilistic interpretation of population codes. Neural\n\nComputation, 10(2):403\u2013430, 1998.\n\n[16] H. B. Barlow. Redundancy reduction revisited. Network: Computation in Neural Systems,\n\n12:241\u2013253, 2001.\n\n[17] A. Hyv\u00a8arinen. Fast and robust \ufb01xed-point algorithms for independent component analysis.\n\nIEEE Trans. on Neural Networks, 10(3):626\u2013634, 1999.\n\n[18] P. O. Hoyer. Modeling receptive \ufb01elds with non-negative sparse coding.\n\nIn E. De Schutter,\neditor, Computational Neuroscience: Trends in Research 2003. Elsevier, Amsterdam, 2003. In\npress.\n\n[19] N. K. Logothetis and J. D. Schall. Neuronal correlates of subjective visual perception. Science,\n\n245:761\u2013763, 1989.\n\n[20] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration\nof images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721\u2013741, 1984.\n\nAppendix: MCMC sampling of the non-negative ICA posterior\n\nThe posterior probability of\n\n, upon observing\n\n, is given by\n\n\u0001\b\u0006\n\n\u0001\u0007\u0003\n\n\u0001\b\u0006\n\n\u0001\u0001\n\n\u000b\u0018\r\u0010\u000f\n\n\u0001\u0004\u0003\n\n\u0001\u0006\u0005\b\u0007\n\t\n\n\u000b\u000e\r\u0010\u000f\n\n(3)\n\n\n\u0001\n\u0002\n\u0004\n\n\u0003\n\u0001\n\u0002\n\u0004\n\n\u0006\n\u0002\n\u0004\n\n\u0006\n\u0002\n\u0004\n\u0002\n\u0011\n\n\u0015\n\n\u0001\n\u0011\n\u0019\n\n\u0003\n\u0004\n\u0011\n\b\n\t\n\u0006\n\n\fTaking the (natural) logarithm yields\n\n\u0002\u0001\n\n\u0001\b\u0006\n\n\u0002\u0001\n\n\u0001\u0004\u0003\n\n\u0001\u0006\u0005\n\n\u0001\u0006\u0005\n\n\u0007\u001c\u0007\u0005\n\nwhere\n\nis a vector of all ones. The crucial thing to note is that this function is quadratic in\n. Thus, the posterior distribution has the form of a gaussian, except that of course it is only\nde\ufb01ned for non-negative\n. Rejection sampling might look tempting, but unfortunately does\nnot work well in high dimensions. Thus, we will instead opt for a Markov Chain Monte\nCarlo approach. Implementing Gibbs sampling [20] is quite straightforward. The posterior\ndistribution of\n, is a one-dimensional density\nthat we will call cut-gaussian,\n\nand all other hidden variables\n\n\u0011\u000b\n\n\t\b\n\n(4)\n\n\b\r\f , given\n\b\u000e\t\t\u000e\n\n\u0006\u0006\u0005\n\n\u000b\u0018\r\u0010\u000f\u001b\u001a\n\u00143\n\u001e! \n\f denotes the6 :th column of \u0019\n\n\u000121\n\nif\n\nif\n\n\b\u0014\f\u0016\u0015\u0018\u0017\u0007\u0019\n\u0019.-\n\f\u0016-\n\b\u0014\f\u0016/\u0018\u0017\nand\u0017\n\n\u0011\u001d\u001c\u001f\u001e! \u0014\"$#&%\n \r')(\n\u0001+*\n\u001e! \n\r3 denotes the current state vector but with\n\nif\u0017\n\t\u000b\u0017\u0007\u0019\n\n\u000154\n\n(5)\n\n(6)\n\n, and\n\nIn this case, we have the following parameter values:\n\nHere,1\n\nset to zero. Sampling from such a one-dimensional distribution is relatively simple. Just as\none can sample the corresponding (uncut) gaussian by taking uniformly distributed samples\nand passing them through the inverse of the gaussian cumulative\ndistribution function, the same can be done for a cut-gaussian distribution by constraining\nthe uniform sampling interval suitably.\n\non the interval \u0004\n\nHence Gibbs sampling is feasible, but, as is well known, Gibbs sampling exhibits problems\nwhen there are signi\ufb01cant correlations between the sampled variables. Thus we choose to\nuse a sampling scheme based on a rotated co-ordinate system. The basic idea is to update\nthe state vector not in the directions of the component axes, as in standard Gibbs sampling,\n. Thus we start by calculating\nthese eigenvectors, and cycle through them one at a time. Denoting the current unit-length\n\nbut rather in the directions of the eigenvectors of \u0019\n\neigenvector to be updated7 we have as a function of the step length \u0003\n\u0002\u0001\n\nAgain, note how this is a quadratic function of \u0003\n. Again, the non-negativity constraints\non\nrequire us to sample a cut-gaussian distribution. But this time there is an additional\ncomplication: When the basis is overcomplete, some of the eigenvectors will be associated\nwith zero eigenvalues, and the logarithmic probability will be linear instead of quadratic.\nThus, in such a case we must sample a cut-exponential distribution,\n\n\u00118\n\nconst\n\n\u0001\b\u0006\n\n(7)\n\n,\n\n(8)\n\nLike in the cut-gaussian case, this can be done by uniformly sampling the corresponding\ninterval and then applying the inverse of the exponential cumulative distribution function.\n\n\u0004\u0017\u0011\n\n\u000b\u000e\r\u0010\u000f\n\n0=<\n\nif \u0003\nif \u0003\n\n\u0006\u0007\u0005:9\n\n\u0015;\u0017\u0007\u0019\nif\u0017\n\u0019>-\n/;\u0017\nthe eigenvalue corresponding to the current eigenvector7\n\nIn summary: We start by calculating the eigensystem of the matrix \u0019\nstate vector\ninde\ufb01nitely, sampling \u0003\nto\n\n, and set the\nto random non-negative values. Then we cycle through the eigenvectors\nfrom cut-gaussian or cut-exponential distributions depending on\n\n, and updating the state vector\n\nhttp://www.cis.hut.fi/phoyer/code/samplingpack.tar.gz\n\n. MATLAB code performing and verifying this sampling is available at:\n\n\t\n\u0002\n\u0004\n\n\u0003\n\u0001\n\t\n\n\u0011\n\n\u0015\n\n\u0001\n\u0011\n\u0015\n\u0019\n\n\u0019\n\u0005\n\u0019\n\u0005\n\n\t\n\n\n\u0001\n\b\n\t\n\u0002\n\u0004\n\b\n\f\n\u0003\n\u0001\n\t\n\u000f\n\f\n\u0010\n\u0011\n\u0012\n\u0011\n\u0013\n\t\n(\n%\n \n,\n\b\n\u0017\n\u0001\n\t\n\u0001\n0\n\u0005\n\f\n\u0004\n\u0001\n\u0011\n\u0019\n\u0006\n\u0011\n\n\u0001\n\u0003\n1\n\f\n\u0003\n\u0001\n\t\n\n\u0001\n\n\u0003\n1\n\f\n\u0003\n\u0001\n\t\n\t\n\u0001\n\n\b\n\f\n\t\n\t\n\n\u0006\n\u0005\n\u0019\n\t\n\u0002\n\u0004\n\n\u0007\n\u0003\n7\n\u0003\n\u0001\n\u0007\n\u0002\n\n\n\u0001\n\u0004\n\u0001\n\u0011\n\u0019\n\n\u0006\n\u0005\n\u0019\n7\n\u0005\n7\n\u0005\n\u0003\n\u0011\n\n\u0015\n\n\u0001\n\u0003\n7\n\u0005\n\u0019\n\u0005\n\u0019\n7\n\b\n\u0003\n\u0001\n\n\n\u0002\n\u0004\n\u0003\n\t\n\u0003\n\u0016\n\u0006\n\u0003\n-\n\u0017\n\u0001\n\t\n\u0001\n\u0005\n\u0019\n\n\n\n\u0007\n\u0003\n7\n\f", "award": [], "sourceid": 2152, "authors": [{"given_name": "Patrik", "family_name": "Hoyer", "institution": null}, {"given_name": "Aapo", "family_name": "Hyv\u00e4rinen", "institution": null}]}