{"title": "No evidence for active sparsification in the visual cortex", "book": "Advances in Neural Information Processing Systems", "page_first": 108, "page_last": 116, "abstract": "The proposal that cortical activity in the visual cortex is optimized for sparse neural activity is one of the most established ideas in computational neuroscience. However, direct experimental evidence for optimal sparse coding remains inconclusive, mostly due to the lack of reference values on which to judge the measured sparseness. Here we analyze neural responses to natural movies in the primary visual cortex of ferrets at different stages of development, and of rats while awake and under different levels of anesthesia. In contrast with prediction from a sparse coding model, our data shows that population and lifetime sparseness decrease with visual experience, and increase from the awake to anesthetized state. These results suggest that the representation in the primary visual cortex is not actively optimized to maximize sparseness.", "full_text": "No evidence for active sparsi\ufb01cation\n\nin the visual cortex\n\nPietro Berkes, Benjamin L. White, and J\u00b4ozsef Fiser\n\nVolen Center for Complex Systems\n\nBrandeis University, Waltham, MA 02454\n\nAbstract\n\nThe proposal that cortical activity in the visual cortex is optimized for sparse neu-\nral activity is one of the most established ideas in computational neuroscience.\nHowever, direct experimental evidence for optimal sparse coding remains incon-\nclusive, mostly due to the lack of reference values on which to judge the measured\nsparseness. Here we analyze neural responses to natural movies in the primary\nvisual cortex of ferrets at different stages of development and of rats while awake\nand under different levels of anesthesia. In contrast with prediction from a sparse\ncoding model, our data shows that population and lifetime sparseness decrease\nwith visual experience, and increase from the awake to anesthetized state. These\nresults suggest that the representation in the primary visual cortex is not actively\noptimized to maximize sparseness.\n\n1 Introduction\n\nIt is widely believed that one of the main principles underlying functional organization of the early\nvisual system is the reduction of the redundancy of relayed input from the retina. Such a transforma-\ntion would form an optimally ef\ufb01cient code, in the sense that the amount of information transmitted\nto higher visual areas would be maximal. Sparse coding refers to a possible implementation of this\ngeneral principle, whereby each stimulus is encoded by a small subset of neurons. This would allow\nthe visual system to transmit information ef\ufb01ciently and with a small number of spikes, improving\nthe signal-to-noise ratio, reducing the energy cost of encoding, improving the detection of \u201csuspi-\ncious coincidences\u201d, and increasing storage capacity in associative memories [1, 2]. Computational\nmodels that optimize the sparseness of the responses of hidden units to natural images have been\nshown to reproduce the basic features of the receptive \ufb01elds (RFs) of simple cells in V1 [3, 4, 5].\nMoreover, manipulation of the statistics of the environment of developing animals leads to changes\nin the RF structure that can be predicted by sparse coding models [6].\nUnfortunately, attempts to verify this principle experimentally have so far remained inconclusive.\nElectrophysiological studies performed in primary visual cortex agree in reporting high sparseness\nvalues for neural activity [7, 8, 9, 10, 11, 12]. However, it is contested whether the high degree of\nsparseness is due to a neural representation which is optimally sparse, or is an epiphenomenon due\nto neural selectivity [10, 12]. This controversy is mostly due to a lack of reference measurement\nwith which to judge the sparseness of the neural representation in relative, rather than absolute\nterms. Another problem is that most of these studies have been performed on anesthetized animals\n[7, 9, 10, 11, 12], even though the effect of anesthesia might bias sparseness measurements (cf.\nSec. 6).\nIn this paper, we report results from electrophysiological recordings from primary visual cortex (V1)\nof ferrets at various stages of development, from eye opening to adulthood, and of rats at different\nlevels of anesthesia, from awake to deeply anesthetized, with the goal of testing the optimality of\nthe neural code by studying changes in sparseness under different conditions. We compare this data\n\n1\n\n\fwith theoretical predictions: 1) sparseness should increase with visual experience, and thus with\nage, as the visual system adapts to the statistics of the visual environment; 2) sparseness should be\nmaximal in the \u201cworking regime\u201d of the animal, i.e. for alert animals, and decrease with deeper\nlevels of anesthesia. In both cases, the neural data shows a trend opposite to the one expected in a\nsparse coding system, suggesting that the visual system is not actively optimizing the sparseness of\nits representation.\nThe paper is organized as follows: We \ufb01rst introduce and discuss the lifetime and population sparse-\nness measures we will be using throughout the paper. Next, we present the classical, linear sparse\ncoding model of natural images, and derive an equivalent, stochastic neural network, whose output\n\ufb01ring rates correspond to Monte Carlo samples from the posterior distribution of visual elements\ngiven an image. In the rest of the paper, we make use of this neural architecture in order to predict\nchanges in sparseness over development and under anesthesia, and compare these predictions with\nelectrophysiological recordings.\n\n2 Lifetime and population sparseness\n\nThe diverse bene\ufb01ts of sparseness mentioned in the introduction rely on different aspects of the\nneural code, which are captured to a different extent by two sparseness measures, referred to as\nlifetime and population sparseness. Lifetime sparseness measures the distribution of the response\nof an individual cell to a set of stimuli, and is thus related to the cell\u2019s selectivity. This quantity\ncharacterizes the energy costs of coding with a set of neurons. On the other hand, the assessment\nof coding ef\ufb01ciency, as used by Treves and Rolls [13], is based upon the assumption that different\nstimuli activate small, distinct subsets of cells. These requirements of ef\ufb01cient coding are based upon\nthe instantaneous population activity to stimuli and need to take into consideration the population\nsparseness of neural response. Average lifetime and population sparseness are identical if the units\nare statistically independent, in which case the distribution is called ergodic [10, 14]. In practice,\nneural dependencies (Fig. 3C) and residual dependencies in models [15] cause the two measures to\nbe different.\nHere we will use three measures of sparseness, two quantifying population sparseness, and one\nlifetime sparseness. To make a comparison with previous studies easier, we computed population\nand lifetime sparseness using a common measure introduced by Treves and Rolls [13] and perfected\nby Vinje and Gallant [8]:\n\n\uf8ee\uf8ef\uf8f01 \u2212\n\n(cid:16)PN\nPN\ni=1 |ri|/N\ni=1 r2\ni /N\n\n(cid:17)2\n\n\uf8f9\uf8fa\uf8fb,\n\nTR =\n\n(1 \u2212 1/N) ,\n\n(1)\n\nt=1 r2\n\nqPT\n\nwhere ri represents \ufb01ring rates, and i indexes time in the case of lifetime sparseness, and neurons\nfor population sparseness. TR is de\ufb01ned between zero (less sparse) and one (more sparse), and\ndepends on the shape of the distribution. For monotonic, non-negative distributions, such as that of\n\ufb01ring rates, an exponential decay corresponds to TR = 0.5, and values smaller and larger than 0.5\nindicate distributions with lighter and heavier tails, respectively [14]. For population sparseness, we\nrescale the \ufb01ring rate distribution by their standard deviation in time for the modelling results, and by\nt /T for experimental data, as \ufb01ring rate is non-negative. Moreover, in neural recordings\nwe discard bins with no neural activity, as population TR is unde\ufb01ned in this case. TR does not\ndepend on multiplicative changes \ufb01ring rate, since it is invariant to rescaling the rates by a constant\nfactor. However, it is not invariant to additive \ufb01ring rate changes. This seems to be adequate for\nour purposes, as the arguments for sparseness involve metabolic costs and coding arguments like\nredundancy reduction that are sensitive to overall \ufb01ring rates. Previous studies have shown that\nalternative measures of population and lifetime sparseness are highly correlated, therefore our choice\ndoes not affect the \ufb01nal results [15, 10].\nWe also report a second measure of population sparseness known as activity sparseness (AS), which\nis a direct translation of the de\ufb01nition of sparse codes as having a small number of neurons active at\nany time [15]:\n\nAS = 1 \u2212 nt/N ,\n\n2\n\n(2)\n\n\fFigure 1: Generative weights of the sparse coding model at the beginning (A) and end (B) of learning.\n\nwhere nt is de\ufb01ned as the number of neurons with activity larger than a given threshold at time t,\nand N is the number of units. AS = 1 means that no neuron was active above the threshold, while\nAS = 0 means that all of all neurons were active. The threshold is set to be one standard deviation\nfor the modeling results, or equivalently the upper 68th percentile of the distribution for neural\n\ufb01ring rates. AS gives a very intuitive account of population sparseness, and is invariant to both\nmultiplicative and additive changes in \ufb01ring rate. However, since it discards most of the information\nabout the shape of the distribution, it is a less sensitive measure than TR.\n\n3 Sparse coding model\n\nThe sparseness assumption that natural scenes can be described by a small number of elements\nis generally translated in a model with sparsely distributed hidden units xk, representing visual\nelements, that combine linearly to form an image y [3]:\n\np(xk) = psparse(xk) \u221d exp(f(xk)) ,\np(y|x) = Normal(y; Gx, \u03c32\n\ny) ,\n\nk = 1, . . . , K\n\n(3)\n(4)\n\nwhere K is the number of hidden units, G is the mixing matrix (also called the generative weights)\ny is the variance of the input noise. Here we set the sparse prior distribution to a Student-t\nand \u03c32\ndistribution with \u03b1 degrees of freedom,\n\n(cid:18)\n\np(xk) =\n\n1\nZ\n\n1 +\n\n1\n\u03b1\n\n(cid:17)2(cid:19)\u2212 \u03b1+1\n\n2\n\n(cid:16) xk\n\n\u03bb\n\n,\n\n(5)\n\nwith \u03bb chosen such that the distribution has unit variance. This is a common prior for sparse cod-\ning models [3], and its analytical form allows the development of ef\ufb01cient inference and learning\nalgorithms [16, 17].\nThe goal of learning is to adapt the model\u2019s parameters in order to best explain the observed data,\ni.e., to maximize the marginal likelihood\n\nX\n\nlog p(yt|G) =X\n\nZ\n\nlog p(yt|x, G)p(x)dx\n\n(6)\n\nt\n\nt\n\nwith respect to G. We learn the weights using a Variational Expectation Maximization (VEM)\nalgorithm, as described by Berkes et al. [17], with the difference that the generative weights are\nnot treated as random variables, but as parameters with norm \ufb01xed to 1, in order to avoid potential\nconfounds in successive analysis.\nThe model was applied to 9 \u00d7 9 pixel natural image patches, randomly chosen from 36 natural\nimages from the van Hateren database, preprocessed as described in [5]. The dimensionality of the\npatches was reduced to 36 and the variances normalized by Principal Component Analysis. The\nmodel parameters were chosen to be K = 48 and \u03b1 = 2.5, a very sparse, slightly overcomplete\nrepresentation. These parameters are very close to the ones that were found to be optimal for natural\ny = 0.08. The generative weights were initialized at\nimages [17]. The input noise was \ufb01xed to \u03c32\nrandom, with norm 1. We performed 1500 iterations of the VEM algorithm, using a new batch of\n3600 patches at each iteration. Fig. 1 shows the generative weights at the start and at the end of\nlearning. As expected from previous studies [3, 5], after learning the basis vectors are shaped like\nGabor wavelets and resemble simple cell RFs.\n\n3\n\n\fFigure 2: Neural implementation of Gibbs sampling in a sparse coding model. A) Neural network architecture.\nB) Mode of the activation probability of a neuron as a function of the total (feed-forward and recurrent) input,\nfor a Student-t prior with \u03b1 = 2.05 and unit variance.\n\n4 Sampling, sparse coding neural network\n\nIn order to gain some intuition about the neural operations that may underlie inference in this model,\nwe derive an equivalent neural network architecture. It has been suggested that neural activity is\nbest interpreted as samples from the posterior probability of an internal, probabilistic model of the\nsensory input. This assumption is consistent with many experimental observations, including high\ntrial-by-trial variability and spontaneous activity in awake animals [18, 19, 20]. Moreover, sampling\ncan be performed in parallel and asynchronously, making it suitable for a neural architecture. As-\nsuming that neural activity corresponds to Gibbs sampling from the posterior probability over visual\nelements in the sparse coding model, we obtain the following expression for the distribution of the\n\ufb01ring rate of a neuron, given a visual stimulus and the current state of the other neurons representing\nthe image [18]:\n\n(cid:19)\n\n(7)\n\n(8)\n\n,\n\n\uf8f6\uf8f8 .\n\nGikyi)xk +\n\n1\n\u03c32\ny\n\nRjkxj)xk \u2212 1\n2\u03c32\ny\n\nk + f(xk)\nx2\n\n(9)\n\n(X\n\nj6=k\n\n(yT y \u2212 2yT Gx \u2212 xT Rx) + f(xk)\n\nwhere R = \u2212GT G. Expanding the exponent, eliminating the terms that do not depend on xk, and\nnoting that Rkk = \u22121, since the generative weights have unit norm, we get\n\np(xk|xi6=k, y) \u221d p(y|x)p(xk)\n\u2212 1\n2\u03c32\ny\n\n\u221d exp\n\n(cid:18)\n\np(xk|xi6=k, y) \u221d exp\n\n\uf8eb\uf8ed 1\n\n\u03c32\ny\n\n(X\n\ni\n\ny, and information from other neurons via recurrent connections Rjk/\u03c32\n\nSampling in a sparse coding model can thus be achieved by a simple neural network, where the k-th\nneuron integrates visual information through feed\u2013forward connections from input yi with weights\ny (Fig. 2A). Neural\nGik/\u03c32\nactivity is then generated stochastically according to Eq. 9: The exponential activation function gives\nhigher probability to higher rates with increasing input to the neuron, while the terms depending on\nk and f(xk) penalize large \ufb01ring rates. Fig. 2B shows the mode of the activation probability (Eq. 9)\nx2\nas a function of the total input to a neuron.\n\n5 Active sparsi\ufb01cation over learning\n\nA simple, intuitive prediction for a system that optimizes for sparseness is that the sparseness of its\nrepresentation should increase over learning. Since a sparse coding system, including our model,\nmight not directly maximize our measures of sparseness, TR and AS, we verify this intuition by\nanalyzing the model\u2019s representation of images at various stages of learning. We selected at random\na new set of 1800 patches to be used as test stimuli. For every patch, we collected 50 Monte Carlo\nsamples, using Gibbs sampling (Eq. 9) combined with an annealing scheme that starts by drawing\nsamples from the model\u2019s prior distribution and continues to sample as the prior is deformed into\nthe posterior [21]. This procedure ensures that the \ufb01nal samples come from the whole posterior dis-\ntribution, which is highly multimodal in overcomplete models, and therefore that our analysis is not\n\n4\n\n\fFigure 3: Development of sparseness, (A) over learning for the sparse coding model of natural images and (B)\nover age for neural responses in ferrets. (A) The lines indicate the average sparseness over units and samples.\nError bars are one standard deviation over samples. Since the three measures have very different values, we\nreport the change in sparseness in percent of the \ufb01rst iteration. Colored text: absolute values of sparseness at\nthe end of learning. (B) The lines indicate the average sparseness for different animals. Error bars represent\nstandard error of the mean (SEM). (C) KL divergence between the distribution of neural responses and the\nfactorized distribution of neural responses. Error bars are SEM.\n\nbiased by the posterior distribution becoming more (or less) complex over learning. Fig. 3A shows\nthe evolution of sparseness with learning. As anticipated, both population and lifetime sparseness\nincrease monotonically.\nHaving con\ufb01rmed our intuition with the sparse coding model, we turn to data from electrophysio-\nlogical recordings. We analyzed multi-unit recordings from arrays of 16 electrodes implanted in the\nprimary visual cortex of 15 ferrets at various stages of development, from eye opening at postnatal\nday 29 or 30 (P29-30) to adulthood at P151 (see Suppl Mat for experimental details). Over this\nmaturation period, the visual system of ferrets adapts to the statistics of the environment [22, 23].\nFor each animal, neural activity was recorded and collected in 10 ms bins for 15 sessions of 100\nseconds each (for a total of 25 minutes), during which the animal was shown scenes from a movie.\nWe \ufb01nd that all three measures of sparseness decrease signi\ufb01cantly with age1. Thus, during a period\nwhen the cortex actively adapts to the visual environment, the representation in primary visual cor-\ntex becomes less sparse, suggesting that the optimization of sparseness is not a primary objective for\nlearning in the visual system. The decrease in population sparseness seems to be due to an increase\nin the dependencies between neurons: Fig. 3C shows the Kullback-Leibler divergence between the\njoint distribution P of neural activity in 2 ms bins and the same distribution, factorized to eliminate\ni=1 P (ri). The KL divergence increases with age\n\nneural dependencies, i.e., \u02dcP (r1, . . . rN ) := QN\n\n(Spearman\u2019s \u03c1 = 0.73, P < 0.01), indicating an increase in neural dependencies.\n\n6 Active sparsi\ufb01cation and anesthesia\n\nThe sparse coding neural network architecture of Fig. 2 makes explicit that an optimal sparse coding\nrepresentation requires a process of active sparsi\ufb01cation: In general, because of input noise and the\novercompleteness of the representation, there are multiple possible combinations of visual elements\nthat could account for a given image. To select among these combinations the most sparse solution,\na competition between possible alternative interpretations must occur.\nConsider for example a simple system with one input variable and two hidden units, such that\ny = x1 + 1.3 \u00b7 x2 + \u0001, with Gaussian noise \u0001. Given an observed value, y, there are in\ufb01nitely many\nsolutions to this equality, as shown by the dotted line in Fig. 4B for y = 2. These stimulus\u2013induced\ncorrelations in the posterior are known as explaining away. Among all the solutions, the ones com-\npatible with the sparse prior over x1 and x2 are given higher probability, giving raise to a bimodal\n1Lifetime sparseness, TR: effect of age is signi\ufb01cant, Spearman\u2019s \u03c1 = \u22120.65, P < 0.01; differences in\nmean between the four age groups in Fig. 3 are signi\ufb01cant, ANOVA, P = 0.02, multiple comparison tests with\nTukey-Kramer correction shows the mean of group P29-30 is different from that of groups P83-92 and P129-\n151 with P < 0.05; Population sparseness, TR: Spearman\u2019s \u03c1 = \u22120.75, P < 0.01; ANOVA P < 0.01,\nmultiple comparison shows the mean of group P29-30 is different from that of group P129-151 with P < 0.05;\nActivity sparseness, AS: Spearman\u2019s \u03c1 = \u22120.79, P < 0.01; ANOVA P < 0.01, multiple comparison shows\nthe mean of group P29-30 is different from that of groups P83-92 and P129-151 with P < 0.05.\n\n5\n\n\fFigure 4: Active sparsi\ufb01cation. Contour lines correspond to the 5, 25, 50, 75, 90, and 95 percentile of the\ndistributions. A) Prior probability. B) Posterior probability given observed value y = 2. The dotted line\nindicates all solutions to 2 = x1 +1.3\u00b7x2. C) Posterior probability with weakened recurrent weights (\u03b1 = 0.5).\n\nFigure 5: Active sparsi\ufb01cation and anesthesia. A) Percent change in sparseness as the recurrent connections are\nweakened for various values of \u03b1. Error bars are one standard deviation over samples. Colored text: absolute\nvalues of sparseness at the end of learning. B) Average sparseness measures for V1 responses at various levels\nof anesthesia. Error bars are SEM.\n\n\uf8eb\uf8ed 1\n\n\u03c32\ny\n\n(X\n\ni\n\n\u03b1(X\n\nj6=k\n\n\uf8f6\uf8f8 ,\n\nk + f(xk)\nx2\n\n(10)\n\ndistribution centered around the two sparse solutions x1 = 0, x2 = 1.54, and x1 = 2, x2 = 0.\nFrom Eq. 9, it is clear that the recurrent connections are necessary in order to keep the activity of the\nneurons on the solution line, while the stochastic activation function makes sparse neural responses\nmore likely. This active sparsi\ufb01cation process is stronger for overcomplete representations, for when\nthe generative weights are non-orthogonal (in which cases |rij| (cid:29) 0), and for when the input noise\nis large, which makes the contribution from the prior more important.\nIn a system that optimizes sparseness, disrupting the active sparsi\ufb01cation process will lead to lower\nlifetime and population sparseness. For example, if we reduce the strength of the recurrent connec-\ntions in the neural network architecture (Eq. 9) by a factor \u03b1,\n\np(xk|xi6=k, y) \u221d exp\n\nGikyi)xk +\n\n1\n\u03c32\ny\n\nRjkxj)xk \u2212 1\n2\u03c32\ny\n\nthe neurons become more decoupled, and try to separately account for the input, as illustrated in\nFig. 4C. The decoupling will result in a reduction of population sparseness, as multiple neurons\nbecome active to explain the same input. Also, lifetime sparseness will decrease, as the lack of\ncompetition between units means that individual units will be active more often.\nFig. 5 shows the effect of reducing the strength of recurrent connections in the model of natural im-\nages. We analyzed the parameters of the sparse coding model at the end of learning, and substituted\nthe Gibbs sampling posterior distribution of Eq. 9 with the one in Eq. 10 for various values of \u03b1. As\npredicted, decreasing \u03b1 leads to a decrease in all sparseness measures.\nWe argue that a similar disruption of the active sparsi\ufb01cation process can be obtained in electrophys-\niological experiments by comparing neural responses at different levels of iso\ufb02urane anesthesia. In\ngeneral, the evoked, feed-forward responses of V1 neurons under anesthesia are thought to remain\n\n6\n\n\fFigure 6: Neuronal response to a 3.75 Hz full-\ufb01eld stimulation under different levels of anesthesia. Error bars\nare SEM. A) Signal and noise amplitudes. B) Signal-to-noise ratio.\n\nlargely intact: Despite a decrease in average \ufb01ring rate, the selectivity of neurons to orientation,\nfrequency, and direction of motion has been shown to be very similar in awake and anesthetized\nanimals [24, 25, 26]. On the other hand, anesthesia disrupts contextual effects like \ufb01gure-ground\nmodulation [26] and pattern motion [27], which are known to be mediated by top-down and recur-\nrent connections. Other studies have shown that, at low concentrations, iso\ufb02urane anesthesia leaves\nthe visual input to the cortex mostly intact, while the intracortical recurrent and top-down signals\nare disrupted [28, 29]. Thus, if the representation in the visual cortex is optimally sparse, disrupting\nthe active sparsi\ufb01cation by anesthesia should decrease sparseness.\nWe analyzed multi-unit neural activity from bundles of 16 electrodes implanted in primary visual\ncortex of 3 adult Long-Evans rats (5-11 units per recording session, for a total of 39 units). Record-\nings were made in the awake state and under four levels on anesthesia, from very light to deep (cor-\nresponding to concentrations of iso\ufb02urane between 0.6 and 2.0%) (see Suppl Mat for experimental\ndetails). In order to con\ufb01rm that the effect of the anesthetic does not prevent visual information to\nreach the cortex, we presented the animals with a full-\ufb01eld periodic stimulus (\ufb02ashing) at 3.75 Hz\nfor 2 min in the awake state, and 3 min under anesthesia. The Fourier spectrum of the spikes trains\non individual channels shows sharp peaks at the stimulation frequency in all states. We measured the\nresponse to the signal by the average amplitude of the Fourier spectrum between 3.7 and 3.8 Hz, and\nde\ufb01ned the amplitude of the noise, due to spontaneous activity and neural variability, as the average\namplitude between 1 and 3.65 Hz (the amplitudes in this band are found to be noisy but uniform).\nThe amplitude of the evoked signal decreases with increasing iso\ufb02urane concentration, due to a de-\ncrease in overall \ufb01ring rate; however, the background noise is also suppressed with anesthesia, so\nthat overall the signal-to-noise ratio does not decrease signi\ufb01cantly with anesthesia (Fig. 6, ANOVA,\nP=0.46).\nWe recorded neural responses while the rats were shown a two minute movie recorded from a cam-\nera mounted on the head of a person walking in the woods. Neural activity was collected in 25 ms\nbins. All three sparseness measures increase signi\ufb01cantly with increasing concentration of iso\ufb02u-\nrane2 (Fig. 5B). Contrary to what is expected in a sparse-coding system, the data suggests that the\ncontribution of lateral and top-down connections in the awake state leads to a less sparse code.\n\n7 Discussion\n\nWe examined multi-electrode recordings from primary visual cortex of ferrets over development,\nand of rats at different levels of anesthesia. We found that, contrary to predictions based on the-\noretical considerations regarding optimal sparse coding systems, sparseness decreases with visual\nexperience, and increases with increasing concentration of anesthetic. These data suggest that the\n\n2Lifetime sparseness, TR: ANOVA with different anesthesia groups, P < 0.01; multiple comparison tests\nwith Tukey-Kramer correction shows the mean of awake group is different from the mean of all other groups\nwith P < 0.05; Population sparseness, TR: ANOVA, P < 0.01; multiple comparison shows the mean of\nthe awake group is different from that of the light, medium, and deep anesthesia groups, P < 0.05; Activity\nsparseness, AS: ANOVA P < 0.01, multiple comparison shows the mean of the awake group is different from\nthat of the light, medium, and deep anesthesia groups, P < 0.05.\n\n7\n\n\fhigh sparseness levels that have been reported in previous accounts of sparseness in the visual cortex\n[7, 8, 9, 10, 11, 12], and which are otherwise consistent with our measurements (Fig. 3B, 5), are\nmost likely a side effect of the high selectivity of neurons, or an overestimation due to the effect of\nanesthesia (Fig. 5; with the exception of [8], where sparseness was measured on awake animals),\nbut do not indicate an active optimization of sparse responses (cf. [10]).\nOur measurements of sparseness from neural data are based on multi-unit recording. By collecting\nspikes from multiple cells, we are in fact reporting a lower bound of the true sparseness values.\nWhile a precise measurement of the absolute value of these quantities would require single-unit\nmeasurement, our conclusions are based on relative comparisons of sparseness under different con-\nditions, and are thus not affected.\nOur theoretical predictions were veri\ufb01ed with a common sparse coding model [3]. The model as-\nsumes linear summation in the generative process, and a particular sparse prior over the hidden unit.\nDespite these speci\ufb01c choices, we expect the model results to be general to the entire class of sparse\ncoding models. In particular, the choice of comparing neural responses with Monte Carlo samples\nfrom the model\u2019s posterior distribution was taken in agreement with experimental results that report\nhigh neural variability. Alternatively, one could assume a deterministic neural architecture, with\na network dynamic that would drive the activity of the units to values that maximize the image\nprobability [3, 30, 31]. In this scenario, neural activity would converge to one of the modes of the\ndistributions in Fig. 4, leading us to the same conclusions regarding the evolution of sparseness.\nAlthough our analysis found no evidence for active sparsi\ufb01cation in the primary visual cortex, ideas\nderived from and closely related to the sparse coding principle are likely to remain important for our\nunderstanding of visual processing. Ef\ufb01cient coding remains a most plausible functional account of\ncoding in more peripheral parts of the sensory pathway, and particularly in the retina, from where\nraw visual input has to be sent through the bottleneck formed by the optic nerve without signi\ufb01cant\nloss of information [32, 33]. Moreover, computational models of natural images are being extended\nfrom being strictly related to energy constraints and information transmission, to the more general\nview of density estimation in probabilistic, generative models [34, 35]. This view is compatible with\nour \ufb01nding that the representation in the visual cortex becomes more dependent with age, and is less\nsparse in the awake condition than under anesthesia: We speculate that such dependencies re\ufb02ect\ninference in a hierarchical generative model, where signals from lateral, recurrent connections in V1\nand from feedback projections from higher areas are integrated with incoming evidence, in order to\nsolve ambiguities at the level of basic image features using information from a global interpretation\nof the image [26, 19, 27, 20].\n\nReferences\n[1] D.J. Field. What is the goal of sensory coding? Neural Computation, 6(4):559\u2013601, 1994.\n[2] B.A. Olshausen and D.J. Field. Sparse coding of sensory inputs. Current Opinion in Neurobiology,\n\n14(4):481\u2013487, 2004.\n\n[3] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a sparse\n\ncode for natural images. Nature, 381(6583):607\u2013609, 1996.\n\n[4] A.J. Bell and T.J. Sejnowski. The \u2018independent components\u2019 of natural scenes are edge \ufb01lters. Vision\n\nResearch, 37(23):3327\u20133338, 1997.\n\n[5] J.H. van Hateren and A. van der Schaaf. Independent component \ufb01lters of natural images compared with\n\nsimple cells in primary visual cortex. Proc. R. Soc. Lond. B, 265:359\u2013366, 1998.\n\n[6] A.S. Hsu and P. Dayan. An unsupervised learning model of neural plasticity: Orientation selectivity in\n\ngoggle-reared kittens. Vision Research, 47(22):2868\u20132877, 2007.\n\n[7] R. Baddeley, L.F. Abbott, M.C.A. Booth, F. Sengpiel, T. Freeman, E. Wakeman, and E.T. Rolls. Responses\nof neurons in primary and inferior temporal visual cortices to natural scenes. Proceedings of the Royal\nSociety B: Biological Sciences, 264(1389):1775\u20131783, 1997.\n\n[8] W.E. Vinje and J.L. Gallant. Sparse coding and decorrelation in primary visual cortex during natural\n\nvision. Science, 297(5456):1273\u20131276, 2000.\n\n[9] M. Weliky, J. Fiser, R.H. Hunt, and D.N. Wagner. Coding of natural scenes in primary visual cortex.\n\nNeuron, 37(4):703\u2013718, 2003.\n\n[10] S.R. Lehky, T.J. Sejnowski, and R. Desimone. Selectivity and sparseness in the responses of striate\n\ncomplex cells. Vision Research, 45(1):57\u201373, 2005.\n\n8\n\n\f[11] S.C. Yen, J. Baker, and C.M. Gray. Heterogeneity in the responses of adjacent neurons to natural stimuli\n\nin cat striate cortex. Journal of Neurophysiology, 97(2):1326\u20131341, 2007.\n\n[12] D.J. Tolhurst, D. Smyth, and I.D. Thompson. The sparseness of neuronal responses in ferret primary\n\nvisual cortex. Journal of Neuroscience, 29(9):2355\u20132370, 2009.\n\n[13] A. Treves and E.T. Rolls. What determines the capacity of autoassociative memories in the brain? Net-\n\nwork: Computation in Neural Systems, 2(4):371\u2013397, 1991.\n\n[14] P. Foldiak and D. Endres. Sparse coding. Scholarpedia, 3(1):2984, 2008.\n[15] B. Willmore and D.J. Tolhurst. Characterizing the sparseness of neural codes. Network: Computation in\n\nNeural Systems, 12:255\u2013270, 2001.\n\n[16] S. Osindero, M. Welling, and G.E. Hinton. Topographic product models applied to natural scene statistics.\n\nNeural Computation, 18:381\u2013344, 2006.\n\n[17] P. Berkes, R. Turner, and M. Sahani. On sparsity and overcompleteness in image models. In Advances in\n\nNeural Information Processing Systems, volume 20. MIT Press, 2008. Cambridge, MA.\n\n[18] P.O. Hoyer and A. Hyvarinen. Interpreting neural response variability as monte carlo sampling of the pos-\nterior. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003. Cambridge,\nMA.\n\n[19] T.S. Lee and D. Mumford. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical\n\nSociety of America A, 20(7):1434\u20131448, 2003.\n\n[20] P. Berkes, G. Orban, M. Lengyel, and J. Fiser. Matching spontaneous and evoked activity in V1: a\nhallmark of probabilistic inference. Frontiers in Systems Neuroscience, 2009. Conference Abstract:\nComputational and systems neuroscience.\n\n[21] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220:671\u2013\n\n680, 1983.\n\n[22] B. Chapman and M.P. Stryker. Development of orientation selectivity in ferret visual cortex and effects\n\nof deprivation. Journal of Neuroscience, 13:5251\u20135262, 1993.\n\n[23] L.E. White, D.M. Coppola, and D. Fitzpatrick. The contribution of sensory experience to the maturation\n\nof orientation selectivity in ferret visual cortex. Nature, 411:1049\u20131052, 2001.\n\n[24] P.H. Schiller, B.L. Finlay, and S.F. Volman. Quantitative studies of single-cell properties in monkey striate\ncortex. I. Spatiotemporal organization of receptive \ufb01elds. Journal of Neurophysiology, 39(6):1288\u20131319,\n1976.\n\n[25] D.M. Snodderly and M. Gur. Organization of striate cortex of alert, trained monkeys (Macaca fascicu-\nlaris): ongoing activity, stimulus selectivity, and widths of receptive \ufb01eld activating regions. Journal of\nNeurophysiology, 74(5):2100\u20132125, 1995.\n\n[26] V.A.F. Lamme, K. Zipser, and H. Spekreijse. Figure-ground activity in primary visual cortex is suppressed\n\nby anesthesia. PNAS, 95:3263\u20133268, 1998.\n\n[27] K. Guo, P.J. Benson, and C. Blakemore. Pattern motion is present in V1 of awake but not anaesthetized\n\nmonkeys. European Journal of Neuroscience, 19:1055\u20131066, 2004.\n[28] O. Detsch, C. Vahle-Hinz, E. Kochs, M. Siemers, and B. Bromm.\n\nIso\ufb02urane induces dose-dependent\n\nchanges of thalamic somatosensory information transfer. Brain Research, 829:77\u201389, 1999.\n\n[29] H. Hentschke, C. Schwarz, and A. Bernd. Neocortex is the major target of sedative concentrations of\nvolatile anaesthetics: strong depression of \ufb01ring rates and increase of GABA-A receptor-mediated inhibi-\ntion. European Jounal of Neuroscience, 21(1):93\u2013102, 2005.\n\n[30] P. Dayan and L.F. Abbott. Theoretical Neuroscience: Computational and Mathematical Modeling of\n\nNeural Systems. MIT Press, 2001.\n\n[31] C.J. Rozell, D.H. Johnson, R.G. Baraniuk, and B.A. Olshausen. Sparse coding via thresholding and local\n\ncompetition in neural circuits. Neural Computation, 20:2526\u20132563, 2008.\n\n[32] J.J. Atick. Could information theory provide an ecological theory of sensory processing? Network:\n\nComputation in Neural Systems, 3(2):213\u2013251, 1992.\n\n[33] V. Balasubramanian and M.J. Berry. Evidence for metabolically ef\ufb01cient codes in the retina. Network:\n\nComputation in Neural Systems, 13(4):531\u2013553, 2002.\n\n[34] Y. Karklin and M.S. Lewicki. A hierarchical bayesian model for learning non-linear statistical regularities\n\nin non-stationary natural signals. Neural Computation, 17(2):397\u2013423, 2005.\n\n[35] M.J. Wainwright and E.P. Simoncelli. Scale mixtures of gaussians and the statistics of natural images. In\n\nAdvances in Neural Information Processing Systems. MIT Press, 2000. Cambridge, MA.\n\n9\n\n\f", "award": [], "sourceid": 145, "authors": [{"given_name": "Pietro", "family_name": "Berkes", "institution": null}, {"given_name": "Ben", "family_name": "White", "institution": null}, {"given_name": "Jozsef", "family_name": "Fiser", "institution": null}]}