{"title": "A mechanistic model of early sensory processing based on subtracting sparse representations", "book": "Advances in Neural Information Processing Systems", "page_first": 1979, "page_last": 1987, "abstract": "Early stages of sensory systems face the challenge of compressing information from numerous receptors onto a much smaller number of projection neurons, a so called communication bottleneck. To make more efficient use of limited bandwidth, compression may be achieved using predictive coding, whereby predictable, or redundant, components of the stimulus are removed. In the case of the retina, Srinivasan et al. (1982) suggested that feedforward inhibitory connections subtracting a linear prediction generated from nearby receptors implement such compression, resulting in biphasic center-surround receptive fields. However, feedback inhibitory circuits are common in early sensory circuits and furthermore their dynamics may be nonlinear. Can such circuits implement predictive coding as well? Here, solving the transient dynamics of nonlinear reciprocal feedback circuits through analogy to a signal-processing algorithm called linearized Bregman iteration we show that nonlinear predictive coding can be implemented in an inhibitory feedback circuit. In response to a step stimulus, interneuron activity in time constructs progressively less sparse but more accurate representations of the stimulus, a temporally evolving prediction. This analysis provides a powerful theoretical framework to interpret and understand the dynamics of early sensory processing in a variety of physiological experiments and yields novel predictions regarding the relation between activity and stimulus statistics.", "full_text": "A mechanistic model of early sensory \nprocessing based on subtracting sparse \n\nrepresentations \n\n * - Equal contribution \n\n Shaul Druckmann* Tao Hu* Dmitri B. Chklovskii \n \n Janelia Farm Research Campus \n {druckmanns, hut, mitya}@janelia.hhmi.org \n\nAbstract \n\nEarly stages of sensory systems face the challenge of compressing \ninformation from numerous receptors onto a much smaller number of \nprojection neurons, a so called communication bottleneck. To make more \nefficient use of limited bandwidth, compression may be achieved using \npredictive coding, whereby predictable, or redundant, components of the \nstimulus are removed. In the case of the retina, Srinivasan et al. (1982) \nsuggested that feedforward inhibitory connections subtracting a linear \nprediction generated from nearby receptors implement such compression, \nresulting in biphasic center-surround receptive fields. However, feedback \ninhibitory circuits are common in early sensory circuits and furthermore \ntheir dynamics may be nonlinear. Can such circuits implement predictive \ncoding as well? Here, solving the transient dynamics of nonlinear reciprocal \nfeedback circuits through analogy to a signal-processing algorithm called \nlinearized Bregman iteration we show that nonlinear predictive coding can \nbe implemented in an inhibitory feedback circuit. In response to a step \nstimulus, interneuron activity in time constructs progressively less sparse \nbut more accurate representations of the stimulus, a temporally evolving \nprediction. This analysis provides a powerful theoretical framework to \ninterpret and understand the dynamics of early sensory processing in a \nvariety of physiological experiments and yields novel predictions regarding \nthe relation between activity and stimulus statistics. \n \n\n1 Introduction \n\nReceptor neurons in early sensory systems are more numerous than the projection \nneurons that transmit sensory information to higher brain areas, implying that sensory \nsignals must be compressed to pass through a limited bandwidth channel known as \n\u201cBarlow\u2019s bottleneck\u201d [1]. Since natural signals arise from physical objects, which are \ncontiguous in space and time, they are highly spatially and temporally correlated [2-4]. Such \nsignals are ideally suited for predictive coding, a compression strategy borrowed from \nengineering whereby redundant, or predictable components of the signal are subtracted and \nonly the residual is transmitted [5]. \n\nConsider, for example, the processing of natural images in the retina. Instead of \ntransmitting photoreceptor signals, which are highly correlated in space and time, ganglion \ncells can transmit differences in signal between nearby pixels or consecutive time points. \nThe seminal work of Srinivasan et al. introduced predictive coding to neuroscience, \nproposing that feedforward inhibition could implement predictive coding by subtracting a \nprediction for the activity of a given photoreceptor generated from the activity of nearby \nreceptors [6]. Indeed, the well known center surround spatial receptive fields or biphasic \ntemporal receptive fields of ganglion cells [7] may be viewed as evidence of predictive \ncoding because they effectively code such differences [6, 8-10]. Although the Srinivasan et \n\n\fal. model captured the essence of predictive coding it does not reflect two important \nbiological facts. First, in the retina, and other early sensory systems, inhibition has a \nsignificant feedback component [11-13]. Second, interneuron transfer functions are often \nnon-linear [14-16]. \n\nHere, we demonstrate that feedback circuits can be viewed as implementing predictive \ncoding. Surprisingly, by taking advantage of recent developments in applied mathematics and \nsignal processing we are able to solve the non-linear recurrent dynamics of such a circuit, for an \narbitrary number of sensory channels and interneurons, allowing us to address in detail the circuit \ndynamics and consequently the temporal and stimulus dependencies. Moreover, introducing non-\nlinear feedback dramatically changes the nature of predictions. Instead of a static relation \nbetween stimulus and prediction, we find that the prediction becomes both stimulus and time \ndependent. \n2 Model \n2.1 Dynamics of the linear single-channel feedback circuit \n\nWe start by considering predictive coding in feedback circuits, where principal neurons are \nreciprocally connected with inhibitory interneuron forming a negative feedback loop. Much of the \nintuition can be developed from linear circuits and we start from this point. Consider a negative \nfeedback circuit composed of a single principal neuron, p, and a single interneuron, n (Fig. 1a). \nAssuming that both types of neurons are linear first-order elements, their dynamics are given by: \n\n!!!!\"!\"=\u2212!!!!(!)+!!!!(!)\u2212!\"(!),\n!!!!\"!\"=\u2212!!!!! +!!!!\"! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \n\n \n\n \n\n \n\n \n\n \n\n (1) \n\n \n \n\n \n \n\nwhere gm is the membrane conductance (inverse of membrane resistance), Cm the membrane \ncapacitance, gs synaptic conductance and the subscript designates the neuron class (principal and \ninterneuron) and w in the second equation is the weight of the synapse from the principal neuron \nto the interneuron. For simplicity, we assumed that the weight of the synapse from the interneuron \nto the principal neuron is the same in magnitude but with negative sign, -w. Although we do not \nnecessarily expect the brain to fully reconstruct the stimulus on the receiving side, we must still \nensure that the transmitted signal is decodable. To guarantee that this is the case, the prediction \nmade by the interneuron must be strictly causal. In other words, there must be a delay between the \nrequires signals passing through a synapse, such delay is biologically plausible. When discussing \ntime dependence of the vectors p, s, and n. By rearranging the terms in Eq. 1 we obtain: \n\ninput to the interneurons, !\"(!), and the output of the interneurons, !(!+!). Given that feedback \nanalytical solutions below, we assume that !\u21920 to avoid clutter and do not explicitly indicate the \n\n \n\n !!!\"!\"=\u2212!+!!! !!! !\u2212!\" , \n\n !!!\"!\"=\u2212!+!!! !!!!\" \n !=!!\u2212!\" \n !!!\"!\"=\u2212!+!!! !!!!\" , \n\n \n \n\n \n\n \n\n \n\n \n \n\n (2) \nwhere \u03c4=RC is the membrane time constant. Since principal neurons should be able to transmit \nfast changes in the stimuli, we assume that the time constant of the principal cells is small \ncompared to that of the interneurons. Therefore, we can assume that the first equation reaches \nequilibrium instantaneously: \n\nwhere we defined !=!!! !!!. As the purpose of interneuron integration will be to construct \n\nstimulus representation, the integration time should be on the order of the auto-correlation time in \nthe stimulus. Since here we study the simplified case of the semi-infinite step-stimulus, the time \nconstant of the neuron should approach infinity. We assume this occurs by the interneurons having \na very large membrane resistance (or correspondingly a very small conductance) and moderate \ncapacitance. Therefore, the leakage term, -n, which is the only term in the second line of Eq. 3 that \n\n (3) \n\n \n \n\n\f \n \n\ndoesn\u2019t grow with the membrane resistance, can be neglected in the dynamics of interneurons. By \nthis assumption and substituting the first equation into the second, we find: \n\n!=!!\u2212!\" \n!!!!!\"=!!! !!!!\"!\u2212!\" . \nDefining the effective time constant !=!!!!!! ! we have: \n!=!!\u2212!\"\n!!!!\"=!\" \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \nIn response to a step stimulus: !(!) \u00a0=!!!, where !(!) is the Heavyside function, the dynamics \n!! = \u00a0!! \u00a0\u03b8! \u00a0 1 \u00a0\u2013 \u00a0exp \u2212!!!!!\n \u00a0 \u00a0 \u00a0\n!! =!\" \u00a0\u03b8!exp \u2212!!!!! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \n\nof equation 5 are straightforward to solve, yielding: \n\n (5) \n\n (6) \n\n \n\n (4) \n\n \n\n \n\n \n\n \n\n \n \n\n \n\n \n\n \n \n\n \n\n \n\n \n \n\n \n\n \n\n \n\n \n\n \n\n \n\nThe interneuron\u2019s activity, n(t), grows with time as it integrates the output of the principal \nneuron, p(t), Fig. 1a. In turn, the principal neuron\u2019s output, p(t), is the difference between the \nincoming stimulus and the interneuron\u2019s activity, n(t), i.e. a residual, which decays with time from \nthe onset of the stimulus. In the limit considered here (infinite interneuron time constant), the \ninterneuron\u2019s feedback will approach the incoming stimulus and the residual will decay to zero. \nTo summarize, one can view the interneuron\u2019s activity as a series of progressively more accurate \npredictions of the stimulus. The principal neuron subtracts these predictions and sends the series of \nresiduals to higher brain areas, a more efficient approach than direct transmission (Fig. 1a). \n\na\n(cid:54)(cid:72)(cid:81)(cid:86)(cid:82)(cid:85)(cid:92)(cid:3)(cid:76)(cid:81)(cid:83)(cid:88)(cid:87)(cid:3)(cid:239)(cid:3)(cid:86)(cid:3)\n\nNegative feedback circuit\n\nb\n\nDirect transmission\n(cid:54)(cid:72)(cid:81)(cid:86)(cid:82)(cid:85)(cid:92)(cid:3)(cid:76)(cid:81)(cid:83)(cid:88)(cid:87)(cid:3)(cid:239)(cid:3)(cid:86)(cid:3)\n\n \n\nFigure 1 Schematic view of early processing in a single sensory channel in response to a \nstep stimulus. a. A predictive coding model consists of a coding circuit, transmission channel \nand, for theoretical analysis only, a virtual decoding circuit. Coding is performed in a negative \nfeedback circuit containing a principal neuron, p, and an inhibitory interneuron, n. In response \nto a step-stimulus (top left) the interneuron charges up with time (top right) till it reaches the \nvalue of the stimulus. Principal neuron (middle left) transmits the difference between the \ninterneuron activity and the stimulus, resulting in a transient signal. b. Direct transmission. \n\nThe transient response to a step stimulus (Fig. 1a left) is consistent with electrophysiological \nmeasurements from principal neurons in invertebrate and vertebrate retina [10, 17]. For example, \nin flies, cells post-synaptic to photoreceptors (the LMCs) have graded potential response \nconsistent with Equation 5. In the vertebrate retina, most recordings are performed on ganglion \ncells, which read out signals from bipolar cells. In response to a step-stimulus the firing rate of \nganglion cells is consistent with Equation 6 [17]. \n2.2 Dynamics of the linear multi-channel feedback circuit \n\nIn most sensory systems, stimuli are transmitted along multiple parallel sensory channels, \nsuch as mitral cells in the olfactory bulb, or bipolar cells in the retina. Although a circuit could \nimplement predictive coding by replicating the negative feedback loop in each channel, this \n\nCoding\nw\n\nInterneuron, n \n\n\u03b4\n\u03b1\n\nPrincipal neuron, p \n\np\n\nnnn\n\nw\nTransmission channel\nw\n\np\np\n\nnnnn\nn\n\nw\n\nDecoding\n\nTime\n\nOutput\n\nTime\n\np\n\np\np\n\nPrincipal neuron, p \n\nTransmission channel\n\nOutput\n\nTime\n\n \n\n\fsolution is likely suboptimal due to the contiguous nature of objects in space, which often results \nin stimuli correlated across different channels. Therefore, interneurons that combine inputs across \nchannels may generate an accurate prediction more rapidly. The dynamics of a multichannel linear \nnegative feedback circuit are given by: \n\n!=!\u2212!\"\n!!!!\"=!!!, \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n (7) \n\nbe solved in the standard manner similarly to equation 6: \n\nsynaptic weights from interneurons to principal neurons are, for simplicity, symmetric but with the \n\nprincipal neurons to a given interneuron, thus defining that interneuron\u2019s feature vector (Fig. 2b). \nLinear dynamics of the feedback circuit in response to a multi-dimensional step stimulus can \n\nwhere boldface lowercase letters are column vectors representing stimulus, \u00a0!=(!!,!!,!!,\u2026)!, \u00a0 \nactivity of principal neurons, !, and interneurons, \u00a0!, Fig. 2a. Boldface uppercase letters designate \nsynaptic weight matrices. Synaptic weights from principal neurons to interneurons are !!, and \nnegative sign, \u2212!. Such symmetry was suggested for olfactory bulb, considering dendro-\ndendritic synapses [18]. Each column of ! contains the weights of synapses from correlated \n!=(!!!)!! \u00a0 1 \u00a0\u2013 \u00a0exp \u2212!!!!!! !!! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\n!=! \u00a0!\u2212!(!!!)!! \u00a0 1 \u00a0\u2013 \u00a0exp \u2212!!!!!! !! ! (8) \nprovided !!! is invertible. When the matrix !!! is not full rank, for instance if the number of \n!=!!(!!!)!! \u00a0 1 \u00a0\u2013 \u00a0exp \u2212!!!!!!\n!\n!=!exp \u2212!!!!!! ! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \nof progressively more accurate stimulus predictions, !=!\". The principal neuron sends the \nseries of residuals of these predictions, !=!\u2212!, to higher brain areas, and the dynamics result in \n\nRecapitulating the equations in words, as above one can view the interneurons\u2019 activity as a series \n\ninterneurons exceeds the number of sensory channels, the solution of Equation 7 is given by: \n\n \n\n \n\n (9) \n\nthe transmitted residual decreasing in time [19-22] (Fig. 2c,d). \n2.3 Dynamics of the non-linear multi-channel feedback circuit \n\n \n\n \n\n \n\nOur solution of the circuit dynamics in the previous sub-section relied on the assumption that \nneurons act as linear elements, which in view of non-linearities in real neurons, represents a \ndrastic simplification. We now extend this analysis to the non-linear circuit. A typical neural \nresponse non-linearity is the existence of a non-zero input threshold below which neurons do not \nrespond. A pair of such on- and off- neurons is described by a threshold function (Fig. 2e) that has \na \u201cgap\u201d or \u201cdeadzone\u201d around zero activity and is not equivalent to a linear neuron: \n\n \n\n \n\n \n\n \n\n \n\n \n\n (10) \n\n (11) \n\nAccordingly, the dynamics are given by: \n\n \n\n \n\n \n\n \n\n \n\n \n\n Thresh! = !\u2212!,!>!\n0,! \u2264!\n!+!,!<\u2212! \n!=!\u2212!\" \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\n!!!!\"=!!! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\n!=Thresh!(!), \n\n \n\n \n\nThe central contribution of this paper is an analysis of predictive coding in a feedback circuit \nwith threshold-linear interneurons inspired by the equivalence of the network dynamics to a \nsignal-processing algorithm called linearized Bregman iteration [23, 24]. Before showing the \nequivalence, we first describe linearized Bregman iteration. This algorithm constructs a faithful \nrepresentation of an input as a linear sum over dictionary elements while minimizing the L1-L2 \nnorm of the representation [25]. Formally, the problem is defined as follows: \n\n\f!!!!=!!+!!!!\u2212!!!\n!!!!=Thresh!!!!! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0, \n\n for \u00a0!! \u2261!|!|!+ !!!|!|!!,min!!(!) \u00a0!.!.!\"=!. \n\n (12) \nRemarkably, this high-dimensional non-linear optimization problem can be solved by a simple \niterative scheme (see Appendix): \n\n \n\n \n\n \n\n \n\n \n\n (13) \ncombining a linear step, which looks like gradient descent on the representation error, and a \ncomponent-wise threshold-linear step. \n\n \n\n \n\n \n\nEq. 11, the network dynamics, is the continuous version of linearized Bregman iteration, Eq. \n13. Intuitively speaking, the dynamics of the network play the role of the iterations in the \nalgorithm. Having identified this equivalence, we are able to both solve and interpret the transient \nnon-linear dynamics (see supplementary materials for further details). The analytical solution \nallows us a deeper understanding, for instance of the convergence of the algorithm. We note that if \nthe interneuron feature vectors span the stimulus space the steady-state activity will be zero for \nany stimulus and thus non-informative. Therefore, solving the transient dynamics, as opposed to \njust the steady-state activity [18, 19, 21, 26], was particularly crucial in this case. \n\nNext, we describe in words the mathematical expressions for the response of the feedback \ncircuit to a step-stimulus (see Supplement for dynamics equations), Fig. 2f-g. Unlike in the linear \ncircuit, interneurons do not inhibit principal neurons until their internal activity crosses threshold, \nFig. 2f. Therefore, their internal activity initially grows with a rate proportional to the projection \ncontribute to the stimulus representation, thereby constructing a more accurate representation of \nthe stimulus, Fig. 2f,g. The first interneuron to cross threshold is the one for which the projection \n\nof the sensory stimulus on their feature vectors, !!!. With time, interneurons cross threshold and \nof the sensory stimulus on its feature vector, !!! is highest. As its contribution is subtracted from \nthe activity of the principal neurons, the driving force on other interneurons !!(!\u2212!\") \n\nchanges. Therefore, the order by which interneurons cross threshold depends also on the \ncorrelation between the feature vectors, Fig. 2b,f. \n\nLinear negative feedback circ.\n\na\n\nSensory Input\n\np#\n\nT\nW\n\nn#\n\nn\"\n\nW\n\np\"\n\np!\n\nn!\n\nn$\n\nb\nStimulus\n\nOutput to higher brain areas\nFeature vectors\n\nc\n\ny\nt\ni\nv\ni\nt\nc\na\n\n \n\nn\no\nr\nu\ne\nn\nr\ne\n\nt\n\nn\n\nI\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n\u22120.1\n\n0\n\n10\n\n-5\n50\nThresh-linear negative feedback circuit\n\n20\nTime\n\n30\n\n40\n\ne\n\nf\n\nn\n\n \n,\n.\nt\nc\na\n\n \nl\n\na\nn\nr\ne\n\nt\n\nn\n\nI\n\nExternal activity, a\n\n-\u03bb\n\n\u03bb\n\nInternal\nactivity\n\nn\n\nT\nmin(\u03bb /\u03b4 W s)\n\n\u03bb\n\u03bb\n0\n0\n-\u03bb\n\u03bb\n\ng\n\na\n \n,\ny\nt\ni\nv\ni\nt\nc\na\n \nl\na\nn\nr\ne\nt\nx\nE\n\nExpanded view of early time\n\n0.4\n0.3\n0.2\n0.1\n0\n\u22120.1\n\u22120.2\n\u22120.3\n-5\n\nLinear negative feedback circ.\n\n0.12\n0.1\n0.08\n0.06\n0.04\n0.02\n0\n-5\n50\nThresh-linear negative feedback circ.\n\n20\nTime\n\n10\n\n30\n\n40\n\n0\n\nd\n\ny\nt\ni\nv\ni\nt\nc\na\n\n \n\nn\no\nr\nu\ne\nn\n\n \nl\n\ni\n\na\np\nc\nn\ni\nr\n\nP\n\nh\n\ny\nt\ni\nv\ni\nt\nc\na\n \nn\no\nr\nu\ne\nn\n \nl\na\np\nc\nn\ni\nr\n\ni\n\nP\n\n0.14\n0.12\n0.1\n0.08\n0.06\n0.04\n0.02\n0\n-5\n\n0\n\n10\n\n40\n\n50\n\n0\n\n10\n\n30\n\n20\nTime\n\n \nFigure 2. Predictive coding in a feedback circuit in response to a step stimulus at time \nzero. a. Circuit diagram for feedback circuit. b. Stimulus (grayscale in black box left) and a \nsubset of interneuron\u2019s feature vector (grayscale in boxes). c-d. Response of linear feedback \ncircuit to a step stimulus at time zero in interneurons (c) and principal neurons (d). e. Threshold-\nlinear transfer function relating internal, n, and external, a, activity of interneurons. Dashed line \nshows diagonal. Firing rates cannot be negative and therefore the threshold-linear function \n\n30\n\n20\nTime\n\n40\n\n50\n\n\fshould be thought as combining a pair of on and off-cells. f-h. Response of interneurons (f-g) \nand principal neurons to a step stimulus at time zero. f. Expanded view of internal activity of the \ninterneurons (only some are shown, see grayscale in boxes color coded to match b) at early \ntimes. g. External activity of a larger subset of interneurons over a longer time period. Grayscale \nboxes show the stimulus represented by the interneuron layer at various times marked by \narrows. h. Principal neuron activity as a function of time. As interneurons cross threshold they \nmore closely represent the stimulus and cancel out more of the principal cell activity. \nEventually, the interneuron representation (right box in g) is nearly identical to the stimulus and \nthe principal neurons\u2019 activity drops almost to zero. \nCollectively the representation progresses from sparse to dense, but individual interneurons may \nfirst be active then become silent. Eventually interneurons will accurately represent the input with \nfurther excitation to the interneurons, Fig. 2g,h. \n\ntheir activity, !=!\", and will fully subtract it from the principal cells\u2019 activity, resulting in no \n\nHowever, this description leads to an immediate puzzle. Namely, the algorithm builds a \nrepresentation of the stimulus by the activity of interneurons. Yet, interneurons are local circuit \nelements whose activity is not transmitted outside the circuit. Why would a representation be built \nif it is available only locally within the neural circuit? The answer to this conundrum is found by \nconsidering the notion of predictive coding in early sensory circuits presented in the introduction. \nThe interneurons serve as the predictor and the principal neurons transmit a prediction residual. \n\nthe prediction, !=!\", which was constructed in the interneurons from previous incoming \n!=!\u2212!, to higher brain areas. We note that initially the interneurons are silent and the principal \n\nAs expected by the framework of predictive coding, at each point in time, the circuit subtracts \nsensory signals, from the current sensory stimulus and the principal neurons transmit the residual, \nneurons transmit the stimulus directly. If there were no bandwidth limitation, the stimulus could \nbe decoded just from this initial transmission. However, the bandwidth limitation results in coarse, \nor noisy, principal neuron transmission, an issue we will return to later. \n3 Results \n\nIn neuroscience, the predictive coding strategy was originally suggested to allow efficient \ntransmission through a limited bandwidth channel (Srinivasan et al., 1982). Our main result is the \nsolution of the transient dynamics given in the section above. Understanding circuit dynamics in \nthe predictive coding framework allows us to make a prediction regarding the length of transient \nactivity for different types of stimuli. Recall that the time from stimulus onset to cancellation of \nthe stimulus depends on the rate of the interneurons\u2019 activation, which in turn is proportional to \nthe projection of the stimulus on the interneurons\u2019 feature vectors. Presumably, interneuron feature \nvectors are adapted to the most common stimuli, e.g. natural images in the case of the retina, \ntherefore this type of stimulus should be relatively quickly cancelled out. In contrast, non-natural \nstimuli, like white noise patterns, will be less well captured by interneuron receptive fields and \ntheir activation will occur after a longer delay. Accordingly, it will take longer to cancel out non-\nnatural stimuli, leading to longer principal neuron transients. \n\nBelow, we show that the feedback circuit with threshold-linear neurons is indeed more \nefficient than the existing alternatives. We first consider a scenario in which effective bandwidth \nlimitation is imposed through addition of noise. Secondly, we consider a more biologically \nrelevant model, where transmission bandwidth is set by the discreteness of Poisson neural activity. \nWe find that threshold linear interneurons achieve more accurate predictions when faced with \nstimulus corrupted with i.i.d Gaussian noise. The intuition behind this result is that of sparse \ndenoising [23]. Namely, if the signal can be expressed as a sparse sum of strong activation of \ndictionary elements, whereas the noise requires a large number of weakly activated elements, then \nthresholding the elements will suppress the noise more than the signal, yielding denoising. We \nnote that this fact alone does not in itself argue for the biological plausibility of this network, but \nthreshold-linear dynamics are a common approximation in neural networks. \n\n \n\n\fFigure 3. Predictions by negative feedback circuit. Left: Relative prediction error (!\u2212! !/\n! !), where !=!\", as a function of time for a stimulus consisting of an image patch \n\ncorrupted by i.i.d Gaussian noise at every time point. Right: An image is sent through principal \nneurons that transmit Poisson. The reconstruction error as a function of time following the \npresentation of stimulus is shown for the full non-linear negative feedback circuit (black), for a \nlinear negative feedback circuit (red), for a direct transmission circuit (blue), and for a circuit \nwhere the sparse approximation itself is transmitted instead of the residual (green). Time on the \nx-axis is measured in units of the time length in which a single noisy transmission occurs. Inset \nshows log-log plot. \n\n \n\nIn addition to considering transmission of stimuli corrupted by Gaussian noise, we also \nstudied a different model where bandwidth limitation is set by the discreteness of spiking, \nmodeled by a Poisson process. Although the discreteness of transmission can be overcome by \naveraging over time, this comes at the cost of longer perceptual delays, or lower transmission \nrates, as longer integration takes place. Therefore, we characterize transmission efficiency by \nreconstruction error as a function of time, Fig. 3. We find that, for Poisson transmission, predictive \ncoding provides more accurate stimulus reconstruction than direct transmission for all times but \nthe brief interval until the first interneuron has crossed threshold (Fig. 3). \n4 Discussion \n\nBy solving the dynamics of the negative feedback circuit through equivalence to \nlinearized Bregman iteration we have shown that the development of activity in a simplified \nearly sensory circuit can be viewed as implementing an efficient, non-linear, intrinsically \nparallel algorithm for predictive coding. Our study maps the steps of the algorithm onto \nspecific neuronal substrates, providing a solid theoretical framework for understanding \nphysiological experiments on early sensory processing as well as experimentally testing \npredictive coding ideas on a finer, more quantitative level. \n\nRecently, sparse representations were studied in a single-layer circuit with lateral \ninhibitory connections proposed as a model of a different brain area, namely primary cortical \nareas. The circuit constructs the stimulus representation in the projection neurons themselves \nand directly transmits it downstream [27, 28]. We believe it does not model early sensory \nsystems as well as the negative feedback circuit for a number of reasons. First, anatomical \ndata is more consistent with the reciprocally connected interneuron layer than lateral \nconnections between principal neurons [11, 13]. Second, direct transmission of the \nrepresentation would result in greater perceptual delays after stimulus onset since no \ninformation is transmitted while the representation is being built up in the sub-threshold \nrange. In contrast, in the predictive coding model the projection neurons pass forth (a coarse \nand possibly noisy version of) the input stimulus from the very beginning. We note that \nadding a nonlinearity on the principal neurons would result in a delay in transmission in both \nmodels. Although there is no biological justification for introducing a threshold to \ninterneurons only, the availability of an analytically solvable model justifies this abstraction. \nDynamics of a circuit with threshold on principal neurons will be explored elsewhere. \n\nFrom a computational point of view \n\nto \novercompleteness in the negative feedback circuit. First, the delay until subtraction of \n\nthere are \n\nthree main advantages \n\n\fprediction, which occurs when the first interneuron crosses threshold, will be briefer as the \nnumber of feature vectors grows since the maximal projection of the stimulus on the \ninterneurons\u2019 feature vectors will be higher. Second, the larger the number of feature vectors \nthe fewer the number of interneurons with supra-threshold activity, which may be \nenergetically more efficient. Third, if stimuli come from different statistical ensembles, it \ncould be advantageous to have feature vectors tailored to the different stimulus ensembles, \nwhich may result in more feature vectors, i.e., interneurons than principle neurons. \n\n (A1) \n\n (A2) \n\nto update a so as to minimize the square error plus the distance from the previous value of a. Thus, \nwe perform the following update: \n\nb induced by the convex function J. The Bregman divergence is an appropriate measure for such \nproblems that can handle the non-differentiable nature of the cost. It is defined by the following \nthe \n\nOur study considered responses to step-like stimuli. If the sensory environment changes \non slow time scales, a series of step-like responses may be taken as an approximation to the \ntrue signal. Naturally, the extension of our framework to fully time-varying stimuli is an \nimportant research direction. \n \nAcknowledgements \nWe thank S. Baccus, A. Genkin, V. Goyal, A. Koulakov, M. Meister, G. Murphy, D. \nRinberg, and R. Wilson for valuable discussions and their input. \n \nAppendix: Derivation of linearized Bregman iteration \nHere, inspired by [22,23], we solve the following basis pursuit-like optimization problem: \n\nFor \u00a0!! \u2261!|!|!+ !!!|!|!!,min!!(!) \u00a0!.!.!\"=!. \nThe idea behind linearized Bregman iteration, is to start with !!=0 and, at each iteration, to seek \n!!!!=argmin! !!!!!,!! +!!|!\u2212!\"|! \nwhere we used a notation !!!!,! for the Bregman divergence [29] between the two points a and \nexpression: !!!!,! =!! \u2212!! \u2212 !,!\u2212! , where !\u2208!\"(!) is an element of \n!!!(!,!!)= !|!|!\u2212!|!!|!+ !!!|!|!!\u2212 !!!!!|!!\u2212 \u00a0!,!\u2212!! , \nwhere ! is a subgradient of J at ak . The condition for the minimum in Eq. A2 is: \n! !|!!!!|!+ !!!!!!!|!! \u220b!!+!!!\u2212!!!, \nwhere ! [.] designates a subdifferential. Consistency of the iteration scheme requires that the \nupdate !!!! \u00a0be a subgradient of J as well: \n! !|!!!!|!+ !!!!!!!|!! \u220b!!!!. \n!!!!=!!+!!!\u2212!!! . \n!!!!=argmin! ! ! !+ !!!|!\u2212!!!!!|! , \n!!!!=Thresh!\"(!!!!!) \nBy defining \u00a0!!=!!! and expressing it in Eqs. A6,A8 with substitution !=!\" we get: \n!!!!=!!+!!!!\u2212!!!\n!!!!=Thresh!!!!! \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \n\nsubgradient of J at the point b. \nThe Bregman divergence for the elastic net cost function J defined in Eq. A1 is: \n\nBy substituting Eq. A6 into Eq. A4 and simplifying we get: \n\nBy combining Eqs. A4,A5 we set: \n\nwhich has the explicit solution: \n\n (A9) \nEq. A9 is the linearized Bregman iteration algorithm (main text Eq. 13), thereby showing that the \niterative scheme indeed finds a minimum of Eq. A2 at every time point. The sequence \nconvergence proof [23, 24] is beyond the scope of this paper. \n\n \n\n \n\n \n\n \n\n \n\n \n\n (A6) \n\n (A8) \n\n (A7) \n\n (A4) \n\n (A5) \n\n (A3) \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\fR eferences \n1. \n\n2. \n\n3. \n\n4. \n\n5. \n6. \n\n7. \n\n8. \n\n9. \n\n10. \n\n11. \n\n12. \n\n13. \n\n14. \n\n15. \n\n16. \n\n17. \n\n18. \n\n19. \n\n20. \n\n21. \n\n22. \n\n23. \n\n24. \n\n25. \n\n26. \n\n27. \n\n28. \n\n29. \n\nBarlow, H.B. and W.R. Levick, Threshold setting by the surround of cat retinal ganglion \ncells. The Journal of physiology, 1976. 259(3): p. 737-57. \nDong, D.W. and J.J. Atick, Statistics of natural time-varying images. Network: Computation \nin Neural Systems, 1995. 6(3): p. 345--358. \nField, D.J., Relations between the statistics of natural images and the response properties of \ncortical cells. Journal of the Optical Society of America. A, Optics and image science, 1987. \n4(12): p. 2379-94. \nRuderman, D.L. and W. Bialek, Statistics of natural images: Scaling in the woods. Physical \nreview letters, 1994. 73(6): p. 814-817. \nElias, P., Predictive coding. Information Theory, IRE Transactions on, 1955. 1(1): p. 16--24. \nSrinivasan, M.V., S.B. Laughlin, and A. Dubs, Predictive coding: a fresh view of inhibition \nin the retina, in Proc R Soc Lond, B, Biol Sci1982. p. 427-59. \nVictor, J.D., Temporal aspects of neural coding in the retina and lateral geniculate. \nNetwork-Computation in Neural Systems, 1999. 10(4): p. R1-R66. \nHosoya, T., S.A. Baccus, and M. Meister, Dynamic predictive coding by the retina. Nature, \n2005. 436(7047): p. 71-7. \nHuang, Y. and R.P.N. Rao, Predictive coding. Wiley Interdisciplinary Reviews: Cognitive \nScience, 2011. 2(5): p. 580-593. \nLaughlin, S., A simple coding procedure enhances a neuron's information capacity. \nZeitschrift fur Naturforschung. Section C: Biosciences, 1981. 36(9-10): p. 910-2. \nMasland, R.H., The fundamental plan of the retina. Nature neuroscience, 2001. 4(9): p. 877-\n86. \nOlsen, S.R., V. Bhandawat, and R.I. Wilson, Divisive normalization in olfactory population \ncodes. Neuron, 2010. 66(2): p. 287-99. \nShepherd, G.M., et al., The olfactory granule cell: from classical enigma to central role in \nolfactory processing. Brain research reviews, 2007. 55(2): p. 373-82. \nArevian, A.C., V. Kapoor, and N.N. Urban, Activity-dependent gating of lateral inhibition in \nthe mouse olfactory bulb. Nature neuroscience, 2008. 11(1): p. 80-7. \nBaccus, S.A., Timing and computation in inner retinal circuitry. Annu Rev Physiol, 2007. \n69: p. 271-90. \nRieke, F. and G. Schwartz, Nonlinear spatial encoding by retinal ganglion cells: when 1+1 \nnot equal 2. Journal of General Physiology, 2011. 138(3): p. 283-290. \nShapley, R.M. and J.D. Victor, The effect of contrast on the transfer properties of cat retinal \nganglion cells. The Journal of physiology, 1978. 285: p. 275-98. \nKoulakov, A.A. and D. Rinberg, Sparse Incomplete Representations: A Potential Role of \nOlfactory Granule Cells. Neuron, 2011. 72(1): p. 124-136. \nLee, D.D. and H.S. Seung, Unsupervised learning by convex and conic coding. Advances in \nNeural Information Processing Systems, 1997: p. 515--521. \nLochmann, T. and S. Deneve, Neural processing as causal inference. Curr Opin Neurobiol, \n2011. \nOlshausen, B.A. and D.J. Field, Sparse coding with an overcomplete basis set: a strategy \nemployed by V1? Vision research, 1997. 37(23): p. 3311-25. \nRao, R.P.N. and D.H. Ballard, Predictive coding in the visual cortex: a functional \ninterpretation of some extra-classical receptive-field effects. nature neuroscience, 1999. 2: p. \n79--87. \nOsher, S., et al., Fast linearized Bregman iteration for compressive sensing and sparse \ndenoising. Communications in Mathematical Sciences, 2009. \nYin, W., et al., Bregman iterative algorithms for l1-minimization with applications to \ncompressed sensing. SIAM Journal on Imaging Sciences, 2008. 1(1): p. 143--168. \nZou, H. and T. Hastie, Regularization and variable selection via the elastic net. Journal of \nthe Royal Statistical Society: Series B (Statistical Methodology), 2005. 67(2): p. 301--320. \nDayan, P., Recurrent sampling models for the Helmholtz machine. Neural computation, 1999. \n11(3): p. 653-78. \nRehn, M. and F.T. Sommer, A network that uses few active neurones to code visual input \npredicts \nfields. Journal of computational \nneuroscience, 2007. 22(2): p. 135-46. \nRozell, C.J., et al., Sparse coding via thresholding and local competition in neural circuits. \nNeural computation, 2008. 20(10): p. 2526-63. \nBregman, L.M., The relaxation method of finding the common point of convex sets and its \napplication to the solution of problems in convex programming* 1. USSR computational \nmathematics and mathematical physics, 1967. 7(3): p. 200--217. \n\nthe diverse shapes of cortical receptive \n\n\f", "award": [], "sourceid": 975, "authors": [{"given_name": "Shaul", "family_name": "Druckmann", "institution": null}, {"given_name": "Tao", "family_name": "Hu", "institution": null}, {"given_name": "Dmitri", "family_name": "Chklovskii", "institution": null}]}