{"title": "The Neural Costs of Optimal Control", "book": "Advances in Neural Information Processing Systems", "page_first": 712, "page_last": 720, "abstract": "Optimal control entails combining probabilities and utilities. However, for most practical problems probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields.", "full_text": "The Neural Costs of Optimal Control\n\nSamuel J. Gershman and Robert C. Wilson\n\nPsychology Department and Neuroscience Institute\n\nPrinceton University\nPrinceton, NJ 08540\n\n{sjgershm,rcw2}@princeton.edu\n\nAbstract\n\nOptimal control entails combining probabilities and utilities. However, for most\npractical problems, probability densities can be represented only approximately.\nChoosing an approximation requires balancing the bene\ufb01ts of an accurate approx-\nimation against the costs of computing it. We propose a variational framework for\nachieving this balance and apply it to the problem of how a neural population code\nshould optimally represent a distribution under resource constraints. The essence\nof our analysis is the conjecture that population codes are organized to maximize\na lower bound on the log expected utility. This theory can account for a plethora\nof experimental data, including the reward-modulation of sensory receptive \ufb01elds,\nGABAergic effects on saccadic movements, and risk aversion in decisions under\nuncertainty.\n\n1\n\nIntroduction\n\nActing optimally under uncertainty requires comparing the expected utility of each possible action,\nbut in most situations of practical interest this expectation is impossible to calculate exactly: the\nhidden states that must be integrated over may be high-dimensional and the probability density may\nnot take on any simple form. As a consequence, approximations must inevitably be used. Typically\none has a choice of approximation, with more exact approximations demanding more computational\nresources, a penalty that can be naturally incorporated into the utility function. The question we\naddress in this paper is: given a family of approximations and their associated resource demands,\nwhat approximation will lead as close as possible to the optimal control policy?\nThis is a poignant problem for the brain, which expends a collosal amount of metabolic energy in\nbuilding an internal model of the world. Previous theoretical work has studied how \u201cenergy-ef\ufb01cient\ncodes\u201d might be constructed by the brain to maximize information transfer with the least possible\nenergy consumption [10]. However, maximizing information transfer is only one component of\nadaptive behavior; the utility of information must be taken into account when choosing a code [15],\nand this may interact in complicated ways with the computational costs of approximate inference.\nOur contribution is to place this problem within a decision-theoretic framework by representing the\nchoice of approximation as a \u201cmeta-decision\u201d with its own expected utility. Central to our analysis\nis the observation that while this expected utility cannot be maximized directly, it is possible to\nmaximize a variational lower bound on log expected utility (see also [17, 5] for related approaches).\nWe study the properties of this lower bound and show how it accounts for some intriguing empirical\nproperties of neural codes.\n\n1\n\n\f2 Optimal control with approximate densities\n\nLet a denote an action and s denote a hidden state variable drawn from some probability density\np(s).1 Given a utility function U (a; s), the optimal action ap is the one that maximizes expected\nutility Vp(a):\n\nwhere\n\nVp(a) = Ep[U (a; s)] =\n\np(s)U (a; s)ds.\n\nap = argmax\n\nVp(a),\n\na\n\n(cid:90)\n\ns\n\n(1)\n\n(2)\n\nComputing the expected utility for each action requires solving a possibly intractable integral. An\napproximation of expected utility can be obtained by substituting an alternative density q(s) for\nwhich the expected utility is tractable. For example, one might choose q(s) to be a Gaussian with\nsome mean and variance, or a Monte Carlo approximation, or even a delta function at some point.\nUsing an approximate density presents the \u201cmeta-decision\u201d of which density to use. If one chooses\noptimally under q(s), then the expected utility is given by Ep[U (aq; s)] = Vp(aq), therefore the\noptimal density q\u2217 should be chosen according to\n\nq\u2217 = argmax\nq\u2208Q\n\nVp(aq),\n\n(3)\nwhere Q is some family of densities. To understand Eq. 3, consider the optimization as consisting\nof two parts: \ufb01rst, select an approximate density q(s) and choose the optimal action with respect to\nthis density; then evaluate the true value of that action under the target density. Clearly, if p \u2208 Q,\nthen q = p is the optimal solution. In general, we cannot optimize this function directly because it\nrequires solving precisely the integral we are trying to avoid: the expected utility under p(s). We\ncan, however, use the approximate density to lower-bound the log expected utility under p(s) by\nappealing to Jensen\u2019s inequality:\nlog Vp(a) \u2265\n\np(s)U (a; s)\n\nq(s) log\n\nds\n\n= Eq[log U (a; s)] + Eq[log p(s)] \u2212 Eq[log q(s)],\n\n(4)\nNotice the similarity to the evidence lower bound used in variational Bayesian inference [9]: whereas\nin variational inference we attempt to lower-bound the log marginal likelihood (evidence), in varia-\ntional decision theory we attempt to lower-bound the log expected utility.\nExamining the utility lower bound, we see that the terms exert conceptually distinct in\ufb02uences:\n\n(cid:90)\n\ns\n\nq(s)\n\nsity.\n\n1. A utility component, Eq[log U (a; s)], the expected log utility under the approximate den-\n2. A cross-entropy component, \u2212Eq[log p(s)], re\ufb02ecting the mismatch between the approxi-\nmate density and the target density. This can be thought of as a form of \u201csensory prediction\nerror.\u201d\n3. An entropy component, \u2212Eq[log q(s)], embodying a maximum entropy principle [8]: for a\n\n\ufb01xed utility and cross-entropy, choose the distribution with maximal entropy.\n\nIntuitively, a more accurate approximate density q(s) should incur a larger computational cost. One\nway to express this notion of cost is to incorporate it directly into the utility function. That is,\nwe consider an augmented utility function U (a, q; s) that depends on the approximate density. If\nwe assume that the utility function takes the form log U (a, q; s) = log R(a; s) \u2212 log C(q), where\nR(a; s) represents a reward function and C(q) represents a computational cost function, we arrive\nat the following modi\ufb01cation to the utility lower bound:\n\nL(q, a) = Eq[log R(a; s)] + Eq[log p(s)] \u2212 Eq[log q(s)] \u2212 log C(q).\n\n(5)\n\n1For the sake of notational simplicity, we implicitly condition on any observed variables. We also refer\nthroughout this paper to probability densities over a multimdensional, continuous state variable, but our results\nstill apply to one dimensional and discrete variables (in which case the probability densities are replaced with\nprobability mass functions).\n\n2\n\n\fThe assumption that the log utility decomposes into additive reward and cost components is intu-\nitive: it implies that reward is measured relative to the computational cost of earning it. In summary,\nthe utility lower bound L(q, a) provides an objective function for simultaneously choosing an action\nand choosing an approximate density over hidden states. Whereas in classical decision theory, opti-\nmization is performed over the action space, in variational decision theory optimization is performed\nover the joint space of actions and approximate densities. Perception and action are thereby treated\nas a single optimization problem.\n\n3 Choosing a probabilistic population code\n\nWhile the theory developed in the previous section applies to any representation scheme, in this\nsection, for illustrative purposes, we focus on one speci\ufb01c family of approximate densities de\ufb01ned\nby the \ufb01ring rate of neurons in a network. Speci\ufb01cally, we consider a population of N neurons tasked\nwith encoding a probability density over s. One way to do this, known as a kernel density estimate\n(KDE) code [1, 28], is to associate with each neuron a kernel density fn(s) and then approximate\nthe target density with a convex combination of the kernel densities:\n\nq(s) =\n\nexn fn(s),\n\nwhere xn denotes the \ufb01ring rate of neuron n and Z =(cid:80)N\n(cid:18)\n\nn=1 exn. We assume that the kernel density\nfunctions are Gaussian, parameterized by a preferred stimulus (mean) sn and a standard deviation\n\u03c3n:\n\n(7)\nFor simplicity, in this paper we will focus on the limiting case in which \u03c3 \u21d2 0.2 In this case q(s)\ndegenerates onto a collection of delta functions:\n\n\u2212 (s \u2212 sn)2\n\n1\u221a\n2\u03c0\u03c3n\n\nfn(s) =\n\n(cid:19)\n\n2\u03c32\nn\n\nexp\n\n(6)\n\n(8)\nwhere \u03b4(\u00b7) is the Dirac delta function. This density corresponds to a collection of sharply tuned\nneurons; provided that the preferred values {s1, . . . , sN} densely cover the state space, q(s) can\nrepresent arbitrarily complicated densities by varying the \ufb01ring rates x.\n\nq(s) =\n\nn=1\n\nexn \u03b4(s \u2212 sn),\n\nN(cid:88)\n\n1\nZ\n\nN(cid:88)\n\nn=1\n\n1\nZ\n\n3.1 Optimizing the bound\n\nN(cid:88)\n\nAssuming for the moment that there is only a single action, we can state the optimization problem\nas follows: given the family of approximate densities parameterized by x, choose the density that\nmaximizes the utility lower bound\n\nn=1\n\n1\nZ\n\nL(q, a) =\n\nexn [log U (a; sn) + log \u02dcp(sn) \u2212 xn] + log Z \u2212 log B \u2212 log C(q),\n\nwhere p(s) = \u02dcp(s)/B (i.e., \u02dcp(s) is the un-normalized target density). Note also that B =(cid:82)\nEq[g(s)] \u2248 Z\u22121(cid:80)N\n\ns \u02dcp(s)ds\ndoes not depend on xn, and hence can be ignored for the purposes of optimization. Techni-\ncally, the lower bound is not well de\ufb01ned in the limit because the target density is non-atomic\n(i.e., has zero mass at any given value). However, approximating the expectations in Eq. 5 by\nn=1 exng(sn), as we do above, can be justi\ufb01ed in terms of \ufb01rst-order Taylor\nseries expansions around the preferred stimuli, which will be arbitrarily accurate as \u03c3 \u2192 0.\nIn the rest of this paper, we shall assume that the cost function takes the following form:\n\n(9)\n\nC(q) = \u03b2N + \u03b3\n\nxn,\n\n(10)\n\n2The case of small, \ufb01nite \u03c3 can be addressed by using a Laplace approximation to the integrals and leads to\n\nsmall correction terms in the following equations.\n\nn=1\n\n3\n\nN(cid:88)\n\n\fFigure 1: Comparison between coding schemes. The leftmost panel shows a collection of prob-\nability distributions with different variances, and the other panels show different neural representa-\ntions of these distributions.\n\n\uf8ee\uf8f0log U (a; sn) + log \u02dcp(sn) +\n\nxn \u2190\n\nN(cid:88)\n\nj=1\n\n\uf8f9\uf8fb\n\nwhere \u03b2 is the \ufb01xed cost of maintaining a neuron, and \u03b3 is the cost of a spike (c.f. [10]).\nWe next seek a neuronal update rule that performs gradient ascent on the utility lower bound. Hold-\ning the \ufb01ring rate of all neurons except n \ufb01xed, taking the partial derivative of L(q, a) with respect\nto xn and setting it to 0, we arrive at the following update rule:\n\n1\nZ\n\nexj [xj \u2212 log U (a; sj) \u2212 log \u02dcp(sj)] \u2212 Z\u03b3\n\nexn C(q)\n\n+\n(11)\nwhere [\u00b7]+ denotes linear recti\ufb01cation.3 This update rule de\ufb01nes an attractor network whose Lya-\npunov function is the (negative) utility lower bound. When multiple actions are involved, the bound\ncan be jointly optimized over a and q by coordinate ascent. While somewhat untraditional, we\nnote that this update rule is biologically plausible in the sense that it only involves local pairwise\ninteractions between neurons.\n\n4 Relation to other probability coding schemes\n\n4.1 Exponential, convolutional and gain coding\n\nThe probability coding scheme proposed in Eq. 8 is closely related to the exponential coding de-\nscribed in [16]. That scheme also encodes probabilities using exponentiated activities, although\nit uses the representation in a very different way and in a network with very different dynamics,\nfocusing on sequential inference problems instead of the arbitrary decision problems we consider\nhere. Other related schemes include convolutional coding [28], in which a distribution is encoded\nby convolving it with a neural tuning function, and gain coding [11, 27], in which the variance of\nthe distribution is inversely proportional to the gain of the neural response.\nIn Figure 1, we show how these three different ways of encoding probability distributions represent\nthree different Gaussians with variance 2 (black line in Figure 1a), 4 (red) and 10 (blue) units.\nConvolutional coding (Figure 1b) is characterized by a neural response pattern that gets broader as\nthe distribution gets broader. This has been one of the major criticisms of this type of encoding\nscheme as this result does not seem to be borne out experimentally (e.g., [19, 2]). In contrast, gain\ncoding schemes (Figure 1c) posit that changes in uncertainty only change the overall gain, and not\nthe shape, of the neural response. This leads to predictions that are consistent with experiments, but\nlimits the type of distributions that can be represented to the exponential family [11].\nFinally, Figure 1d shows how the exponential coding scheme we propose represents the distributions\nin a manner that can be thought of as in between convolutional coding and gain encoding, with\na population response that gets broader as the encoded distribution broadens, but in a much less\n3This update is equivalent to performing gradient ascent on L with a variable learning rate parameter given\nexn . We chose this rule as it converges faster and seems more neurally plausible than the pure gradient\n\nby Z\nascent.\n\n4\n\n10305070sprobability density(a) probability distributions10305070neuron number(d) exponential coding10305070neuron number(c) gain coding10305070neuron numberfiring rate(b) convolutional coding\fpronounced way than pure convolutional coding. This point is crucial for the biological plausibility\nof this scheme, as it seems unlikely that these minute differences in population response width would\nbe easily measured experimentally.\nIt is also important to note that both the convolutional and gain coding schemes ignore the utility\nfunction in constructing probabilistic representations. As we explore in later sections, rewards and\ncosts place strong constraints on the types of codes that are learned by the variational objective, and\nthe available experimental data is congruent with this view. \u201cPure\u201d probabilistic representations may\nnot exist in the brain.\n\n4.2 Connection to Monte Carlo approximation\n\nSubstantial interest has been generated recently in the idea that the brain might use some form of\nsampling (i.e., Monte Carlo algorithm) to approximate complicated probability densities. Psycho-\nlogical phenomena like perceptual multistability [6] and speech perception [21] are parsimoniously\nexplained by a model in which a density over the complete hypothesis space is replaced by a small\nset of discrete samples. Thus, it is reasonable to speculate whether our theory of population coding\nrelates to these at the neural level.\nWhen each neuron\u2019s tuning curve is sharply peaked, the resulting population code resembles impor-\ntance sampling, a common Monte Carlo method for approximating probability densities, wherein\nthe approximation consists of a weighted set of samples:\n\n(12)\nwhere s(n) is drawn from a proposal density \u03c0(s) and w(n) \u221d p(s(n))/\u03c0(s(n)). In fact, we can make\nthis correspondence precise: for any population code of the form in Eq. 8, there exists an equivalent\nimportance sampling approximation. The corresponding proposal density takes the form:\n\nn=1\n\nw(n)\u03b4(s \u2212 s(n)),\n\np(s) \u2248 N(cid:88)\n\n\u03c0(s) \u221d(cid:88)\n\nn\n\np(sn)\nexn\n\n\u03b4(s \u2212 sn).\n\n(13)\n\nThis means that optimizing the bound with respect to x is equivalent to selecting a proposal density\nso as to maximize utility under resource constraints. A related analysis was made by Vul et al. [26],\nthough in a more restricted setting, showing that maximal utility is achieved with very few samples\nwhen sampling is costly. Similarly, \u03c0(s) will be sensitive to the computational costs inherent in the\nutility lower bound, favoring a small number of samples.\nInterestingly,\nimportance sampling has been proposed as a neurally-plausible mechanism for\nBayesian inference [22]. In that treatment, the proposal density was assumed to be the prior, leading\nto the prediction that neurons with preferred stimulus s\u2217 should occur with frequency proportional\nto the prior probability of s\u2217. One source of evidence for this prediction comes from the oblique\neffect: the observation that more V1 neurons are tuned to cardinal orientations than to oblique ori-\nentations [3], consistent with the statistics of the natural visual environment. In contrast, our model\npredicts that the proposal density will be sensitive to rewards in addition to the prior; as we argue in\nthe section 5.1, a considerable amount of evidence favors this view.\n\n5 Results\n\nIn the following sections, we examine some of the neurophysiological and psychological implica-\ntions of the variational objective. Tying these diverse topics together is the central idea that utilities,\ncosts and probabilistic beliefs exert a synergistic effect on neural codes and their behavioral outputs.\nOne consequence of the variational objective is that a clear separation of these components in the\nbrain may not exist: rewards and costs in\ufb01ltrate very early sensory areas. These in\ufb02uences result in\ndistortions of probabilistic belief that appear robustly in experiments with humans and animals.\n\n5.1 Why are sensory receptive \ufb01elds reward-modulated?\n\nAccumulating evidence indicates that perceptual representations in the brain are modulated by re-\nward expectation. For example, Shuler and Bear [23] paired retinal stimulation of the left and right\n\n5\n\n\fFigure 2: Grasshopper auditory coding. Probability density of natural sounds and the optimized\napproximate density, with black lines demarcating the region of behaviorally relevant sounds.\n\neyes with reward after different delays and recorded neurons in primary visual cortex that switched\nfrom representing purely physical attributes of the stimulation (e.g., eye of origin) to coding reward\ntiming. Similarly, Serences [20] showed that spatially selective regions of visual cortex are biased\nby the prior reward associated with different spatial locations. These studies raise the possibility that\nthe brain does not encode probabilistic beliefs separately from reward; indeed, this idea has been\nenshrined by a recent theoretical account [4]. One important rami\ufb01cation of this con\ufb02ation is that it\nwould appear to violate one of the axioms of statistical decision theory: probabilistic sophistication\n[18]. On the other hand, the variational framework we have described accounts for these \ufb01ndings by\nshowing that decision-making using approximate densities leads automatically to reward-modulated\nprobabilistic beliefs. Thus, the apparent inconsistency with statistical decision theory may be an\nartifact of rational responses to the information-processing constraints of the brain.\nTo drive this point home, we now analyze one example in more detail. Machens et al. [12] recorded\nthe responses of grasshopper auditory neurons to different stimulus ensembles and found that the en-\nsembles that elicited the optimal response differed systematically from the natural auditory statistics\nof the grasshopper\u2019s environment. In particular, the optimal ensembles were restricted to a region of\nstimulus space in which behaviorally important sounds live, namely species-speci\ufb01c mating signals.\nIn the words of Machens et al., \u201can organism may seek to distribute its sensory resources according\nto the behavioral relevance of the natural stimuli, rather than according to purely statistical prin-\nciples.\u201d We modeled this phenomenon by constructing a relatively wide density of natural sounds\nwith a narrow region of behaviorally relevant sounds (in which states are twice as rewarding). Fig-\nure 2 shows the results, con\ufb01rming that maximizing the utility lower bound selects a kernel density\nestimate that is narrower than the target density of natural sounds.\n\n5.2 Changing the cost of a spike\n\nExperimentally, there are at least two ways to manipulate the cost of a spike. One is by changing\nthe amount of inhibition in the network (e.g., using injections of muscimol, a GABA agonist) and\nhence increasing the metabolic requirements for action potential generation. A second method is\nby manipulating the availability of glucose [7], either by making the subject hypoglycemic or by\nadministering local infusions of glucose directly into the brain. We predict that increasing spik-\ning costs (either by reducing glucose levels or increasing GABAergic transmission) will result in a\ndiminished ability to detect weak signals embedded in noise. Consistent with this prediction, con-\ntrolled hypoglycemia reduces the speed with which visual changes are detected amidst distractors\n[13].\nThese predictions have received a more direct test in a recent visual search experiment by McPeek\nand Keller [14], in which muscimol was injected into local regions of the superior colliculus, a\nbrain area known to control saccadic target selection. In the absence of distractors, response laten-\ncies to the target were increased when it appeared in the receptive \ufb01elds of the inhibited neurons.\nIn the presence of distractors, response latencies increased and choice accuracy decreased when\nthe target appeared in the receptive \ufb01elds of the inhibited neurons. We simulated these \ufb01ndings\nby constructing a cost-\ufb01eld \u03b3(n) to represent the amount of GABAergic transmission at different\nneurons induced by muscimol injections. In the distractor condition (Figure 3, top panel), accuracy\n\n6\n\n00.20.40.60.8100.0050.010.0150.020.0250.03SoundProbability  Natural soundsNeural code\fFigure 3: Spiking cost in the superior colliculus. Top panels illustrate distractor condition. Bottom\npanels illustrate no-distractor condition. (Left column) Target density, with larger bump in the top\npanel representing the target; (Center column) neural code under different settings of cost-\ufb01eld \u03b3(n);\n(Right column) \ufb01ring rates under different cost-\ufb01elds.\n\ndecreases because the increased cost of spiking in the neurons representing the target location damp-\nens the probability density in that location. Increasing spiking cost also reduces the overall \ufb01ring\nrate in the target-representing neurons relative to the distractor-representing neurons. This predicts\nincreased response latencies if we assume a monotonic relationship with the relative \ufb01ring rate in\nthe target-representing neurons. Similarly, in the no-distractor condition (Figure 3, bottom panel),\nresponse latencies increase due to decreased \ufb01ring rate in the target-representing neurons.\n\n5.3 Non-linear probability weighting\n\nIn this section, we show that the variational objective provides a new perspective on some well-\nknown peculiarities of human probabilistic judgment. In particular, the ostensibly irrational non-\nlinear weighting of probabilities in risky choice emerges naturally from optimization of the varia-\ntional objective under a natural assumption about the ecological distribution of rewards.\nTversky and Kahneman [25] observed that people tend to be risk-seeking (over-weighting probabil-\nities) for low-probability gains and risk-averse (under-weighting probabilities) for high-probability\ngains. This pattern reverses for losses. The variational objective explains these phenomena by virtue\nof the fact that under neural resource constraints, the approximate density will be biased towards\nhigh reward regions of the state space. It is also necessary to assume that the magnitude of gains or\nlosses scales inversely with probability (i.e., large gains or losses are rare). With this assumption,\nthe optimized neural code produce the four-fold pattern of risk-attitudes observed by Tversky and\nKahneman (Figure 4).\n\n6 Discussion\n\nWe have presented a variational objective function for neural codes that balances motivational, sta-\ntistical and metabolic demands in the service of optimal behavior. The essential idea is that the\nintractable problem of computing expected utilities can be \ufb01nessed by instead computing expected\nutilities under an approximate density that optimizes a variational lower bound on log expected\nutility. This lower bound captures the neural costs of optimal control: more accurate approxima-\ntions will require more metabolic resources, whereas less accurate approximations will diminish the\namount of earned reward. This principle can explain, among other things, why receptive \ufb01elds of\n\n7\n\n\u22125005000.050.10.150.2sp(s)\u22125005000.050.10.150.2sq(s)  ControlMuscimol050100012345neuron numberfiring rate\u22125005000.050.10.150.2sp(s)\u22125005000.050.10.150.2sq(s)05010002468neuron numberfiring rate\fFigure 4: Probability weighting. Simulated calibration curve for gains and losses. Perfect calibra-\ntion (i.e., linear weighting) is indicated by the dashed line.\n\nsensory neurons have repeatedly been found to be sensitive to reward contingencies. Intuitively,\nexpending more resources on accurately approximating the complete density of natural sensory\nstatistics is inef\ufb01cient (from an optimal control perspective) if the behaviorally relevant signals live\nin a compact subspace. We showed that the approximation that maximizes the utility lower bound\nconcentrates its density within this subspace.\nOur variational framework differs in important ways from the one recently proposed by Friston\n[4]. In his treatment, utilities are not represented explicitly at all; rather, they are implicit in the\nprobabilistic structure of the environment. Based on an evolutionary argument, Friston suggests\nthat high utility states are precisely those that have high probability, since otherwise organisms who\n\ufb01nd themselves frequently in low utility states are unlikely to survive. Thus, adopting a control\npolicy that minimizes a variational upper bound on surprise will lead to optimal behavior. However,\nadopting this control policy may lead to pathological behaviors, such as attraction to malign states\nthat have been experienced frequently (e.g., a person who has been poor her whole life should reject\na winning lottery ticket).\nIn contrast, our variational framework is motivated by quite different\nconsiderations arising from the computational constraints of the brain\u2019s architecture. Nonetheless,\nthese approaches have in common the idea that probabilistic beliefs will be shaped by the utility\nstructure of the environment.\nThe psychological concept of \u201cbounded rationality\u201d is an old one [24], classically associated with\nthe observation that humans sometimes adopt strategies for identifying adequate solutions rather\nthan optimal ones (\u201csatis\ufb01cing\u201d). The variational framework offers a rather different perspective on\nbounded rationality; it asserts that humans are indeed trying to \ufb01nd optimal solutions, but subject\nto certain computational resource constraints. By making explicit what these constraints are, and\nhow they interact at a neural level, our work provides a foundation upon which to develop a more\ncomplete neurobiological theory of optimal control under resource constraints.\n\nAcknowledgments\n\nWe thank Matt Botvinick, Matt Hoffman, Chong Wang, Nathaniel Daw and Yael Niv for helpful dis-\ncussions. SJG was supported by a Quantitative Computational Neuroscience grant from the National\nInstitutes of Health.\n\nReferences\n[1] C.H. Anderson and D.C. Van Essen. Neurobiological computational systems. Computational\n\nintelligence imitating life, pages 213\u2013222, 1994.\n\n[2] J.S. Anderson, I. Lampl, D.C. Gillespie, and D. Ferster. The contribution of noise to contrast\n\ninvariance of orientation tuning in cat visual cortex. Science, 290(5498):1968, 2000.\n\n[3] R.L. De Valois, E. William Yund, and N. Hepler. The orientation and direction selectivity of\n\ncells in macaque visual cortex. Vision Research, 22(5):531\u2013544, 1982.\n\n[4] K. Friston. The free-energy principle: a uni\ufb01ed brain theory? Nature Reviews Neuroscience,\n\n11(2):127\u2013138, 2010.\n\n8\n\n00.5100.20.40.60.81Target probabilityApproximate probability  GainsLosses\f[5] T. Furmston and D. Barber. Variational methods for reinforcement learning. Proceedings of\n\nthe Thirteenth Conference on Arti\ufb01cial Intelligence and Statistics (AISTATS), 2010.\n\n[6] S.J. Gershman, E. Vul, and J.B. Tenenbaum. Perceptual multistability as Markov Chain Monte\nCarlo inference. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta,\neditors, Advances in Neural Information Processing Systems 22, pages 611\u2013619. 2009.\n\n[7] P.E. Gold. Role of glucose in regulating the brain and cognition. American Journal of Clinical\n\nNutrition, 61:987S\u2013995S, 1995.\n\n[8] E.T. Jaynes. On the rationale of maximum-entropy methods. Proceedings of the IEEE,\n\n70(9):939\u2013952, 1982.\n\n[9] M.I. Jordan, Z. Ghahramani, T.S. Jaakkola, and L.K. Saul. An introduction to variational\n\nmethods for graphical models. Machine learning, 37(2):183\u2013233, 1999.\n\n[10] W.B. Levy and R.A. Baxter. Energy ef\ufb01cient neural codes. Neural Computation, 8(3):531\u2013543,\n\n1996.\n\n[11] W.J. Ma, J.M. Beck, P.E. Latham, and A. Pouget. Bayesian inference with probabilistic popu-\n\nlation codes. Nature Neuroscience, 9(11):1432\u20131438, 2006.\n\n[12] C.K. Machens, T. Gollisch, O. Kolesnikova, and A.V.M. Herz. Testing the ef\ufb01ciency of sensory\n\ncoding with optimal stimulus ensembles. Neuron, 47(3):447\u2013456, 2005.\n\n[13] RJ McCrimmon, IJ Deary, BJP Huntly, KJ MacLeod, and BM Frier. Visual information pro-\n\ncessing during controlled hypoglycaemia in humans. Brain, 119(4):1277, 1996.\n\n[14] R.M. McPeek and E.L. Keller. De\ufb01cits in saccade target selection after inactivation of superior\n\ncolliculus. Nature neuroscience, 7(7):757\u2013763, 2004.\n\n[15] P.R. Montague and B. King-Casas. Ef\ufb01cient statistics, common currencies and the problem of\n\nreward-harvesting. Trends in cognitive sciences, 11(12):514\u2013519, 2007.\n\n[16] R.P.N. Rao. Bayesian computation in recurrent neural circuits. Neural Computation, 16(1):1\u2013\n\n38, 2004.\n\n[17] M. Sahani. A biologically plausible algorithm for reinforcement-shaped representational learn-\n\ning. Advances in Neural Information Processing, 16, 2004.\n\n[18] L.J. Savage. The Foundations of Statistics. Dover, 1972.\n[19] G. Sclar and RD Freeman. Orientation selectivity in the cat\u2019s striate cortex is invariant with\n\nstimulus contrast. Experimental Brain Research, 46(3):457\u2013461, 1982.\n\n[20] J.T. Serences. Value-based modulations in human visual cortex. Neuron, 60(6):1169\u20131181,\n\n2008.\n\n[21] L. Shi, N.H. Feldman, and T.L. Grif\ufb01ths. Performing Bayesian inference with exemplar mod-\nIn Proceedings of the 30th annual conference of the cognitive science society, pages\n\nels.\n745\u2013750, 2008.\n\n[22] Lei Shi and Thomas Grif\ufb01ths. Neural implementation of hierarchical bayesian inference by im-\nportance sampling. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta,\neditors, Advances in Neural Information Processing Systems 22, pages 1669\u20131677. 2009.\n\n[23] M.G. Shuler and M.F. Bear. Reward timing in the primary visual cortex.\n\n311(5767):1606, 2006.\n\nScience,\n\n[24] H.A. Simon. Models of Bounded Rationality. MIT Press, 1982.\n[25] A. Tversky and D. Kahneman. Advances in prospect theory: cumulative representation of\n\nuncertainty. Journal of Risk and uncertainty, 5(4):297\u2013323, 1992.\n\n[26] E. Vul, N.D. Goodman, T.L. Grif\ufb01ths, and J.B. Tenenbaum. One and done? Optimal decisions\nfrom very few samples. In Proceedings of the 31st Annual Meeting of the Cognitive Science\nSociety, Amseterdam, the Netherlands, 2009.\n\n[27] R.C. Wilson and L.H. Finkel. A neural implementation of the kalman \ufb01lter. In Y. Bengio,\nD. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural\nInformation Processing Systems 22, pages 2062\u20132070. 2009.\n\n[28] R.S. Zemel, P. Dayan, and A. Pouget. Probabilistic interpretation of population codes. Neural\n\nComputation, 10(2):403\u2013430, 1998.\n\n9\n\n\f", "award": [], "sourceid": 697, "authors": [{"given_name": "Samuel", "family_name": "Gershman", "institution": null}, {"given_name": "Robert", "family_name": "Wilson", "institution": null}]}