{"title": "The rat as particle filter", "book": "Advances in Neural Information Processing Systems", "page_first": 369, "page_last": 376, "abstract": null, "full_text": "The pigeon as particle \ufb01lter\n\nNathaniel D. Daw\n\nCenter for Neural Science\n\nand Department of Psychology\n\nNew York University\ndaw@cns.nyu.edu\n\nAaron C. Courville\n\nD\u00e9partement d\u2019Informatique\net de recherche op\u00e9rationnelle\n\nUniversit\u00e9 de Montr\u00e9al\n\naaron.courville@gmail.com\n\nAbstract\n\nAlthough theorists have interpreted classical conditioning as a laboratory model\nof Bayesian belief updating, a recent reanalysis showed that the key features that\ntheoretical models capture about learning are artifacts of averaging over subjects.\nRather than learning smoothly to asymptote (re\ufb02ecting, according to Bayesian\nmodels, the gradual tradeoff from prior to posterior as data accumulate), subjects\nlearn suddenly and their predictions \ufb02uctuate perpetually. We suggest that abrupt\nand unstable learning can be modeled by assuming subjects are conducting in-\nference using sequential Monte Carlo sampling with a small number of samples\n\u2014 one, in our simulations. Ensemble behavior resembles exact Bayesian models\nsince, as in particle \ufb01lters, it averages over many samples. Further, the model is\ncapable of exhibiting sophisticated behaviors like retrospective revaluation at the\nensemble level, even given minimally sophisticated individuals that do not track\nuncertainty in their beliefs over trials.\n\n1 Introduction\n\nA central tenet of the Bayesian program is the representation of beliefs by distributions, which as-\nsign probability to each of a set of hypotheses. The prominent theoretical status accorded to such\nambiguity seems rather puzzlingly at odds with the all-or-nothing nature of our everyday perceptual\nlives. For instance, subjects observing ambiguous or rivalrous visual displays famously report ex-\nperiencing either percept alternately and exclusively; for even the most fervent Bayesian, it seems\nimpossible simultaneously to interpret the Necker cube as potentially facing either direction [1].\nA longstanding laboratory model for the formation of beliefs and their update in light of experience\nis Pavlovian conditioning in animals, and analogously structured prediction tasks in humans. There\nis a rich program of reinterpreting data from such experiments in terms of statistical inference [2,\n3, 4, 5, 6]. The data do appear in a number of respects to re\ufb02ect key features of the Bayesian ideal\n\u2014 speci\ufb01cally, that subjects represent beliefs as distributions with uncertainty and appropriately\nemploy it in updating them in light of new evidence. Most notable in this respect are retrospective\nrevaluation phenomena (e.g., [7]), which demonstrate that subjects are able to revise previously\nfavored beliefs in a way suggesting that they had entertained alternative hypotheses all along [6].\nHowever, the data addressed by such models are, in almost all cases, averages over large numbers of\nsubjects. This raises the question whether individuals really exhibit the sophistication attributed to\nthem, or if it instead somehow emerges from the ensemble. Recent work by Gallistel and colleagues\n[8] frames the problem particularly sharply. Whereas subject-averaged responses exhibit smooth\nlearning curves approaching asymptote (interpreted by Bayesian modelers as re\ufb02ecting the gradual\ntradeoff from prior to posterior as data accumulate), individual records exhibit neither smooth learn-\ning nor steady asymptote. Instead responding emerges abruptly and \ufb02uctuates perpetually. These\nanalyses soundly refute all previous quantitative theories of learning in these tasks: both Bayesian\nand traditional associative learning.\n\n1\n\n\fHere we suggest that individuals\u2019 behavior in conditioning might be understood in terms of Monte\nCarlo methods for sequentially sampling different hypotheses (e.g., [9]). Such a model preserves\nthe insights of a statistical framing while accounting for the characteristics of individual records.\nThrough the metaphor of particle \ufb01ltering, it also explains why exact Bayesian reasoning is a good\naccount of the ensemble. Finally, it addresses another common criticism of Bayesian models: that\nthey attribute wildly intractable computations to the individual. A similar framework has also re-\ncently been used to characterize human categorization learning [10].\nTo make our point in the most extreme way, and to explore the most novel corner of the model space,\nwe here develop as proof of concept the idea that (as with percepts in the Necker cube) subjects\nsample only a single hypothesis at a time. That is, we treat them as particle \ufb01lters employing only\none particle. We show that even given individuals of such minimal capacity, sophisticated effects\nlike retrospective revaluation can emerge in the ensemble. Clearly intermediate models are possible,\neither employing more samples or mixtures of sampling and exact methods within the individual,\nand the insights developed here will extend to those cases. We therefore do not mean to defend\nthe extreme claim that subjects never track or employ uncertainty \u2014 we think this would be highly\nmaladaptive \u2014 but instead intend to explore the role of sampling and also point out how poor is\nthe evidentiary record supporting more sophisticated accounts, and how great is the need for better\nexperimental and analytical methods to test them.\n\n2 Model\n\n2.1 Conditioning as exact \ufb01ltering\n\nIn conditioning experiments, a subject (say, a dog) experiences outcomes (\u201creinforcers,\u201d say, food)\npaired with stimuli (say, a bell). That subjects learn thereby to predict outcomes on the basis of\nantecedent stimuli is demonstrated by the \ufb01nding that they emit anticipatory behaviors (such as\nsalivation to the bell) which are taken directly to re\ufb02ect the expectation of the outcome. Human ex-\nperiments are analogously structured, but using various cover stories (such as disease diagnosis) and\nwith subjects typically simply asked to state their beliefs about how much they expect the outcome.\nA standard statistical framing for such a problem [5], which we will adopt here, is to assume that\nsubjects are trying to learn the conditional probability P (r | x) of (real-valued) outcomes r given\n(vector-valued) stimuli x. One simple generative model is to assume that each stimulus xi (bells,\nlights, tones) produces reinforcement according to some unknown parameter wi; that the contribu-\ntions of multiple stimuli sum; and that the actual reward is Gaussian in the the aggregate. That is,\nP (r | x) = N (x \u00b7 w, \u03c32\no), where we take the variance parameter as known. The goal of the subject\nis then to infer the unknown weights in order to predict reinforcement. If we further assume the\nweights w can change with time, and take that change as Gaussian diffusion,\n\nP (wt+1 | wt) = N (wt, \u03c32\ndI)\n\n(1)\n\nthen we complete the well known generative model for which Bayesian inference about the weights\ncan be accomplished using the Kalman \ufb01lter algorithm [5]. Given a Gaussian prior on w0, the\nposterior distribution P (wt | x1..t, r1...t) also takes a Gaussian form, N ( \u02c6wt, \u03a3t), with the mean\nand covariance given by the recursive Kalman \ufb01lter update equations.\nReturning to conditioning, a subject\u2019s anticipatory responding to test stimulus xt is taken to be\nproportional to her expectation about rt conditional on xt, marginalizing out uncertainty over the\nweights. E(rt | xt, \u02c6wt, \u03a3t) = xt \u00b7 \u02c6wt.\n\n2.2 Conditioning as particle \ufb01ltering\n\ncovariance \u03a3t, but instead that subject L maintains a point estimate (cid:101)wL\ncertainty. Even given such certainty, because of diffusion intervening between t and t + 1, (cid:101)wL\nwill be uncertain; let us assume that she recursively samples her new point estimate (cid:101)wL\n\nHere we assume instead that subjects do not maintain uncertainty in their posterior beliefs, via\nt and treats it as true with\nt+1\nt+1from the\n\nposterior given this diffusion and the new observation xt+1, rt+1:\n\n(cid:101)wL\nt+1 \u223c P (wL\n\nt+1 | wt = (cid:101)wL\n\nt , xt+1, rt+1)\n\n(2)\n\n2\n\n\fThis is simply a Gaussian given by the standard Kalman \ufb01lter equations. In particular, the mean of\nd +\u03c32\no)\n\nthe sampling distribution is(cid:101)wL\nis constant; the expected update in (cid:101)w, then, is just that given by the Rescorla-Wagner [11] model.\n(cid:101)wL\n\nt +xt+1\u03ba(rt+1\u2212xt+1\u00b7(cid:101)wt). Here the Kalman gain \u03ba = \u03c32\n\nd/(\u03c32\n\nSuch seemingly peculiar behavior may be motivated by the observation that, assuming that the initial\n0 is sampled according to the prior, this process also describes the evolution of a single sample\nin particle \ufb01ltering by sequential importance sampling, with Equation 2 as the optimal proposal\ndistribution [9]. (In this algorithm, particles evolve independently by sequential sampling, and do\nnot interact except for resampling.)\nOf course, the idea of such sampling algorithms is that one can estimate the true posterior over wt\nby averaging over particles. In importance sampling, the average must be weighted according to\nt ) over each t) serve to\nsquelch the contribution of particles whose trajectories turn out to be conditionally more unlikely\ngiven subsequent observations. If subjects were to behave in accord with this model, then this would\ngive us some insight into the ensemble average behavior, though if computed without importance\nreweighting, the ensemble average will appear to learn more slowly than the true posterior.\n\nimportance weights. These (here, the product of P (rt+1 | xt+1, wt = (cid:101)wL\n\n2.3 Resampling and jumps\n\nOne reason why subjects might employ sampling is that, in generative models more interesting than\nthe toy linear, Gaussian one used here, Bayesian reasoning is notoriously intractable. However,\nthe approximation from a small number of samples (or in the extreme case considered here, one\nsample) would be noisy and poor. As we can see by comparing the particle \ufb01lter update rule of\nEquation 2 to the Kalman \ufb01lter, because the subject-as-single-sample does not carry uncertainty\nfrom trial to trial, she is systematically overcon\ufb01dent in her beliefs and therefore tends to be more\nreluctant than optimal in updating them in light of new evidence (that is, the Kalman gain is low).\nThis is the individual counterpart to the slowness at the ensemble level, and at the ensemble level, it\ncan be compensated for by importance reweighting and also by resampling (for instance, standard\nsequential importance resampling; [12, 9]). Resampling kills off conditionally unlikely particles\nand keeps most samples in conditionally likely parts of the space, with similar and high importance\nweights. Since optimal reweighting and resampling both involve normalizing importance weights\nover the ensemble, they are not available to our subject-as-sample.\nHowever, there are some generative models that are more forgiving of these problems. In particular,\nconsider Yu and Dayan\u2019s [13] diffusion-jump model, which replaces Equation 1 with\n\ndI) + \u03c0N (0, \u03c32\nj I)\n\nP (wt+1 | wt) = (1 \u2212 \u03c0)N (wt, \u03c32\n\n(3)\nwith \u03c3j (cid:192) \u03c3d. Here, the weights usually diffuse as before, but occasionally (with probability \u03c0)\nare regenerated anew. (We refer to these events as \u201cjumps\u201d and the previous model of Equation 1\nas a \u201cno-jump\u201d model, even though, strictly speaking, diffusion is accomplished by smaller jumps.)\nSince optimal inference in this model is intractable (the number of modes in the posterior grows\nexponentially) Yu and Dayan [13] propose maintaining a simpli\ufb01ed posterior: they make a sort of\nmaximum likelihood determination whether a jump occurred or not; conditional on this the posterior\nis again Gaussian and inference proceeds as in the Kalman \ufb01lter.\nIf we use Equation 3 together with the one-sample particle \ufb01ltering scheme of Equation 2, then we\nsimplify the posterior still further by not carrying over uncertainty from trial to trial, but instead\nt+1 | wt =\nonly a point estimate. As before, at each step, we sample from the posterior P (wL\nt , xt+1, rt+1) given total con\ufb01dence in our previous estimate. This distribution now has two\nmodes, one representing the posterior given that a jump occurred, the other representing the posterior\ngiven no jump.\nImportantly, we are more likely to infer a jump, and resample from scratch, if the observation rt+1 is\nt . Speci\ufb01cally, the probability that\nno jump occurred (and that we therefore resample according to the posterior distribution given drift\n\u2014 effectively, the chance that the sample \u201csurvives\u201d as it would have in the no-jump Kalman \ufb01lter)\nt , no jump). This is also the factor that the trial would\ncontribute to the importance weight in the no-jump Kalman \ufb01lter model of the previous section. The\nimportance weight, in turn, is also the factor that would determine the chance that a particle would\nbe selected during an exact resampling step [12, 9].\n\n(cid:101)wL\nfar from that expected under the hypothesis of no jump, xt+1 \u00b7(cid:101)wL\n\u2014 is proportional to P (rt+1 | xt+1, wt =(cid:101)wL\n\n3\n\n\fFigure 1: Aggregate versus individual behavior in conditioning, \ufb01gures adapted with permission\nfrom [8], copyright 2004 by The National Academy of Sciences of the USA. (a) Mean over subjects\nreveals smooth, slow acquisition curve (timebase is in sessions). (b) Individual records are noisier\nand with more abrupt changes (timebase is in trials).\n(c) Examples of \ufb01ts to individual records\nassuming the behavior is piecewise Poisson with abrupt rate shifts.\n\nFigure 2: Simple acquisition in conditioning, simulations using particle \ufb01lter models. (a) Mean\nbehavior over samples for jump (\u03c0 = 0.075; \u03c3j = 1; \u03c3d = 0.1; \u03c3o = 0.5) and no-jump (\u03c0 = 0)\nparticle \ufb01lter models of conditioning, plotted against exact Kalman \ufb01lter for same parameters (and\n\u03c0 = 0). (b) Two examples of individual subject traces for the no-jump particle \ufb01lter model. (c)\nTwo examples of individual subject traces for the particle \ufb01lter model incorporating jumps.\n(d)\nDistribution over individuals using the jump model of the \u201cdynamic interval\u201d of acquisition, that is\nthe number of trials over which responding grows from negligible to near-asymptotic levels.\n\nThere is therefore an analogy between sampling in this model and sampling with resampling in\nthe simpler generative model of Equation 1. Of course, this cannot exactly accomplish optimal\nresampling, both because the chance that a particle survives should be normalized with respect to\nthe population, and because the distribution from which a non-surviving particle resamples should\nalso depend on the ensemble distribution. However, it has a similar qualitative effect of suppressing\nconditionally unlikely samples and replacing them ultimately with conditionally more likely ones.\nWe can therefore view the jumps of Equation 3 in two ways. First, they could correctly model\na jumpy world; by periodically resetting itself, such a world would be relatively forgiving of the\ntendency for particles in sequential importance sampling to turn out conditionally unlikely. Alterna-\ntively, the jumps can be viewed as a \ufb01ction effectively encouraging a sort of resampling to improve\nthe performance of low-sample particle \ufb01ltering in the non-jumpy world of Equation 1. Whatever\ntheir interpretation, as we will show, they are critical to explaining subject behavior in conditioning.\n\n3 Acquisition\n\nIn this and the following section, we illustrate the behavior of individuals and of the ensemble in\nsome simple conditioning tasks, comparing particle \ufb01lter models with and without jumps (Equations\n1 and 3).\nFigure 1 reproduces some data reanalyzed by Gallistel and colleagues [8], who quantify across a\nnumber of experiments what had long been anecdotally known about conditioning: that individual\n\n4\n\n02040608010000.20.40.60.81trialaverage P(r)kalmanjumpsno jumps(a)05010001.5jumps05010001.505010001no jumps05010001(b)(c) 1 50 >100 00.050.10.150.20.25dynamic intervalprobability(d)\frecords look nothing like the averages over subjects that have been the focus of much theorizing.\nConsider the simplest possible experiment, in which a stimulus A is paired repeatedly with food.\n(We write this as A+.) Averaged learning curves slowly and smoothly climb toward asymptote\n(Figure 1a, here the anticipatory behavior measured is pigeons pecking), just as does the estimate of\nthe mean, \u02c6wA, in the Kalman \ufb01lter models.\nViewed in individual records (Figure 1b), the onset of responding is much more abrupt (often it\noccurred in a single trial), and the subsequent behavior much more variable. The apparently slow\nlearning results from the average over abrupt transitions occurring at a range of latencies. Gallistel et\nal. [8] characterized the behavior as piecewise Poisson with instantaneous rate changes (Figure 1c).\nThese results present a challenge to the bulk of models of conditioning \u2014 not just Bayesian ones, but\nalso associative learning theories like the seminal model of Rescorla & Wagner [11] ubiquitously\nproduce smooth, asymptoting learning curves of a sort that these data reveal to be essentially an\nartifact of averaging.\nOne further anomaly with Bayesian models even as accounts for the average curves is that acquisi-\ntion is absurdly slow from a normative perspective \u2014 it emerges long after subjects using reasonable\npriors would be highly certain to expect reward. This was pointed out by Kakade and Dayan [5],\nwho also suggested an account for why the slow acquisition might actually be normative due to\nunaccounted priors caused by pretraining procedures known as hopper training. However, Balsam\nand colleagues later found that manipulating the hopper pretraining did not speed learning [14].\nFigure 2 illustrates individual and group behavior for the two particle \ufb01lter models. As expected,\nat the ensemble level (Figure 2a), particle \ufb01ltering without jumps learns slowly, when averaged\nwithout importance weighting or resampling and compared to the optimal Kalman \ufb01lter for the\nsame parameters. As shown, the inclusion of jumps can speed this up.\nIn individual traces using the jumps model (Figure 2c) frequent sampled jumps both at and after\nacquisition of responding capture the key qualitative features of the individual records: the abrupt\nonset and ongoing instability. The inclusion of jumps in the generative model is key to this account:\nas shown in Figure 2b, without these, behavior changes more smoothly. In the jump model, when\na jump is sampled, the posterior distribution conditional on the jump having occurred is centered\nnear the observed rt, meaning that the sampled weight will most likely arrive immediately near its\nasymptotic level. Figure 2d shows that such an abrupt onset of responding is the modal behavior of\nindividuals. Here (after [8]), we have \ufb01t each individual run from the jump-model simulations with\na sigmoidal Weibull function, and de\ufb01ned the \u201cdynamic interval\u201d over which acquisition occurs\nas the number of trials during which this \ufb01t function rises from 10% to 90% of its asymptotic\nlevel. Of course, the monotonic Weibull curve is not a great characterization of the individual\u2019s\nnoisy predictions, and this mismatch accounts for the long tail of the distribution. Nevertheless, the\ncumulative distribution from our simulations closely matches the proportions of animals reported as\nachieving various dynamic intervals when the same analysis was performed on the pigeon data [8].\nThese simulations demonstrate, \ufb01rst, how sequential sampling using a very low number of samples\nis a good model of the puzzling features of individual behavior in acquisition, and at the same\ntime clarify why subject-averaged records resemble the results of exact inference. Depending on\nthe presumed frequency of jumps (which help to compensate for this problem) the fact that these\naverages are of course computed without importance weighting may also help to explain the apparent\nslowness of acquisition. This could be true regardless of whether other factors, such as those posited\nby Kakade and Dayan [5], also contribute.\n\n4 Retrospective revaluation\n\nSo far, we have shown that sequential sampling provides a good qualitative characterization of in-\ndividual behavior in the simplest conditioning experiments. But the best support for sophisticated\nBayesian models of learning comes from more demanding tasks such as retrospective revaluation.\nThese tasks give the best indication that subjects maintain something more than a point estimate\nof the weights, and instead strongly suggest that they maintain a full joint distribution over them.\nHowever, as we will show here, this effect can actually emerge due to covariance information be-\ning implicitly represented in the ensemble of beliefs over subjects, even if all the individuals are\none-particle samplers.\n\n5\n\n\fFigure 3: Simulations of backward blocking effect, using exact Kalman \ufb01lter (a) and particle \ufb01lter\nmodel with jumps (b). Left, middle: Joint distributions over wAand wB following \ufb01rst-phase AB+\ntraining (left) and second phase B+ training (middle). For the particle \ufb01lter, these are derived from\nthe histogram of individual particles\u2019 joint point beliefs about the weights. Right: Mean beliefs\nabout wA and wB, showing development of backward blocking. Parameters as in Figure 2.\n\nRetrospective revaluation refers to how the interpretation of previous experience can be changed by\nsubsequent experience. A typical task, called backward blocking [7], has two phases. First, two\nstimuli, A and B, are paired with each other and reward (AB+), so that both develop a moderate\nlevel of responding. In the second phase, B alone is paired with reward (B+), and then the predic-\ntion to A alone is probed. The typical \ufb01nding is that responding to A is attenuated; the intuition is\nthat the B+ trials suggested that B alone was responsible for the reward received in the AB+ trials,\nso the association of A with reward is retrospectively discounted. Such retrospective revaluation\nphenomena are hard to demonstrate in animals (though see [15]) but robust in humans [7].\nKakade and Dayan [6] gave a more formal analysis of the task in terms of the Kalman \ufb01lter model.\nIn particular they point out that conditonal on the initial AB+ trials, the model will infer an anti-\ncorrelated joint distribution over wA and wB \u2014 i.e., that they together add up to about one. This is\nrepresented in the covariance \u03a3; the joint distribution is illustrated in Figure 3a (left). Subsequent\nB+ training indicates that wB is high, which means, given its posterior anticorrelation with wA,\nthat the latter is likely low. Note that this explanation seems to turn crucially on the representation\nof the full joint distribution over the weights, rather than just a point estimate.\nContrary to this intuition, Figure 3b demonstrates the same thing in the particle \ufb01lter model with\njumps. At the end of AB+ training, the subjects as an ensemble represent the anti-correlated joint\ndistribution over the weights, even though each individual maintains only a particular point belief.\nMoreover, B+ training causes an aggregate backward blocking effect. This is because individuals\nwho believe that wA is high tend also to believe that wB is low, which makes them most likely to\nsample that a jump has occurred during subsequent B+ training. The samples most likely to stay\n\nin place already have (cid:101)wA low and (cid:101)wB high; beliefs about wA are, on average, thereby reduced,\n\nproducing the backward blocking effect in the ensemble.\nNote that this effect depends on the subjects sampling using a generative model that admits of jumps\n(Equation 3). Although the population implicitly represents the posterior covariance between wA\nand wB even using the diffusion model with no jumps (Equation 1; simulations not illustrated), sub-\n\n6\n\nweight Aweight Bafter AB+\u2212101\u2212101(a)after B+weight Aweight B\u2212101\u221210105010000.51ABexpected P(r)trialsB+\u2192AB+\u2192after AB+weight Bweight A\u2212101\u2212101(b)after B+weight Aweight B\u2212101\u221210105010000.51ABaverage P(r)trialsB+\u2192AB+\u2192\fsequent B+ training has no tendency to suppress the relevant part of the posterior, and no backward\nblocking effect is seen. Again, this traces to the lack of a mechanism for downweighting samples\nthat turn out to be conditionally unlikely.\n\n5 Discussion\n\nWe have suggested that individual subjects in conditioning experiments behave as though they are\nsequentially sampling hypotheses about the underlying weights:\nlike particle \ufb01lters using a sin-\ngle sample. This model reproduces key and hitherto theoretically troubling features of individual\nrecords, and also, rather more surprisingly, has the ability to reproduce more sophisticated behav-\niors that had previously been thought to demonstrate that subjects represented distributions in a fully\nBayesian fashion. One practical problem with particle \ufb01ltering using a single sample is the lack of\ndistributional information to allow resampling or reweighting; we have shown that use of a particular\ngenerative model previously proposed by Yu and Dayan [13] (involving sudden shocks that effec-\ntively accomplish resampling) helps to compensate qualitatively if not quantitatively for this failing.\nThis mechanism is key to all of our results.\nThe present work echoes and formalizes a long history of ideas in psychology about hypothesis\ntesting and sudden insight in learning, going back to Thorndike\u2019s puzzle boxes. It also complements\na recent model of human categorization learning [10], which used particle \ufb01lters to sample (sparsely\nor even with a single sample) over possible clusterings of stimuli. That work concentrated on trial\nordering effects arising from the sparsely represented posterior (see also [16]); here we concentrate\non a different set of phenomena related to individual versus ensemble behavior.\nGallistel and colleagues\u2019 [8] demonstration that individual learning curves exhibit none of the fea-\ntures of the ensemble average curves that had previously been modeled poses rather a serious chal-\nlenge for theorists: After all, what does it mean to model only the ensemble? Surely the individual\nsubject is the appropriate focus of theory \u2014 particularly given the evolutionary rationale often ad-\nvanced for Bayesian modeling, that individuals who behave rationally will have higher \ufb01tness. The\npresent work aims to refocus theorizing on the individual, while at the same time clarifying why the\nensemble may be of interest. (At the group level, there may also be a \ufb01tness advantage to spreading\ndifferent beliefs \u2014 say, about productive foraging locations \u2014 across subjects rather than having\nthe entire population gravitate toward the \u201cbest\u201d belief. This is similar to the phenomenon of mixed\nstrategy equilibrium in multiplayer games, and may provide an additional motivation for sampling.)\nPrevious models fail to predict any intersubject variability because they incorporate no variation\nin either the subjects\u2019 beliefs or in their responses given their beliefs. We have suggested that the\nstructure in response timeseries suggests a prominent role for intersubject variability in the beliefs,\ndue to sampling. There is surely also noise in the responding, which we do not model, but for\nthis alone to rescue previous models, one would have to devise some other explanation for the\nnoise\u2019s structure. (For instance, if learning is monotonic, simple IID output noise would not predict\nsustained excursions away from asymptote as in Fig 1c.) Similarly, nonlinearity in the performance\nfunction relating beliefs to response rates might help to account for the sudden onset of responding\neven if learning is smooth, but would not address the other features of the data.\nIn addition to addressing the empirical problem of \ufb01t to the individual, sampling also answers an\nadditional problem with Bayesian models: that they attribute to subjects the capacity for radically\nintractable calculations. While the simple Kalman \ufb01lter used here is tractable, there has been a\ntrend in modeling human and animal learning toward assuming subjects perform inference about\nmodel structure (e.g., recovering structural variables describing how different latent causes interact\nto produce observations; [4, 3, 2]). Such inference cannot be accomplished exactly using simple\nrecursive \ufb01ltering like the Kalman \ufb01lter.\nIndeed, it is hard to imagine any approach other than\nsequentially sampling one or a small number of hypothetical model structures, since even with\nthe structure known, there typically remains a dif\ufb01cult parametric inference problem. The present\nmodeling is therefore motivated, in part, toward this setting.\nWhile in our model, subjects do not explicitly carry uncertainty about their beliefs from trial to trial,\nthey do maintain hyperparameters (controlling the speed of diffusion, the noise of observations, and\nthe probability of jumps) that serve as a sort of constant proxy for uncertainty. We might expect them\n\n7\n\n\fto adjust these so as to achieve the best performance; because the inference is anyway approximate,\nthe veridical, generative settings of these parameters will not necessarily perform the best.\nOf course, the present model is only the simplest possible sketch, and there is much work to do in\ndeveloping it. In particular, it would be useful to develop less extreme models in which subjects ei-\nther rely on sampling with more particles, or on some combination of sampling and exact inference.\nWe posit that many of the insights developed here will extend to such models, which seem more re-\nalistic since exclusive use of low-sample particle \ufb01ltering would be extremely brittle and unreliable.\n(The example of the Necker cube also invites consideration of Markov Chain Monte Carlo sampling\nfor exploration of multimodal posteriors even in nonsequential inference [1] \u2014 such methods are\nclearly complementary.) However, there is very little information available about individual-level\nbehavior to constrain the details of approximate inference. The present results on backward blocking\nstress again the perils of averaging and suggest that data must be analyzed much more delicately if\nthey are ever to bear on issues of distributions and uncertainty. In the case of backward blocking, if\nour account is correct, there should be a correlation, over individuals, between the degree to which\n\nthey initially exhibited a low (cid:101)wB and the degree to which they subsequently exhibited a backward\n\nblocking effect. This would be straightforward to test. More generally, there has been a recent trend\n[17] toward comparing models against raw trial-by-trial data sets according to the cumulative log-\nlikelihood of the data. Although this measure aggregates over trials and subjects, it measures the\naverage goodness of \ufb01t, not the goodness of \ufb01t to the average, making it much more sensitive for\npurposes of studying the issues discussed in this article.\n\nReferences\n[1] P Schrater and R Sundareswara. Theory and dynamics of perceptual bistability. In NIPS 19, 2006.\n[2] TL Grif\ufb01ths and JB Tenenbaum. Structure and strength in causal induction. Cognit Psychol, 51:334\u2013384,\n\n2005.\n\n[3] AC Courville, ND Daw, and DS Touretzky. Similarity and discrimination in classical conditioning: A\n\nlatent variable account. In NIPS 17, 2004.\n\n[4] AC Courville, ND Daw, GJ Gordon, and DS Touretzky. Model uncertainty in classical conditioning. In\n\nNIPS 16, 2003.\n\n[5] S Kakade and P Dayan. Acquisition and extinction in autoshaping. Psychol Rev, 109:533\u2013544, 2002.\n[6] S Kakade and P Dayan. Explaining away in weight space. In NIPS 13, 2001.\n[7] DR Shanks. Forward and backward blocking in human contingency judgement. Q J Exp Psychol B,\n\n37:1\u201321, 1985.\n\n[8] CR Gallistel, S Fairhurst, and P Balsam. The learning curve: Implications of a quantitative analysis. Proc\n\nNatl Acad Sci USA, 101:13124\u201313131, 2004.\n\n[9] A Doucet, S Godsill, and C Andrieu. On sequential Monte Carlo sampling methods for Bayesian \ufb01ltering.\n\nStat Comput, 10:197\u2013208, 2000.\n\n[10] AN Sanborn, TL Grif\ufb01ths, and DJ Navarro. A more rational model of categorization. In CogSci 28, 2006.\n[11] RA Rescorla and AR Wagner. A theory of Pavlovian conditioning: The effectiveness of reinforcement and\nnon-reinforcement. In AH Black and WF Prokasy, editors, Classical Conditioning, 2: Current Research\nand Theory, pages 64\u201369. 1972.\n\n[12] DB Rubin. Using the SIR algorithm to simulate posterior distributions. In JM Bernardo, MH DeGroot,\n\nDV Lindley, and AFM Smith, editors, Bayesian Statistics, Vol. 3, pages 395\u2013402. 1988.\n\n[13] AJ Yu and P Dayan. Expected and unexpected uncertainty: ACh and NE in the neocortex. In NIPS 15,\n\n2003.\n\n[14] PD Balsam, S Fairhurst, and CR Gallistel. Pavlovian Contingencies and Temporal Information. J Exp\n\nPsychol Anim Behav Process, 32:284\u2013295, 2006.\n\n[15] RR Miller and H Matute. Biological signi\ufb01cance in forward and backward blocking: Resolution of a\ndiscrepancy between animal conditioning and human causal judgment. J Exp Psychol Gen, 125:370\u2013386,\n1996.\n\n[16] ND Daw, AC Courville, and P Dayan. Semi-rational models of cognition: The case of trial order. In\n\nN Chater and M Oaksford, editors, The Probabilistic Mind. 2008. (in press).\n\n[17] ND Daw and K Doya. The computational neurobiology of learning and reward. Curr Opin Neurobiol,\n\n16:199\u2013204, 2006.\n\n8\n\n\f", "award": [], "sourceid": 1002, "authors": [{"given_name": "Aaron", "family_name": "Courville", "institution": null}, {"given_name": "Nathaniel", "family_name": "Daw", "institution": null}]}