{"title": "Model Uncertainty in Classical Conditioning", "book": "Advances in Neural Information Processing Systems", "page_first": 977, "page_last": 984, "abstract": "", "full_text": "Model Uncertainty in Classical Conditioning\n\nA. C. Courville*1;3, N. D. Daw2;3, G. J. Gordon4, and D. S. Touretzky2;3\n\n1Robotics Institute, 2Computer Science Department,\n\n3Center for the Neural Basis of Cognition,\n\n4Center for Automated Learning and Discovery\nCarnegie Mellon University, Pittsburgh, PA 15213\nfaaronc,daw,ggordon,dstg@cs.cmu.edu\n\nAbstract\n\nWe develop a framework based on Bayesian model averaging to explain\nhow animals cope with uncertainty about contingencies in classical con-\nditioning experiments. Traditional accounts of conditioning \ufb01t parame-\nters within a \ufb01xed generative model of reinforcer delivery; uncertainty\nover the model structure is not considered. We apply the theory to ex-\nplain the puzzling relationship between second-order conditioning and\nconditioned inhibition, two similar conditioning regimes that nonethe-\nless result in strongly divergent behavioral outcomes. According to the\ntheory, second-order conditioning results when limited experience leads\nanimals to prefer a simpler world model that produces spurious corre-\nlations; conditioned inhibition results when a more complex model is\njusti\ufb01ed by additional experience.\n\n1\n\nIntroduction\n\nMost theories of classical conditioning, exempli\ufb01ed by the classic model of Rescorla and\nWagner [7], are wholly concerned with parameter learning. They assume a \ufb01xed (often\nimplicit) generative model m of reinforcer delivery and treat conditioning as a process\nof estimating values for the parameters wm of that model. Typically, these parameters\nrepresent the rates of reinforcers delivered in the presence of various stimuli. Using the\nmodel and the parameters, the probability of reinforcer delivery can be estimated; such\nestimates are assumed to give rise to conditioned responses in behavioral experiments.\nMore overtly statistical theories have treated uncertainty in the parameter estimates, which\ncan in\ufb02uence predictions and learning [4].\n\nIn realistic situations, the underlying contingencies of the environment are complex and\nunobservable, and it can thus make sense to view the model m as itself uncertain and subject\nto learning, though (to our knowledge) no explicitly statistical theories of conditioning\nhave yet done so. Under the standard Bayesian approach, such uncertainty can be treated\nanalogously to parameter uncertainty, by representing knowledge about m as a distribution\nover a set of possible models, conditioned on evidence. Here we advance this idea as a high-\nlevel computational framework for the role of model learning in classical conditioning. We\ndo not concentrate on how the brain might implement these processes, but rather explore\nthe behavior that a system approximating Bayesian reasoning should exhibit. This work\n\n\festablishes a relationship between theories of animal learning and a recent line of theory\nby Tenenbaum and collaborators, which uses similar ideas about Bayesian model learning\nto explain human causal reasoning [9].\n\nWe have applied our theory to a variety of standard results in animal conditioning, in-\ncluding acquisition, negative and positive patterning, and forward and backward blocking.\nHere we present one of the most interesting and novel applications, an explanation of a\nrather mysterious classical conditioning phenomenon in which opposite predictions about\nthe likelihood of reinforcement can arise from different amounts of otherwise identical\nexperience [11]. The opposing effects, both well known, are called second-order condi-\ntioning and conditioned inhibition. The theory explains the phenomenon as resulting from\na tradeoff between evidence and model complexity.\n\n2 A Model of Classical Conditioning\n\nIn a conditioning trial, a set of conditioned stimuli CS (cid:17) fA; B; : : :g is presented, poten-\ntially accompanied by an unconditioned stimulus or reinforcement signal, US . We repre-\nsent the jth stimulus with a binary random variable yj such that yj = 1 when the stimulus\nis present. Here the index j, 1 (cid:20) j (cid:20) s, ranges over both the (s (cid:0) 1) conditioned stimuli\nand the unconditioned stimulus. The collection of trials within an experimental protocol\nconstitutes a training data set, D = fyjtg, indexed by stimulus j and trial t, 1 (cid:20) t (cid:20) T .\nWe take the perspective that animals are attempting to recover the generative process under-\nlying the observed stimuli. We claim they assert the existence of latent causes, represented\nby the binary variables xi 2 f0; 1g, responsible for evoking the observed stimuli. The re-\nlationship between the latent causes and observed stimuli is encoded with a sigmoid belief\nnetwork. This particular class of models is not essential to our conclusions; many model\nclasses should result in similar behavior.\n\nSigmoid Belief Networks\nde\ufb01ned as functions of weighted sums of parent nodes. Using our notation,\n\nIn sigmoid belief networks, local conditional probabilities are\n\nP (yj = 1 j x1; : : : ; xc; wm; m) = (1 + exp((cid:0)Xi\n\nwijxi (cid:0) wyj ))(cid:0)1;\n\n(1)\n\nand P (yj = 0 j x1; : : : ; xc; wm; m) = 1 (cid:0) P (yj = 1 j x1; : : : ; xc; wm; m). The weight,\nwij, represents the in\ufb02uence of the parent node xi on the child node yj. The bias term wyj\nencodes the probability of yj in the absence of all parent nodes. The parameter vector wm\ncontains all model parameters for model structure m.\nThe form of the sigmoid belief networks we consider is represented as a directed graphical\nmodel in Figure 1a, with the latent causes as parents of the observed stimuli. The latent\ncauses encode the intratrial correlations between stimuli \u2014 we do not model the temporal\nstructure of events within a trial. Conditioned on the latent causes, the stimuli are mutually\nindependent. We can express the conditional joint probability of the observed stimuli as\n\nj=1 P (yj j x1; : : : ; xc; wm; m).\n\nQs\n\nSimilarly, we assume that trials are drawn from a stationary process. We do not consider\ntrial order effects, and we assume all trials are mutually independent. (Because of these\nsimplifying assumptions, the present model cannot address a number of phenomena such\nas the difference between latent inhibition, partial reinforcement, and extinction.) The\nresulting likelihood function of the training data, with latent causes marginalized, is:\n\nP (D j wm; m) =\n\nTYt=1Xx\n\nsYj=1\n\nP (yjt j x; wm; m)P (x j wm; m);\n\n(2)\n\n\fwx1\n\nwx2\n\nx1\n\nx2\n\nw11 w12\n\nw22\n\nA\nwy1\n\nB\nwy2\n\nw1s\n(cid:1)(cid:1)(cid:1)\n\n(cid:1)(cid:1)(cid:1)\nw2s\n\nU S\n\nwys\n\n(a) Sigmoid belief network\n\np(D j m) simple model\n\ncomplicated model\nD\n\nwins\n\ncomplicated\n\nsimple\nwins\n(b) Marginal likelihood\n\nFigure 1: (a) An example from the proposed set of models. Conditional dependencies\nare depicted as links between the latent causes (x1, x2) and the observed stimuli (A, B,\nU S) during a trial. (b) Marginal likelihood of the data, D, for a simple model and a more\ncomplicated model (after MacKay [5]).\n\nQc\ni=1(1 + exp((cid:0)1xi wxi ))(cid:0)1.\n\nwhere the sum is over all combinations of values of x = [x1; : : : ; xc] and P (x j wm; m) =\n\nSigmoid belief networks have a number of appealing properties for modeling conditioning.\nFirst, the sigmoid belief network is capable of compactly representing correlations between\ngroups of observable stimuli. Without a latent cause, the number of parameters required to\nrepresent these correlations would scale exponentially with the number of stimuli. Second,\nthe parent nodes, interacting additively, constitute a factored representation of state. This is\nadvantageous as it permits generalization to novel combinations of factors. Such additivity\nhas frequently been observed in conditioning experiments [7].\n\n2.1 Prediction under Parameter Uncertainty\n\nConsider a particular network structure, m, with parameters wm. Given m and a set of\ntrials, D, the uncertainty associated with the choice of parameters is represented in a pos-\nterior distribution over wm. This posterior is given by Bayes\u2019 rule, p(wm j D; m) /\nP (D j wm; m)p(wm j m), where P (D j m) is from Equation 2 and p(wm j m) is the\nprior distribution over the parameters of m. We assume the model parameters are a pri-\n\nori independent. p(wm j m) = Qij p(wij)Qi p(wxi )Qj p(wyj ), with Gaussian priors\n\nfor weights p(wij) = N (0; 3), latent cause biases p(wxi ) = N (0; 3), and stimulus biases\np(wyj ) = N ((cid:0)15; 1), the latter re\ufb02ecting an assumption that stimuli are rare in the absence\nof causes.\n\nIn conditioning, the test trial measures the conditioned response (CR). This is taken to\nbe a measure of the animal\u2019s estimate of the probability of reinforcement conditioned on\nthe present conditioned stimuli CS . This probability is also conditioned on the absence\nof the remaining stimuli; however, in the interest of clarity, our notation suppresses these\nabsent stimuli. In the Bayesian framework, given m, this probability, P (US j CS ; m;D)\nis determined by integrating over all values of the parameters weighted by their posterior\nprobability density,\n\nP (US j CS ; m;D) =Z P (US j CS ; wm; m;D)p(wm j m;D) dwm\n\n(3)\n\n\f2.2 Prediction under Model Uncertainty\n\nIn the face of uncertainty about which is the correct model of contingencies in the world \u2014\nfor instance, whether a reinforcer is independent of a tone stimulus \u2014 a standard Bayesian\napproach is to marginalize out the in\ufb02uence of the model choice,\n\nP (US j CS ;D) = Xm\n\nP (US j CS ; m;D)P (m j D)\n\n(4)\n\n= Xm Z P (US j CS ; wm; m;D)p(wm j m;D)P (m j D) dwm\n; P (D j m) =Z P (D j wm; m)p(wm j m) dwm\n\nThe posterior over models, p(m j D), is given by:\nP (m j D) =\nThe marginal likelihood P (D j m) is the probability of the data under model m, marginal-\nizing out the model parameters. The marginal likelihood famously confers an automatic\nOccam\u2019s razor effect on the average of Equation 4. Under complex models, parameters\ncan be found to boost the probability of particular data sets that would be unlikely under\nsimpler models, but any particular parameter choice is also less likely in more complex\nmodels. Thus there is a tradeoff between model \ufb01delity and complexity (Figure 1b).\n\nPm0 P (D j m0)P (m0)\n\nP (D j m)P (m)\n\nWe also encode a further preference for simpler models through the prior over model struc-\ni=1 P (li), where c is the number of latent causes\nand li is the number of directed links emanating from xi. The priors over c and li are in\nturn given by,\n\nture, which we factor as P (m) = P (c)Qc\nP (c) =(\n\n10(cid:0)3c\nP5\nc0=0 10(cid:0)3c0\n\nif 0 (cid:20) c (cid:20) 5\notherwise\n\n0\n\nand P (li) =(\n\n10(cid:0)3li\n\n10(cid:0)3l0i\n\nPs\n\n0\n\nl0i\n\n=0\n\nif 0 (cid:20) li (cid:20) 4\notherwise\n\nIn the Bayesian model average, we consider the set of sigmoid belief networks with a\nmaximum of 4 stimuli and 5 latent causes.\n\nThis strong prior over model structures is required in addition to the automatic Occam\u2019s\nrazor effect in order to explain the animal behaviors we consider. This probably is due to\nthe extreme abstraction of our setting. With generative models that included, e.g., temporal\nordering effects and multiple perceptual dimensions, model shifts equivalent to the addition\nof a single latent variable in our setting would introduce a great deal of additional model\ncomplexity and require proportionally more evidential justi\ufb01cation.\n\n2.3 Monte Carlo Integration\n\nIn order to determine the predictive probability of reinforcement, Bayesian model aver-\naging requires that we evaluate Equation 4. Unfortunately, the integral is not amenable\nto analytic solution. Hence we approximate the integral with a sum over samples from\nthe posterior p(wm; m j D). Acquiring samples is complicated by the need to sample\nover parameter spaces of different dimensions. In the simulations reported here, we solved\nthis problem and obtained samples using a reversible jump Markov chain Monte Carlo\n(MCMC) method [2]. A new sample in the chain is obtained by proposing perturbations\nto the current sample\u2019s model structure or parameters.1 Jumps include the addition or re-\nmoval of links or latent causes, or updates to the stimulus biases or weights. To improve\nmixing over the different modes of the target distribution, we used exchange MCMC, which\nenables fast mixing between modes through the coupling of parallel Markov chains [3].\n\n1The proposal acceptance probability satis\ufb01es detailed balance for each type of jump.\n\n\fGroup\nNo-X\nFew-X\nMany-X\n\n96\n96\n96\n\n0\n4\n48\n\nA-US A-X B-US\n\n8\n8\n8\n\nTest ;Result\nX ;(cid:0)\nX ;CR\nX ;(cid:0)\n\nTest ;Result\nXB ;CR\nXB ;CR\nXB ;(cid:0)\n\nTable 1: A summary of some of the experiments of Yin et al. [11]. The US was a footshock;\nA = white noise or buzzer sound; X = tone; B = click train.\n\n3 Second-Order Conditioning and Conditioned Inhibition\n\nWe use the model to shed light on the relationship between two classical conditioning\nphenomena, second-order conditioning and conditioned inhibition. The procedures for es-\ntablishing a second-order excitor and a conditioned inhibitor are similar, yet the results are\ndrastically different. Both procedures involve two kinds of trials: a conditioned stimulus A\nis presented with the US (A-US ); and A is also presented with a target conditioned stimu-\nlus X in unreinforced trials (A-X). In second order conditioning, X becomes an excitor \u2014\nit is associated with increased probability of reinforcement, demonstrated by conditioned\nresponding. But in conditioned inhibition, X becomes an inhibitor, i.e. associated with\ndecreased probability of reinforcement. Inhibition is probed with two tests: a transfer test,\nin which the inhibitor is paired with a second excitor B and shown to reduce conditioned\nresponding, and a retardation test, in which the time course of response development under\nsubsequent excitatory X-US training is retarded relative to naive animals.\nYin et al. [11] explored the dimensions of these two procedures in an effort to distill the\nessential requirements for each. Under previous theories [8], it might have seemed that the\ncrucial distinction between second order conditioning and conditioned inhibition had to do\nwith either blocked versus interspersed trials, or with sequential versus simultaneous pre-\nsentation of the CS es. However, they found that using only interspersed trials and simul-\ntaneous presentation of the conditioned stimuli, they were able to shift from second-order\nconditioning to conditioned inhibition simply by increasing the number of A-X pairings.2\nTable 1 summarizes the relevant details of the experiment.\n\nFrom a theoretical perspective, these results present a challenge for models of conditioning.\nWhy do animals so drastically change their behavior regarding X given only more of the\nsame kind of A-X experience? Bayesian model averaging offers some insight.\nWe simulated the experiments of Yin et al., matching their numbers for each type of trial,\nas shown in Table 1. Results of the MCMC approximation of the Bayesian model average\nintegration are shown in Figure 2. All MCMC runs were at least 5 (cid:2) 106 iterations long\nexcluding a burn-in of 1 (cid:2) 106 iterations. The sequences were subsampled to 2:5 (cid:2) 104.\nIn Figure 2a, we see that P (US j X;D) reveals signi\ufb01cant second order conditioning\nwith few A-X trials. With more trials the predicted probability of reinforcement quickly\ndecreases. These results are consistent with the \ufb01ndings of Yin et al., as shown in Ta-\nble 1. With few A-X trials there are insuf\ufb01cient data to justify a complicated model that\naccurately \ufb01ts the data. Due to the automatic Occam\u2019s razor and the prior preference for\nsimple models, high posterior density is inferred for the simple model of Figure 3a. This\nmodel combines the stimuli from all trial types and attributes them to a single latent cause.\nWhen X is tested alone, its connection to the US through the latent cause results in a large\nP (US j X;D).\nWith more training trials, the preference for simpler models is more successfully offset and\nmore complicated models \u2014 capable of describing the data more accurately \u2014 are given\n\n2In other conditions, trial ordering was shown to have an additional effect; this is outside the scope\n\nof the present theory due to our stationarity assumptions.\n\n\f1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n \n \n \n \n \n \n \n \n \n \n\n0\n0\n\n10\n\nP(US | A, D )\nP(US | X, D )\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n \n \n \n \n \n \n \n \n \n \n\nP(US | B, D )\nP(US | X, B, D )\n\n20\n\n30\n\n40\n\nNumber of A\u2212X trials\n\n50\n\n60\n\n0\n0\n\n10\n\n20\n\n30\n\n40\n\n50\n\n60\n\nNumber of A\u2212X trials\n\n)\n \n\n,\n\n|\n\nD\nX\nS\nU\nP\n\n(\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n4\n\nNumber of A\u2212X trials\n\n48\n\n(a) Second-order Cond.\n\n(b) Summation test\n\n(c) Retardation test\n\nFigure 2: A summary of the simulation results. Error bars indicate the 3(cid:27) margin in the\nstandard error of the estimate (we omit very small error bars). (a) P (US j X;D) and\nP (US j A;D) as a function of A-X trials. For few trials (2 to 8), P (US j X;D) is high,\nindicative of second-order conditioning. (b) P (US j X; B;D) and P (US j B;D) as a\nfunction of number of A-X trials. After 10 trials, X is able to signi\ufb01cantly reduce the\npredicted probability of reinforcement generated by the presentation of B. (c) Results of\na retardation test. With many A-X trials, acquisition of an excitatory association to X is\nretarded.\n\ngreater posterior density (Figure 3c). An example of such a model is shown in Figure 3b.\nIn the model, X is made a conditioned inhibitor by a negative valued weight between x2\nand X. In testing X with a transfer excitor B, as shown in Figure 2, this weight acts to\ncancel a positive correlation between B and the US . Note that the shift from excitation\nto inhibition is due to inclusion of uncertainty over models; inferring the parameters with\nthe more complex model \ufb01xed would result in immediate inhibition. In their experiment,\nYin et al. also conducted a retardation test of conditioned inhibition for X. We follow\ntheir procedure and include in D 3 X-US trials. Our retardation test results are shown in\nFigure 2 and are in agreement with the \ufb01ndings of Yin et al.\n\nA further mystery about conditioned inhibitors, from the perspective of the benchmark\ntheory of Rescorla and Wagner [7], is the nonextinction effect: repeated presentations of a\nconditioned inhibitor X alone and unreinforced do not extinguish its inhibitory properties.\nAn experiment by Williams and Overmier [10] demonstrated that unpaired presentations of\na conditioned inhibitor can actually enhance its ability to suppress responding in a transfer\ntest. Our model shows the same effect, as illustrated with a dramatic test in Figure 4. Here\nwe used the previous dataset with only 8 A-X pairings and added a number of unpaired\npresentations of X. The additional unpaired presentations shift the model from a second-\norder conditioning regime to a conditioned inhibition regime. The extinction trials suppress\nposterior density over simple models that exhibit a positive correlation between X and US ,\nshifting density to more complex models and unmasking the inhibitor.\n\n4 Discussion\n\nWe have demonstrated our ideas in the context of a very abstract set of candidate models,\nignoring the temporal arrangement of trials and of the events within them. Obviously, both\nof these issues have important effects, and the present framework can be straightforwardly\ngeneralized to account for them, with the addition of temporal dependencies to the latent\nvariables [1] and the removal of the stationarity assumption [4].\n\nAn odd but key concept in early models of classical conditioning is the \u201ccon\ufb01gural unit,\u201d\na detector for a conjunction of co-active stimuli. \u201cCon\ufb01gural learning\u201d theories (e.g. [6])\n\n\f(cid:0)2:5\nx1\n\n(cid:0)2:5\nx1\n\n0:8\n\nx2\n\n15 10\n\n11 16\n\n16\n\n16\n\n(cid:0)8\n\n11\n\n11\n\n8\n\nA X\n(cid:0)13 (cid:0)14\n\nB U S\n\n(cid:0)14 (cid:0)13\n\nA\n(cid:0)14\n\nX\n(cid:0)14\n\nB\n(cid:0)14\n\nU S\n\n(cid:0)14\n\n3\n\n2.5\n\n2\n\n1.5\n\ns\ne\ns\nu\na\nc\n \nt\nn\ne\nt\na\nl\n \nf\no\n \nr\ne\nb\nm\nu\nn\n \ne\ng\na\nr\ne\nv\nA\n\n1\n0\n\n10\n\n20\n\nNumber of A\u2212X trials\n\n30\n\n40\n\n50\n\n60\n\n(a) Few A-X trials\n\n(b) Many A-X trials\n\n(c) Model size over trials\n\nFigure 3: Sigmoid belief networks with high probability density under the posterior. (a)\nAfter a few A-X pairings: this model exhibits second-order conditioning. (b) After many\nA-X pairings: this model exhibits conditioned inhibition. (c) The average number of latent\ncauses as a function of A-X pairings.\n\nrely on heuristics for creating such units in response to observations, a rough-and-ready\nsort of model structure learning. With a stimulus con\ufb01guration represented through a latent\ncause, our theory provides a clearer prescription for how to reason about model structure.\nOur framework can be applied to a reservoir of con\ufb01gural learning experiments, including\nnegative and positive patterning and a host of others. Another body of data on which our\nwork may shed light is acquisition of a conditioned response. Recent theories of acquisition\n(e.g. [4]) propose that animals respond to a conditioned stimulus (CS) when the difference\nin the reinforcement rate between the presence and absence of the CS satis\ufb01es some test of\nsigni\ufb01cance. From the perspective of our model, this test looks like a heuristic for choosing\nbetween generative models of stimulus delivery that differ as to whether the CS and US\nare correlated through a shared hidden cause.\n\nTo our knowledge, the relationship between second-order conditioning and conditioned in-\nhibition has never been explicitly studied using previous theories. This is in part because the\nmajority of classical conditioning theories do not account for second-order conditioning at\nall, since they typically consider learning only about CS -US but not CS -CS correlations.\nModels based on temporal difference learning [8] predict second-order conditioning, but\nonly if the two CS es are presented sequentially (not true of the experiment considered\nhere). Second-order conditioning can also be predicted if the A-X pairings cause some\nsort of representational change so that A\u2019s excitatory associations generalize to X. Yin et\nal. [11] suggest that if this representational learning is fast (as in [6], though that theory\nwould need to be modi\ufb01ed to include any second-order effects) and if conditioned inhi-\nbition accrues only gradually by error-driven learning [7], then second-order conditioning\nwill dominate initially. The details of such an account seem never to have been worked out,\nand even if they were, such a mechanistic theory would be considerably less illuminating\nthan our theory as to the normative reasons why the animals should predict as they do.\n\nAcknowledgments\n\nThis work was supported by National Science Foundation grants IIS-9978403 and DGE-\n9987588, and by AFRL contract F30602\u201301\u2013C\u20130219, DARPA\u2019s MICA program. We thank\nPeter Dayan and Maneesh Sahani for helpful discussions.\n\n\f)\n \n\nD\n\n \n|\n \n\nm\n\nm\n\n,\n\nw\n(\np\n\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n0\n\n1 X\u2212 trial\n2 X\u2212 trials\n3 X\u2212 trials\n\n0.2\n\n0.4\n\n0.6\n\nP(US | X,B,wm,m,D )\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n \n \n \n \n \n \n \n \n \n \n\n0.8\n\n1\n\n0\n0\n\n2\n\nP(US | B,D )\nP(US | X,B,D )\n\n4\n\n6\n\nNumber of X\u2212 trials\n\n8\n\n10\n\n(a) Posterior PDF\n\n(b) Summation test\n\nFigure 4: Effect of adding unpaired presentations of X on the strength of X as an inhibitor.\n(a) Posterior probability of models which predict different values of P (US j X; B). With\nonly 1 unpaired presentation of X, most models predict a high probability of US (second-\norder conditioning). With 2 or 3 unpaired presentations of X, models which predict a low\nP (US j X; B) get more posterior weight (conditioned inhibition). (b) A plot contrasting\nP (US j B;D) and P (US j X; B;D) as a function of unpaired X trials. The reduction in\nthe probability of reinforcement indicates an enhancement of the inhibitory strength of X.\nError bars indicate the 3(cid:27) margin in the standard error in the estimate (omitting small error\nbars).\n\nReferences\n[1] A. C. Courville and D. S. Touretzky. Modeling temporal structure in classical conditioning. In\nAdvances in Neural Information Processing Systems 14, pages 3\u201310, Cambridge, MA, 2002.\nMIT Press.\n\n[2] P. J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model\n\ndetermination. Biometrika, 82:711\u2013732, 1995.\n[3] Y. Iba. Extended ensemble Monte Carlo.\n\n12(5):623\u2013656, 2001.\n\nInternational Journal of Modern Physics C,\n\n[4] S. Kakade and P. Dayan. Acquisition and extinction in autoshaping. Psychological Review,\n\n109:533\u2013544, 2002.\n\n[5] D. J. C. MacKay. Bayesian model comparison and backprop nets.\nInformation Processing Systems 4, Cambridge, MA, 1991. MIT Press.\n\nIn Advances in Neural\n\n[6] J. M. Pearce. Similarity and discrimination: A selective review and a connectionist model.\n\nPsychological Review, 101:587\u2013607, 1994.\n\n[7] R. A. Rescorla and A. R. Wagner. A theory of Pavlovian conditioning: Variations in the ef-\nfectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasy, editors,\nClassical Conditioning II. Appleton-Century-Crofts, 1972.\n\n[8] R. S. Sutton and A. G. Barto. Time-derivative models of Pavlovian reinforcement. In M. Gabriel\nand J. Moore, editors, Learning and Computational Neuroscience: Foundations of Adaptive\nNetworks, chapter 12, pages 497\u2013537. MIT Press, 1990.\n\n[9] J. Tenenbaum and T. Grif\ufb01ths. Structure learning in human causal induction. In Advances in\n\nNeural Information Processing Systems 13, pages 59\u201365, Cambridge, MA, 2001. MIT Press.\n\n[10] D. A. Williams and J. B. Overmier. Some types of conditioned inhibitors carry collateral exci-\n\ntatory associations. Learning and Motivation, 19:345\u2013368, 1988.\n\n[11] H. Yin, R. C. Barnet, and R. R. Miller. Second-order conditioning and Pavlovian conditioned in-\nhibition: Operational similarities and differences. Journal of Experimental Psychology: Animal\nBehavior Processes, 20(4):419\u2013428, 1994.\n\n\f", "award": [], "sourceid": 2530, "authors": [{"given_name": "Aaron", "family_name": "Courville", "institution": null}, {"given_name": "Geoffrey", "family_name": "Gordon", "institution": null}, {"given_name": "David", "family_name": "Touretzky", "institution": null}, {"given_name": "Nathaniel", "family_name": "Daw", "institution": null}]}