{"title": "Similarity and Discrimination in Classical Conditioning: A Latent Variable Account", "book": "Advances in Neural Information Processing Systems", "page_first": 313, "page_last": 320, "abstract": null, "full_text": " Similarity and discrimination in classical\n conditioning: A latent variable account\n\n\n\n Aaron C. Courville*1,3, Nathaniel D. Daw4 and David S. Touretzky2,3\n 1Robotics Institute, 2Computer Science Department,\n 3Center for the Neural Basis of Cognition,\n Carnegie Mellon University, Pittsburgh, PA 15213\n 4Gatsby Computational Neuroscience Unit, University College London\n {aaronc,dst}@cs.cmu.edu; daw@gatsby.ucl.ac.uk\n\n\n\n Abstract\n\n We propose a probabilistic, generative account of configural learning\n phenomena in classical conditioning. Configural learning experiments\n probe how animals discriminate and generalize between patterns of si-\n multaneously presented stimuli (such as tones and lights) that are dif-\n ferentially predictive of reinforcement. Previous models of these issues\n have been successful more on a phenomenological than an explanatory\n level: they reproduce experimental findings but, lacking formal founda-\n tions, provide scant basis for understanding why animals behave as they\n do. We present a theory that clarifies seemingly arbitrary aspects of pre-\n vious models while also capturing a broader set of data. Key patterns\n of data, e.g. concerning animals' readiness to distinguish patterns with\n varying degrees of overlap, are shown to follow from statistical inference.\n\n\n1 Introduction\n\nClassical conditioning experiments probe how organisms learn to predict significant events\nsuch as the receipt of food or shock. While there is a history of detailed quantitative theo-\nries about these experiments, only recently has there been a sustained attempt to understand\nthem in terms of sound statistical prediction [1]. A statistical foundation helps to identify\nkey theoretical issues (such as uncertainty) underlying these experiments, to explain oth-\nerwise puzzling results, and to connect these behavioral theories with theories of neural\ncomputation, which are also increasingly framed in statistical terms.\n\nA cluster of issues that has received great experimental and theoretical attention in condi-\ntioning -- but not yet from a statistically grounded perspective -- concerns discrimination\nand generalization between patterns of sensory input. Historically, these issues arose in the\ncontext of nonlinear discriminations, such as the XOR problem (in which, e.g., a light and\na tone each predict shock when presented alone, but not together). While animals can learn\nsuch a discrimination, the seminal model of Rescorla and Wagner [2] cannot, since it as-\nsumes that the prediction is linear in the stimuli. Traditionally, this problem was solved by\nintroducing extra discriminative features to the model's input (known as \"configural units,\"\nsince they detect conjunctions of stimuli such as tone plus light), rendering the augmented\nproblem linearly solvable [3]. On this foundation rests a wealth of work probing how ani-\n\n\f\nmals learn and predict given compounds of stimuli. Here, we reinterpret these issues from\na Bayesian perspective.\n\nPrevious work posits an informal division (or perhaps a spectrum) between \"elemental\"\nand \"configural\" approaches to stimulus patterns, distinguished by whether a compound's\nassociation with reinforcement is derived from its individual stimuli (lights, tones), or rests\ncollectively in the full compound (light and tone together). The prototypical elemental\nmodel is the original RescorlaWagner model, without configural units, in which the ag-\ngregate prediction is linear in the elements. The standard configural model is that of Pearce\n[4], in which responding to a compound is determined by previous experience with that\nand other similar compounds, through a process of generalization and weighted averaging.\nBoth theories match an impressive range of experimental data, but each is refuted by some\nexperiments that the other captures. It is not clear how to move beyond this stalemate.\nBecause the theories lack formal foundations, their details -- particularly those on which\nthey differ -- are ad-hoc and poorly understood. For instance, what circumstances justify\nthe introduction of a new configural unit, and what should be the form of generalization\nbetween compounds?\n\nHere we leverage our Bayesian theory of conditioning [5] to shed new light on these issues.\nOur model differs from traditional ones in a number of ways. Notably, analogizing condi-\ntioning to classification, we take a generative rather than a discriminative approach. That\nis, we assume animals are modeling their complete sensory experience (lights, tones, and\nshocks) rather than only the chance of shock conditioned on lights and tones. We assume\nthat stimuli are correlated with each other, and with reinforcement, through shared latent\nvariables. Because a latent variable can trigger multiple events, these causes play a role\nakin to configural units in previous theories, but offer stronger normative guidance. Ques-\ntions about generalization (what is the probability that a latent variable is active given a\nparticular constellation of inputs) are seen as standard statistical inference; questions about\nmodel structure (how many \"configural units\" should there be and with what constellations\nof stimuli are they associated) are answerable using Bayesian model averaging, which we\nhave suggested animals can approximate [5]. Such inferences also determine whether an\nanimal's experience on a trial is best explained by multiple causes interacting additively, in\nthe style of RescorlaWagner, or by a single cause triggering multiple events like one of\nPearce's configural units. This allows our theory to capture patterns of data that seem to\nfavor each of its predecessors.\n\nOur theory is meant to shed light on the normative reasons why animals behave as they do,\nrather than on how they might carry out computations like those we describe. In practice,\nthe inferences we discuss can be computed only approximately, and we intend no claim that\nanimals are using the same approximations to them as we are. More mechanistic models,\nsuch as Pearce's, can broadly be viewed as plausible implementations for approximating\nsome aspects of our more general framework.\n\n\n2 Theories of Learning with Compound Stimuli\n\nClassical conditioning experiments probe animals' anticipation of a reinforcer R such as\nfood or footshock, given the presentation of initially neutral stimuli such as lights and tones.\nExpectation is assessed via reflexive conditioned responses such as salivation or freezing,\nwhich are thought to reveal animals' predictions of reinforcement. By studying responding\nas a function of the pattern of previous reinforcer / stimulus pairings, the experiments assess\nlearning. To describe a conditioning task abstractly, we use capital letters for the stimuli\nand + and - to indicate whether they are reinforced. For instance, the XOR task can be\nwritten as A+, B+, AB-, where AB- denotes simultaneous presentation of both stimuli\nunreinforced. Typically, each type of trial is delivered repeatedly, and the development of\nresponding is assessed.\n\n\f\nWe now describe the treatment of compound stimuli in the models of Rescorla and Wagner\n[2] and Pearce [4]. In both models, the set of stimuli present on a trial is converted into\nan input vector x. The strength of the conditioned response is modeled as proportional to\na prediction of reinforcement v = x w, the dot product between the input and a weight\nvector. Finally, one or more weights are updated proportionally to the mismatch r - v\nbetween observed and predicted reinforcement.\n\nFor both theories, x includes an element (or \"unit\") corresponding to each individual stim-\nulus.1 In Pearce's model, and in augmented \"added elements\" versions of the Rescorla\nWagner model [3], additional \"configural\" units are also included, corresponding to con-\njunctions of stimuli. In particular, it is assumed that a unique configural unit is added for\neach stimulus compound observed, such as ABC. Note that this assumption is both ar-\nbitrary (e.g. we might very well include elements for subcompounds such as AB) and\nunrealistic (given the profusion of uncontrolled stimuli simultaneously present in a real\nexperiment).\n\nThe theories differ as to how they apportion activation over x and learning over w. In the\nRescorlaWagner model, the input vector is binary: xi = 1 if the ith stimulus (or an exactly\nmatching compound) is present, 0 otherwise. For learning, the weight corresponding to\neach active input is updated. The Pearce model instead spreads graded activation over x,\nbased on a measure of similarity between the observed stimulus compound (or element)\nand the compounds represented by the model's configural units. In particular, if we denote\nthe number of stimulus elements present in an observed stimulus pattern a as size(a), and\nin the pattern represented by the ith configural unit as size(i), then the activation of unit\ni by pattern a is given by xi = size(overlap(a, i))2/(size(a) size(i)). The learning\nphase updates only the weight corresponding to the configural unit that exactly matches\nthe observed stimulus configuration.\n\nAs neither scheme has much formal basis, there seems to be no theoretical reason to prefer\none over the other, nor over any other ad-hoc recipe for apportioning representation and\nlearning. Empirical considerations also provide ambivalent guidance, as we discuss next.\n\n\n3 Data on Learning with Compound Stimuli\n\nBoth the elemental and configural models reproduce a number of well known experimental\nphenomena. Here we review several basic patterns of results. Notably, each theory has\na set of experiments that seems to support it over the other. Later, we will show that our\nnormative theory accounts for all of these results.\n\n\nOvershadowing When a pair of stimuli AB+ is reinforced together, then tested sepa-\nrately, responding to either individual stimulus is often attenuated compared to a control in\nwhich the stimulus is trained alone (A+). Both models reproduce overshadowing, though\nRescorlaWagner incorrectly predicts that it takes at least two AB+ pairings to materialize.\n\n\nSummation The converse of overshadowing is summation: when two stimuli are indi-\nvidually reinforced, then tested together, there is often a greater response to the pair than to\neither element alone. In a recent variation by Rescorla [6], animals were trained on a pair\nof compounds AB+ and CD+, then responses were measured to the trained compounds,\nthe individual elements A, B, etc., and the novel transfer compounds AD and BC. The\nstrongest response was elicited by the trained compounds. The transfer compounds elicited\na moderate response, and the individual stimuli produced the weakest responding.\n\n 1In Pearce's presentation of his model, these units are added only after elements are observed\nalone. We include them initially, which does not affect the model's behavior, to stress similarity with\nthe Rescorla-Wagner model.\n\n\f\nThe added elements RescorlaWagner model predicts this result due to the linear summa-\ntion of the influences of all the units (A through D, AB, and CD -- note that the added\nconfigural units are crucial). However, because of the normalization term in the generaliza-\ntion rule, Pearce's model often predicts no summation. Here it predicts equal responding\nto the individual stimuli and to the transfer compounds. There is controversy as to whether\nthe model can realistically be reconciled with summation effects [4, 7], but on the whole,\nthese phenomena seem more parsimoniously explained with an elemental account.\n\n\nOverlap A large number of experiments (see [4] for a review) demonstrate that the more\nelements shared by two compounds, the longer it takes animals to learn to discriminate\nbetween them. Though this may seem intuitive, elemental theories predict the opposite.\nIn one example, Redhead and Pearce [8] presented subjects with the patterns A+, BC+\nreinforced and ABC- unreinforced. Differential responding between A and ABC was\nachieved in fewer trials than that between BC and ABC.\n\nPearce's configural theory predicts this result because the extra overlap between BC and\nABC (compared to A vs. ABC) causes each compound to activate the other's configural\nunit more strongly. Thus, larger weights are required to produce a differentiated prediction.\nRescorlaWagner predicts the opposite result, because compounds with more elements, e.g.\nBC, accumulate more learning on each trial.\n\n\n4 A latent variable model of stimulus generalization\n\n\nIn this section we present a generative model of how stimuli and reinforcers are jointly\ndelivered. We will show how the model may be used to estimate the conditional proba-\nbility of reinforcement (the quantity we assume drives animals' responding) given some\npattern of observed stimuli. The theory is based on the one we presented in [5], and casts\nconditioning as inference over a set of sigmoid belief networks. Our goal here is to use this\nformalism to explain configural learning phenomena.\n\n\n4.1 A Sigmoid Belief Network Model of Conditioning\n\nConsider a vector of random variables S representing stimuli on a trial, with the jth stimu-\nlus present when Sj = 1 and absent when Sj = 0. One element of S is distinguished as the\nreinforcer R; the remainder (lights and tones) is denoted as Stim. We encode the correla-\ntions between all stimuli (including the reinforcer) through common connections to a vector\nof latent variables, or causes, x where xi {0, 1}. According to the generative process,\non each trial the state of the latent variables is determined by independent Bernoulli draws\n(each latent variable has a weight determining its chance of activation [5]). The probability\nof stimulus j being present is then determined by its relationship to the latent variables:\n\n P (Sj | m, wm, x) = (1 + exp(-(w(j))T x - w\n m bias ))-1 , (1)\n\n (j)\nwhere the weight vector wm encodes the connection strengths between x and Sj for the\nmodel structure m. The bias weight wbias is fixed at -6, ensuring that spontaneous events\nare rare. Some examples of the type of network structure under consideration are shown as\ngraphical models in Figure 1(c)(d) and Figure 2(c)(e).\n\nWe assume animals learn about the model structure itself, analogous to the experience-\ndependent introduction of configural units in previous theories. In our theory, animals\nuse experience to infer which network structures (from a set of candidates) and weights\nlikely produced the observed stimuli and reinforcers. These in turn determine predictions\nof future reinforcement. Details of this inference are laid out below.\n\n\f\n4.2 Generalization: inference over latent variables\n\nGeneralization between observed stimulus patterns is a key aspect of previous models. We\nnow describe how generalization arises in our theory.\n\nGiven a particular belief net structure m, weights wm, and previous conditioning experi-\nence D, the probability of reinforcement R given observed stimuli Stim can be computed\nby integrating over the possible settings x of the latent variables:\n\n P (R | Stim, m, wm, D) = P (R | m, wm, x)P (x | Stim, m, wm, D) (2)\n\n x\n\nThe first term is given by Equation 1. By Bayes' rule, the second term weighs particular\nsettings of the hidden causes proportionally to the likelihood that they would give rise to the\nobserved stimuli. This process is a counterpart to Pearce's generalization rule for configural\nunits. Unlike Pearce's rule, inference over x considers settings of the individual causes xi\njointly (allowing for explaining away effects) and incorporates prior probabilities over each\ncause's activation. Nevertheless, the new rule broadly resembles its predecessor in that a\ncause is judged likely to be active (and contributes to predicting R) if the constellation of\nstimuli it predicts is similar to what is observed.\n\n\n4.3 Learning to discriminate: inference over models\n\nWe treat the model weights wm and the model structure m as uncertain quantities subject\nto standard Bayesian inference. We assume that, given a model structure, the weights are\nmutually independent a priori and each distributed according to a Laplace distribution.2\nConditioning on the data D produces a posterior distribution over the weights, over which\nwe integrate to predict R:\n\n\n P (R | Stim, m, D) = P (R | Stim, m, wm, D)P (wm | m, D)dwm (3)\n\n\nUncertainty over model structure is handled analogously. Integrating over posterior model\nuncertainty we arrive at the prediction of reinforcement:\n\n P (R | Stim, D) = P (R | Stim, m, D)P (m | D), (4)\n m\n\nwhere P (m | D) P (D | m)P (m) and the marginal likelihood P (D | m) is computed\nsimilarly to equation 3, by integration over the weights. The prior over models, P (m) is\nexpressed as a distribution over n , the number of latent variables, and over l\n x i, the number\nof links between the stimuli and each latent variable: P (m) = P (n ) nx P (l\n x i=1 i). We\nassume that P (n ) and each P (l\n x i) are given by geometric distributions (param. = 0.1),\nrenormalized to sum to unity over the maximum of 5 latents and 5 stimuli. This prior\nreflects a bias against complex models. The marginal likelihood term also favors simplicity,\ndue to the automatic Occam's razor (see [5]). For our simulations, we approximately\nevaluated Equation 4 using reversible-jump Markov Chain Monte Carlo (see [5] for details).\n\nProgressively conditioning on experience to resolve prior uncertainty in the weights and\nmodel structure produces a gradual change in predictions akin to the incremental learning\nrules of previous models. The extent to which a particular model structure m participates\nin predicting R in Equation 4 is, by Bayes' rule, proportional to its prior probability, P (m),\nand to the extent that it explains the data, P (D | m). Thus a prior preference for simpler\nmodels competes against better data fidelity for more complex models. As data accumulate,\n\n 2The Laplace distribution is given by f(y) = 1 e-|y-|/b. In our simulations = 0 and b = 2.\n 2b\nAs a prior, it encodes a bias for sparsity consistent with a preference for simpler model structures.\n\n\f\n 1 1\n\n x1\n 0.8 0.8\n\n ) )\n D D \n 0.6 0.6\n A B R\n\n\n 0.4 0.4 (c) Overshadowing\n P(R | Probe, P(R | Probe,\n x x\n 0.2 0.2 1 2\n\n\n\n 0 0\n A AB Control Trained Transfer Element \n Probe Stimulus Probe Stimulus A B C D R\n\n\n (a) Overshadowing (b) Summation (d) Summation\n\nFigure 1: Results of MCMC simulation. (a) Overshadowing (AB+): the predicted probability of\nreinforcement in response to presentations of the element A, the compound AB, and an individually\ntrained control element (A+). (b) Summation experiment (AB+, CD+): the predicted probability\nof reinforcement in response to separate presentations of the trained compounds (AB, CD), the\ntransfer compounds (AD, BC) and the elements (A, B, etc.). (c) Depiction of the MAP model\nstructure after overshadowing training. (d) The MAP model structure after AB+ CD+ training.\n\n\n\nthe balance shifts toward the latter, and predictions become more accurate. Analogously,\nweights are small a priori but can grow with experience.\n\nTogether with the generalization effects discussed above, these inference effects explain\nwhy animals can learn more readily to discriminate stimulus compounds that have less\noverlap. Key to the discrimination is inferring that different compounds are produced by\nseparate latent variables; the more the compounds overlap, the more accurately will the\ndata be approximated by a model with a single latent variable (preferred a priori), which\nbiases the complexity-fidelity tradeoff toward simplicity and retards acquisition.\n\n\n5 Results\n\nOvershadowing Overshadowing exemplifies our account of between-compound gener-\nalization; our model's performance is illustrated in Figure 1(a). After 5 AB+ pairings,\nthe network with highest posterior probability, depicted in (c), contains one latent vari-\nable correlated with both stimuli and the reinforcer. Consistent with experimental results,\ntesting on A produces attenuated responding. This is because predicting whether A is rein-\nforced requires balancing the relative probabilities of two unlikely events: that the stimulus\noccurred spontaneously (with x1 inactive), versus that it was caused by x1 being active,\nbut that B uncharacteristically failed to occur (this probability measures generalization be-\ntween the patterns A and AB). Overall, this tradeoff decreases the chance that x1 is active,\nsuppressing the prediction of reinforcement relative to the control treatment, where A is\nreinforced in isolation (A+). Unlike the RescorlaWagner model, ours correctly predicts\nthat overshadowing can occur after even a single AB+ presentation.\n\n\nSummation Figure 1(b) shows our model's performance on Rescorla's AB+ CD+ sum-\nmation and transfer experiment [6], which is one of several summation experiments our\nmodel explains. Compounds were reinforced 10 times. Consistent with experimental find-\nings, the model predicts greatest responding to the trained compounds (AB, CD), moder-\nate responding to transfer compounds (AD, BC), and least responding to the elements (A,\nB, etc.). The maximum a posteriori (MAP) model structure (Figure 1(d)) mimics the train-\ning compounds, with one latent variable connected to A, B, and R and another connected\nto C, D, and R. The results follow from a combination of generalization and additivity.\nThe training compounds activate one latent variable strongly; the transfer compounds acti-\n\n\f\n 1 3.5\n\n\n 3\n 0.8\n\n 2.5\n\n 0.6 P(R | A, D ) 2\n P(R | B,C, D )\n P(R | A,B,C, D ) 1.5\n 0.4\n\n 1\n\n 0.2\n 0.5\n Average number of latent variables\n\n 0 0\n 0 5 10 15 20 25 30 0 5 10 15 20 25 30\n Trial Blocks Trial Blocks\n\n (a) Overlap: learning curves (b) Model growth with training\n\n\n x x x x x x\n 1 1 2 1 2 3\n\n\n\n\n\n A B C R A B C R A B C R\n\n\n (c) 4 Trial Blocks (d) 10 Trial Blocks (e) 20 Trial Blocks\n\n\nFigure 2: Summary of MCMC simulation results on the A+, BC+, ABC- experiment. The\nestimated error due to MCMC sampling is small and not shown. (a) Learning curves showing the\npredicted probability of reinforcement in response to separate presentations of A, BC, and ABC\nas a function of number of trial blocks. (b) The average number of latent variables over the 10000\nMCMC sample models. (c) - (e) Representations of MAP model structures after training with 4, 10,\nand 20 trial blocks (edge widths represent mean weight strength).\n\n\n\nvate both latents weakly (together additively influencing the probability of reinforcement);\nthe elements weakly activate only a single latent variable.\n\n\nOverlap Figure 2(a) shows the model's learning curves from the overlapping compound\nexperiment, A+, BC+, ABC-. Each trial block contains one trial of each type. The\nmodel correctly predicts faster discrimination between A and ABC than between BC and\nABC. This pattern results from progressive increase in the number of inferred latent vari-\nables (b). Early in training, probability density concentrates on small models with a single\nlatent variable correlating all stimuli and the reinforcer (c). After more trials, models with\ntwo latent variables become more probable, one correlating A and R and the other cor-\nrelating B and C with both A and R, attempting to capture both BC+ and ABC- trial\ntypes. (d). With further training, the most likely models are those with three latents, each\nencoding one trial type (e). Our theory captures many similar experiments demonstrating\nthe difficulty of discriminating overlapping compounds.\n\n\n6 Discussion\n\nThe configural unit is an ad-hoc device that nonetheless plays a key role in previous exper-\nimental and theoretical work in conditioning. Its inclusion in models like that of Rescorla\nWagner invites a number of questions. Which configurations should be represented? How\nshould activation and learning be apportioned between them? These issues are contentious,\nadmitting no clear answer, precisely because of the arbitrary nature of the device. We\nhave shown how a latent variable correlated with a constellation of stimuli provides a well\nfounded counterpart to the configural unit, and how a range of experimental phenomena\nconcerning similarity and discrimination can be accounted for with the assumption that\nanimals are carrying out inference about these variables. While data exist that tend to fa-\n\n\f\nvor each of the two major previous models of configural learning over the other, the new\nmodel accounts for the full pattern, balancing the strengths of both theories. Our theory\nalso improves on its predecessors in other ways; for instance, because it includes learning\nabout stimulus interrelationships it can explain second-order conditioning [5], which is not\naddressed by either the Pearce or the RescorlaWagner accounts.\n\nOf course, many issues remain. A full account of summation phenomena, in particular, is\nbeyond the scope of the present model. We treat reinforcer delivery as binary and model\na limited, saturating, summation in probabilities. However, realistic summation almost\ncertainly concerns reinforcement magnitudes as well (see, for example, [9]), and our model\nwould need to be augmented to address them. Because we have assumed that trials are IID,\nthe model cannot yet account for effects of trial ordering (e.g. the difference between partial\nreinforcement and extinction). These could be addressed by incorporating dynamics into\nthe generative model, so that inference requires tracking the changing model parameters.\nAlso for future work is exploring how different priors might give rise to different behavior.\nAn advantage of Bayesian modeling is that because the free parameters are formulated as\npriors, they represent concrete assertions about the world (e.g. how often particular kinds\nof events occur), and can thus be constrained and even experimentally manipulated.\n\nWe have focused only on two previous models and only on animal behavioral experiments.\nIssues of similarity and discrimination are also studied in the rather different setting of hu-\nman category judgments, where Bayesian generative approaches have also proved useful\n[10]. There is also a tradition of more neurophysiological models of the hippocampal sub-\nstrates of configural learning [11, 12]. Given the large body of theory and experiment on\nthese issues, this seems a promising direction for future work connecting our behavioral\ntheory with neurophysiological ones. In one of the hippocampal theories, Gluck and My-\ners [12] augment the RescorlaWagner model with an input representation learned by an\nautoencoder. Since autoencoders perform probabilistic density modeling, this is probably\nthe most statistically minded of prior approaches to configural representation and has clear\nparallels with our work.\n\n\nAcknowledgments\n\nThis work was supported by National Science Foundation grants IIS-9978403 and DGE-\n9987588. ND is funded by a Royal Society USA Research Fellowship and the Gatsby\nFoundation. We thank Peter Dayan, Yael Niv and Geoff Gordon for helpful discussions.\n\n\nReferences\n\n [1] P. Dayan, T. Long, Advances in Neural Information Processing Systems 10 (1998), pp. 117123.\n\n [2] R. A. Rescorla, A. R. Wagner, Classical Conditioning II, A. H. Black, W. F. Prokasy, eds.\n (Appleton-Century-Crofts, 1972), pp. 6499.\n\n [3] R. A. Rescorla, Journal of Comparative and Physiological Psychology 79, 307 (1972).\n\n [4] J. M. Pearce, Psychological Review 101, 587 (1994).\n\n [5] A. C. Courville, N. D. Daw, G. J. Gordon, D. S. Touretzky, Advances in Neural Information\n Processing Systems 16 (2004).\n\n [6] R. A. Rescorla, Quarterly Journal of Experimental Psychology 56B, 161 (2003).\n\n [7] R. A. Rescorla, Animal Learning and Behavior 25, 200 (1997).\n\n [8] E. S. Redhead, J. M. Pearce, Quarterly Journal of Experimental Psychology 48B, 46 (1995).\n\n [9] E. F. Kremer, Journal of Experimental Psychology: Animal Behavior Processes 4, 22 (1978).\n\n[10] J. B. Tenenbaum, T. L. Griffiths, Behavioral and Brain Sciences 24, 629 (2001).\n\n[11] R. C. O'Reilly, J. W. Rudy, Psychological Review 108, 311 (2001).\n\n[12] M. A. Gluck, C. Myers, Hippocampus 3, 491 (1993).\n\n\f\n", "award": [], "sourceid": 2711, "authors": [{"given_name": "Aaron", "family_name": "Courville", "institution": null}, {"given_name": "Nathaniel", "family_name": "Daw", "institution": null}, {"given_name": "David", "family_name": "Touretzky", "institution": null}]}