{"title": "A Theory of Decision Making Under Dynamic Context", "book": "Advances in Neural Information Processing Systems", "page_first": 2485, "page_last": 2493, "abstract": "The dynamics of simple decisions are well understood and modeled as a class of random walk models (e.g. Laming, 1968; Ratcliff, 1978; Busemeyer and Townsend, 1993; Usher and McClelland, 2001; Bogacz et al., 2006). However, most real-life decisions include a rich and dynamically-changing influence of additional information we call context. In this work, we describe a computational theory of decision making under dynamically shifting context. We show how the model generalizes the dominant existing model of fixed-context decision making (Ratcliff, 1978) and can be built up from a weighted combination of fixed-context decisions evolving simultaneously. We also show how the model generalizes re- cent work on the control of attention in the Flanker task (Yu et al., 2009). Finally, we show how the model recovers qualitative data patterns in another task of longstanding psychological interest, the AX Continuous Performance Test (Servan-Schreiber et al., 1996), using the same model parameters.", "full_text": "A Theory of Decision Making Under Dynamic\n\nContext\n\nPrinceton Neuroscience Institute\n\nDepartment of Mechanical and Aerospace Engineering\n\nMichael Shvartsman\n\nPrinceton University\nPrinceton, NJ, 08544\n\nms44@princeton.edu\n\nVaibhav Srivastava\n\nPrinceton University\nPrinceton, NJ, 08544\n\nvaibhavs@princeton.edu\n\nJonathan D. Cohen\n\nPrinceton Neuroscience Institute\n\nPrinceton University\nPrinceton, NJ, 08544\n\njdc@princeton.edu\n\nAbstract\n\nThe dynamics of simple decisions are well understood and modeled as a class\nof random walk models [e.g. 1\u20134]. However, most real-life decisions include a\ndynamically-changing in\ufb02uence of additional information we call context. In this\nwork, we describe a computational theory of decision making under dynamically\nshifting context. We show how the model generalizes the dominant existing model\nof \ufb01xed-context decision making [2] and can be built up from a weighted combi-\nnation of \ufb01xed-context decisions evolving simultaneously. We also show how the\nmodel generalizes recent work on the control of attention in the Flanker task [5].\nFinally, we show how the model recovers qualitative data patterns in another task\nof longstanding psychological interest, the AX Continuous Performance Test [6],\nusing the same model parameters.\n\n1\n\nIntroduction\n\nIn the late 1940s, Wald and colleagues developed a sequential test called the sequential probability\nratio test (SPRT; [7]). This test accumulates evidence in favor of one of two simple hypotheses until\na log likelihood threshold is crossed and one hypothesis is selected, forming a random walk to a\ndecision bound. This test was quickly applied as a model of human decision making behavior both\nin its discrete form [e.g. 1] and in a continuous realization as biased Wiener process (the Diffusion\nDecision Model or DDM; [2]). This work has seen a recent revival due to evidence of neurons that\nappear to re\ufb02ect ramping behavior consistent with evidence accumulation [e.g. 8], cortical circuits\nimplementing a decision process similar to the SPRT in the basal ganglia in rats [9], and the \ufb01nding\ncorrelations between DDM parameters and activity in EEG [10] and fMRI [11].\nBolstered by this revival, a number of groups investigated extension models. Some of these models\ntackle complex hypothesis spaces [e.g. 12], or greater biological realism [e.g. 13]. Others focus\non relaxing stationarity assumptions about the task setting, whether by investigating multi-stimulus\nintegration [5], deadlines [14], or different evidence distribution by trial [15].\nWe engage with the latter literature by providing a theory of multi-alternative decision making under\ndynamically changing context. We de\ufb01ne context simply as additional information that may bear\nupon a decision, whether from perception or memory. Such a theory is important because even\nsimple tasks that use apparently-\ufb01xed contexts such as prior biases may require inference on the\n\n1\n\n\fcontext itself before it can bear on the decision. The focus on dynamics is what distinguishes our\nwork from efforts on context-dependent changes in preferences [e.g. 16] and internal context updat-\ning [e.g. 17]. The admission of evidence from memory distinguishes it from work on multisensory\nintegration [e.g. 18].\nWe illustrate such decisions with an example: consider seeing someone that looks like a friend (a\ntarget stimulus), and a decision: to greet or not greet this person. A context can be external (e.g.\na concert hall) or internal (e.g. the memory that the friend went on vacation, and therefore this\nperson is likely a lookalike). The context can strictly constrain the decision (e.g. greeting a friend\nin the street vs. the middle of a \ufb01lm), or only bias it (guessing whether this is a friend or lookalike\nafter retrieving the memory of them on vacation). Regardless, context affects the decision, and we\nassume it needs to be inferred, either before or alongside the greeting decision itself. We aim to\nbuild a normative theory of this context processing component of decision making. We show that\nour theory generalizes the discrete-time context-free SPRT (and therefore a Wiener process DDM in\ncontinuous time) and how context-dependent decisions can be optimally built up from a dynamically\nweighted combination of context-free decisions.\nOur theory is general enough to consider a range of existing empirical paradigms in the literature,\nincluding the Stroop, Flanker, Simon, and the AX-CPT [6, 19\u201321]. We choose to mention these in\nparticular because they reside on the bounds of the task space our theory considers on two different\ndimensions, and describe a discretization of task space on those dimensions that accommodates\nthose existing paradigms. We show that in spite of the framework\u2019s generality, it can provide well-\nbehaved zero-parameter predictions across qualitatively different tasks. We do this by using our\nframework to derive a notational variant of an existing Flanker model [5], and using parameter values\nfrom this previous model to simultaneously generate qualitatively accurate predictions in both the\n\ufb02anker and AX-CPT paradigms. That is, our theory generates plausible behavior in qualitatively\ndifferent tasks, using the same parameters.\n\n2 The theoretical framework\n\nWe assume that dynamic context decision making, like \ufb01xed context decision making, can be un-\nderstood as a sequential Bayesian inference process. Our theory therefore uses sequentially drawn\nsamples from external input and/or internal memory to compute the joint posterior probability over\nthe identity of the true context and decision target over time. It maps from this joint probability to\na response probability using a \ufb01xed response mapping, and uses a \ufb01xed threshold rule de\ufb01ned over\nthe response probability to stop sampling and respond. We make a distinction between our theory\nof decision making and individual task models that can be derived from the theory by picking points\nin task space that the theory accommodates.\nFormally, we assume the decider conditions a decision based on its best estimate of two pieces of\ninformation: some unknown true context taking on one of the values {ci}n\ni=0, and some unknown\ntrue target taking on one of the values {gj}m\nj=0. This intentionally abstracts from richer views of\ncontext (e.g. ones which assume that the context is qualitatively different from the target, or that the\nrelevant context to sample from is unknown). We denote by C, G random variables representing the\npossible draws of context and target, and r(\u00b7) a deterministic function from the distribution P (C, G)\nto a distribution over responses. We de\ufb01ne an abstract context sensor and target sensor selectively\ntuned to context or target information, such that eC is a discrete piece of evidence drawn from the\ncontext sensor, and eG one drawn from the target sensor. The goal of the decider is to average over\nthe noise in the sensors to estimate the pair (C, G) suf\ufb01ciently to determine the correct response,\nand we assume that this inference is done optimally using Bayes\u2019 rule.\nc \u2265 ton\nthe time at which it disap-\nWe denote by ton\nc\nthe time at which the target appears and disappears. We also restrict\npears, and likewise ton\nthese times such that ton\ng ; this is the primary distinction between context and target, which\ncan otherwise be two arbitrary stimuli. The onsets and offsets de\ufb01ne one dimension in a continuous\nspace of tasks over which our theory can make predictions.\nThe form of r(\u00b7) de\ufb01nes a second dimension in the space of possible tasks where our theory makes\npredictions. We use a suboptimal but simple threshold heuristic for the decision rule: when the a\n\nthe time at which the context appears and tof f\ng \u2264 tof f\nc \u2264 ton\n\ng\n\nc\n\n2\n\n\fg\n\nc = ton\n\ng and tof f\n\nc = tof f\nc \u2264 ton\n\nposteriori probability of any response crosses some adaptively set threshold, sampling ends and the\nresponse is made in favor of that response.\nFor the purposes of this paper, we restrict ourselves to two extremes on both of these dimensions.\nFor stimulus onset and offset times, we consider one setting where the context and target appear\n), and one where the target\nand disappear together (perfect overlap, i.e. ton\nappears some time after the context disappears (no overlap, i.e. tof f\ng ). We label the former the\nexternal context model, because the contextual information is immediately available, and the latter\nthe internal context model, because the information must be previously encoded and maintained.\nThe external context model is like the ongoing \ufb01lm context from the introduction, and the internal\ncontext is like knowing that the friend is on vacation.\nFor the response mapping function r(\u00b7) we consider one setting where the response is solely con-\nditioned on the perceptual target (context-independent response) and one where the response is is\nconditioned jointly on the context-target pair (context-dependent response). The context-dependent\nresponse is like choosing to greet or not greet the friend at the movie theater, and the context-\nindependent one is like choosing to greet or not greet the friend on the street.\nIn the lab, classic tasks like the Stroop, Flanker, and Simon [19\u201321] fall into the taxonomy as\nexternal-context tasks with a context-independent response, because the response is solely condi-\ntioned on the perceptual target. On the other side of both dimensions are tasks like the N-back task\nand the AX Continuous Performance Test [6]. In our consideration of these tasks, we restrict our\nattention to the case where there are only two possible context and target hypotheses. The sequential\ninference procedure we use can be performed for other numbers of potentially-dependent hypotheses\nand responses, though the analysis we show later in the paper relies on the n = m = 2 assumption\nand on indepednence between the two sensors.\n\n3 External context update\n\nFirst we describe the inference procedure in the case of perfect overlap of context and target. At the\ncurrent timestep \u03c4, the decider has available evidence samples from both the context and the target\n(eC and eG) and uses Bayes\u2019 rule to compute the posterior probability P (C, G | eC, eG):\n\nP\u03c4 (C = c, G = g | eC , eG) \u221d P (eC , eG | C = c, G = g)P\u03c4\u22121(C = c, G = g)\n\n(1)\nThe \ufb01rst term is the likelihood of the evidence given the joint context-target hypothesis, and the\nsecond term is the prior, which we take to be the posterior from the previous time step. We use\nthe \ufb02anker task as a concrete example. In this task, participants are shown a central target (e.g. an\nS or an H) surrounded on both sides by distractors (\u2018\ufb02ankers\u2019, more S or H stimuli) that are either\ncongruent or incongruent with it. Participants are told to respond to the target only but show a\nnumber of indications of in\ufb02uence of the distractor, most notably an early period of below-chance\nperformance and a slowdown or reduced accuracy with incongruent relative to congruent \ufb02ankers\n[20]. We label the two possible target identities {g0 = S, g1 = H} and the possible \ufb02anker identities\n{c0 = S_S, c1 = H_H} with the underscore representing the position of the target. This gives us\nthe two congruent possibilities {[C = c0, G = g0], [C = c1, G = g1]} or [SSS,HHH] and the\ntwo incongruent possibilities {[C = c0, G = g1], [C = c1, G = g0]} or [SHS,HSH]. The response\nmapping function marginalize over context identities at each timestep:\nc P (C = c, G = g0)\nc P (C = c, G = g1)\n\nr0 with probability (cid:80)\nr1 with probability (cid:80)\n\nr(P (C, G)) =\n\n(cid:40)\n\n(2)\n\nThe higher of the two response probabilities is compared to a threshold \u03b8 and when this threshold\nis crossed, the model responds. What remains is to de\ufb01ne the prior P0(C, G) and the likelihood\nfunction P (eC, eG|C, G) or its inverse, the sample generation function. For sample generation, we\nassume that the context and target are represented as two Gaussian distributions:\n\neC \u223c N (\u00b5c + \u03b1\u00b5\u00b5g , \u03c32\neG \u223c N (\u00b5g + \u03b1\u00b5\u00b5c, \u03c32\n\nc + \u03b1\u03c3 \u03c32\ng )\ng + \u03b1\u03c3 \u03c32\nc )\n\n(3)\n\n(4)\nHere \u00b5c and \u00b5g are baseline means for the distributions of context and target, \u03c32\ng are their\nvariances, and the \u03b1 scaling factors mix them, potentially re\ufb02ecting perceptual overlap in the sensors.\nThis formulation is a notational variant of an earlier \ufb02anker model [5], but we are able to derive it by\ndescribing the task in our formalism (we describe the exact mapping in the supplementary material).\nMoreover, we later show how this notational equivalence lets us reproduce both Yu and colleagues\u2019\nresults and data patterns in another task, using the same parameter settings.\n\nc and \u03c32\n\n3\n\n\f4 Comparison to a constant-drift model\n\nWe now write the model in terms of a likelihood ratio test to facilitate comparison to Wald\u2019s SPRT\nand Wiener diffusion models. This is complementary to an earlier approach performing dynamical\nanalysis on the problem in probability space [22]. First we write the likelihood ratio Z of the full\nresponse posteriors for the two responses. Since the likelihood ratio and the max a posteriori prob-\nability are monotonically related, thresholding on Z maps onto the threshold over the probability of\nthe most probable response we described above.\n\nZ =\n\n=\n\np(r(P (C, G)) = r0|eC , eG)\n(cid:16)\n(cid:17)\np(r(P (C, G)) = r1|eC , eG)\n(cid:0)P (eC , eG | C = c0, G = g1)P\u03c4\u22121(C = c0, G = g1) + P (eC , eG | C = c1, G = g1)P\u03c4\u22121(C = c1, G = g1)(cid:1)\nP (eC , eG | C = c0, G = g0)P\u03c4\u22121(C = c0, G = g0) + P (eC , eG | C = c1, G = g0)P\u03c4\u22121(C = c1, G = g0)\n\n(5)\n\n(6)\n\nFor this analysis we assume that context and target samples are drawn independently from each\nother, i.e. that \u03b1\u00b5 = \u03b1\u03c3 = 0 and therefore that P (eC, eG | C, G) = P (eC | C)P (eG | T ). We also\nindex the evidence samples by time to remove the prior terms P\u03c4\u22121(\u00b7), and introduce the notation\n| C = cx) for the likelihoods, with x \u2208 {0, 1}\nlt(tx) = P (eG\nindexing stimuli and t \u2208 {tcon = tgon . . . \u03c4} indexing evidence samples over time. Now we can\nt\nrewrite:\n\n| G = gx) and lt(cx) = P (eC\n\nt\n\nZ\u03c4 =\n\nt lt(c0)lt(g0) + P0(C = c1, G = g0)(cid:81)\nt lt(c0)lt(g1) + P0(C = c1, G = g1)(cid:81)\n\nDivide both the numerator and the denominator by(cid:81)\n\n=\n\nP0(C = c0, G = g0)(cid:81)\nP0(C = c0, G = g1)(cid:81)\nP0(C = c0)P (G = g0 | C = c0)(cid:81)\nP0(C = c0)P (G = g1 | C = c0)(cid:81)\nP0(C = c0)P (G = g0 | C = c0)(cid:81)\nP0(C = c0)P (G = g1 | C = c0)(cid:81)\n\nZ\u03c4 =\n\nSeparate out the target likelihood product and take logs:\n\nlog Z\u03c4 =\n\n\u03c4(cid:88)\n\nt=1\n\nlog\n\nlt(g0)\nlt(g1)\n\n+ log\n\nP (G = g0 | C = c0)\nP (G = g1 | C = c0)\n\nt lt(c1)lt(g0)\nt lt(c1)lt(g1)\n\nt\n\nt\n\nlt(c0 )\n\nlt(c0 )\n\nt lt(c1)lt(g0)\nt lt(c1)lt(g1)\n\nt lt(c1):\n\nt lt(c0)lt(g0) + P0(C = c1)P (G = g0 | C = c1)(cid:81)\nt lt(c0)lt(g1) + P0(C = c1)P (G = g1 | C = c1)(cid:81)\nlt(c1 ) lt(g0) + P0(C = c1)P (G = g0 | C = c1)(cid:81)\nlt(c1 ) lt(g1) + P0(C = c1)P (G = g1 | C = c1)(cid:81)\ng =(cid:80)\ng = (cid:80)\n\nP0 (C=c0)\nP0 (C=c1)\nP0 (C=c0)\nP0 (C=c1)\n\n(cid:81)\n(cid:81)\n\nt\n\nt\n\nlt(c0 )\n\nlt(c1 ) + P (G = g0 | C = c1)\nlt(c1 ) + P (G = g1 | C = c1)\n\nlt(c0 )\n\nt lt(g0)\n\nt lt(g1)\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n(11)\n\nlt(g1). In the\nNow, the \ufb01rst term is the Wald\u2019s sequential probability ratio test [7] with z\u03c4\ncontinuum limit, it is equal to a Wiener diffusion process dzg = agdt+bgdW with ag = E[log l(g0)\nl(g1) ]\nand b2\nlt(g1) and do\nthe same for the context drift that appears on both numerator and denominator of the \ufb01nal term:\nzc\n\nl(g1) ] [1, 4]. We can relabel the SPRT for the target z\u03c4\n\ng = Var[log l(g0)\n\nt log lt(g0)\n\n\u03c4 =(cid:80)\n\nc = log P0(C=c0)\n\nt log lt(c0)\n\nt log lt(g0)\n\nlt(c1) and z0\n\nP0(C=c1). Then the expression is as follows:\nc + P (G = g0 | C = c1)\nc + P (G = g1 | C = c1)\n\nP (G = g0 | C = c0)ez0\nP (G = g1 | C = c0)ez0\n\nc +z\u03c4\nc +z\u03c4\n\nlog Z\u03c4 = z\u03c4\n\ng + log\n\n(cid:17)\n\n(cid:16) P (G=g0)\n\nlog Z \u03c4 in equation (11) comprises two terms. The \ufb01rst is the unbiased SPRT statistic, while the\nsecond is a nonlinear function of the SPRT statistic for the decision on the context. The nonlinear\nterm plays the role of bias in the SPRT for decision on target. This rational dynamic prior bias is an\nadvance over previous heuristic approaches to dynamic biases [e.g. 23].\nSeveral limits of (11) are of interest: if the context and the target are independent, then the second\n, and (11) reduces to the biased SPRT for the target. If each target\nterm reduces to log\nis equally likely given a context, then the nonlinear term in (11) reduces to zero and (11) reduces\nto the SPRT for the target. If each context deterministically determines a different target, then any\npiece of evidence on the context is equally informative about the target. Accordingly, (11) reduces\nto the sum of statistic for context and target, i.e., z\u03c4\nc ). If the magnitude of drift rate for\nthe context is much higher than the magnitude of drift rate for the target, or the magnitude of the\n0 is high, then the nonlinear term saturates at a faster timescale compared to the decision time.\nbias zc\nIn this limit, the approximate contribution of the nonlinear term is either log\n, or\n\n(cid:16) P (G=g0|C=c0)\n\ng \u00b1 (z\u03c4\n\nc + z0\n\nP (G=g1)\n\n(cid:17)\n\nP (G=g1|C=c0)\n\n. Finally, in the limit of large thresholds, or equivalently, large decision times\nlog\n|zc\nc| will be small, and the nonlinear term in (11) can be approximated\n\u03c4 + zc\nby a linear function of z\u03c4\n0 obtained using the \ufb01rst order Taylor series expansion. In all these\ncases, (11) can be approximated by a sum of two SPRTs. However, this approximation may not hold\n\nP (G=g1|C=c1)\n0| will be a large, e\u2212|z\u03c4\nc + zc\n\nc +z0\n\n(cid:16) P (G=g0|C=c1)\n\n(cid:17)\n\n4\n\n\fin general and we suspect many interesting cases will require us to consider the nonlinear model\nin (11). In those cases, the signal and noise characteristics of context and target will have different \u2013\nand we think distinguishable \u2013 effects on the RT distributions we measure.\n\n5 The internal-context update and application to a new task\n\nRecall our promise to explore two extremes on the dimension of context and onset timing, and\ntwo extremes on the dimension of context-response dependence. The \ufb02anker task is an external\ncontext task with a context-independent response, so we now turn to an internal context task with\ncontext-dependent response. This task is the AX Continuous Performance Test (AX-CPT), a task\nwith origins in the psychiatry literature now applied to cognitive control [6]. In this task, subjects\nare asked to make a response to a probe (target) stimulus, by convention labeled \u2018X\u2019 or \u2018Y\u2019, where\nthe response mapping is determined by a previously seen cue (context) stimulus, \u2018A\u2019 or \u2018B\u2019. In our\nnotation: g0 = X, g1 = Y, c0 = A, c1 = B. Unlike the \ufb02anker, where all stimuli pairs are equally\nlikely, in the AX-CPT AX trials are usually the most common (appearing 50% of the time or more),\nand BY trials least common. AY and BX trials appear with equal frequency, but have dramatically\ndifferent conditional probabilities due to the preponderance of AX trials.\nTwo response mappings are used in the literature: an asymmetric one where one response is made\non AX trials and the other response otherwise; and a symmetric variant where one response is made\nto AX and BY trials, and the other to AY and BX trials. We focus on the symmetric variant, since\nin this case the response is always context-dependent (in the asymmetric variant the response is is\ncontext-independent on Y trials). We can use the de\ufb01nition of the task to write a new form for r(\u00b7):\n\nr(P (C, G)) =\n\nr0 = \u2018lef t(cid:48)\nwith probability P (G = g0, C = c0) + P (G = g1, C = c1)\nr1 = \u2018right(cid:48) with probability P (G = g0, C = c1) + P (G = g1, C = c0)\n\n(12)\n\n(cid:40)\n\nWe assume for simplicity that the inference process on the context models the maintenance of con-\ntext information and retrieval of the response rule (though the model could be extended to perceptual\n, using the following\nencoding of the context as well). That is, we start the inference machine at tof f\nupdate when tof f\n\nc \u2264 \u03c4 \u2264 ton\ng :\n\nc\n\nThen, once the target appears the update becomes:\n\nP\u03c4 (C, G | eC ) \u221d P (eC | C, G)P\u03c4\u22121(C, G)\n\n(13)\n\nP\u03c4 (C, G | eC , eG) \u221d P (eC , eG | C, G)P\u03c4\u22121(C, G)\n\n(14)\nFor samples after the context disappears, we introduce a simple decay mechanism wherein the prob-\nability with which the context sensor provides a sample from the true context decays exponentially.\nA sample is drawn from the true context with probability e\u2212d(\u03c4\u2212tof f\n), and drawn uniformly oth-\nerwise. The update takes this into account, such that as \u03c4 grows the ratio P (eC|C=c0)\nP (eC|C=c1) approaches\n1 and the context sensor stops being informative (notation omitted for space). This means that the\nunconditional posterior of context can saturate at values other than 1. The remainder of the model\nis exactly as described above. This provides an opportunity to generate predictions of both tasks in\na shared model, something we take up in the \ufb01nal portion of the paper. But \ufb01rst, as in the \ufb02anker\nmodel, we reduce this model to a combination of multiple instances of the well-understood DDM.\n\nc\n\n6 Relating the internal context model to the \ufb01xed-context drift model\n\nWe sketch an intuition for how our internal context model can be built up from a combination of\n\ufb01xed-context drift models (again assuming sensor independence). The derivation uses the same trick\nof dividing numerator and denominator by the likelihood as the \ufb02anker expressions, and is included\nin the supplementary material, as is the asymmetric variant. We state the \ufb01nal expression for the\nsymmetric case here:\n\nlog Z = log\n\nP0(C = c0, G = g0)ez\u03c4\nc e\nP0(C = c0, G = g1)ez\u03c4\n\nz\u03c4\ng + P0(C = c1, G = g1)\nz\u03c4\ng\n\nc + P0(C = c1, G = g0)e\n\n(15)\n\nEquation (15) combines the SPRT statistic associated with the context and the target in a nonlinear\nfashion which is more complicated than in (11), further complicated by the fact that the memory\ndecay turns the context random walk into an Ornstein-Uhlenbeck process in expectation (notation\nomitted for space, but follows from the relationship between continuous O-U and discrete AR(1)\nprocesses). The reduction of these equations to a SPRT or the sum of two SPRTs is subtle, and is\nvalid only in rather contrived settings. For example, if the drift rate for the target is much higher\n\n5\n\n\fthan the drift rate for the context, then in the limit of large thresholds (15) can be approximated by\neither log P0(C=c0,G=g0)\nc . As with (11), we think it will be highly\ninstructive to further invesgate the cases where the reductions cannot apply.\n\nP0(C=c0,G=g1) \u2212 z\u03c4\n\nc , or log P0(C=c1,G=g1)\n\nP0(C=c1,G=g0) + z\u03c4\n\n7 Simulation results for both tasks using the same model and parameters\n\nWith the relationship between both tasks established via our theory, we can now simulate behavior in\nboth tasks under nearly the same model parameters. The one difference is in the memory component,\ngoverned by the memory decay parameter d and the target onset time \u03c4ton. Longer intervals between\ncontext disappearance and target appearance have the same effect as higher values of d: they make\ncontext retrieved more poorly. We use d = 0.0001 for the decay and a 2000-timestep interval, which\nresults in approximately 82% probability of drawing a correct sample by the time the target comes\non. The effect of both parameters is equivalent in the results we show, since we do not explore\nvariable context-target delays, but could be explored by varying this duration.\nFor simplicity we assume the sampling distribution for eC and eG is identical for both tasks, though\nthis need not hold except for identical stimuli sampled from perception. For \ufb02anker simulations we\nuse the model no spatial uncertainty, i.e. \u03b1\u00b5 = \u03b1\u03c3 = 0, to best match the AX-CPT model and\nour analytical connections to the SPRT. We assume the model has a high congruence prior for the\n\ufb02anker model, and the correct prior for the AX-CPT, as detailed in Table 1.\n\nContext\n\nTarget\n\nPrior\n\nFlanker AX-CPT Flanker AX-CPT Flanker AX-CPT\nS_S\nS_S\nH_H\nH_H\n\n0.45\n0.05\n0.05\n0.45\n\n0.5\n0.2\n0.2\n0.1\n\nA\nA\nB\nB\n\nS\nH\nS\nH\n\nX\nY\nX\nY\n\nTable 1: Priors for the inference process for the Flanker and AX-CPT instantiation of our theory.\n\nThe remainder of parameters are identical across both task simulations: \u03c3c = \u03c3g = 9, \u03b8 = 0.9,\n\u00b5c = \u00b5g = 0 for c0 and g0, and \u00b5c = \u00b5g = 1 for c1 and g1. To replicate the \ufb02anker results,\nwe followed [5] by introducing a non-decision error parameter \u03b3 = 0.03: this is the probability of\nmaking a random response immediately at the \ufb01rst timestep. We simulated 100,000 trials for each\nmodel. Figure 1 shows results from the simulation of the \ufb02anker task, recovering the characteristic\nearly below-chance performance in incongruent trials. This simulation supports the assertion that\nour theory generalizes the \ufb02anker model of [5], though we are not sure why our scale on timesteps\nappears different by about 5x in spite of using what we think are equivalent parameters. A library\nfor simulating tasks that \ufb01t in our framework and code for generating all simulation \ufb01gures in this\npaper can be found at https://github.com/mshvartsman/cddm.\nFor the AX-CPT behavior, we compare qualitative patterns from our model to a heterogeneous\ndataset of humans performing this task (n=59) across 4 different manipulations with 200 trials per\nsubject [24]. The manipulations were different variants of \u201cproactive\u201d-behavior inducing manipu-\nlations in the sense of [25]. This is the most apt comparison to our model: proactive strategies are\nargued to involve response preparation of the sort that our model re\ufb02ects in its accumulation over\nthe context before the target appears.\nFigure 2 shows mean RTs and accuracies produced by our model for the AX-CPT, under the same\nparameters that we used for the \ufb02anker model. This model recovers the qualitative pattern of behav-\nior seen in human subjects in this task, both RT and error proportion by condition showing the same\npattern. Moreover, if we examine the conditional RT plot (Figure 3) we see that the model predicts\na region of below-chance performance early in AY trials but not other trials. This effect appears\nisomorphic to the early congruence effect in the \ufb02anker task, in the sense that both are caused by a\nstrong prior biased away from the correct response: on incongruent trials given a high congruence\nprior in the \ufb02anker, and on AY trials given a high AX prior in AX-CPT. More generally, the model\nrecovers conditional accuracy curves that look very similar to those in the human data.\n\n6\n\n\fFigure 1: Model recovers characteristic \ufb02anker pattern. Left: response time computed by 50-\ntimestep RT bin for congruent and incongruent trials, showing early below-chance performance.\nRight: response time distributions for congruent and incongruent trials, showing the same mode but\nfatter tail for incongruent relative to congruent trials. Both are signature phenomena in the \ufb02anker\ntask previously recovered by the model of Yu and colleagues, consistent with our theory being a\ngeneralization of their model.\n\nFigure 2: Model recovers gross RT patterns in human behavior. Left: RT and error rates by trial\ntype in the model, using the same parameters as the \ufb02anker model. Right: RT and error rates by trial\ntype in 59 human participants. Error bars are standard errors (where not visible, they are smaller\nthan the dots). Both RT and error patterns are quite similar (note that the timestep-to-ms mapping\nneed not be one-to-one).\n\n8 Discussion\n\nIn this paper, we have provided a theoretical framework for understanding decision making under\ndynamically shifting context. We used this framework to derive models of two distinct tasks from\nthe cognitive control literature, one a notational equivalent of a previous model and the other a novel\nmodel of a well-established task. We showed how we can write these models in terms of com-\nbinations of constant-drift random walks. Most importantly, we showed how two models derived\nfrom our theoretical framing can recover RT, error, and RT-conditional accuracy patterns seen in\nhuman data without a change of parameters between tasks and task models. Our results are quan-\ntitatively robust to small changes in the prior because equations 12 and 16 are smooth functions of\nthe prior. The early incongruent errors in \ufb02anker are also robust to larger changes, as long as the\ncongruence prior is above 0.5. The ordering of RTs and error rates for AX-CPT rely on assuming\nthat participants at least learn the correct ordering of trial frequencies \u2013 we think an uncontroversial\nassumption.\nOne natural next step should be to generate direct quantitative predictions of behavior in one task\nbased on a model trained on another task \u2013 ideally on an individual subject level, and in a task\n\n7\n\nllllllllllllllllllllllllllllllllllllllllll0.000.250.500.751.0002505007501000TimestepsAccuracyllCongruentIncongruent0.0000.0020.0040.00602505007501000TimestepsdensityCongruentIncongruentllll350400450500550A:XA:YB:XB:YTrial TypeRT (timesteps)RT by condition (model)llll0.050.100.150.20A:XA:YB:XB:YTrial TypeError ProportionErrors by condition (model)llll420430440450460470AXAYBXBYTrial TypeRT (ms)RT by condition (humans)llll0.050.100.150.20AXAYBXBYTrial TypeError ProportionErrors by condition (humans)\fFigure 3: Model recovers conditional accuracy pattern in human behavior. Left: response\ntime computed by 50-timestep bin for the four trial types, using same parameters as the \ufb02anker\nmodel. Right: same plot from 59 human participants (see text for details). Bins with fewer than\n50 observations omitted. Error bars are standard errors (where not visible, they are smaller than\nthe dots). Both plots show qualitatively similar patterns. Two discrepancies are of note: \ufb01rst, the\nmodel predicts very early AY responses to be more accurate than slightly later responses, and early\nB responses to be close to chance. We think at least part of this is due to the non-decision error \u03b3,\nbut we retained it for consistency with the \ufb02anker model. Second, the humans show slightly better\nBY than BX performance early on, something the model does not recover. We think this may have\nto do with a global left-response bias that the model is somehow not capturing. Note: the abscissae\nare in different units (though they correspond surprisingly well).\n\nthat \ufb01ts in our framework that has not been extensively explored (for example, an internal-context\nFlanker variant, or a context-dependent response congruence judgment task). The main challenge in\npursuing this kind of analysis is our ability to ef\ufb01ciently estimate and explore these models which,\nunlike the \ufb01xed-context models, have no closed-form analytic expressions or fast approximations.\nWe believe that approximations such as those provided for the \ufb02anker model [22] can and should\nbe applied within our framework, both as a way to generate more ef\ufb01cient data \ufb01ts, and as a way\nto apply the tools of dynamical systems analysis to the overall behavior of a system. Of particu-\nlar interest is whether some points in the task space de\ufb01ned in our framework map onto existing\ndescriptive decision models [e.g. 3].\nAnother natural next step is to seek evidence of our proposed form of integrator in neural data,\nor investigate plausible neural implementations or approximations to it. One way of doing so is\ncomputing time-varying tuning curves of neural populations in different regions to the individual\ncomponents of the accumulators we propose in equations (11) and (15). Another is to \ufb01nd connec-\ntivity patterns that perform the log-sum computation we hypothesize happens in the integrator. A\nthird is to look for components correlated with them in EEG data. All of these methods have some\npromise, as they have been successfully applied to the \ufb01xed context model [9, 10, 26]. Such neural\ndata would not only test a prediction of our theory, but also \u2013 via the brain locations found to be\ncorrelated \u2013 address questions we presently do not address, such as whether the dynamic weighting\nhappens at the sampler or further upstream (i.e. whether unreliable evidence is gated at the sampler\nor discounted at the integrator).\nA second key challenge given our focus on optimal inference is the fact that the \ufb01xed threshold\ndecision rule we use is suboptimal for the case of non identically distributed observations. While\nthe likelihoods of context and target are independent in our simulations, the likelihoods of the two\nresponses are not identically distributed. The optimal threshold is generally time-varying for this\ncase [27], though the speci\ufb01c form is not known.\nFinally, while our model recovers RT-conditional accuracies and stimulus-conditional RT and accu-\nracy patterns, it fails to recover the correct pattern of accuracy-conditional RTs. That is, it predicts\nmuch faster errors than corrects on average. Future work will need to investigate whether this is\ncaused by qualitative or quantitative aspects of the theoretical framework and model.\n\nReferences\n[1] D. R. J. Laming, Information theory of choice-reaction times. London: Academic Press, 1968.\n\n[2] R. Ratcliff, \u201cA theory of memory retrieval.,\u201d Psychological Review, vol. 85, no. 2, pp. 59\u2013108, 1978.\n\n8\n\n0.000.250.500.751.0002505007501000TimestepsAccuracyA:XA:YB:XB:YConditional Accuracy (model)0.000.250.500.751.0002505007501000RT (ms)AccuracyAXAYBXBYConditional Accuracy (humans)\f[3] M. Usher and J. L. McClelland, \u201cThe time course of perceptual choice: The leaky, competing accumulator\n\nmodel.,\u201d Psychological Review, vol. 108, no. 3, pp. 550\u2013592, 2001.\n\n[4] R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen, \u201cThe physics of optimal decision making: a\nformal analysis of models of performance in two-alternative forced-choice tasks.,\u201d Psychological review,\nvol. 113, pp. 700\u201365, Oct. 2006.\n\n[5] A. J. Yu, P. Dayan, and J. D. Cohen, \u201cDynamics of attentional selection under con\ufb02ict: toward a rational\nBayesian account.,\u201d Journal of experimental psychology. Human perception and performance, vol. 35,\npp. 700\u201317, June 2009.\n\n[6] D. Servan-Schreiber, J. D. Cohen, and S. Steingard, \u201cSchizophrenic De\ufb01cits in the Processing of Context,\u201d\n\nArchives of General Psychiatry, vol. 53, p. 1105, Dec. 1996.\n\n[7] A. Wald and J. Wolfowitz, \u201cOptimum Character of the Sequential Probability Ratio Test,\u201d The Annals of\n\nMathematical Statistics, vol. 19, pp. 326\u2013339, Sept. 1948.\n\n[8] S. Kira, T. Yang, and M. N. Shadlen, \u201cA Neural Implementation of Wald\u2019s Sequential Probability Ratio\n\nTest,\u201d Neuron, vol. 85, pp. 861\u2013873, Feb. 2015.\n\n[9] R. Bogacz and K. N. Gurney, \u201cThe basal ganglia and cortex implement optimal decision making between\n\nalternative actions.,\u201d Neural computation, vol. 19, pp. 442\u201377, Feb. 2007.\n\n[10] M. K. van Vugt, P. Simen, L. E. Nystrom, P. Holmes, and J. D. Cohen, \u201cEEG oscillations reveal neural\n\ncorrelates of evidence accumulation.,\u201d Frontiers in neuroscience, vol. 6, p. 106, Jan. 2012.\n\n[11] B. M. Turner, L. van Maanen, and B. U. Forstmann, \u201cInforming cognitive abstractions through neu-\nroimaging: The neural drift diffusion model.,\u201d Psychological Review, vol. 122, no. 2, pp. 312\u2013336, 2015.\n[12] D. Norris, \u201cThe Bayesian reader: explaining word recognition as an optimal Bayesian decision process.,\u201d\n\nPsychological review, vol. 113, pp. 327\u2013357, Apr. 2006.\n\n[13] K.-F. Wong and X.-J. Wang, \u201cA recurrent network mechanism of time integration in perceptual deci-\nsions.,\u201d The Journal of neuroscience : the of\ufb01cial journal of the Society for Neuroscience, vol. 26, no. 4,\npp. 1314\u20131328, 2006.\n\n[14] P. I. Frazier and A. J. Yu, \u201cSequential hypothesis testing under stochastic deadlines,\u201d Advances in Neural\n\nInformation Processing Systems, pp. 1\u20138, 2008.\n\n[15] J. Drugowitsch, R. Moreno-Bote, A. K. Churchland, M. N. Shadlen, and A. Pouget, \u201cThe cost of accu-\nmulating evidence in perceptual decision making.,\u201d The Journal of Neuroscience, vol. 32, pp. 3612\u201328,\nMar. 2012.\n\n[16] N. Srivastava and P. Schrater, \u201cRational inference of relative preferences,\u201d in Advances in Neural Infor-\n\nmation Processing Systems 25, pp. 2312\u20132320, 2012.\n\n[17] R. C. O\u2019Reilly and M. J. Frank, \u201cMaking Working Memory Work: A Computational Model of Learning\n\nin the Prefrontal Cortex and Basal Ganglia,\u201d Neural Computation, vol. 18, pp. 283\u2013328, Feb. 2006.\n\n[18] J. P. Sheppard, D. Raposo, and A. K. Churchland, \u201cDynamic weighting of multisensory stimuli shapes\n\ndecision-making in rats and humans.,\u201d Journal of vision, vol. 13, no. 6, pp. 1\u201319, 2013.\n\n[19] J. R. Stroop, \u201cStudies of interference in serial verbal reactions.,\u201d Journal of Experimental Psychology,\n\nvol. 18, no. 6, pp. 643\u2013662, 1935.\n\n[20] G. Gratton, M. G. Coles, E. J. Sirevaag, C. W. Eriksen, and E. Donchin, \u201cPre- and poststimulus activa-\ntion of response channels: a psychophysiological analysis.,\u201d Journal of experimental psychology. Human\nperception and performance, vol. 14, no. 3, pp. 331\u2013344, 1988.\n\n[21] J. R. Simon and J. D. Wolf, \u201cChoice reaction time as a function of angular stimulus-response correspon-\n\ndence and age,\u201d Ergonomics, vol. 6, pp. 99\u2013105, Jan. 1963.\n\n[22] Y. S. Liu, A. Yu, and P. Holmes, \u201cDynamical analysis of Bayesian inference models for the Eriksen task.,\u201d\n\nNeural computation, vol. 21, pp. 1520\u201353, June 2009.\n\n[23] T. D. Hanks, M. E. Mazurek, R. Kiani, E. Hopp, and M. N. Shadlen, \u201cElapsed decision time affects\nthe weighting of prior probability in a perceptual decision task.,\u201d The Journal of Neuroscience, vol. 31,\npp. 6339\u201352, Apr. 2011.\n\n[24] O. Lositsky, R. C. Wilson, M. Shvartsman, and J. D. Cohen, \u201cA Drift Diffusion Model of Proactive and\nReactive Control in a Context-Dependent Two-Alternative Forced Choice Task,\u201d in The Multi-disciplinary\nConference on Reinforcement Learning and Decision Making, pp. 103\u2013107, 2015.\n\n[25] T. S. Braver, \u201cThe variable nature of cognitive control: a dual mechanisms framework.,\u201d Trends in cogni-\n\ntive sciences, vol. 16, pp. 106\u201313, Feb. 2012.\n\n[26] T. D. Hanks, C. D. Kopec, B. W. Brunton, C. A. Duan, J. C. Erlich, and C. D. Brody, \u201cDistinct relation-\n\nships of parietal and prefrontal cortices to evidence accumulation,\u201d Nature, 2015.\n\n[27] Y. Liu and S. Blostein, \u201cOptimality of the sequential probability ratio test for nonstationary observations,\u201d\n\nIEEE Transactions on Information Theory, vol. 38, no. 1, pp. 177\u2013182, 1992.\n\n9\n\n\f", "award": [], "sourceid": 1481, "authors": [{"given_name": "Michael", "family_name": "Shvartsman", "institution": "Princeton Neuroscience Inst."}, {"given_name": "Vaibhav", "family_name": "Srivastava", "institution": "Princeton Neuroscience Institute"}, {"given_name": "Jonathan", "family_name": "Cohen", "institution": "Princeton University"}]}