{"title": "Sequential Hypothesis Testing under Stochastic Deadlines", "book": "Advances in Neural Information Processing Systems", "page_first": 465, "page_last": 472, "abstract": null, "full_text": "Sequential Hypothesis Testing under Stochastic\n\nDeadlines\n\nPeter I. Frazier\n\nORFE\n\nPrinceton University\nPrinceton, NJ 08544\n\nAngela J. Yu\n\nCSBMB\n\nPrinceton University\nPrinceton, NJ 08544\n\npfrazier@princeton.edu\n\najyu@princeton.edu\n\nAbstract\n\nMost models of decision-making in neuroscience assume an in\ufb01nite horizon,\nwhich yields an optimal solution that integrates evidence up to a \ufb01xed decision\nthreshold; however, under most experimental as well as naturalistic behavioral\nsettings, the decision has to be made before some \ufb01nite deadline, which is often\nexperienced as a stochastic quantity, either due to variable external constraints or\ninternal timing uncertainty. In this work, we formulate this problem as sequential\nhypothesis testing under a stochastic horizon. We use dynamic programming tools\nto show that, for a large class of deadline distributions, the Bayes-optimal solution\nrequires integrating evidence up to a threshold that declines monotonically over\ntime. We use numerical simulations to illustrate the optimal policy in the special\ncases of a \ufb01xed deadline and one that is drawn from a gamma distribution.\n\n1 Introduction\nMajor strides have been made in understanding the detailed dynamics of decision making in sim-\nple two-alternative forced choice (2AFC) tasks, at both the behavioral and neural levels. Using a\ncombination of probabilistic and dynamic programming tools, it has been shown that when the de-\ncision horizon is in\ufb01nite (i.e. no deadline), the optimal policy is to accumulate sensory evidence for\none alternative versus the other until a \ufb01xed threshold, and report the corresponding hypothesis [1].\nUnder similar experimental conditions, it appears that humans and animals accumulate information\nand make perceptual decisions in a manner close to this optimal strategy [2\u20134], and that neurons\nin the posterior parietal cortex exhibit response dynamics similar to that prescribed by the optimal\nalgorithm [6]. However, in most 2AFC experiments, as well as in more natural behavior, the de-\ncision has to be made before some \ufb01nite deadline. This corresponds to a \ufb01nite-horizon sequential\ndecision problem. Moreover, there is variability associated with that deadline either due to external\nvariability associated with the deadline imposition itself, or due to internal timing uncertainty about\nhow much total time is allowed and how much time has already elapsed. In either case, with respect\nto the observer\u2019s internal timer, the deadline can be viewed as a stochastic quantity.\n\nIn this work, we analyze the optimal strategy and its dynamics for decision-making under the pres-\nsure of a stochastic deadline. We show through analytical and numerical analysis that the optimal\npolicy is a monotonically declining decision threshold over time. A similar result for determinis-\ntic deadlines was shown in [5]. Declining decision thresholds have been used in [7] to model the\nspeed vs. accuracy tradeoff, and also in the context of sequential hypothesis testing ( [8]). We \ufb01rst\npresent a formal model of the problem, as well as the main theoretical results (Sec. 2). We then use\nnumerical simulations to examine the optimal policy in some speci\ufb01c examples (Sec. 3).\n\n2 Decision-making under a Stochastic Deadline\nWe assume that on each trial, a sequence of i.i.d inputs are observed: x1, x2, x3, . . .. With probability\np0, all the inputs for the trial are generated from a probability density f1, and, with probability\n\n1\n\n\f1 \u2212 p0, they are generated from an alternate probability density f0. Let \u03b8 be index of the generating\ndistribution. The objective is to decide whether \u03b8 is 0 or 1 quickly and accurately, while also under\nthe pressure of a stochastic decision deadline.\n\nWe de\ufb01ne xt , (x1, x2, . . . , xt) to be the vector of observations made by time t. This vector of\nobservations gives information about the generating density \u03b8. De\ufb01ning pt , P{\u03b8 = 1 | xt}, we\nobserve that pt+1 may be obtained iteratively from pt via Bayes\u2019 rule,\nptf1(xt+1)\n\npt+1 = P{\u03b8 = 1 | xt+1} =\n\nptf1(xt+1) + (1 \u2212 pt)f0(xt+1)\n\n.\n\n(1)\n\nLet D be a deadline drawn from a known distribution that is independent of the observations xt. We\nwill assume that the deadline D is observed immediately and effectively terminates the trial. Let\nc > 0 be the cost associated with each unit time of decision delay, and d \u2265 .5 be the cost associated\nwith exceeding the deadline, where both c and d are normalized against the (unit) cost of making\nan incorrect decision. We choose d \u2265 .5 so that d is never smaller than the expected penalty for\nguessing at \u03b8. This avoids situations in which we prefer to exceed the deadline.\nA decision-policy \u03c0 is a sequence of mappings, one for each time t, from the observations so far\nto the set of possible actions: stop and choose \u03b8 = 0; stop and choose \u03b8 = 1; or continue sam-\npling. We de\ufb01ne \u03c4\u03c0 to be the time when the decision is made to stop sampling under decision-policy\n\u03c0, and \u03b4\u03c0 to be the hypothesis chosen at this time \u2013 both are random variables dependent on the\nsequence of observations. More formally, \u03c0 , \u03c00, \u03c01, . . ., where \u03c0t(xt) 7\u2192 {0, 1, continue},\nand \u03c4\u03c0 , min(D, inf{t \u2208 N : \u03c0t(xt) \u2208 {0, 1}}), \u03b4\u03c0 , \u03c0\u03c4\u03c0 (x\u03c4\u03c0 ). We may also de\ufb01ne\n\u03c3\u03c0 , inf{t \u2208 N : \u03c0t(xt) \u2208 {0, 1}} to be the time when the policy would choose to stop sam-\npling if the deadline were to fail to occur. Then \u03c4\u03c0 = min(D, \u03c3\u03c0).\nOur loss function is de\ufb01ned to be l(\u03c4, \u03b4; \u03b8, D) = 1{\u03b46=\u03b8}1{\u03c4 t, pti\u03b8,D,x. The cost\nassociated with continuing at time t, known as the Q-factor for continuing and denoted by Q, takes\nthe form\n\nQ(t, pt) , inf\n\nhl(\u03c4, \u03b4; \u03b8, D) | D > t, pti\u03b8,D,x.\n\n\u03c4 \u2265t+1,\u03b4\n\n(3)\n\n2\n\n\fContinuing vs. Stopping\n\nt\ns\no\nC\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n0\n\nQ(t + 1, p) \u2212 c\nQ(t, p)\n\u00afQ(t, p)\n0.5\n= p\n\np\n\nt\n\n1\n\nFigure 1: Comparison of the cost Q(t, p) of stopping at time t (red); the cost Q(t, p) of continuing\nat time t (blue solid line); and Q(t + 1, p) \u2212 c (black solid line), which is the cost of continuing at\ntime t + 1 minus an adjustment Q(t + 1, p) \u2212 Q(t, p) = c. The continuation region Ct is the interval\nbetween the intersections of the solid blue and red lines, marked by the blue dotted lines, and the\ncontinuation region Ct+1 is the interval between the intersections of the solid black and red lines,\nmarked by the black dotted lines. Note that Q(t + 1, p) \u2212 c \u2265 Q(t, p), so Ct contains Ct+1.\n\nNote that, in general, both V (t, pt) and Q(t, pt) may be dif\ufb01cult to compute due to the need to\noptimize over in\ufb01nitely many decision policies. Conversely, the cost associated with stopping at\ntime t, known as the Q-factor for stopping and denoted by Q, is easily computed as\n\nQ(t, pt) = inf\n\u03b4=0,1\n\nhl(t, \u03b4; \u03b8, D) | D > t, pti\u03b8,D,x = min{pt, 1 \u2212 pt} + ct,\n\n(4)\n\nwhere the in\ufb01mum is obtained by choosing \u03b4 = 0 if pt \u2264 .5 and choosing \u03b4 = 1 otherwise.\nAn optimal stopping rule is to stop the \ufb01rst time the expected cost of continuing exceeds that of\nstopping, and to choose \u03b4 = 0 or \u03b4 = 1 to minimize the probability of error given the accumulated\n{p\u03c4 \u2217 \u22651/2}.\nevidence (see [10]). That is, \u03c4 \u2217 = inf{t \u2265 0 : Q(t, pt) \u2264 Q(t, pt)} and \u03b4\u2217 = 1\nWe de\ufb01ne the continuation region at time t by Ct , (cid:8)pt \u2208 [0, 1] : Q(t, pt) > Q(t, pt)(cid:9) so that\n\n\u03c4 \u2217 = inf{t \u2265 0 : pt /\u2208 Ct}. Although we have obtained an expression for the optimal policy in\nterms of Q(t, p) and Q(t, p), computing Q(t, p) is dif\ufb01cult in general.\nLemma 1. The function p 7\u2192 Q(t, pt) is concave with respect to pt for each t \u2208 N.\nProof. We may restrict the in\ufb01mum in Eq. 3 to be over only those \u03c4 and \u03b4 depeding on D and\nthe future observations xt+1 , {xt+1, xt+2, . . .}. This is due to two facts. First, the expectation\nis conditioned on pt, which contains all the information about \u03b8 available in the past observations\nxt, and makes it unnecessary for the optimal policy to depend on xt except through pt. Second,\ndependence on pt in the optimal policy may be made implicit by allowing the in\ufb01mum to be attained\nby different \u03c4 and \u03b4 for different values of pt but removing explicit dependence on pt from the\nindividual policies over which the in\ufb01mum is taken. With \u03c4 and \u03b4 chosen from this restricted set\nof policies, we note that the distribution of the future observations xt+1 is entirely determined by \u03b8\nand so we have hl(\u03c4, \u03b4; \u03b8, D) | \u03b8, ptiD,xt+1 = hl(\u03c4, \u03b4; \u03b8, D) | \u03b8iD,xt+1. Summing over the possible\nvalues of \u03b8, we may then write:\n\nhl(\u03c4, \u03b4; \u03b8, D) | pti\u03b8,D,xt+1 = Xk\u2208{0,1}\n\nhl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = kiD,xt+1\n\nP{\u03b8 = k | pt}\n\n= hl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = 0iD,xt+1(1 \u2212 pt) + hl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = 1iD,xt+1pt.\n\nEq. (3) can then be rewritten as:\n\nQ(t, pt) = inf\n\n\u03c4 \u2265t+1,\u03b4\n\nhl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = 0iD,xt+1(1 \u2212 pt) + hl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = 1iD,xt+1pt,\n\nwhere this in\ufb01mum is again understood to be taken over this set of policies depending only upon\nobservations after time t. Since neither hl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = 0i nor hl(\u03c4, \u03b4; \u03b8, D) | \u03b8 = 1i depend on\npt, this is the in\ufb01mum of a collection of linear functions in pt, and hence is concave in pt ( [9]).\n\n3\n\n\fWe now need a lemma describing how expected cost depends on the distribution of the deadline. Let\nD\u2032 be a deadline whose distribution is different than that of D. Let \u03c0\u2217 be the policy that is optimal\ngiven that the deadline has distribution D, and denote \u03c3\u03c0\u2217 by \u03c3\u2217. Then de\ufb01ne\n\nV \u2032(t, pt) , hmin(p\u03c3\u2217\n\n, 1 \u2212 p\u03c3\u2217\n\n)1{\u03c3\u2217 ti\u03b8,D,x\n\nso that V \u2032 gives the expected cost of taking the stopping time \u03c3\u2217 which is optimal for deadline D\n(t, pt) denote the\nand applying it to the situation with deadline D\u2032. Similarly, let Q\u2032(t, pt) and Q\ncorresponding expected costs under \u03c3\u2217 and D\u2032 given that we continue or stop, respectively, at time\n\u2032\nt given pt and D\u2032 > t. Note that Q\n(t, pt) = Q(t, pt) = min(pt, 1 \u2212 pt) + ct. These de\ufb01nitions\nare the basis for the following lemma, which essentially shows that replacing the deadline D which\na less urgent deadline D\u2032 lowers cost. This lemma is needed for Lemma 3 below.\nLemma 2 If D\u2032 is such that P{D\u2032 > t + 1 | D\u2032 > t} \u2265 P{D > t + 1 | D > t} for all t, then\nV \u2032(t, p) \u2264 V (t, p) and Q\u2032(t, p) \u2264 Q(t, p) for all t and p.\nProof. First let us show that if we have V \u2032(t + 1, p\u2032) \u2264 V (t + 1, p\u2032) for some \ufb01xed t and all p\u2032, then\nwe also have Q\u2032(t, p) \u2264 Q(t, p) for that same t and all p. This is the case because, if we \ufb01x t, then\n\n\u2032\n\nQ(t, pt) = (d + c(t + 1)) P{D = t+1 | D > t} + hV (t + 1, pt+1) | ptixt+1 P{D > t+1 | D > t}\n\n= d + c(t + 1) + hV (t + 1, pt+1) \u2212 (d + c(t + 1)) | ptixt+1 P{D > t+1 | D > t}\n\u2265 d + c(t + 1) + hV (t + 1, pt+1) \u2212 (d + c(t + 1)) | ptixt+1 P{D\u2032 > t+1 | D\u2032 > t}\n\n\u2265 d + c(t + 1) + hV \u2032(t + 1, pt+1) \u2212 (d + c(t + 1)) | ptixt+1 P{D\u2032 > t+1 | D\u2032 > t} = Q\u2032(t, p).\nIn the \ufb01rst inequality we have used two facts:\nthat V (t + 1, pt+1) \u2264 Q(t + 1, pt+1) =\nmin(pt+1, 1 \u2212 pt+1) + c(t + 1) \u2264 d + c(t + 1) (which is true because d \u2265 .5); and that\nP{D > t + 1 | D > t} \u2264 P{D\u2032 > t + 1 | D\u2032 > t}. In the second inequality we have used our\nassumption that V \u2032(t + 1, p\u2032) \u2264 V (t + 1, p\u2032) for all p\u2032.\nNow consider a \ufb01nite horizon version of the problem where \u03c3\u2217 is only optimal among stopping\ntimes bounded above by a \ufb01nite integer T . We will show the lemma for this case, and the lemma for\nthe in\ufb01nite horizon version of the problem follows by taking the limit as T \u2192 \u221e.\nWe induct backwards on t. Since \u03c3\u2217 is required to stop at T , we have V (T, pT ) = Q(T, pT ) =\n(T, pT ) = V \u2032(T, pT ). Now for the induction step. Fix p and t < T . If \u03c3\u2217 chooses to stop\nQ\nat t when pt = p, then V (t, p) = Q(t, p) = Q\n(t, p) = V \u2032(t, p). If \u03c3\u2217 continues instead, then\nV (t, p) = Q(t, p) \u2265 Q\u2032(t, p) = V \u2032(t, p) by the induction hypothesis.\nNote the requirement that d \u2265 1/2 in the previous lemma. If this requirement is not met, then if pt\nis such that d < min(pt, 1 \u2212 pt) then we may prefer to get timed out rather than choose \u03b4 = 0 or\n\u03b4 = 1 and suffer the expected penalty of min(pt, 1 \u2212 pt) for choosing incorrectly. In this situation,\nsince the conditional probability P{D = t + 1 | D > t} that we will time out in the next time period\ngrows as time moves forward, the continuation region may expand with time rather than contract.\nUnder most circumstances, however, it seems reasonable to assume the deadline cost to be at least\nas large as that of making an error.\n\n\u2032\n\n\u2032\n\nWe now state Lemma 3, which shows that the cost of delaying by one time period is as least as large\nas the continuation cost c, but may be larger because the delay causes the deadline to approach more\nrapidly.\nLemma 3. For each t \u2208 N and p \u2208 (0, 1), Q(t \u2212 1, pt\u22121 = p) \u2264 Q(t, pt = p) \u2212 c.\nProof. Fix t. Let \u03c3\u2217 , inf{s \u2265 t + 1 : ps /\u2208 Cs} so that min(\u03c3\u2217, D) attains the in\ufb01mum for\nQ(t, pt). Also de\ufb01ne \u03c3\u2032 , inf{s \u2265 t : ps /\u2208 Cs+1} and \u03c4 \u2032 , min(D, \u03c3\u2032). Since \u03c4 \u2032 is within the set\nover which the in\ufb01mum de\ufb01ning Q(t \u2212 1, p) is taken,\n\nQ(t \u2212 1, p) \u2264 hmin(p\u03c4 \u2032\n\n, 1 \u2212 p\u03c4 \u2032\n\n)1{\u03c4 \u2032 t \u2212 1, pt\u22121 = piD,xt\n\n= hmin(p\u03c3\u2032\n\n, 1 \u2212 p\u03c3\u2032\n\n= hmin(p\u03c3\u2217\n\n, 1\u2212p\u03c3\u2217\n\n)1{\u03c3\u2032 t \u2212 1, pt\u22121 = piD,xt\n)1{\u03c3\u2217\u22121 t\u22121, pt = piD,xt+1,\n\nwhere the last step is justi\ufb01ed by the stationarity of the observation process, which implies that the\njoint distribution of (ps)s\u2265t, p\u03c3\u2217, and \u03c3\u2217 conditioned on pt = p is the same as the joint distribution\n\n4\n\n\fof (ps\u22121)s\u2265t, p\u03c3\u2032, and \u03c3\u2032 + 1 conditioned on pt\u22121 = p. Let D\u2032 = D + 1 and we have\n\nQ\u2032(t, p) = hmin(p\u03c3\u2217\n\n, 1\u2212p\u03c3\u2217\n\n)1{\u03c3\u2217 t, pt = piD\u2032,xt+1,\n\nso Q(t \u2212 1, p) \u2264 Q\u2032(t, p) \u2212 c. Finally, as D\u2032 satis\ufb01es the requirements of Lemma 2, Q\u2032(t, p) \u2264\nQ(t, p).\nLemma 4. For t \u2208 N, Q(t, 0) = Q(t, 1) = c(t + 1) + dP{D = t + 1 | D > t}.\nProof. On the event pt = 0, we have that P{\u03b8 = 0} = 1 and the policy attaining the in\ufb01mum in (3) is\n\u03c4 \u2217 = t+1, \u03b4\u2217 = 0. Thus, Q(t, 0) becomes\n\nQ(t, 0) = hl(\u03c4 \u2217, \u03b4\u2217; \u03b8, D) | D > t, pt = 0iD,xt+1 = hl(\u03c4 \u2217, \u03b4\u2217; \u03b8, D) | D > t, \u03b8 = 0iD,xt+1\n\n= hd1{t+1\u2265D} + c(t + 1) | D > t, \u03b8 = 0iD,xt+1 = c(t+1) + dP{D = t+1 | D > t}.\n\nSimilarly, on the event pt = 1, we have that P{\u03b8 = 1} = 1 and the policy attaining the in\ufb01mum in (3)\nis \u03c4 \u2217 = t+1, \u03b4\u2217 = 1. Thus, Q(t, 1) = c(t+1) + dP{D \u2264 t + 1 | D > t}.\nWe are now ready for the main theorem, which shows that Ct is either empty or an interval, and\nthat Ct+1 \u2286 Ct. To illustrate our proof technique, we plot Q(t, p), Q(t, p), and Q(t + 1, p) \u2212 c as\nfunctions of p in Figure 2.1. As noted, the continuation region Ct is the set of p such that Q(t, p) \u2264\nQ(t, p), To show that Ct is either empty or an interval, we note that Q(t, p) is a concave function\nin p (Lemma 1) whose value at the endpoints p = 0, 1 are greater than the corresponding values of\nQ(t, p) (Lemma 4). Such a concave function may only intersect Q(t, p), which is a constant plus\nmin(p, 1 \u2212 p), either twice or not at all. When it intersects twice, we have the situation pictured in\nFigure 2.1, in which Ct is a non-empty interval, and when it does not intersect Ct is empty.\nTo show that Ct+1 \u2286 Ct we note that the difference between Q(t + 1, p) and Q(t, p) is the constant\nc. Thus, to show that Ct, the set where Q(t, p) contains Q(t, p), is larger than Ct+1, the set where\nQ(t + 1, p) is larger than Q(t + 1, p), it is enough to show that the difference between Q(t + 1, p)\nand Q(t, p) is at least as large as the adjustment c, which we have done in Lemma 3.\nTheorem. At each time t \u2208 N, the optimal continuation region Ct is either empty or a closed\ninterval, and Ct+1 \u2286 Ct.\nProof. Fix t \u2208 N. We begin by showing that Ct+1 \u2286 Ct. If Ct+1 is empty then the statement\nfollows trivially, so consider the case when Ct+1 6= \u2205. Choose p \u2208 Ct+1. Then\n\nQ(t, p) \u2264 Q(t + 1, p) \u2212 c \u2264 Q(t + 1, p) \u2212 c = min{p, 1 \u2212 p} + ct = Q(t, p).\n\nThus, p \u2208 Ct, implying Ct+1 \u2286 Ct.\n\nNow suppose that Ct is non-empty and we will show it must be a closed interval. Let at , inf Ct\nand bt , sup Ct. Since Ct is a non-empty subset of [0, 1], we have at, bt \u2208 [0, 1]. Furthermore,\nat > 0 because Q(t, p) \u2265 c(t + 1) + dP{D = t + 1 | D > t} > ct = Q(t, 0) for all p, and\nthe continuity of Q(t, \u00b7) implies that Q(t, p) > Q(t, p) > 0 for p in some open interval around 0.\nSimilarly, bt < 1. Thus, at, bt \u2208 (0, 1).\nWe will show \ufb01rst that [at, 1/2] \u2286 Ct. If at > 1/2 then this is trivially true, so consider the case\nthat at \u2264 1/2. Since Q(t, \u00b7) is concave on the open interval (0, 1), it must also be continuous\nthere. This and the continuity of Q imply that Q(t, at) = Q(t, at). Also, Q(t, 0) > Q(t, 0) by\nLemma 4. Thus at > 0 and we may take a left-derivative at at. For any \u03b5 \u2208 (0, at), at \u2212 \u03b5 /\u2208 Ct so\nQ(at \u2212 \u03b5) > Q(at \u2212 \u03b5). This implies together with Q(t, at) = Q(t, at) that\n\n\u2202\u2212\n\u2202p\n\nQ(t, at) = lim\n\u03b5\u21920+\n\nQ(t, at) \u2212 Q(t, at \u2212 \u03b5)\n\n\u03b5\n\n\u2264 lim\n\u03b5\u21920+\n\nQ(t, at) \u2212 Q(t, at \u2212 \u03b5)\n\n\u03b5\n\n=\n\n\u2202\u2212\n\u2202p\n\nQ(t, at).\n\nSince Q(t, \u00b7) is concave by Lemma 1 and Q(t, \u00b7) is linear on [0, 1/2], we have for any p\u2032 \u2208 [at, 1/2],\n\n\u2202\u2212\n\u2202p\n\nQ(t, p\u2032) \u2264\n\n\u2202\u2212\n\u2202p\n\nQ(t, at) \u2264\n\n\u2202\u2212\n\u2202p\n\nQ(t, at) =\n\n\u2202\u2212\n\u2202p\n\nQ(t, p\u2032).\n\nSince Q(t, \u00b7) is concave, it is differentiable except at countably many points, so for any p \u2208 [at, 1/2],\n\nQ(t, p) = Q(t, at) +Z p\n\nat\n\n\u2202\u2212\n\u2202p\n\nQ(t, p\u2032) dp\u2032 \u2264 Q(t, at) +Z p\n\nat\n\n\u2202\u2212\n\u2202p\n\nQ(t, p\u2032) dp\u2032 = Q(t, p).\n\n5\n\n\fIf P{D < \u221e} = 1 then there exists a T < \u221e such that CT = \u2205.\n\nTherefore p \u2208 Ct, and, more generally, [at, 1/2] \u2286 Ct. By a similar argument, [1/2, bt] \u2286 Ct.\nFinally, Ct \u2286 [at, bt] \u2286 [at, 1/2] \u222a [1/2, bt] \u2286 Ct and we must have Ct = [at, bt].\nWe also include the following proposition, which shows that if D is \ufb01nite with probability 1 then\nthe continuation region must eventually narrow to nothing.\nProposition.\nProof. First consider the case when D is bounded, so P{D \u2264 T + 1} = 1 for some time T < \u221e.\nThen, Q(T, pT ) = d + c(T + 1), while Q(T, pT ) = cT + min(pT , 1 \u2212 pT ) \u2264 cT + 1/2. Thus\nQ(T, pT ) \u2212 Q(T, pT ) \u2265 d + c \u2212 1/2 > 0, and CT = \u2205.\nNow consider the case when P{D > t} > 0 for every t. By neglecting the error probability and\nincluding only continuation and deadline costs, we obtain Q(t, pt) \u2265 d P{D = t+1 | D > t}+c(t+1).\nBounding the error probability by 1/2 we obtain Q(t, pt) \u2264 ct + 1/2. Thus, Q(t, pt) \u2212 Q(t, pt) \u2265\nc + d P{D = t + 1 | D > t} \u2212 1/2. Since P{D < \u221e} = 1, limt\u2192\u221e c + d P{D = t+ 1 | D >\nt} \u2212 1/2 = c + d \u2212 1/2 > 0, and there exists a T such that c + d P{D = t+1 | D > t} \u2212 1/2 > 0 for\nevery t \u2265 T . This implies that, for t \u2265 T and pt \u2208 [0, 1], Q(t, pt) \u2212 Q(t, pt) > 0 and Ct = \u2205.\n\n3 Computational simulations\nWe conducted a series of simulations in which we computed the continuation region and distribu-\ntions of response time and accuracy for the optimal policy for several choices of the parameters c and\nd, and for the distribution of the deadline D. We chose the observation xt to be a Bernoulli random\nvariable under both f0 and f1 for every t = 1, 2, . . . with different values for q\u03b8 , P{xi = 1 | \u03b8}. In\nour simulations we chose q0 = .45 and q1 = .55.\nWe computed optimal policies for two different forms of deadline distribution: \ufb01rst for a determin-\nistic deadline \ufb01xed to some known constant; and second for a gamma distributed deadline. The\ngamma distribution with parameters k > 0 and \u03b2 > 0 has density (\u03b2k/\u0393(k))xk\u22121e\u2212\u03b2x for x > 0,\nwhere \u0393(\u00b7) is the gamma function. The parameters k and \u03b2, called the shape and rate parameters\nrespectively, are completely determined by choosing the mean and the standard deviation of the dis-\ntribution since the gamma distribution has mean k/\u03b2 and variance k/\u03b22. A \ufb01xed deadline T may\nactually be seen as a limiting case of a gamma-distributed deadline by taking both k and \u03b2 to in\ufb01nity\nsuch that k/\u03b2 = T is \ufb01xed.\nWe used the table-look-up form of the backward dynamic programming algorithm (see, e.g., [11])\nto compute the optimal Q-factors. We obtained approximations of the value function and Q-factors\nat a \ufb01nite set of equally spaced discrete points {0, 1/N, . . . , (N \u2212 1)/N, 1} in the interval [0, 1]. In\nour simulations we chose N = 999. We establish a \ufb01nal time T that is large enough that P{D \u2264 T }\nis nearly 1, and thus P{\u03c4 \u2217 \u2264 T } is also nearly 1.\nIn our simulations we chose T = 60. We\napproximated the value function V (T, pT ) at this \ufb01nal time by Q(T, pT ). Then we calculated value\nfunctions and Q-factors for previous times recursively according to Bellman\u2019s equation:\n\nQ(t, p) = hV (t + 1, pt+1) | pt = pipt+1 ;\n\nV (t, p) = min(Q(t, p), Q(t, p)).\n\nThis expectation relating Q(t, \u00b7) to V (t + 1, \u00b7) may be written explicitly using our hypotheses and\nEq. 1 to de\ufb01ne a function g so that pt+1 = g(pt, xt+1). In our case this function is de\ufb01ned by\ng(pt, 1) , (ptq1)/(ptq1 + (1 \u2212 pt)q0) and g(pt, 0) , (pt(1 \u2212 q1))/(pt(1 \u2212 q1) + (1 \u2212 pt)(1 \u2212 q0)).\nThen we note that P{xt+1 = 1 | pt} = P{xt+1 = 1 | \u03b8 = 1}pt + P{xt+1 = 1 | \u03b8 = 0}(1 \u2212 pt) =\nptq1 + (1 \u2212 pt)q0, and similarly P{xt+1 = 0 | pt} = pt(1 \u2212 q1) + (1 \u2212 pt)(1 \u2212 q0). Then\n\nQ(t, pt) = (c(t+1)+d)P{D \u2264 t+1 | D > t} + P{D > t+1 | D > t} [\nV(cid:0)t+1, g(pt, 1)(cid:1)(cid:0)ptq1 +(1 \u2212 pt)q0(cid:1)+V(cid:0)t+1, g(pt, 0)(cid:1)(cid:0)pt(1 \u2212 q1)+(1 \u2212 pt)(1 \u2212 q0)(cid:1)(cid:3) .\n\nWe computed continuation regions Ct from these Q-factors, and then used Monte Carlo simulation\nwith 106 samples for each problem setting to estimate P{\u03b4 = \u03b8 | \u03c4 = t} and P{\u03c4 = t} as functions\nof t. The results of these computational simulations are shown in Figure 3. We see in Fig. 3A that\nthe decision boundaries for a \ufb01xed deadline (solid blue) are smoothly narrowing toward the midline.\nClearly, at the last opportunity for responding before the deadline, the optimal policy would always\ngenerate a response (and therefore the thresholds merge), since we assumed that the cost of penalty\n\n6\n\n\fA\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\nP\n\nC\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nVarying std(D)\n\nVarying c\n\nc=0.001\nc=0.002\nc=0.004\n\nB\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\nP\n\nD\n\ny\nt\ni\nl\ni\n\nb\na\nb\no\nr\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\nVarying mean(D)\n\n=40\n=30\n=25\n\n10\n\n20\n\n30\n\nVarying d\n\n40\n\n \n\nd=0.5\nd=2\nd=1000\n\n0\n0\n\n10\n\n30\n\n40\n\n20\n\nTime\n\n \n\n0\n0\n\n10\n\n30\n\n40\n\n20\nTime\n\nFigure 2: Plots of the continuation region Ct (blue), and the probability of a correct response P{\u03b4 =\n\u03b8 | \u03c4 = t} (red). The default settings were c = .001, d = 2, mean(D) = 40, std(D) = 1, and\nq0 = 1\u2212q1 = .45. In each plot we varied one of them while keeping the others \ufb01xed. In (A) we varied\nthe standard deviation of D, in (B) the mean of D, in (C) the value of c, and in (D) the value of d.\n\nis greater than the expected cost of making an error: d \u2265 .5 (since the optimal policy is to choose the\nhypothesis with probability \u2265 .5, the expected probability of error is always \u2264 .5). At the time step\nbefore, the optimal policy would only continue if one more data point is going to improve the belief\nstate enough to offset the extra time cost c. Therefore, the optimal policy only continues for a small\n\u201cwindow\u201d around .5 even though it has the opportunity to observe one more data point. At earlier\ntimes, the window \u201cwidens\u201d following similar logic. When uncertainty about the deadline increases\n(larger std(D); shown in dashed and dash-dotted blue lines), the optimal thresholds are squeezed\ntoward each other and to the left, the intuition being that the threat of encountering the deadline\nspreads earlier and earlier into the trial. The red lines denote the average accuracy for different\nstopping times obtained from a million Monte Carlo simulations of the observation-decision process.\nThey closely follow the decision thresholds (since the threshold is on the posterior probability p\u03c4 ),\nbut are slightly larger, because p\u03c4 must exceed the threshold, and pt moves in discrete increments\ndue to the discrete Bernoulli process.\n\nThe effect of decreasing the mean deadline is to shift the decision boundaries left-ward, as shown in\nFig. 3B. The effect of increasing the cost of time c is to squeeze the boundaries toward the midline\n(Fig. 3C \u2013 this result is analogous to that seen in the classical sequential probability ratio test for the\nin\ufb01nite-horizon case. The effect of increasing d is to squeeze the thresholds to the left (Fig. 3D),\nand the rate of shifting is on the order of log(d) because the tail of the gamma distribution is falling\noff nearly exponentially.\n\n4 Discussion\n\nIn this work, we formalized the problem of sequential hypothesis testing (of two alternatives) under\nthe pressure of a stochastically sampled deadline, and characterized the optimal policy. For a large\nclass of deadline distributions (including gamma, normal, exponential, delta), we showed that the\noptimal policy is to report a hypothesis as soon as the posterior belief hits one of a pair of mono-\ntonically declining thresholds (toward the midline). This generalizes the classical in\ufb01nite horizon\ncase in the limit when the deadline goes to in\ufb01nity, and the optimal policy reverts to a pair of \ufb01xed\nthresholds as in the sequential probability ratio test [1]. We showed that the decision policy becomes\nmore conservative (thresholds pushed outward and to the right) when there\u2019s less uncertainty about\n\n7\n\n\fthe deadline, when the mean of the deadline is larger, when the linear temporal cost is larger, and\nwhen the deadline cost is smaller.\n\nIn the theoretical analysis, we assumed that D has the property that P{D > t+u | D > t} is non-\nincreasing in t for each u \u2265 0 over the set of t such that P{D > t} > 0. This assumption implies that,\nif the deadline has not occurred already, then the likelihood that it will happen soon grows larger\nand larger, as time passes. The assumption is violated by multi-modal distributions, for which there\nis a large probability the deadline will occur at some early point in time, but if the deadline does not\noccur by that point in time then will not occur until some much later time. This assumption is met\nby a \ufb01xed deadline (std(D)\u2192 0), and also includes the classical in\ufb01nite-horizon case (D \u2192 \u221e) as a\nspecial case (and the optimal policy reverts to the sequential probability ratio test). This assumption\nis also met by any distribution with a log-concave density because log P{D > t + u | D > t} =\nlog P{D > t+u} \u2212 log P{D > t} = F (t+u) \u2212 F (t), where F (t) , log P{D > t}. If the density of\nD is log-concave, then F is concave ( [9]), and the increment F (t+u)\u2212F (t) is non-increasing in t.\nMany common distributions have log-concave densities, including the exponential distribution, the\ngamma distribution, the normal distribution, and the uniform distribution on an interval.\n\nWe used gamma distributions for the deadline in the numerical stimulations. There are several em-\npirical properties about timing uncertainty in humans and animals that make the gamma distribution\nparticularly suitable. First, realizations from the gamma distribution are always non-negative, which\nis consistent with the assumption that a subject never thinks a deadline has passed before the ex-\nperiment has started. Second, if we \ufb01x the rate parameter \u03b2 and vary the shape k, then we obtain a\ncollection of deadline distributions with different means whose variance and mean are in a \ufb01xed ra-\ntio, which is consistent with experimental observations [12]. Third, for large values of k the gamma\ndistribution is approximately normal, which is also consistent with experimental observations [12].\nFinally, a gamma distributed random variable with mean \u00b5 may be written as the sum of k = \u00b5\u03b2\nindependent exponential random variables with mean 1/\u03b2, so if the brain were able to construct\nan exponential-distributed timer whose mean 1/\u03b2 were on the order of milliseconds, then it could\nconstruct a very accurate gamma-distributed timer for intervals of several seconds by resetting this\nexponential timer k times and responding after the kth alarm. This has interesting rami\ufb01cations for\nhow sophisticated timers for relatively long intervals can be constructed from neurons that exhibit\ndynamics on the order of milliseconds.\n\nThis work makes several interesting empirical predictions. Subjects who have more internal un-\ncertainty, and therefore larger variance in their perceived deadline stochasticity, should respond to\nstimuli earlier and with lower accuracy. Similarly, the model makes quantitative predictions about\nthe subject\u2019s performance when the experimenter explicitly manipulates the mean deadline, and the\nrelative costs of error, time, and deadline.\n\nAcknowledgments\n\nWe thank Jonathan Cohen, Savas Dayanik, Philip Holmes, and Warren Powell for helpful discus-\nsions. The \ufb01rst author was supported in part by the Air Force Of\ufb01ce of Scienti\ufb01c Research under\ngrant AFOSR-FA9550-05-1-0121.\n\nReferences\n[1] Wald, A & Wolfowitz, J (1948). Ann. Math. Statisti. 19: 326-39.\n[2] Luce, R D (1986). Response Times: Their Role in Inferring Elementary Mental Org. Oxford Univ. Press.\n[3] Ratcliff, R & Rouder, J N (1998). Psychol. Sci. 9: 347-56.\n[4] Bogacz, R et al (2006). Pyschol. Rev. 113: 700-65.\n[5] Bertsekas, D P (1995). Dynamic Programming and Optimal Control. Athena Scienti\ufb01c.\n[6] Gold, J I & Shadlen, M N (2002). Neuron 36: 299-308.\n[7] Mozer et al (2004). Proc. Twenty Sixth Annual Conference of the Cognitive Science Society. 981-86.\n[8] Siegmund, D (1985). Sequential Analysis. Springer.\n[9] Boyd, S & Vandenberghe, L (2004) Convex Optimization. Cambridge Univ. Press.\n[10] Poor, H V (1994). An Introduction to Signal Detection and Estimation. Springer-Verlag.\n[11] Powell, W B (2007) Approximate Dynamic Programming: Solving the curses of dimensionality. Wiley.\n[12] Rakitin, et al (1998). J. Exp. Psychol. Anim. Behav. Process. 24: 15-33.\n\n8\n\n\f", "award": [], "sourceid": 706, "authors": [{"given_name": "Peter", "family_name": "Frazier", "institution": null}, {"given_name": "Angela", "family_name": "Yu", "institution": null}]}