{"title": "Context-sensitive active sensing in humans", "book": "Advances in Neural Information Processing Systems", "page_first": 2958, "page_last": 2966, "abstract": "Humans and animals readily utilize active sensing, or the use of self-motion, to focus sensory and cognitive resources on the behaviorally most relevant stimuli and events in the environment. Understanding the computational basis of natural active sensing is important both for advancing brain sciences and for developing more powerful artificial systems. Recently, a goal-directed, context-sensitive, Bayesian control strategy for active sensing, termed C-DAC (Context-Dependent Active Controller), was proposed (Ahmad & Yu, 2013). In contrast to previously proposed algorithms for human active vision, which tend to optimize abstract statistical objectives and therefore cannot adapt to changing behavioral context or task goals, C-DAC directly minimizes behavioral costs and thus, automatically adapts itself to different task conditions. However, C-DAC is limited as a model of human active sensing, given its computational/representational requirements, especially for more complex, real-world situations. Here, we propose a myopic approximation to C-DAC, which also takes behavioral costs into account, but achieves a significant reduction in complexity by looking only one step ahead. We also present data from a human active visual search experiment, and compare the performance of the various models against human behavior. We find that C-DAC and its myopic variant both achieve better fit to human data than Infomax (Butko & Movellan, 2010), which maximizes expected cumulative future information gain. In summary, this work provides novel experimental results that differentiate theoretical models for human active sensing, as well as a novel active sensing algorithm that retains the context-sensitivity of the optimal controller while achieving significant computational savings.", "full_text": "Context-sensitive active sensing in humans\n\nSheeraz Ahmad\n\nDepartment of Computer Science and Engineering\n\nUniversity of California San Diego\n\n9500 Gilman Drive La Jolla, CA 92093\n\nsahmad@cs.ucsd.edu\n\nHe Huang\n\nDepartment of Cognitive Science\nUniversity of California San Diego\n\n9500 Gilman Drive La Jolla, CA 92093\n\nheh001@ucsd.edu\n\nAngela J. Yu\n\nDepartment of Cognitive Science\nUniversity of California San Diego\n\n9500 Gilman Drive La Jolla, CA 92093\n\najyu@ucsd.edu\n\nAbstract\n\nHumans and animals readily utilize active sensing, or the use of self-motion, to\nfocus sensory and cognitive resources on the behaviorally most relevant stimuli\nand events in the environment. Understanding the computational basis of natu-\nral active sensing is important both for advancing brain sciences and for devel-\noping more powerful arti\ufb01cial systems. Recently, we proposed a goal-directed,\ncontext-sensitive, Bayesian control strategy for active sensing, C-DAC (Context-\nDependent Active Controller) (Ahmad & Yu, 2013). In contrast to previously pro-\nposed algorithms for human active vision, which tend to optimize abstract statis-\ntical objectives and therefore cannot adapt to changing behavioral context or task\ngoals, C-DAC directly minimizes behavioral costs and thus, automatically adapts\nitself to different task conditions. However, C-DAC is limited as a model of human\nactive sensing, given its computational/representational requirements, especially\nfor more complex, real-world situations. Here, we propose a myopic approxi-\nmation to C-DAC, which also takes behavioral costs into account, but achieves\na signi\ufb01cant reduction in complexity by looking only one step ahead. We also\npresent data from a human active visual search experiment, and compare the per-\nformance of the various models against human behavior. We \ufb01nd that C-DAC and\nits myopic variant both achieve better \ufb01t to human data than Infomax (Butko &\nMovellan, 2010), which maximizes expected cumulative future information gain.\nIn summary, this work provides novel experimental results that differentiate the-\noretical models for human active sensing, as well as a novel active sensing algo-\nrithm that retains the context-sensitivity of the optimal controller while achieving\nsigni\ufb01cant computational savings.\n\n1\n\nIntroduction\n\nBoth arti\ufb01cial and natural sensing systems face the challenge of making sense out of a continuous\nstream of noisy sensory inputs. One critical tool the brain has at its disposal is active sensing, a goal-\ndirected, context-sensitive control strategy that prioritizes sensing and processing resources toward\nthe most rewarding or informative aspects of the environment (Yarbus, 1967). Having a formal\nunderstanding of active sensing is not only important for advancing neuroscienti\ufb01c progress but also\ndeveloping context-sensitive, interactive arti\ufb01cial agents.\n\n1\n\n\fThe most well-studied aspect of human active sensing is saccadic eye movements. Early work\nsuggested that saccades are attracted to salient targets that differ from surround in one or more of\nfeature dimensions (Koch & Ullman, 1985; Itti & Koch, 2000); however, saliency has been found\nto only account for a small fraction of human saccadic eye movement (Itti, 2005). More recently,\nmodels of human active vision have incorporated top-down objectives, such as maximizing the ex-\npected future cumulative informational gain (Infomax) (Lee & Yu, 2000; Itti & Baldi, 2006; Butko &\nMovellan, 2010), and maximizing the one-step look-ahead probability of \ufb01nding the target (greedy\nMAP)(Najemnik & Geisler, 2005). However, these are generic statistical objectives that do not\nnaturally adapt to behavioral context, such as changes in the relative cost of speed versus error, or\nthe energetic or temporal cost associated with switching from one sensing location/con\ufb01guration\nto another. We recently proposed the C-DAC (Context-Dependent Active Controller) algorithm\n(Ahmad & Yu, 2013), which maps from Bayesian posterior beliefs about the environment into the\naction space while optimizing directly with respect to context-sensitive, behavioral goals; C-DAC\nwas shown to result in better accuracy and lower search time, as compared to Infomax and greedy\nMAP, in various simulated task environments.\nIn this paper, we investigate whether human behavior is better explained by taking into account\ntask-speci\ufb01c considerations, as in C-DAC, or whether it is suf\ufb01cient to optimize a generic goal,\nlike that of Infomax. We compare C-DAC and Infomax performance to human data, in terms of\n\ufb01xation choice and duration, from a visual search experiment. We exclude greedy MAP from this\ncomparison, based on the results from our recent work showing that it is an almost random, and thus\nhighly suboptimal strategy for the well-structured visual search task presented here.\nAt a theoretical level, both Infomax and C-DAC are of\ufb02ine algorithms involving iterative computa-\ntion until convergence, and which compute a global policy that speci\ufb01es the optimal action (relative\nto their respective objectives) for every possible setting of previous actions and observations, most\nof which may not be used often or at all. Both of these algorithms suffer the well-known curse\nof dimensionality, and are thus dif\ufb01cult, if not impossible, to generalize to more complex, real-\nworld problems. Humans seem capable of planning and decision-making in very high-dimensional\nsettings, while readily adapting to different behavioral context. It therefore behooves us to \ufb01nd a\ncomputationally inexpensive strategy that is nevertheless context-sensitive. Here, we consider an\napproximate algorithm that chooses actions online and myopically, by considering the behavioral\ncost of looking only one step ahead (instead of an in\ufb01nite horizon as in the optimal C-DAC policy).\nIn Sec. 2, we brie\ufb02y summarize C-DAC and Infomax, as well as introduce the myopic approximation\nto C-DAC. In Sec. 3, we describe the experiment, present the human behavioral data, and compare\nthe performance of different models to the human data. In Sec. 4, we simulate scenarios where C-\nDAC and myopic C-DAC achieve a \ufb02exible trade-off between speed, accuracy and effort depending\non the task demands, whereas Infomax falls short \u2013 this forms experimentally testable predictions\nfor future investigations. We conclude in Sec. 5 with a discussion of the insights gained from both\nthe experiment and the models, as well as directions for future work.\n\n2 The Models\n\nIn the following, we assume a basic active sensing scenario, which formally translates to a sequential\ndecision making process based on noisy inputs, where the observer can control both the sampling\nlocation and duration. For example, in a visual search task, the observer controls where to look,\nwhen to switch to a different sensing location, and when to stop searching and report the answer.\nAlthough the framework discussed below applies to a broad range of active sensing problems, we\nwill use language speci\ufb01c to visual search for concreteness.\n\n2.1 C-DAC\n\nThis model consists of both an inference strategy and a control/decision strategy. For inference,\nwe assume the observer starts with a prior belief over the latent variable (true target location), and\nthen updates her beliefs via Bayes rule upon receiving each new observation. The observer main-\ntains a probability distribution over the k possible target locations, representing the corresponding\nbelief about the presence of the target in that location (belief state). Thus, if s is the target loca-\ntion (latent), \u03bbt := {\u03bb1, . . . , \u03bbt} is the sequence of \ufb01xation locations up to time t (known), and\n\n2\n\n\fxt := {x1, . . . , xt} is the sequence of observations up to time t (observed), the belief state and the\nbelief update rule are:\n\npt := (P (s = 1|xt; \u03bbt), . . . , P (s = k|xt; \u03bbt))\nt = P (s = i|xt; \u03bbt) \u221d p(xt|s = i; \u03bbt)P (s = i|xt\u22121; \u03bbt\u22121) = fs,\u03bbt(xt)pi\npi\n\nt\u22121\n\n(1)\n\nwhere fs,\u03bb(xt) is the likelihood function, and p0 the prior belief distribution over target location.\nFor the decision component, C-DAC optimizes the mapping from the belief state to the action space\n(continue, switch to one of the other sensing locations, stop and report the target location) with\nrespect to a behavioral cost function. If the target is at location s, and the observer declares it to be\nat location \u03b4, after spending \u03c4 units of time and making n\u03c4 number of switches between potential\ntarget locations, then the total cost incurred is given by:\n\nl(\u03c4, \u03b4; \u03bb\u03c4 , s) = c\u03c4 + csn\u03c4 + 1{\u03b4(cid:54)=s}\n\n(2)\n\nwhere c is the cost per unit time, cs is the cost per switch, and cost of making a wrong response is\n1 (since we can always make one of the costs to be unity via normalization). For any given policy\n\u03c0 (mapping belief state to action), the expected cost is L\u03c0 := cE[\u03c4 ] + csE[ns] + P (\u03b4 (cid:54)= s). At any\ntime t, the observer can either choose to stop and declare one of the locations to be the target, or\nchoose to continue and look at location \u03bbt+1. Thus, the expected cost associated with stopping and\ndeclaring location i to be the target is:\n\nt(pt, \u03bbt) := E[l(t, i)|pt, \u03bbt] = ct + csnt + (1\u2212pi\n\u00afQi\nt)\n\n(3)\n\nAnd the minimum expected cost for continuing sensing at location j is:\n\nQj\n\nt (pt = p, \u03bbt) := c(t + 1) + cs(nt + 1{j(cid:54)=\u03bbt}) + min\n\u03c4(cid:48),\u03b4,\u03bb\u03c4(cid:48)\n\nE[l(\u03c4(cid:48), \u03b4)|p0 = p, \u03bb1 = j]\n\n(4)\n\nThe value function V (p, i), or the expected cost incurred following the optimal policy (\u03c0\u2217), starting\nwith the prior belief p0 = p and initial observation location \u03bb1 = i, is:\nE[l(\u03c4, \u03b4)|p0 = p, \u03bb1 = i] .\n\n(5)\n\nV (p, i) := min\n\u03c4,\u03b4,\u03bb\u03c4\n\nThen the value function satis\ufb01es the following recursive relation (Bellman, 1952), and the action\nthat minimizes the right hand side is the optimal action \u03c0\u2217(p, k):\n\n(cid:17)\n\n, min\n\nj\n\n(cid:0)c + cs1{j(cid:54)=k} + E[V (p(cid:48), j)](cid:1)(cid:19)\n\n(6)\n\n(cid:18)(cid:16)\n\nV (p, k) = min\n\n\u00afQi\n\n1(p, k)\n\nmin\n\ni\n\nThis can be solved using dynamic programming, or more speci\ufb01cally value iteration, whereby we\nguess an initial value of the value function and iterate eq. 6 until convergence.\n\n2.2\n\nInfomax policy\n\nInfomax (Butko & Movellan, 2010) presents a similar formulation in terms of belief state repre-\nsentation and Bayesian inference, however, for the control part, the goal is to maximize long term\ninformation gain (or minimize cumulative future entropy of the posterior belief state). Thus, the\naction-values, value function, and the resultant policy are:\n\nQim(pt, j) =\n\nE[H(pt(cid:48))|\u03bbt+1 = j]; V im(pt, j) = min\n\nj\n\nQim(pt, j); \u03bbim\n\nt+1 = argmin\n\nj\n\nQim(pt, j)\n\nT(cid:88)\n\nt(cid:48)=t+1\n\nInfomax does not directly prescribe when to stop, since there are only continuation actions and no\nstopping action. A general heuristic used for such strategies is to stop when the con\ufb01dence in one of\nthe locations being the target (the belief about that location) exceeds a certain threshold, which is a\n\n3\n\n\ffree parameter challenging to set for any speci\ufb01c problem. In our recent work we used an optimistic\nstrategy for comparing Infomax with C-DAC by giving Infomax a stopping boundary that is \ufb01t to\nthe one computed by C-DAC. Here we present a novel theoretical result that gives an inner bound\nof the stopping region, obviating the need to do a manual \ufb01t. The bound is sensitive to the sampling\ncost c and the signal-to-noise ratio of the sensory input, and underestimates the size of the stopping\nregion.\nAssuming that the observations are binary and Bernoulli distributed (i.i.d. conditioned on target and\n\ufb01xation locations), i.e.:\n\nfs,\u03bb(x) = p(x|s = i; \u03bb = j) = 1{i=j}\u03b2x(1 \u2212 \u03b2)1\u2212x + 1{i(cid:54)=j}(1 \u2212 \u03b2)x\u03b21\u2212x\n\n(7)\n\nWe can state the following result:\nTheorem 1. If p\u2217 is the solution of the equation:\n\n(2\u03b2 \u2212 1)(1 \u2212 p)\n\n\u03b2p + (1 \u2212 \u03b2)(1 \u2212 p)\n\np\n\n= c\n\nwhere c is the cost per unit time as de\ufb01ned in sec. 2.1, then for all pi > p\u2217, the optimal action is to\nstop and declare location i under the cost formulation of C-DAC.\n\nProof. The cost incurred for collecting each new sample is c. Therefore stopping is optimal when\nthe improvement in belief from collecting another sample is less than the cost incurred to collect that\nsample. Formally, stopping and choosing i is optimal for the corresponding belief pi = p when:\n\nwhere P is the set of achievable beliefs starting from p. Furthermore, if we solve the above equation\nfor equality, to \ufb01nd p\u2217, then by problem construction, it is always optimal to stop for p > p\u2217\n(stopping cost (1 \u2212 p) < (1 \u2212 p\u2217)). Given the likelihood function fs,\u03bb(x) (eq. 7), we can use eq. 1\nto simplify the above relation to:\n\np(cid:48)\u2208P(p(cid:48)) \u2212 p \u2264 c\n\nmax\n\n(2\u03b2 \u2212 1)(1 \u2212 p)\n\n\u03b2p + (1 \u2212 \u03b2)(1 \u2212 p)\n\n= c\n\np\n\n2.3 Myopic C-DAC\n\nThis approximation attempts to optimize the contextual cost proposed in C-DAC, but only for one\nstep in the future. In other words, the planning is based on the inherent assumption that the next\naction is the last action permissible, and so the goal is to minimize the cost incurred in this single\nstep. The actions thus available are, stop and declare the current location as the target, or choose\nanother sensing location before stopping. Similar to eq. 6, we can write the value function as:\n\n(cid:18)(cid:0)1 \u2212 pk(cid:1) , min\n\n(cid:18)\n\n(cid:0)1 \u2212 E[plj ](cid:1)(cid:19)(cid:19)\n\nV (p, k) = min\n\nc + cs1{j(cid:54)=k} + min\nlj\n\nj\n\nwhere j indexes the possible sensing locations, and lj indexes the possible stopping actions for the\nsensing location j.\nNote that the value function computation does not involve any recursion, just a comparison between\nsimple-to-compute action values for different actions. For the visual search problem considered\nbelow, because the stopping action is restricted to only the current sensing location, lj = j, the\nright-hand side simpli\ufb01es to\n\n(cid:18)(cid:0)1 \u2212 pk(cid:1) , min\n(cid:18)(cid:0)1 \u2212 pk(cid:1) , min\n\nj\n\n(cid:0)c + cs1{j(cid:54)=k} +(cid:0)1 \u2212 E[pj](cid:1)(cid:1)(cid:19)\n(cid:0)c + cs1{j(cid:54)=k} +(cid:0)1 \u2212 pj(cid:1)(cid:1)(cid:19)\n\nj\n\nV (p, k) = min\n\n= min\n\n(8)\n\n(9)\n\nthe last equality due to p being a martingale. It can be seen, therefore, that this myopic policy\noverestimates the size of the stopping region: if there is only step left, it is never optimal to continue\nlooking at the same location, since such an action would not lead to any improvement in expected\naccuracy, but incur a unit cost of time c. Therefore, in the simulations below, just like for Infomax,\nwe set the stopping boundary for myopic C-DAC using the bound presented in Theorem 1.\n\n4\n\n\f3 Case Study: Visual Search\n\nIn this section, we apply the different active sensing models discussed above to a simple visual\nsearch task, and compare their performance with the observed human behavior in terms of accuracy\nand \ufb01xation duration.\n\n3.1 Visual search experiment\n\nThe task involves \ufb01nding a target (the patch with dots moving to the left) amongst two distractors\n(the patches with dots moving to the right), where a patch is a stimulus location possibly containing\nthe target. The de\ufb01nition of target versus distractor is counter-balanced across subjects. Fig. 1 shows\nschematic illustration of the task at three time points in a trial. The display is gaze contingent, such\nthat only the location currently \ufb01xated is visible on the screen, allowing exact measurement of where\na subject obtains sensory input. At any time, the subject can declare the current \ufb01xation location to\nbe the target by pressing space bar. Target location for each trial is drawn independently from the\n\ufb01xed underlying distribution (1/13, 3/13, 9/13), with the spatial con\ufb01guration \ufb01xed during a block\nand counter-balanced across blocks. As search behavior only systematically differed depending on\nthe probability of a patch containing a target, and not on its actual location, we average data across\nall con\ufb01gurations of spatial statistics and differentiate the patches only by their prior likelihood of\ncontaining the target; we call them patch 1, patch 3, and patch 9, respectively. The study had 11\nparticipants, each presented with 6 blocks (counterbalanced for different likelihoods: 3! = 6), with\neach block consisting of 90 trials, leading to a total of 5940 trials. Subjects were rewarded points\nbased on their performance, more if they got the answer correct (less if they got it wrong), and\npenalized for total search time as well as the number of switches in sensing location.\n\nFigure 1: Simple visual search task, with gaze contingent display.\n\n3.2 Comparison of Model Predictions and Behavioral Data\n\nIn the model, we assume binary observations (eq. 7), which are more likely to be 1 if the location\ncontains the target, and more likely to be 0 if it contains a distractor (the probabilities sum to 1,\nsince the left and right-moving stimuli are statistically/perceptually symmetric). We assume that\nwithin a block of trials, subjects learn about the spatial distribution of target location in that block\nby inverting a Bayesian hidden Markov model, related to the Dynamic Belief Model (DBM) (Yu\n& Cohen, 2009). This implies that the target location on each trial is generated from a categorical\ndistribution, whose underlying rates at the three locations are, with probability \u03b1, the same as last\ntrial and, probability 1 \u2212 \u03b1, redrawn from a prior Dirichlet distribution. Even though the target\ndistribution is \ufb01xed in a block, we use DBM with \u03b1 = 0.8 to capture the general tendency of human\nsubjects to typically rely more on recent observations than distant ones in anticipating upcoming\nstimuli. We assume that subjects choose the \ufb01rst \ufb01xation location on each trial as the option with\nthe highest prior probability of containing the target. The subsequent \ufb01xation decisions are made\nfollowing a given control policy (C-DAC, Infomax or Myopic C-DAC).\nWe investigate how well these policies explain the emergence of a certain con\ufb01rmation bias in hu-\nmans \u2013 the tendency to favor the more likely (privileged) location when making a decision about\ntarget location. We focus on this particular aspect of behavioral data because of two reasons: (1)\nThe more obvious aspects (e.g. where each policy would choose to \ufb01xate \ufb01rst) are also the more\ntrivial ones that all reasonable policies would display (e.g. the most probable one); (2) Con\ufb01rmation\n\n5\n\n\fbias is a well studied, psychologically important phenomenon exhibited by humans in a variety of\nchoice and decision behavior (see (Nickerson, 1998), for a review), and is, therefore, important to\ncapture in its own right.\n\nFigure 2: Con\ufb01rmation bias in human data and model simulations. The parameters used for C-DAC\npolicy are (c, cs, \u03b2) = (0.005, 0.1, 0.68). The stopping thresholds for both Infomax and myopic\nC-DAC are set using the bound developed in Theorem 1. The spatial prior for each trial, used by\nall three algorithms, is produced by running DBM on the actual experimental stimulus sequences\nexperienced by subjects. Units for \ufb01xation duration: millisecond (experiment), number of time-steps\n(simulations)\n\nBased on the experimental data (Fig. 2), we observe this bias in \ufb01xation choice and duration. Sub-\njects are more likely to identify the 9 patch to contain the target, whether it is really there (\u201chits\u201d, left\ncolumn) or not (\u201cfalse alarms\u201d, middle column). This is not due to a potential motor bias (tendency\nto assume the \ufb01rst \ufb01xation location contains the target, combined with \ufb01rst \ufb01xating the 9 patch most\noften), as we only consider trials where the subject \ufb01rst \ufb01xates the relevant patch. The con\ufb01rmation\nbias is also apparent in \ufb01xation duration (right column), as subjects \ufb01xate the 9 patch shorter than\nthe 1 & 3 patches when it is the target (as though faster to con\ufb01rm), and longer when it is not the\ntarget (as though slower to be dissuaded). Again, only those trials where the \ufb01rst \ufb01xation landed\non the relevant patch are included. As shown in Figure 2, these con\ufb01rmation bias phenomena are\ncaptured by both C-DAC and myopic C-DAC, but not by Infomax.\n\n6\n\n\fOur results show that human behavior is best modeled by a control strategy (C-DAC or myopic C-\nDAC) that takes into account behavior costs, e.g. related to time and switching. However, C-DAC\nin its original formulation is arguably not very psychologically plausible. This is because C-DAC\nrequires using dynamic programming (recursing Bellman\u2019s optimal equation) of\ufb02ine to compute a\nglobally optimal policy over the continuous state space (belief state), so that the discretized state\nspace scales exponentially in the number of hypotheses. We have previously proposed families\nof parametric and non-parametric approximations, but these still involve large representations, and\nrecursive solutions. On the other hand, myopic C-DAC incurs just a constant cost to compute the\npolicy online for only the current belief state, is consequently psychologically more plausible, and\nprovides a qualitative \ufb01t to the data with a simple threshold bound. We believe its performance can\nbe improved by using a tighter bound to approximate the stopping region. Infomax, on the other\nhand, is not context sensitive, and our experiments suggest that even manually setting its threshold\nto match that of C-DAC does not lead to substantial improvement in performance (not shown).\n\n4 Model Predictions\n\nWith the addition of the parametric threshold to Infomax and myopic C-DAC, we discover the wider\ndisparity which we earlier observed between C-DAC and Infomax disappears for a large class of\nparameter settings, since now the stopping boundary for Infomax is also context sensitive. Similar\nclaim holds for myopic C-DAC. However, one scenario where Infomax does not catch up to the full\ncontext sensitivity of C-DAC, is when cost of switching from one sensing location to another comes\nin to play. This is due to the rigid switching boundaries of Infomax. In contrast, myopic C-DAC\ncan adjust its switching boundary depending on context. We illustrate the same for the case when\n(c, cs, \u03b2) = (0.1, 0.1, 0.9) in Fig. 3.\n\nFigure 3: Different policies for the environment (c, cs, \u03b2) = (0.1, 0.1, 0.9), as de\ufb01ned on the belief\nstate (p1, p2), under af\ufb01ne transform to preserve rotational symmetry. Blue: stop & declare. Green:\n\ufb01xate location 1. Orange: \ufb01xate location 2. Brown: \ufb01xate location 3.\n\nWe show in Fig. 4 how the differences in policy space translate to behavioral differences in terms\nof accuracy, search time, number of switches, and total behavioral cost (eq. 2). As with the previ-\nous results, we set the threshold using the bound developed in Theorem 1. Note that, as expected,\nthe performance of Infomax and Myopic C-DAC are closely matched on all measures for the case\ncs = 0. The accuracy of C-DAC is poorer as compared to the other two, because the threshold\nused for the other policies is more conservative (thus stopping and declaration happens at higher\ncon\ufb01dence, leading to higher accuracy), but C-DAC takes less time to reach the decision. Looking\nat the overall behavioral costs, we can see that although C-DAC loses in accuracy, it makes up at\nother measures, leading to a comparable net cost. For the case when cs = 0.1, we notice that the\naccuracy and search time are relatively unchanged for all the policies. However, C-DAC has a no-\ntable advantage in terms of number of switches, while the number of switches remain unchanged for\nInfomax. This case exempli\ufb01es the context-sensitivity of C-DAC and Myopic C-DAC, as they both\nreduce number of switches when switching becomes costly. When all these costs are combined we\nsee that C-DAC incurs the minimum overall cost, followed by Myopic C-DAC, and Infomax incurs\nthe highest cost due to its lack of \ufb02exibility for a changed context. Thus Myopic C-DAC, a very\nsimple approximation to a computationally complex policy C-DAC, still retains context sensitivity,\nwhereas Infomax with complexity comparable to C-DAC falls short.\n\n7\n\n\fFigure 4: Comparison between C-DAC, Infomax and Myopic C-DAC (MC-DAC) for two environ-\nments (c, cs, \u03b2) = (0.005, 0, 0.68) and (0.005, 0.1, 0.68). For cs > 0, the performance of C-DAC is\nbetter than MC-DAC which in turn is better than Infomax.\n\n5 Discussion\n\nIn this paper, we presented a novel visual search experiment that involves \ufb01nding a target amongst\na set of distractors differentiated only by the stimulus characteristics. We found that the \ufb01xation\nand choice behavior of subjects is modulated by top-down factors, speci\ufb01cally the likelihood of a\nparticular location containing the target. This suggests that any purely bottom-up, saliency based\nmodel would be unable to fully explain human behavior. Subjects were found to exhibit a certain\ncon\ufb01rmation bias \u2013 the tendency to systematically favor a location that is a priori judged more likely\nto contain the target, compared to another location less likely to contain the target, even in the face\nof identical sensory input and motor state. We showed that C-DAC, a context-sensitive policy we\nrecently introduced, can reproduce this bias. In contrast, a policy that aims to optimize statistical\nobjectives of task demands and ignores behavioral constraints (e.g. cost of time and switch), such as\nInfomax (Lee & Yu, 2000; Itti & Baldi, 2006; Butko & Movellan, 2010), falls short. We proposed\na bound on the stopping threshold that allows us to set the decision boundary for Infomax, by\ntaking into account the time or sampling cost c, but that still does not suf\ufb01ciently alleviate the\ncontext-insensitivity of Infomax. This is most likely due to both a sub-optimal incorporation of\nsampling cost and an intrinsic lack of sensitivity toward switching cost, because there is no natural\nway to compare a unit of switching cost with a unit of information gain. To set the stage for future\nexperimental research, we also presented a set of predictions for scenarios where we expect the\nvarious models to differ the most.\nWhile C-DAC does a good job of matching human behavior, at least based on the behavioral metrics\nconsidered here, we note that this does not necessarily imply that the brain implements C-DAC ex-\nactly. In particular, solving C-DAC exactly using dynamic programming requires a representational\ncomplexity that scales exponentially with the dimensionality of the search problem (i.e. the number\nof possible target locations), thus making it an impractical solution for more natural and complex\nproblems faced daily by humans and animals. For this reason, we proposed a myopic approximation\nto C-DAC that scales linearly with search dimensionality, by eschewing a globally optimal solu-\ntion that must be computed and maintained of\ufb02ine, in favor of an online, approximately and locally\noptimal solution. This myopic C-DAC algorithm, by retaining context-sensitivity, was found to nev-\nertheless reproduce critical \ufb01xation choice and duration patterns, such as the con\ufb01rmation bias, seen\nin human behavior. However, exact C-DAC was still better than myopic C-DAC at reproducing hu-\nman data, leaving room for \ufb01nding other approximations that explain brain computations even better.\nOne possibility is to \ufb01nd better approximations to the switching and stopping boundary, since these\ntogether completely characterize any decision policy, and we previously showed that there might be\na systematic, monotonic relationship between the decision boundaries and the different cost param-\neters (Ahmad & Yu, 2013). We proposed one such bound on the stopping boundary here, and other\napproximate bounds have been proposed for similar problems (Naghshvar & Javidi, 2012). Further\ninvestigations are needed to \ufb01nd more inexpensive, yet context-sensitive active sensing policies,\nthat would not only provide a better explanation for brain computations, but yield better practical\nalgorithms for active sensing in engineering applications.\n\n8\n\n\fReferences\nAhmad, S., & Yu, A. (2013). Active sensing as bayes-optimal sequential decision-making. Uncer-\n\ntainty in Arti\ufb01cial Intelligence.\n\nBellman, R. (1952). On the theory of dynamic programming. PNAS, 38(8), 716-719.\nButko, N. J., & Movellan, J. R. (2010). Infomax control of eyemovements. IEEE Transactions on\n\nAutonomous Mental Development, 2(2), 91-107.\n\nItti, L.\n\n(2005). Quantifying the contribution of low-level saliency to human eye movements in\n\ndynamic scenes. Visual Cognition, 12(6), 1093-1123.\n\nItti, L., & Baldi, P.\n\n(2006). Bayesian surprise attracts human attention.\n\ninformation processing systems, vol. 19 (p. 1-8). Cambridge, MA: MIT Press.\n\nIn Advances in neural\n\nItti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual\n\nattention. Vision Research, 40(10-12), 1489-506.\n\nKoch, C., & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural\n\ncircuitry. Hum. Neurobiol..\n\nLee, T. S., & Yu, S. (2000). An information-theoretic framework for understanding saccadic be-\nhaviors. In Advance in neural information processing systems (Vol. 12). Cambridge, MA: MIT\nPress.\n\nNaghshvar, M., & Javidi, T.\n\narXiv:1203.4626.\n\n(2012). Active sequential hypothesis testing.\n\narXiv preprint\n\nNajemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature,\n\n434(7031), 387-91.\n\nNickerson, R. S. (1998). Con\ufb01rmation bias: a ubiquitous phenomenon in many guises. Review of\n\nGeneral Psychology, 2(2), 175.\n\nYarbus, A. F. (1967). Eye movements and vision. New York: Plenum Press.\nYu, A. J., & Cohen, J. D. (2009). Sequential effects: Superstition or rational behavior? Advances in\n\nNeural Information Processing Systems, 21, 1873-80.\n\n9\n\n\f", "award": [], "sourceid": 1349, "authors": [{"given_name": "Sheeraz", "family_name": "Ahmad", "institution": "UC San Diego"}, {"given_name": "He", "family_name": "Huang", "institution": "UC San Diego"}, {"given_name": "Angela", "family_name": "Yu", "institution": "UC San Diego"}]}