{"title": "Optimal cue selection strategy", "book": "Advances in Neural Information Processing Systems", "page_first": 987, "page_last": 994, "abstract": null, "full_text": "Optimal cue selection strategy\n\nVidhya Navalpakkam Department of Computer Science USC, Los Angeles navalpak@usc.edu\n\nLaurent Itti Department of Computer Science USC, Los Angeles itti@usc.edu\n\nAbstract\nSurvival in the natural world demands the selection of relevant visual cues to rapidly and reliably guide attention towards prey an d predators in cluttered environments. We investigate whether our visu al system selects cues that guide search in an optimal manner. We formall y obtain the optimal cue selection strategy by maximizing the signal to noise ratio (S N R) between a search target and surrounding distractors. This optimal strategy successfully accounts for several phenom ena in visual search behavior, including the effect of target-distracto r discriminability, uncertainty in target's features, distractor heterogenei ty, and linear separability. Furthermore, the theory generates a new predict ion, which we verify through psychophysical experiments with human subj ects. Our results provide direct experimental evidence that humans sel ect visual cues so as to maximize S N R between the targets and surrounding clutter.\n\n1 Introduction\nDetecting a yellow tiger among distracting foliage in different shades of yellow and brown requires efficient top-down strategies that select relevan t visual cues to enable rapid and reliable detection of the target among several distractors . For simple scenarios such as searching for a red target, the Guided Search theory [17] pre dicts that search efficiency can be improved by boosting the red feature in a top-down manner. But for more complex and natural scenarios such as detecting a tiger in the jungle or l ooking for a face in a crowd, finding the optimum amount of top-down enhancement to be applied to each low-level feature dimension encoded by the early visual system is non-trivial. It must not only consider features present in the target, but also those present in the distractors. In this paper, we formally obtain the optimal cue selection strategy and investigate whether our visual system has evolved to deploy it. In section 2, we formulate cue selection as an optimization problem where the relevant goal is to maximize the signal to noise ratio (S N R) of the saliency map, so that the target becomes most salient and quickly draws attention, thereby minimizing search time. Next, we show through simulations that this optimal top-down guided search theory successfully accounts for several observed p henomena in visual search behavior, such as the effect of target-distractor discrimina bility, uncertainty in target's features, distractor heterogeneity, linear separability, an d more. In section 4, we describe the design and analysis of psychophysics experiments to test new, counter-intuitive predictions of the theory. The results of our study suggest that humans de ploy optimal cue selection strategies to detect targets in cluttered and distracting e nvironments.\n\n\f\n2 Formalizing visual search as an optimization problem\nTo quickly find a target among distractors, we wish to maximiz e the salience of the target relative to the distractors. Thus we can define the signal to n oise ratio (S N R) as the ratio of salience of the target to the distractors. Assuming that visual cues or features are encoded by populations of neurons in early visual areas, we define the optimal cue selection strategy as the best choice of neural response gain that maximizes the signal to noise ratio (S N R). In the rest of this section, we formally obtain the optimal choice of gain in neural responses that will maximize S N R. S N R in a visual search paradigm: In a typical visual search paradigm, the salience of the target and distractors is a random variable that depends on t heir location in the search array, their features, the spatial configuration of target and dist ractors, and that varies between identical repeated trials due to internal noise in neural re sponse to the visual input. Hence, we express S N R as the ratio of expected salience of the target over expected salience of the distractors, with the expectation taken over all possib le target and distractor locations, their features and spatial configurations, and over several repeated trials. Mean salience of the Target S N R = Mean salience of the distractor\n\nSearch array and its stimuli: Let search array A be a two-dimensional display that consists of one target T and several distractors Dj (j = 1...N 2 -1). Let the display be divided into an invisible N N grid, with one item occuring at each cell (x, y ) in the grid. Let the color, contrast, orientation and other target paramete rs T be chosen from a distribution P (|T ). Similarly, for each distractor Dj , let its parameters Dj be sampled independently from a distribution P (|D). Thus, search array A has a fixed choice of target and distractor parameters. Next, the spatial configuration C is decided by a random permutation of some assignment of the target and distractors to the N 2 cells in A (such that there is exactly one item in each cell). Thus, for a given search array A, the spatial configuration as well as stimulus parameters are fixed. Finally, given a choice of par ameter and its spatial location (x, y ), we generate an image pattern R() (a set of pixels and their values) and embed it at location (x, y ) in search array A. Thus, we generate search array A. Saliency computation: Let the input search array A be processed by a population of neurons with gaussian tuning curves tuned to different st imulus parameters such as 1 , 2 , ...n . The output of this early visual processing stage is used to compute saliency maps si (x, y , A) of search array A, that consist of the visual salience at every location (x, y ) for feature-values i (i = 1...n). Let si (x, y , A) be combined linearly to form S (x, y , A), the overall salience at location (x, y ). Further, assuming a multiplicative gain gi on the ith saliency map, we obtain: i gi si (x, y , A) (1 ) S (x, y , A) = Salience of the target and distractors: Let ST (A) be a random variable representing the salience of the target T in search array A. To factor out the variability due to internal noise , we consider E [ST (A)], which is the mean salience of the target over repeated identical presentations of A. Further, let EC [ST (A)] be the mean salience of the target averaged over all spatial configurations of a given set of target and distractor parameters. Similarly, E|T [ST (A)] is the mean salience of the target over all target parameters . The mean salience of the target combined over several repeated p resentations of the search array A (to factor out internal noise ), over all spatial configurations C , and over all choices of\n\n\f\ntarget parameters |T is given below. Further, since , C and are independent random variables, we can rewrite the joint expectation as follows: E [ST (A)] = E|T [EC [E [ST (A)]]] (2 ) Let SD (A) represent the mean salience of distractors Dj (j = 1...N 2 -1) in search array A. Similar to computing the mean salience of the target, we find the mean salience of distractors over all , C and |D. (3 ) SD (A) = EDj [siDj (A)] E [SD (A)] = E|D [EC [E [SD (A)]]] (4 ) S N R and its optimization: The additive salience and multiplicative gain hypothesis in eqn. 1 yields the following: in gi E|T [EC [E [siT (A)]]] (5 ) E [ST (A)] =\n=1\n\nE [SD (A)]\n\n=\n\nin\n\ngi E|T [EC [E [siT (A)]]] (similarly)\n\n(6 )\n\n=1\n\nS N R can be expressed in terms of salience as: n gi E|T [EC [E [siT (A)]]] i n=1 SN R = (7 ) i=1 gi E|D [EC [E [siD (A)]]] We wish to find the optimal choice of gi that maximises S N R. Hence, we differentiate S N R wrt gi to get the following: n gj E|T [EC [E [sjT (A)]]] E|T [EC [E [siT (A)]]] j n=1 E|D [EC [E [siD (A)]]] - gj E|D [EC [E [sjD (A)]]] j =1 n (8 ) SN R = gj E|D [EC [E [sjD (A)]]] gi j =1\nE|D [EC [E [siD (A)]]]\n\ncreased or maintained at the baseline activation 1 in order to maximize S N R.\nS N Ri SN R < = > 1\n\n(9 ) i where i is a normalization term and S N Ri is the signal-to-noise ratio of the ith saliency m ap . S N Ri = E|T [EC [E [siT (A)]]]/E|D [EC [E [siD (A)]]] (1 0 ) g d The sign of the derivative, dgi S N R tells us whether gi should be increased, dei =1\n\n=\n\nS N Ri SN R\n\n-1\n\nd S N R < 0 S N R increases as gi decreases gi < 1 dg i d S N R = 0 S N R does not change with gi gi = 1 1 dg i d S N R > 0 S N R increases as gi increases gi > 1 1 dg i\n\n(11) (12) (13)\n\nThus, we obtain an intuitive result that gi increases as S N Ri increases. We simplify this SN R monotonic relationship assuming proportionality. Furthe r, if we impose a restriction that the gains cannot be increased indiscriminately, but must sum to some constant, say the total number of saliency maps (n), we have the following: S N Ri let gi (1 4 ) SN R i S N Ri i (1 5 ) gi = n gi = if\nS N Ri n\n\n\f\nThus the gain of a saliency map tuned to a band of feature-values depends on the strength of the signal-to-noise ratio in that band compared to the mean signal-to-noise ratio in all bands in that feature dimension.\n\n3 Predictions of the optimal cue selection strategy\nTo understand the implications of biasing features accordi ng to the optimal cue selection strategy, we simulate a simple model of early visual cortex. We assume that each feature dimension is encoded by a population of neurons with overlapping gaussian tuning curves that are broadly tuned to different features in that dimensi on. Let fi () represent the tuning curve of the ith neuron in a population of broadly tuned neurons with overlap ping tuning curves. Let the tuning width and amplitude a be equal for all neurons, and i represent the preferred stimulus parameter (or feature) of the ith neuron. ( - a ( - i )2 fi () = exp 16) 2 2 Let r((x, y , A)) = {r1 ((x, y , A))...rn ((x, y , A))} be the population response to a stimulus parameter (x, y , A) at a location (x, y ) in search array A, where ri refers to the response of the ith neuron and n is the total number of neurons in the population. Let the neural response ri ((x, y , A)) be a Poisson random variable. P (ri ((x, y , A)) = z ) = Pfi ((x,y,A)) (z ) (1 7 ) For simplicity, let's assume that the local neural response ri ((x, y , A)) is a measure of salience si (x, y , A). Using eqns. 2, 4, 10, 16, 17, we can derive the mean salience of the target and distractor, and use it to compute S N Ri . si (x, y , A) E [siT (A)] E [siD (A)] S N Ri = ri ((x, y , A)) = E|T [fi ()] = E|D [fi ()] = E|T [fi ()] E|D [fi ()] (1 8 ) (1 9 ) (2 0 ) (2 1 )\n\nFinally, the gains gi on each saliency map can be found using eqn. 15. Thus, for a given distribution of stimulus parameters for the target P (|T ) and distractors P (|D), we simulate the above model of early visual cortex, compute salience of target and distractors, compute S N Ri and obtain gi . In the rest of this section, we plot the distribution of optimal choice of gains gi for an exhaustive list of conditions where knowledge of the target and distractors varies from complete certainty to uncertai nty. Unknown target and distractors: In the trivial case where there is no knowledge of the target and distractors, all cues are equally relevant and th e optimal choice of gains is the same as baseline activation (unity). S N R is minimum leading to a slow search. This prediction is consistent with visual search experiments th at observe slow search when the target and distractors are unknown due to reversal between t rials [1, 2]. Search for a known target: During search for a known target, the optimal strategy predi cts that S N R can be maximised by boosting neurons according to how strong ly they respond to the target feature (as shown in figure 1, predicted S N R is 12.2 dB). Thus, a neuron that is optimally tuned to the target feature receives maximal ga in. This prediction is consistent with single unit recordings on feature-based attention whi ch show that the gain in neural response depends on the similarity between the neuron's pre ferred feature and the target feature [3, 4]. Role of uncertainty in target features: When there's uncertainty in the target's features, i.e., when the target's parameter assumes multiple values a ccording to some probability\n\n\f\ndistribution P (|T ), the optimal strategy predicts that S N R decreases, leading to a slower search (as shown in figure 1, S N R decreases from 12.2 dB to 9 dB ). This result is consistent with psychophysics experiments which suggest that better knowledge of the target leads to faster search [5, 6]. Distractor heterogeneity: While searching for an unknown target among known distractors, the optimal strategy predicts that S N R can be maximised by suppressing the neurons tuned to the distractors (see figure 1). But as we increase dis tractor heterogeneity or the number of distractor types, it predicts a decrease in S N R (from 36 dB to 17 dB, figure 1). This result is consistent with experimental data [10]. Discriminability between target and distractors: Several experiments and theories have studied the effect of target-distractor discriminability [10]-[17]. The optimal cue selection strategy also shows that if the target and distractors are ve ry different or highly discriminable, S N R is high and the search is efficient (S N R = 51.4 dB, see figure 1). Otherwise, if they are similar and not well separated in feature space, S N R is low and the search is hard (S N R = 16.3 dB, see figure 1). Moreover, during search for a less dis criminable target from distractors, the optimal strategy predicts tha t the neuron optimally tuned to the target may not be boosted maximally. Instead, a neuron that is sub-optimally tuned to the target and farther away from the distractors receives maxim al gain. This new and counterintuitive prediction is tested by visual search experiment s described in the next section. Linear separability effect: The optimal strategy also predicts the linear separability effect [18, 19] which suggests that when the target and distractors are less discriminable, search is easier if the target and distractors can be separated by a l ine in feature space (see figure 1). This effect has been demonstrated in size (e.g., search for the smallest or largest item is faster than search for a medium-sized item in the display)[2 0], chromaticity and luminance [21, 19], and orientation [22, 23].\n\n4 Testing new predictions of the optimal cue selection strategy\nIn this section, we describe the design and analysis of psychophysics experiments to verify the counter-intuitive prediction mentioned in the previou s section, i.e., during searching for a target that is less discriminable from the distractors, a n euron that is sub-optimally tuned to the target's feature will be boosted more than a neuron tha t is optimally tuned to the target's feature. 4.1 Design of psychophysics experiments Our experiments are designed in two phases: phase 1 to set up t he top-down bias and phase 2 to measure the bias. Phase 1 - Setup the top-down bias: Subjects perform the primary task T1 which is a visual search for the target among distractors. This task se ts the top-down bias on cues so that the target becomes the most salient item in the display, thus accelerating target detection. Subjects are trained on T1 trials until their per formance stabilises with at least 80% accuracy. They are instructed to find the target ( 55 tilt) among several distractors (50 tilt). The target and distractors are the same for all T1 tria ls. To avoid false reports (which may occur due to boredom or lack of attention) and to ve rify that subjects indeed find the target, we introduce a novel no cheat scheme as follows: After finding the target among distractors, subjects press any key. Following the key press, we flash a grid of fineprint random numbers briefly (120ms) and ask subjects to r eport the number at the target's location. Online feedback on accuracy of report is provided. Thus, the top-down bias is set up by performing T1 trials.\n\n\f\nP( | T) and P( | D)\nProbability\n\nMean response to T and D\nMean firing rate Response gain\n\nOptimal response gain\n\na)\n\nParameter\n\nNeuron's preferred\n\nNeuron's preferred\n\nb)\n\nc)\n\nd)\n\ne)\n\nf)\n\ng)\n\nh)\nFigure 1: a) Search for a known target left: Prior knowledge P (|T ) has a peak at the known target feature and P (|D) is flat as the distractor is unknown, middle: The expected res ponses of a population of neurons to the target is highest for neurons t uned around the target's while the expected response to the distractors is flat, right: The opti mal response gain in this situation is to boost the gain of the neurons that are tuned around the target 's ; b) Search for an uncertain target; c) Unknown target among a known distractor; d) Presence of he terogeneous distractors; e) High discriminability between target and distractors; f) Low di scriminability; g) Search for an extreme feature (linearly separable) among others; h) Search for a m id feature (nonlinearly separable) among others.\n\n\f\nSubject 1\n\nSubject 2\n9\n\nSubject 3\n12\n8 7 6\n\nSubject 4\n\nNumber of reports\n\nP\n\n7\n\n8 6\n\n10\n5\n\n* *\nCues presented\n\n7\n\n4\n\n*\n\n6\n\n8\n\n5\n\n* *\n80\no\n\n5\n\n* *\n1 2 3\n\n3\n\n4\n\n3\n\n*\n2\n\n6\n\n4\n\n2\n\n* *\no\n1 2\n\n4 3\n\n2\n\n1 1\n\n2\n\n*\n55\no\n3 4\n\n2 1 0\n\n*\n4\n\n0\n\n0\n\n0\n\n60\n\n50\n\no\n\nFigure 2: The results of the T2 trials described in section 4.1 (phase 2) are shown here. For each of the four subjects, the number of reports on the steepest ( 80 ), relevant (60 ), target (55 ) and distractor (50 ) cues are shown in these bar plots. As predicted by the theory, a paired t-test reveals that the number of reports on the relevant cue is significantl y higher (p < 0.05) than the number of reports on the target, distractor and steepest cues, as indi cated by the blue star. Phase 2 - Measure the top-down bias: To measure the top-down bias generated by the above task, we randomly insert T2 trials in between T1 trials . Our theory predicts that during search for the target ( 55) among distractors (50 ), the most relevant cue will be around 60 and not 55 . To test this, we briefly (200ms) flash four cues - steepest (S, 80 ), relevant as predicted by our theory (R, 60 ), target (T, 55 ) and distractor (D, 50 ). A cue that is biased more appears more salient, attracts a sac cade, and gets reported. In other words, the greater the top-down bias on a cue, the highe r the number of its reports. According to our theory, there should be higher number of rep orts on R than T. Experimental details: We ran 4 nave subjects. All were aged 22-30, had normal or i corrected vision, volunteered or participated for course c redit. As mentioned earlier, each subject received training on T1 trials for a few days until th e performance (search speed) stabilised with atleast 80% accuracy. To become familiar with the secondary task, they were trained on 50 T2 trials. Finally, each subject performe d 10 blocks of 50 trials each, with T2 trials randomly inserted in between T1 trials. 4.2 Results For each of the four subjects, we extracted the number report s on the steepest (NS ), relevant (NR ), target (NT ) and distractor (ND ) cues, for each block. We used a paired t test to check for statistically significant differences between NR and NT , ND , NS . Results are shown in figure 2. As predicted by the theory, we found a significantl y higher number of reports on the relevant cue than the target cue.\n\n5 Discussion\nIn this paper, we have investigated whether our visual syste m has evolved to use optimal top-down strategies to select relevant cues that quickly and reliably detect the target among distracting environments. We formally obtained the o ptimal cue selection strategy where cues are chosen such that the signal-to-noise ratio ( S N R) of the saliency map is maximized, thus maximizing the target's salience relative to the distractors. The resulting optimal strategy is to boost a cue or feature if it provide s higher signal-to-noise ratio than average. Through simulations, we confirmed the predict ions of the optimal strategy\n\n\f\nwith existing experimental data on visual search behavior, including the effect of distractor heterogeneity [10], uncertainty in target's features [5, 6 ], target-distractor discriminability [10], linear separabilty effect [18, 19]. Our study complem ents the recent work on optimal eye movement strategies [24]. While we focus on an early stage of visual processing optimal cue selection in order to create a saliency map with m aximum S N R, their study focuses on a later stage of visual processing - optimal saccade generation such that for a given saliency map, the probability of subsequent target detection is maximized. Thus, both optimal cue selection and saccade generation are necessary for optimal visual search.\n\nAcknowledgements\nThis work was supported by the National Science Foundation, National Eye Institute, National Imagery and Mapping Agency, Zumberge Innovation Fund, and Charles Lee Powell Foundation.\n\nReferences\n[1] V Maljkovic and K Nakayama. Mem Cognit, 22(6):657672, Nov 1994. [2] J. M. Wolfe, S. J. Butcher, and M. Hyle. J Exp Psychol Hum Percept Perform, 29(2):483502, 2003. [3] S Treue and J C Martinez Trujillo. Nature, 399(6736):575579, Jun 1999. [4] J. C. Martinez-Trujillo and S. Treue. Curr Biol, 14(9):744751, May 2004. [5] J. M. Wolfe, T. S. Horowitz, N. Kenner, M. Hyle, and N. Vasan. Vision Res, 44(12):14111426, Jun 2004. [6] Timothy J Vickery, Li-Wei King, and Yuhong Jiang. J Vis, 5(1):8192, Feb 2005. [7] A. Triesman and J. Souther. Journal of Experimental Psychology: Human Perception and Performance, 14:107141, 1986. [8] A. Treisman and S. Gormican. Psychological Review 95, 1:1548, 1988. [9] R. Rosenholtz. Percept Psychophys, 63(3):476489, Apr 2001. [10] J Duncan and G W Humphreys. Psychological Rev, 96:433458, 1989. [11] A. L. Nagy and R. R. Sanchez. Journal of the Optical Society of America A 7, 7:12091217, 1990. [12] H. Pashler. Percept Psychophys, 41(4):385392, Apr 1987. [13] K. Rayner and D. L. Fisher. Percept Psychophys, 42(1):87100, Jul 1987. [14] A. Treisman. J Exp Psychol Hum Percept Perform, 17(3):652676, Aug 1991. [15] J. Palmer, P. Verghese, and M. Pavel. Vision Res, 40(10-12):12271268, 2000. [16] J. M. Wolfe, K. R. Cave, and S. L. Franzel. J. Exper. Psychol., 15:419433, 1989. [17] J. M. Wolfe. Psyonomic Bulletin and Review, 1(2):202238, 1994. [18] M. D'Zmura. Vision Research 31, 6:951966, 1991. [19] B. Bauer, P. Jolicoeur, and W. B. Cowan. Vision Research 36, 10:14391465, 1996. [20] A. Treisman and G. Gelade. Cognitive Psychology, 12:97136, 1980. [21] B. Bauer, P. Jolicoeur, and W. B. Cowan. Vision Res, 36(10):14391465, May 1996. [22] J. M. Wolfe, S. R. Friedman-Hill, M. I. Stewart, and K. M. O' Connell. J Exp Psychol Hum Percept Perform, 18(1):3449, Feb 1992. [23] W. F. Alkhateeb, R. J. Morris, and K. H. Ruddock. Spat Vis, 5(2):129141, 1990. [24] J. Najemnik, W. S. Geisler. Nature, 434(7031):387391, Mar 2005.\n\n\f\n", "award": [], "sourceid": 2802, "authors": [{"given_name": "Vidhya", "family_name": "Navalpakkam", "institution": null}, {"given_name": "Laurent", "family_name": "Itti", "institution": null}]}