Robert Peters, Laurent Itti
Current computational models of bottom-up and top-down components of atten- tion are predictive of eye movements across a range of stimuli and of simple, ﬁxed visual tasks (such as visual search for a target among distractors). How- ever, to date there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traﬃc. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down rel- evance, and looking for changes in the predictive power of these components at diﬀerent critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and ﬂight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance mod- els exhibit reliable temporal signatures during critical event windows in the task sequence—for example, when the game player directly engages an enemy plane in a ﬂight combat game, the predictive strength of the salience model increases signiﬁcantly, while that of the relevance model decreases signiﬁcantly. Our new framework combines these temporal signatures to implement several event detec- tors. Critically, we ﬁnd that an event detector based on fused behavioral and stim- ulus information (in the form of the model’s predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image in- formation alone (model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings.