Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Alan A. Stocker, Eero Simoncelli
It has been demonstrated that basic aspects of human visual motion per- ception are qualitatively consistent with a Bayesian estimation frame- work, where the prior probability distribution on velocity favors slow speeds. Here, we present a refined probabilistic model that can account for the typical trial-to-trial variabilities observed in psychophysical speed perception experiments. We also show that data from such experiments can be used to constrain both the likelihood and prior functions of the model. Specifically, we measured matching speeds and thresholds in a two-alternative forced choice speed discrimination task. Parametric fits to the data reveal that the likelihood function is well approximated by a LogNormal distribution with a characteristic contrast-dependent vari- ance, and that the prior distribution on velocity exhibits significantly heavier tails than a Gaussian, and approximately follows a power-law function.
Humans do not perceive visual motion veridically. Various psychophysical experiments have shown that the perceived speed of visual stimuli is affected by stimulus contrast, with low contrast stimuli being perceived to move slower than high contrast ones [1, 2]. Computational models have been suggested that can qualitatively explain these perceptual effects. Commonly, they assume the perception of visual motion to be optimal either within a deterministic framework with a regularization constraint that biases the solution toward zero motion [3, 4], or within a probabilistic framework of Bayesian estimation with a prior that favors slow velocities [5, 6].
The solutions resulting from these two frameworks are similar (and in some cases identi- cal), but the probabilistic framework provides a more principled formulation of the problem in terms of meaningful probabilistic components. Specifically, Bayesian approaches rely on a likelihood function that expresses the relationship between the noisy measurements and the quantity to be estimated, and a prior distribution that expresses the probability of encountering any particular value of that quantity. A probabilistic model can also provide a richer description, by defining a full probability density over the set of possible "percepts", rather than just a single value. Numerous analyses of psychophysical experiments have made use of such distributions within the framework of signal detection theory in order to model perceptual behavior .
Previous work has shown that an ideal Bayesian observer model based on Gaussian forms
high contrast low contrast y y posterior likelihood y densit y densit posterior likelihood obabilit prior prior obabilit pr pr v^ v^ a visual speed b visual speed
Figure 1: Bayesian model of visual speed perception. a) For a high contrast stimulus, the likelihood has a narrow width (a high signal-to-noise ratio) and the prior induces only a small shift of the mean ^ v of the posterior. b) For a low contrast stimuli, the measurement is noisy, leading to a wider likelihood. The shift is much larger and the perceived speed lower than under condition (a).
for both likelihood and prior is sufficient to capture the basic qualitative features of global translational motion perception [5, 6]. But the behavior of the resulting model deviates systematically from human perceptual data, most importantly with regard to trial-to-trial variability and the precise form of interaction between contrast and perceived speed. A recent article achieved better fits for the model under the assumption that human contrast perception saturates . In order to advance the theory of Bayesian perception and provide significant constraints on models of neural implementation, it seems essential to constrain quantitatively both the likelihood function and the prior probability distribution. In previous work, the proposed likelihood functions were derived from the brightness constancy con- straint [5, 6] or other generative principles . Also, previous approaches defined the prior distribution based on general assumptions and computational convenience, typically choos- ing a Gaussian with zero mean, although a Laplacian prior has also been suggested . In this paper, we develop a more general form of Bayesian model for speed perception that can account for trial-to-trial variability. We use psychophysical speed discrimination data in order to constrain both the likelihood and the prior function.
1 Probabilistic Model of Visual Speed Perception
1.1 Ideal Bayesian Observer
Assume that an observer wants to obtain an estimate for a variable v based on a measure- ment m that she/he performs. A Bayesian observer "knows" that the measurement device is not ideal and therefore, the measurement m is affected by noise. Hence, this observer combines the information gained by the measurement m with a priori knowledge about v. Doing so (and assuming that the prior knowledge is valid), the observer will on average perform better in estimating v than just trusting the measurements m. According to Bayes' rule 1 p(v|m) = p(m|v)p(v) (1) the probability of perceiving v given m (posterior) is the product of the likelihood of v for a particular measurements m and the a priori knowledge about the estimated variable v (prior). is a normalization constant independent of v that ensures that the posterior is a proper probability distribution.
1 Pcum=0.875 )1^ P + cum=0.5 > v2^ P(v 0 v2 a b vmatch vthres
Figure 2: 2AFC speed discrimination experiment. a) Two patches of drifting gratings were displayed simultaneously (motion without movement). The subject was asked to fixate the center cross and decide after the presentation which of the two gratings was moving faster. b) A typical psychometric curve obtained under such paradigm. The dots represent the empirical probability that the subject perceived stimulus2 moving faster than stimulus1. The speed of stimulus1 was fixed while v2 is varied. The point of subjective equality, vmatch, is the value of v2 for which Pcum = 0.5. The threshold velocity vthresh is the velocity for which Pcum = 0.875.
It is important to note that the measurement m is an internal variable of the observer and is not necessarily represented in the same space as v. The likelihood embodies both the mapping from v to m and the noise in this mapping. So far, we assume that there is a monotonic function f (v) : v vm that maps v into the same space as m (m-space). Doing so allows us to analytically treat m and vm in the same space. We will later propose a suitable form of the mapping function f (v).
An ideal Bayesian observer selects the estimate that minimizes the expected loss, given the posterior and a loss function. We assume a least-squares loss function. Then, the optimal estimate ^ v is the mean of the posterior in Equation (1). It is easy to see why this model of a Bayesian observer is consistent with the fact that perceived speed decreases with con- trast. The width of the likelihood varies inversely with the accuracy of the measurements performed by the observer, which presumably decreases with decreasing contrast due to a decreasing signal-to-noise ratio. As illustrated in Figure 1, the shift in perceived speed towards slow velocities grows with the width of the likelihood, and thus a Bayesian model can qualitatively explain the psychophysical results .
1.2 Two Alternative Forced Choice Experiment
We would like to examine perceived speeds under a wide range of conditions in order to constrain a Bayesian model. Unfortunately, perceived speed is an internal variable, and it is not obvious how to design an experiment that would allow subjects to express it directly 1. Perceived speed can only be accessed indirectly by asking the subject to compare the speed of two stimuli. For a given trial, an ideal Bayesian observer in such a two-alternative forced choice (2AFC) experimental paradigm simply decides on the basis of the two trial estimates ^v1 (stimulus1) and ^v2 (stimulus2) which stimulus moves faster. Each estimate ^v is based on a particular measurement m. For a given stimulus with speed v, an ideal Bayesian observer will produce a distribution of estimates p(^ v|v) because m is noisy. Over trials, the observers behavior can be described by classical signal detection theory based on the distributions of the estimates, hence e.g. the probability of perceiving stimulus2 moving
1Although see  for an example of determining and even changing the prior of a Bayesian model for a sensorimotor task, where the estimates are more directly accessible.
faster than stimulus1 is given as the cumulative probability ^v2 Pcum(^ v2 > ^v1) = p(^ v2|v2) p(^ v1|v1) d^v1 d^v2 (2) 0 0 Pcum describes the full psychometric curve. Figure 2b illustrates the measured psychomet- ric curve and its fit from such an experimental situation.
2 Experimental Methods
We measured matching speeds (Pcum = 0.5) and thresholds (Pcum = 0.875) in a 2AFC speed discrimination task. Subjects were presented simultaneously with two circular patches of horizontally drifting sine-wave gratings for the duration of one second (Fig- ure 2a). Patches were 3deg in diameter, and were displayed at 6deg eccentricity to either side of a fixation cross. The stimuli had an identical spatial frequency of 1.5 cycle/deg. One stimulus was considered to be the reference stimulus having one of two different contrast values (c1=[0.075 0.5]) and one of five different speed values (u1=[1 2 4 8 12] deg/sec) while the second stimulus (test) had one of five different contrast values (c2=[0.05 0.1 0.2 0.4 0.8]) and a varying speed that was determined by an interleaved staircase procedure. For each condition there were 96 trials. Conditions were randomly interleaved, including a random choice of stimulus identity (test vs. reference) and motion direction (right vs. left). Subjects were asked to fixate during stimulus presentation and select the faster mov- ing stimulus. The threshold experiment differed only in that auditory feedback was given to indicate the correctness of their decision. This did not change the outcome of the ex- periment but increased significantly the quality of the data and thus reduced the number of trials needed.