{"title": "Characterizing response behavior in multisensory perception with conflicting cues", "book": "Advances in Neural Information Processing Systems", "page_first": 1153, "page_last": 1160, "abstract": null, "full_text": "Characterizing response behavior in\n\nmulti-sensory perception with con\ufb02icting cues\n\nRama Natarajan1\n\nIain Murray1\n\nLadan Shams2\n\nRichard S. Zemel1\n\n1Department of Computer Science, University of Toronto, Canada\n\n{rama,murray,zemel}@cs.toronto.edu\n\n2Department of Psychology, University of California Los Angeles, USA\n\nladan@psych.ucla.edu\n\nAbstract\n\nWe explore a recently proposed mixture model approach to understand-\ning interactions between con\ufb02icting sensory cues. Alternative model for-\nmulations, differing in their sensory noise models and inference methods,\nare compared based on their \ufb01t to experimental data. Heavy-tailed sen-\nsory likelihoods yield a better description of the subjects\u2019 response behavior\nthan standard Gaussian noise models. We study the underlying cause for\nthis result, and then present several testable predictions of these models.\n\n1 Introduction\n\nA natural scene contains several multi-modal sensory cues to the true underlying values of\nits physical properties. There is substantial evidence that the brain deals with the sensory\ninformation from multiple modalities simultaneously, to form a coherent and uni\ufb01ed percept\nof the world and to guide action. A major focus of multi-sensory perceptual studies has been\nin exploring the synergistic as well as modulatory interactions between individual sensory\ncues. The perceptual consequences of these interactions can be effectively explored in cases\nwhere the cues are in con\ufb02ict with each other, resulting in potentially illusory percepts such\nas the \u201cventriloquism effect\u201d [1].\n\nA well-tested hypothesis with regards to multi-sensory cue interaction is that the individual\nsensory estimates are combined in a linear fashion, weighted by their relative reliabilities.\nMost studies that expound this linear approach assume that sensory noise in the different\nmodalities are independent of each other, and that the sensory likelihoods can be well ap-\nproximated by Gaussian distributions. Under these assumptions, the maximum-likelihood\nestimator of the underlying physical variable is an af\ufb01ne combination of the sensory esti-\nmates weighted in proportion to their precisions. This linear model predicts that the vari-\nance of the posterior distribution is always lower than that of individual cues. However,\ndata from several psychophysical studies contradict this prediction, necessitating non-linear\ncomputational strategies to deal with the inputs.\n\nRecent studies [2; 3; 4; 5] have proposed a particular form of mixture model to address\nresponse behavior in situations with a large con\ufb02ict between sensory stimuli. Con\ufb02icts\narise when corresponding cues suggest very different estimates of an underlying variable.\nThe basic intuition behind these models is that large stimulus disparities might be a conse-\nquence of the stimuli having resulted from multiple underlying causal factors. We evaluate\nthe different formulations in their ability to model experimental data [6] that exhibit very\ninteresting non-linear response behavior under con\ufb02icting stimulus conditions. The formu-\nlations differ in how perceptual estimates are derived from sensory data. We demonstrate\nsome inadequacies of the current models and propose an alternative formulation that em-\nploys heavy-tailed sensory likelihoods. The proposed model not only achieves better \ufb01ts to\nnon-linear response behavior in the experimental data but also makes several quantitatively\ntestable predictions.\n\n\f2 A Mixture Model for Evaluating Cue Interactions\n\nIn this section, we present an overview of a recently proposed mixture model approach [3]\nto dealing with con\ufb02icting sensory inputs. We describe two approaches to inference under\nthis model \u2014 causal averaging and causal selection \u2014 and analyze the model predictions on\nour simulation of an auditory localization task [6].\n\nThe environmental variables of interest are the spatial locations of an auditory and visual\nstimulus, denoted by sa and sv respectively. Information about the stimuli is provided by\nnoisy sensory cues xa and xv. The model evaluates sensory cues under two discrete hypothe-\nses (C = {1, 2}) regarding the causal structure underlying the generation of the stimuli. The\nhypotheses are that the two stimuli could arise from the same (C = 1) or different (C = 2)\ncausal events. This mixture model instantiates a simple idea: if there is a common cause,\ncues are combined; otherwise they are segregated. The model is characterized by (i) the\nsensory likelihoods P (xv|sv) and P(xa|sa), (ii) the prior distributions P (sv, sa) over true\nstimulus positions and (iii) the prior over hypotheses P (C).\n\n2.1 Generating sensory data\n\nThe standard model assumes Gaussian sensory likelihoods and prior distributions. The\ntrue auditory and visual stimulus positions are assumed to be the same for C = 1, i.e.,\nsa = sv = s drawn from a zero-mean Gaussian prior distribution: s \u223c N (0, \u03c32\np) where \u03c3p\nis standard deviation of the distribution. The noisy sensory evidence xa is a sample from a\nGaussian distribution with mean sa = s and standard deviation \u03c3a: xa \u223c N (xa; sa = s, \u03c32\na).\nSimilarly for the visual evidence: xv \u223c N (xv; sv = s, \u03c32\n\nv).\n\nWhen there are C = 2 underlying causes, they are drawn independently from the zero-mean\nGaussian prior distribution: sv \u223c N (0, \u03c32\nv) and\nxa \u223c N (xa; sa, \u03c32\na). The belief in each hypothesis given the cues xa and xv is de\ufb01ned by the\nposterior distribution:\n\np). Then xv \u223c N (xv; sv, \u03c32\n\np); sa \u223c N (0, \u03c32\n\nP (C|xv , xa) =\n\nP (xv, xa|C)P (C)\n\nP (xv, xa)\n\n(1)\n\nWhen the hypotheses are discrete C = {1, 2}, the normalization constant P (xv, xa) =\nP (xv, xa|C = 1)P (C = 1) + P (xv, xa|C = 2)(1 \u2212 P (C = 1)).\n\nGiven this particular causal generative model, the conditional likelihoods in Equation 1 are\nde\ufb01ned as P (xv, xa|C = 1) = R P (xv|sv = s)P (xa|sa = s)P (s)ds and P (xv, xa|C = 2) =\nR P (xv|sv)P (sv)dsv R P (xa|sa)P (sa)dsa. The conditional sensory likelihoods are speci\ufb01ed\nas: P (xv, xa|sv, sa, C) = P (xv|sv)P (xa|sa).\n\n2.2 Inference methods\n\n2.2.1 Causal averaging\n\nThe conditional posterior over stimulus variables is calculated for each hypothesis as\nP (sv, sa|xv, xa, C = 1) and P (sv, sa|xv, xa, C = 2). The standard approach to comput-\ning the full posterior distribution of interest P (sa, sv|xa, xv) is by integrating the evidence\nover both hypotheses weighted by the posterior distribution over C (Equation 1). Such a\nmodel averaging approach to causal inference is speci\ufb01ed by the following identity:\n\nPavg(sv, sa|xv, xa) = X\n\nP (sv, sa|xv, xa, C)P (C|xv , xa)\n\nC\n\n= X\n\nC\n\nP (xv, xa|sv, sa, C)P (sv, sa|C)P (C|xv , xa)\n\nP (xv, xa|C)\n\n(2)\n\n(3)\n\nHere, P (C = 1|xv, xa) = \u03c0c is the posterior mixing proportion and (1 \u2212 \u03c0c) = P (C =\n2|xv, xa).\n\n\f2.2.2 Causal selection\n\nAn alternative approach is to calculate an approximate posterior distribution by \ufb01rst select-\ning the hypothesis C\u2217 that maximizes the posterior distribution P (C|xv, xa). Under this\nmodel selection approach, subsequent inference is based on the selected hypothesis alone.\n\nC\u2217 = argmax\nC={1,2}\n\nP (C|xv , xa)\n\nThen the posterior distribution over stimulus location is approximated as follows:\n\nPsel(sv, sa|xv, xa) \u2248 P (sv, sa|xv, xa, C = C\u2217)\n\n=\n\nP (xv, xa|sv, sa, C = C\u2217)P (sv, sa|C = C\u2217)\n\nP (xv, xa|C = C\u2217)\n\n(4)\n\n(5)\n\n(6)\n\n2.3 Evaluating the models on experimental data\n\nHere, we evaluate the causal averaging and selection models on an auditory localization\ntask [6] where visual and auditory stimuli were presented at varying spatial and temporal\ndisparities. In addition to reporting the location of the auditory target, subjects were also\nasked to report on whether they perceived the two stimuli to be perceptually uni\ufb01ed. The\nvariables examined were the bias and variance of the subjects\u2019 estimates for each stimulus\ncondition. The data exhibit very interesting non-linear response behavior (solid lines in\nFigures 1A and 1D).\n\nIn our simulation of the task, the auditory target was presented at locations {0\u25e6, 5\u25e6, 10\u25e6} left\nor right of \ufb01xation. Although the real experiment varied the \ufb01xation location from trial to\ntrial, it was found to have no effect on subsequent analyses and data were collapsed across\nall \ufb01xation locations. Hence, we assume the \ufb01xation point to be at the center of space\n(0\u25e6). The visual stimuli were assumed to be temporally coincident with the auditory stimuli\nand presented at varying spatial disparities {0\u25e6, 5\u25e6, 10\u25e6, 15\u25e6, 20\u25e6, 25\u25e6} left or right of sound.\nSensory evidence xa and xv were corrupted by Gaussian noise as described earlier.\n\nEach stimulus combination {sa, sv} was presented with equal probability 2000 times. The\nspatial axis ranged from \u221225\u25e6 to 25\u25e6 and was divided into 1\u25e6 width bins. On each trial, the\nmodel computes a posterior probability distribution over stimulus locations conditioned on\nthe noisy cues xa and xv according to one of Equations 3 or 6. It then estimates visual and\nauditory locations \u02c6sa and \u02c6sv as the peak of the posterior distribution (maximum aposteriori\nestimate): \u02c6sa = argmaxsa\n\nP (sa, sv|xa, xv).\n\nWe have simulated estimators using other criteria, such as minimizing the squared error\nof the estimates (i.e, expected value of the posterior distribution). The results were very\nsimilar using the different estimators. Percent bias is given by: \u02c6sa\u2212sa\n\u2217 100. Goodness of \ufb01t\nsv \u2212sa\nwas computed using squared error loss to quantify the amount by which model estimates\ndiffered from the behavioral data. For analysis, the trials were dichotomized into unity and\nnon-unity trials based on the perception of spatial unity. A trial was classi\ufb01ed as unity if\nthe posterior probability P (C = 1|xv, xa) was greater than some threshold \u03c1 and non-unity\notherwise.\n\nThe simulation results (i.e., the estimates \u02c6sa and \u02c6sv) were averaged across trials in each\ncategory. The parameters of the model are: 1) the stimulus location variance \u03c32\np, 2\u20133) the\nobservation variances \u03c32\nv, 4) the prior mixture proportion \u03c9 = P (C = 1), and 5) the\nunity perception threshold \u03c1. The parameter values were estimated to \ufb01t the experimental\ndata and are provided in the \ufb01gure captions.\n\na and \u03c32\n\n2.4 Simulation results for the Gaussian model\n\nFigure 1 presents predictions made by both the theoretical models. The behavioral data\n[6] (solid lines in all plots) range from spatial disparities \u221215\u25e6 to 15\u25e6; error bars represent\nstandard errors across 5 subjects. Model predictions (dashed lines) extend to a wider range\nof \u221225\u25e6 to 25\u25e6. Some of the predicted trends are similar to the behavioral data. Regard-\nless of stimulus disparity, whenever visual and auditory stimuli were perceived as unity,\n\n\fthe predicted response bias was very high (dashed gray; Figure 1A). This means that the\nauditory location was perceived to be very near to the visual stimulus. When the stimuli\nappeared to not be uni\ufb01ed, the auditory location was biased away from the visual stimu-\nlus \u2014 increasingly so as disparity decreased (dashed black; Figure 1A).\n\nA: Localisation biases\n\nB: Causal averaging model\n\nC: Causal selection model\n\ns\na\ni\nb\n \nt\nn\ne\nc\nr\ne\nP\n\n100\n80\n60\n40\n20\n0\n\u221220\n\u221240\n\u221260\n\u221280\n\u2212100\n\n \n\nUnity\nNon\u2212unity\n\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n15\nSpatial disparity sv\u2212sa (deg.)\n\n10\n\n5\n\n0\n\n \n\n20\n\n25\n\n \n\nDat Unity\nDat Non\u2212unity\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n \n\nDat Unity\nDat Non\u2212unity\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n \n\n/\n\n)\ng\ne\nd\n\u2212\n+\n(\n.\nv\ne\nd\nd\nt\nS\n\n \n\n)\ng\ne\nd\n\u2212\n+\n(\n.\n\n \n\n/\n\nv\ne\nd\nd\nt\nS\n\n \n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n15\nSpatial disparity sv\u2212sa (deg.)\n\n10\n\n0\n\n5\n\n20\n\n25\n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n15\nSpatial disparity sv\u2212sa (deg.)\n\n10\n\n5\n\n0\n\nE: Causal averaging model\n\nF: Causal selection model\n\nUnity trials\nNon\u2212unity trials\n\n \n\n20\n\n15\n\n10\n\n5\n\ns\n\nl\n\na\n\ni\nr\nt\n \nf\no\n \nt\nn\ne\nc\nr\ne\nP\n\nUnity trials\nNon\u2212unity trials\n\n20\n\n15\n\n10\n\n5\n\ns\n\nl\n\na\n\ni\nr\nt\n \nf\no\n \nt\nn\ne\nc\nr\ne\nP\n\n20\n\n25\n\n \n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nLocalisation error (deg.)\n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nLocalisation error (deg.)\n\nFigure 1: Simulation results - Gaussian sensory likelihoods: In this, and all subsequent \ufb01gures,\nsolid lines plot the actual behavioral data reported in [6] and dashed lines are the model predictions.\n(A) Localization biases in the data, plotted alongside predictions from both models. (B) Causal aver-\naging model, response variability: \u03c3a = 8, \u03c3v = 0.05, \u03c9 = 0.15. (C) Causal selection model: \u03c3a = 6,\n\u03c3v = 2.5, \u03c9 = 0.2. For both models: \u03c3p = 100, \u03c1 = 0.5. (D) Distribution of localization errors in data,\nfor sv \u2212 sa = 0; re-printed with permission from [6]. (E,F) Localization errors predicted by the causal\naveraging and causal selection models respectively.\n\nHowever, both the models exhibit one or more signi\ufb01cant differences from the experimen-\ntal observations. The predicted curves for unity trials (dashed gray; Figures 1B,C) are all\nconcave, whereas they were actually observed to be convex (solid gray lines). On non-unity\ntrials too, the predicted response variabilities (dashed black lines) are an inadequate \ufb01t to\nthe real data (solid black lines).\n\nAn additional test for the appropriateness of the models is the predictions they make with\nregards to the distribution of localisation errors. An analysis of the behavioral data de-\nrived from the spatially coincident stimulus conditions (sv \u2212 sa = 0) revealed a distinct\npattern (Figure 1D). On unity trials, localization error was 0\u25e6 implying that the responses\nwere clustered around the auditory target. On non-unity trials, the errors were bi-modally\ndistributed and failed the test for normality [6]. Causal selection predicts a qualitatively\nsimilar distribution of errors (Figure 1F), suggesting that it may be the most appropriate\ninference strategy under the given task and model assumptions.\n\n3 An Alternative Model for Sensory Likelihoods\n\n3.1 Heavy-tailed likelihood formulation\n\nIn this section, we re-formulate the sensory likelihoods P (xa|sa) and P (xv|sv) as a mixture\nof Gaussian and uniform distributions. This mixture creates a likelihood function with heavy\ntails.\n\nxv \u223c \u03c0N (xv; sv, \u03c32\n\nv) +\n\n; xa \u223c \u03c0N (xa; sa, \u03c32\n\na) +\n\n(7)\n\n(1 \u2212 \u03c0)\n\nrl\n\n(1 \u2212 \u03c0)\n\nrl\n\n3.2 Simulation results with heavy-tailed sensory likelihoods\nFigure 2 presents predictions made by the theoretical models based on heavy-tailed likeli-\nhoods. Both models now provide a much better \ufb01t to bias and variance, compared to their\n\n\fA: Localisation biases\n\nB: Causal averaging model\n\nC: Causal selection model\n\ns\na\ni\nb\n \nt\nn\ne\nc\nr\ne\nP\n\n100\n80\n60\n40\n20\n0\n\u221220\n\u221240\n\u221260\n\u221280\n\u2212100\n\nDat Unity\nDat Non\u2212unity\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n \n\n15\nSpatial disparity sv\u2212sa (deg.)\n\n10\n\n0\n\n5\n\n \n\n20\n\n25\n\n \n\nDat Unity\nDat Non\u2212unity\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n \n\nDat Unity\nDat Non\u2212unity\n\n14\n\n12\n\n10\n\n8\n\n6\n\n4\n\n2\n\n/\n\n \n\n)\ng\ne\nd\n\u2212\n+\n(\n.\nv\ne\nd\nd\nt\nS\n\n \n\n)\ng\ne\nd\n\u2212\n+\n(\n.\n\n/\n\n \n\nv\ne\nd\nd\nt\nS\n\n \n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n15\nSpatial disparity sv\u2212sa (deg.)\n\n10\n\n5\n\n0\n\n20\n\n25\n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n15\nSpatial disparity sv\u2212sa (deg.)\n\n10\n\n0\n\n5\n\nE: Causal averaging model\n\nF: Causal selection model\n\nUnity trials\nNon\u2212unity trials\n\n \n\n20\n\n15\n\n10\n\n5\n\ns\n\nl\n\na\n\ni\nr\nt\n \nf\no\n \nt\nn\ne\nc\nr\ne\nP\n\nUnity trials\nNon\u2212unity trials\n\n20\n\n15\n\n10\n\n5\n\ns\n\nl\n\na\n\ni\nr\nt\n \nf\no\n \nt\nn\ne\nc\nr\ne\nP\n\n20\n\n25\n\n \n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nLocalisation error (deg.)\n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nLocalisation error (deg.)\n\nFigure 2: Simulation results - heavy-tailed likelihoods: (A) Localization biases in the data, plotted\nalongside model predictions. (B) Causal averaging model, response variability: \u03c3a = 3.5, \u03c3v = 2. (C)\nCausal selection model: \u03c3a = 5, \u03c3v = 2.5. In both models, \u03c3p = 100, \u03c9 = 0.2, \u03c1 = 0.5, rl = 180\u25e6. (D)\nDistribution of localization errors in data, for sv \u2212 sa = 0. (E,F) Localization errors predicted by the\nheavy-tailed causal averaging and causal selection models.\n\nGaussian counterparts. The heavy-tailed causal averaging model (Figure 2B) makes rea-\nsonable predictions with regards to variability. However, both the amount and the trend of\npredicted biases for non-unity trials (dotted line; 2A) do not match observations.\n\nHere too, the best-\ufb01tting model is causal selection (dashed line; Figures 2A,C). The localiza-\ntion error distribution (Figure 2F) very closely matches the true observations (Figure 2D) in\nhow the unity responses are uni-modally distributed about the target location sa, and non-\nunity responses are bi-modally distributed either side of the target. Visually, this is a better\nprediction of the true distribution of errors, compared to the prediction made by the Gaus-\nsian causal selection model (Figure 1F); we are unable to make a quantitative comparison\nfor want of access to the raw data.\n\nCompared with the results in Figure 1, our models make very different bias and variance\npredictions for spatial disparities not tested. This is discussed in detail in Section 4. The\nheavy-tailed likelihood model has two more free parameters (rp and mixing proportion \u03c0;\nEquation 7) than the Gaussian, which is essentially a subset of the heavy-tailed mixture\nwhen \u03c0 = 1. Although the Gaussian model may be preferred for its computational sim-\nplicity, it is a demonstrably poor \ufb01t to the data and the heavy-tailed model is a worthwhile\nimprovement.\n\n3.3 Analyzing the likelihood models\n\nExistence of the heavy tails in the likelihood function seems to be a critical feature that\nsupports the non-linear behavior in the data. We substantiate this suggestion using Figure\n3, and attempt to give some intuition behind the qualitative differences in variability and\nbias between Figures 1 and 2. The discussion below focuses on 3 disparity conditions. The\ncongruent case |sv \u2212 sa| = 0 is chosen for reference; |sv \u2212 sa| = 10 and |sv \u2212 sa| = 25 are\nchosen since the Gaussian and heavy-tailed models tend to differ most in their predictions\nat these disparities.\n\nLet us \ufb01rst consider the unity case.\nIn general, most of the samples on unity trials are\nfrom the region of space where both the auditory and visual likelihoods overlap. When\ntrue disparity |sv \u2212 sa| = 0, it means that the two likelihoods overlap maximally (Figures\n3Aii and 3Cii). Hence regardless of the form of the likelihood, variability on unity trials at\n|sv \u2212 sa| = 0 should be roughly between \u03c3v and \u03c3a. This can be veri\ufb01ed in Figures 1C, 2C.\n\n\fA: Gaussian likelihoods, unity\ni.sv\u2212sa=10\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nii.sv\u2212sa=0\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\niii.sv\u2212sa=\u221225\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nxa\u2212sa (deg.)\n\ns\ne\nl\np\nm\na\ns\n \nf\no\n \nr\ne\nb\nm\nu\nN\n\n100\n\n50\n\n0\n\u221225\n\n100\n\n50\n\n0\n\u221225\n\n100\n\n50\n\n0\n\u221225\n\n100\n\n50\n\n0\n\u221225\n\n100\n\n50\n\n0\n\u221225\n\n100\n\n50\n\n0\n\u221225\n\nB: Gaussian likelihoods, non\u2212unity\ni.sv\u2212sa=10\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\nii.sv\u2212sa=0\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\niii.sv\u2212sa=\u221225\n\ns\n25\ne\nl\np\nm\na\ns\n \nf\no\n \nr\ne\nb\n25\nm\nu\nN\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nxa\u2212sa (deg.)\n\nC: Heavy\u2212tailed likelihoods, unity\n\nD: Heavy\u2212tailed likelihoods, non\u2212unity\n\n200\n\n150\n\n100\n\n50\n\n0\n\u221225\n\n200\n\n150\n\n100\n\n50\n\n0\n\u221225\n\n200\n\n150\n\n100\n\n50\n\n0\n\u221225\n\ni.sv\u2212sa=10\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nii.sv\u2212sa=0\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\niii.sv\u2212sa=\u221225\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nxa\u2212sa (deg.)\n\n200\n\n150\n\n100\n\n50\n\n0\n\u221225\n\n200\n\n150\n\n100\n\n50\n\n0\n\u221225\n\n200\n\n150\n\n100\n\n50\n\n0\n\u221225\n\ni.sv\u2212sa=10\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nii.sv\u2212sa=0\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\niii.sv\u2212sa=\u221225\n\n\u221220\n\n\u221215\n\n\u221210\n\n\u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nxa\u2212sa (deg.)\n\nFigure 3: Analyzing the likelihood models: Results from the causal selection models. In all plots,\nlight-gray histograms are samples xv from visual likelihood distribution; dark-gay histograms plot\nxa. Black histograms are built only from samples xa on which either unity (A,C) or non-unity (B,D)\njudgment was made. Each panel corresponds to one of three chosen disparities; histograms in the\npanel plot samples from all stimulus conditions that correspond to that particular disparity.\n\nNow one of the biggest differences between the likelihood models is what happens to this\nvariability as |sv \u2212 sa| increases.\nIn the case of the Gaussian, the amount of overlap be-\ntween the two likelihoods decreases (Figures 3Ai,3Aiii). Consequently, the samples are\nfrom a somewhat smaller region in space and hence the variability also decreases. This\ncorresponds to the concave curves predicted by the Gaussian model (Figures 1C; dashed\ngray). Whereas for the heavy-tailed likelihood, the overlapping regions roughly increase\nwith increasing disparity, due to the long tails (Figures 3Ci,3Ciii). This is re\ufb02ected in the\ngradually increasing variability on unity trials corresponding to the better matching convex\ncurves predicted by the heavy-tailed model (Figure 2C).\n\nOn the non-unity trials, most of the samples are from non-overlapping regions of space.\nHere, the biggest difference between the likelihood models is that in the Gaussian case, after\na certain spatial limit, the variability tends to increase with increasing |sv \u2212 sa|. We also see\nthis trend in simulation results presented in [2; 4]. This is because as disparity increases, the\ndegree of overlap between two likelihoods decreases and variability approaches \u03c3a (Figures\n3Bi,3Biii). However, the behavior in the real data suggests that variability continues to be a\nconstant. With heavy-tailed likelihoods, the tails of the two likelihoods continue to overlap\neven as disparity increases; hence the variability is roughly constant (Figures 3Di,3Diii).\n\n4 Model Predictions\n\nQuantitative predictions \u2014 variance and bias: Our heavy-tailed causal selection model\nmakes two predictions with regards to variability and bias for stimulus conditions not yet\ntested. One prediction is that on non-unity trials, as spatial disparity sv \u2212 sa increases,\nthe localisation variability continues to remain constant at roughly a value equivalent to\nthe standard deviation of the auditory likelihood (Figure 2C; black dashed plot). However,\nresponse percent bias approaches zero (Figure 2A; black dashed plot), indicating that when\nspatial disparity is very high and the stimuli are perceived as being independent, auditory\nlocalisation response is consistent with auditory dominance.\n\nA second prediction is that percent bias gradually decreases with increasing disparity on\nunity trials as well. This suggests that even when highly disparate stimuli are perceived as\nbeing uni\ufb01ed, perception may be dominated by the auditory cues. Our results also predict\nthat the variability in this case continues to increase very gradually with increasing disparity\nup to some spatial limits (|sv\u2212sa| = 20\u25e6 in our simulations) after which it begins to decrease.\nThis accords with intuition, since for very large disparities, the number of trials in which the\nthe stimuli are perceived as being uni\ufb01ed will be very small.\n\nQualitative prediction \u2014 distribution of localization errors: Our model also makes a\nqualitative prediction concerning the distribution of localisation errors for incongruent (sv \u2212\nsa 6= 0) stimulus conditions. In both Figures 4A and B, localization error on unity trials is\nequivalent to the stimulu disparity sv \u2212 sa = 10\u25e6, indicating that even at this high disparity,\nresponses are cluttered closer to the visual stimulus location. On non-unity trials, the error\n\n\fis about 5\u25e6 here; responses are more broadly distributed and the bias is highly reduced.\nThe Gaussian and heavy-tailed predictions differ in how quickly the error distributions go\nto zero.\n\nC: Heavy\u2212tailed predictions: variability\n8\n\n \n\n \n\n100\n\nD: Heavy\u2212tailed predictions: biases\n\n \n\nData\nCausal averaging\nCausal selection\n\nA: Gaussian predictions (Psel)\nUnity trials\nNon\u2212unity trials\n\nsv\u2212sa=10\n\n \n\n20\n\ns\n\nl\n\na\n\ni\nr\nt\n \nf\no\n \nt\nn\ne\nc\nr\ne\nP\n\n15\n\n10\n\n5\n\n20\n\n15\n\n10\n\n5\n\ns\n\nl\n\na\n\ni\nr\nt\n \nf\no\n \nt\nn\ne\nc\nr\ne\nP\n\nB: Heavy\u2212tailed predictions (Psel)\n\nUnity trials\nNon\u2212unity trials\n\nsv\u2212sa=10\n\n)\ng\ne\nd\n\u2212\n+\n(\n \n\n \n\n/\n\nv\ne\nd\n\n \n.\n\nd\nt\nS\n\n7\n\n6\n\n5\n\n4\n\n3\n\n2\n\ns\na\ni\nb\n \nt\nn\ne\nc\nr\ne\nP\n\n80\n\n60\n\n40\n\n20\n\nCausal averaging\nCausal selection\n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n0\n\n5\n\n10\n\n15\n\n20\n\n25\n\nLocalisation error (deg.)\n\n \n\n0\n\u221225 \u221220 \u221215 \u221210 \u22125\n\n0\n\n5\n\n10\n\nLocalisation error (deg.)\n\n15\n\n20\n\n25\n\n \n\n1\n\u221220\n\n\u221210\n10\nSpatial disparity sv\u2212sa (deg.)\n\n0\n\n20\n\n \n\n\u221220\n\n\u221215\nSpatial disparity sv\u2212sa (deg.)\n\n\u221210\n\n\u22125\n\n0\n\nFigure 4: Model predictions: (A,B) Localization error distributions predited by the Gaussian and\nheavy-tailed causal selection models. Plots correspond to stimulus condition sv = 20;sa = 10. (C,D)\nResponse variability and bias predicted by they heavy-tailed causal averaging and selection models on\nsimulation of an audio-visual localization task [3].\n\nSpeci\ufb01city to experimental task: In the experimental task we have examined here [6],\nsubjects were subjects were asked to \ufb01rst indicate the perceived location of sound on each\ntrial and then to report their judgement of unity. The requirement to explicitly make a unity\njudgement may incur an experimental bias towards the causal selection model.\n\nTo explore the potential in\ufb02uence of task instructions on subjects\u2019 inference strategy, we\ntested our models on a simulation of a different audio-visual spatial localisation task [3].\nHere, subjects were asked to report on both visual and auditory stimulus locations and were\nnot explicitly instructed to make unity judgements. The authors employed model averaging\nto explain the results [3] and the data were found to have a very high likelihood under their\nmodel. However, they do not analyse variability in the subjects\u2019 responses and this aspect of\nbehavior as a function of spatial disparity is not readily obvious in their published data.\n\nWe evaluated both our heavy-tailed causal averaging as well as causal selection models on\na simulation of this experiment. The two models make very different predictions. Causal\naveraging predicts that response variability will monotonically increase with increasing dis-\nparity, while selection predicts a less straightforward trend (Figure 4C). Both models predict\na similar amount of response bias and that it will decrease with increasing disparity (Figure\n4C). This particular prediction is con\ufb01rmed by the response bias in their behavioral data plot\nmade available in [3]. Considering the paradigmatic differences between the two studies\n([6] and [3]) and the wide range in bias, applying both inference methods and likelihood\nmodels on this data could be very informative.\n\nAdaptation of the prior: One interesting aspect of inference under this generative model is\nthat as the value of \u03c9 = P (C = 1) increases, the variability also increases for both unity and\nnon-unity trials across all disparities. However, the response bias remains unchanged. Given\nthis correlation between response variability and the prior over hypotheses, our approach\nmay be used to understand whether and how subjects\u2019 priors change during the course of\nan experimental session. Considering that the best value across all trials for this prior is\nquite small (\u03c9 \u223c 0.2), we hypothesize that this value will be quite high at the start of an\nexperiment, and gradually reduce. This hypothesis leads to a prediction that variability\ndecreases during an experimental session.\n\n5 Discussion\n\nIn this paper, we ventured to understand the computational mechanisms underlying sensory\ncue interactions that give rise to a particular pattern of non-linear response behavior [6],\nusing a mixture of two different models that could have generated the sensory data. We\nproposed that the form of the sensory likelihood is a critical feature that drives non-linear\nbehavior, especially at large stimulus disparities.\nIn particular, a heavy-tailed likelihood\nfunction more accurately \ufb01ts subjects\u2019 bias and variance in a cue combination task.\n\nHeavy-tailed distributions have been used previously in modeling cue interactions [7; 8].\nIn this paper, we went further by comparing the ability of heavy-tailed and Gaussian like-\n\n\flihood models to describe behavior. Qualitative \ufb01ts of summarised statistics such as bias\nand variance are insuf\ufb01cient to make any strong claims about human perceptual processes;\nnevertheless, this work provides some insight into the potential functional role of sensory\nnoise.\n\nAnother signi\ufb01cant contribution in this paper is the critical evaluation of model selection\nversus averaging approaches to inference. These two inference methods may predict differ-\nent variances in their estimates, as a function of stimulus con\ufb02ict. As suggested in Section\n4, having these different models at hand allows one to examine how task instructions affect\nsubject behavior.\n\nWe noted in Section 3.2 that the heavy-tailed model is more complex than the Gaussian\nmodel. Although we have not included any complexity penalty, this formulation was sup-\nported by two aspects: (i) it was relatively insensitive to parameter settings, providing a\nbetter \ufb01t to the data than the Gaussian model for a wide range of parameter values; (ii)\noptimizing the \ufb01t of the Gaussian model required implausible values for parameters \u03c3a, \u03c3v\n(Fig 1B), whereas parameters for the heavy-tailed model accorded well with published data.\n\nOne downside about our results is that even though the model bias for unity trials captures\nthe slightly increasing trend as disparity decreases, it is not as large as in the behavioral data\n(close to 100%) or as that predicted by the Gaussian models. This does not seem to be a\nconsequence of the parameter values chosen. One interpretation provided by [6] of the large\nbias in the data is that a perceptual decision (unity or non-unity) determines a sensorimotor\naction (localization response). Then one response strategy might be to ignore the posterior\nprobability P (sa|xv, xa) once unity is judged and then set \u02c6sa = \u02c6sv; although this results in\nprediction of higher bias, the strategy is not Bayes-optimal. Yet another potential limitation\nof our approach is that the only form of noise we consider is sensory; we do not yet take\ninto account any motor component that may drive target localization.\n\nCurrently, we have access to only an estimate of the average variance in subjects\u2019 auditory\ntarget location estimates. On the computational side, one interesting avenue for future work\nwould be to evaluate the model averaging and selection hypothesis based on a likelihood\nmodel derived directly from the raw data. On the experimental side, one of the major in-\nadequacies of most experimental paradigms is that the only (approximate) measure of a\nsubject\u2019s perceptual uncertainty involves measuring the response variability across a large\nnumber of trials. An alternative paradigm that allows measurement of the perceptual un-\ncertainty on a single trial could provide important constraints on computational models of\nthe perceptual phenomena. At the neural level, a key step entails exploring biologically\nplausible neural implementations of the mixture model approach.\n\nAcknowledgments\n\nThe authors would like to thank National Sciences and Engineering Research Council of\nCanada and Canadian Institute For Advanced Research (RN and RZ), the government of\nCanada (IM), UCLA Faculty Grants Program and UCLA Faculty Career Development (LS).\n\nReferences\n\n[1] I P Howard and W B Templeton. Human spatial orientation. Wiley, New York, 1966.\n[2] Konrad P K\u00a8ording and Joshua B Tenenbaum. Causal inference in sensorimotor integration. In\n\nNIPS, pages 737\u2013744. MIT Press, 2006.\n\n[3] Konrad P K\u00a8ording, Ulrik Beierholm, Wei Ji Ma, Steven Quartz, Joshua B Tenenbaum, and Ladan\n\nShams. Causal inference in multisensory perception. PLoS ONE, 2(9), 2007.\n\n[4] Y Sato, T Toyoizumi, and K Aihara. Bayesian inference explains perception of unity and ventrilo-\n\nquism aftereffect. Neural Comp., 19:3335\u201355, 2007.\n\n[5] Alan Stocker and Eero Simoncelli. A Bayesian model of conditioned perception.\n\nIn NIPS 20,\n\npages 1409\u20131416. MIT Press, Cambridge, MA, 2008.\n\n[6] MT Wallace, GE Roberson, WE Hairston, BE Stein, JW Vaughan, and JA Schirillo. Unifying\n\nmultisensory signals across time and space. Exp Brain Res., 158(2):252\u20138, 2004.\n\n[7] David C Knill. Robust cue integration: A Bayesian model and evidence from cue-con\ufb02ict studies\n\nwith stereoscopic and \ufb01gure cues to slant. Journal of Vision, 7(7):1\u201324, 2007.\n\n[8] Alan A Stocker and Eero P Simoncelli. Noise characteristics and prior expectations in human\n\nvisual speed perception. Nat. Neurosci., 9:578\u2013585, 2006.\n\n\f", "award": [], "sourceid": 3468, "authors": [{"given_name": "Rama", "family_name": "Natarajan", "institution": null}, {"given_name": "Iain", "family_name": "Murray", "institution": null}, {"given_name": "Ladan", "family_name": "Shams", "institution": null}, {"given_name": "Richard", "family_name": "Zemel", "institution": null}]}