{"title": "A Maximum-Likelihood Approach to Modeling Multisensory Enhancement", "book": "Advances in Neural Information Processing Systems", "page_first": 181, "page_last": 187, "abstract": null, "full_text": "A Maximum-Likelihood Approach to \nModeling Multisensory Enhancement \n\nHans Colonius* \n\nInstitut fUr Kognitionsforschung \nCarl von Ossietzky Universitat \n\nOldenburg, D-26111 \n\nAdele Diederich \n\nSchool of Social Sciences \n\nInternational University Bremen \n\nBremen, D-28725 \n\nhans. colonius@uni-oldenburg.de \n\na. diederich @iu-bremen.de \n\nAbstract \n\nMultisensory response enhancement (MRE) is the augmentation of \nthe response of a neuron to sensory input of one modality by si(cid:173)\nmultaneous input from another modality. The maximum likelihood \n(ML) model presented here modifies the Bayesian model for MRE \n(Anastasio et al.) by incorporating a decision strategy to maximize \nthe number of correct decisions. Thus the ML model can also deal \nwith the important tasks of stimulus discrimination and identifi(cid:173)\ncation in the presence of incongruent visual and auditory cues. It \naccounts for the inverse effectiveness observed in neurophysiolog(cid:173)\nical recording data, and it predicts a functional relation between \nuni- and bimodal levels of discriminability that is testable both in \nneurophysiological and behavioral experiments. \n\n1 \n\nIntroduction \n\nIn a typical environment stimuli occur at various positions in space and time. In \norder to produce a coherent assessment of the external world an individual must \nconstantly discriminate between signals relevant for action planning (targets) and \nsignals that need no immediate response (distractors). Separate sensory channels \nprocess stimuli by modality, but an individual must determine which stimuli are \nrelated to one another, i.e., it is must construct a perceptual event by integrating \ninformation from several modalities. For example, stimuli that occur at the same \ntime and space are likely to be interrelated by a common cause. However, if the \nvisual and auditory cues are incongruent, e.g., when dubbing one syllable onto \na movie showing a person mouthing a different syllable, listeners typically report \nhearing a third syllable that represents a combination of what was seen and heard \n(McGurk effect, cf. [1]). This indicates that cross-modal synthesis is particularly \nimportant for stimulus identification and discrimination, not only for detection. \n\nEvidence for multisensory integration at the neural level has been well documented \nin a series of studies in the mammalian midbrain by Stein, Meredith and Wallace \n(e.g., [2] ; for a review, see [3]). The deep layers of the superior colliculus (DSC) \n\n\u2022 www.uni-oldenburg.de/psychologie /hans.colonius /index.html \n\n\fintegrate multisensory input and trigger orienting responses toward salient targets. \nIndividual DSC neurons can receive inputs from multiple sensory modalities (visual, \nauditory, and somatosensory), there is considerable overlap between the receptive \nfields of these individual multisensory neurons, and the number of neural impulses \nevoked depends on the spatial and temporal relationships of the multisensory stim(cid:173)\nuli. \n\nMultisensory response enhancement refers to the augmentation of the response of \na DSC neuron to a multisensory stimulus compared to the response elicited by \nthe most effective single modality stimulus. A quantitative measure of the percent \nenhancement is \n\nMRE = CM - SMmax x 100, \n\nSMmax \n\n(1) \n\nwhere CM is the mean number of impulses evoked by the combined-modality stim(cid:173)\nulus in a given time interval, and S Mmax refers to the response of the most effective \nsingle-modality stimulus (cf. [4]). Response enhancement in the DSC neurons can \nbe quite impressive, with values of M RE sometimes reaching values above 1000. \nTypically, this enhancement is most dramatic when the unimodal stimuli are weak \nand/or ambiguous, a principle referred to in [4] as \"inverse effectiveness\" . \n\nSince DSC neurons play an important role in orienting responses (like eye and \nhead movements) to exogenous target stimuli, it is not surprising that multisensory \nenhancement is also observed at the behavioral level in terms of, for example, a \nlowering of detection thresholds or a speed-up of (saccadic) reaction time (e.g., \n[5], [6], [7]; see [8] for a review) . Inverse effectiveness makes intuitive sense in the \nbehavioral situation: the detection probability for a weak or ambiguous stimulus \ngains more from response enhancement by multisensory integration than a high(cid:173)\nintensity stimulus that is easily detected by a single modality alone. \n\nA model of the functional significance of multisensory enhancement has recently \nbeen proposed by Anastasio, Patton, and Belkacem-Boussaid [9]. They suggested \nthat the responses of individual DSC neurons are proportional to the Bayesian \nprobability that a target is present given their sensory inputs. Here, this Bayesian \nmodel is extended to yield a more complete account of the decision situation an \norganism is faced with. As noted above, in a natural environment an individual is \nconfronted with the task of discriminating between stimuli important for survival \n(\" targets\") and stimuli that are irrelevant (\" distractors\") . Thus, an organism must \nnot only keep up a high rate of detecting targets but, at the same time, must strive \nto minimize \"false alarms\" to irrelevant stimuli. An optimally adapted system will \nbe one that maximizes the number of correct decisions. It will be shown here that \nthis can be achieved already at the level of individual DSC neurons by appealing \nto a maximum-likelihood principle, without requiring any more information than is \nassumed in the Bayesian model. \n\nThe next section sketches the Bayesian model by Anastasio, Patton, and Belkacem(cid:173)\nBoussaid (Bayesian model, for short), after which a maximum-likelihood model of \nmultisensory response enhancement will be introduced. \n\n2 The Bayesian Model of Multisensory Enhancement \n\nDSC neurons receive input from the visual and auditory systems elicited by stimuli \noccurring within their receptive fields! According to the Bayesian model, these vi-\n\ni An extension to the trimodal situation, including somatosensory input, could be easily \n\nattained in the models discussed here. \n\n\fsual and auditory inputs are represented by random variables V and A, respectively. \nA binary random variable T indicates whether a signal is present (T = 1) or not \n(T = 0) . The central assumption of the model is that a DSC neuron computes the \nBayesian (posterior) probability that a target is present in its receptive field given \nits sensory input: \n\nP(T = \n\nI V = A = \n1 \n\nv, \n\n) = P(V = v, A = a I T = I)P(T = 1) \na \n\nP(V = v, A = a) \n\n, \n\n(2) \n\nwhere v and a denote specific values of the sensory input variables. Analogous \nexpressions hold for the two unimodal situations. The response of the DSC neuron \n(number of spikes in a unit time interval) is postulated to be proportional to these \nprobabilities. \nIn order to arrive at quantitative predictions two more specific assumptions are \nmade: \n\n(1) the distributions of V and A, given T = 1 or T = 0, are conditionally indepen(cid:173)\n\ndent, i.e., \n\nP(V = v, A = a I T) = P(V = v IT) P(A = a I T) \n\nfor any v, a; \n\n(2) the distribution of V , given T = 1 or T = 0, is Poisson with Al or Ao , resp. , \nand the distribution of A, given T = 1 or T = 0, is Poisson with {-tl or {-to, \nresp. \n\nThe conditional independence assumption means that the visibility of a target in(cid:173)\ndicates nothing about its audibility, and vice-versa. The choice of the Poisson \ndistribution is seen as a reasonable first approximation that requires only one single \nparameter per distribution. Finally, the computation of the posterior probability \nthat a target is present requires specification of the a-priori probability of a target, \nP(T = 1). \nThe parameters Ao and {-to denote the mean intensity of the visual and auditory \ninput, resp., when no target is present (spontaneous input) , while Al and {-tl are \nthe corresponding mean intensities when a target is present (driven input). By an \nappropriate choice of parameter values, Anastasio et al. [9] show that the Bayesian \nmodel reproduces values of multisensory response enhancement in the order of mag(cid:173)\nnitude observed in neurophysiological experiments [10]. In particular, the property \nof inverse effectiveness, by which the enhancement is largest for combined stimuli \nthat evoke only small unimodal responses, is reflected by the model. \n\n3 The Maximum Likelihood Model of Multisensory \n\nEnhancement \n\n3.1 The decision rule \n\nThe maximum likelihood model (ML model, for short) incorporates the basic deci(cid:173)\nsion problem an organism is faced with in a typical environment: to discriminate \nbetween relevant stimuli (targets), i.e. , signals that require immediate reaction, and \nirrelevant stimuli (distractors), i.e., signals that can be ignored in a given situa(cid:173)\ntion. In the signal-detection theory framework (cf. [11]) , P(Yes I T = 1) denotes the \nprobability that the organism (correctly) decides that a target is present (hit), while \nP(Yes I T = 0) denotes the probability of deciding that a target is present when in \n\n\ffact only a distractor is present (false alarm). In order to maximize the probability \nof a correct response, \n\nP(C) = P(Yes I T = 1) P(T = 1) + [1- P(Yes I T = O)]P(T = 0), \n\n(3) \nthe following maximum likelihood decision rule must be adopted (cf. [12]) for , e.g., \nthe unimodal visual case: \nIf P(T = 11 V = v) > P(T = 0 I V = v), then decide \"Yes\", otherwise decide \" No\" . \nThe above inequality is equivalent to \n\nP(T=IIV=v) P(T=I)P(v=vIT=I) \nP(T = 0 I V = v) \n\nP(T = 0) P(V = v IT = 0) > 1, \n\nwhere the right-most ratio is a function of V , L(V), the likelihood ratio. Thus, the \nabove rule is equivalent to: \nIf L(v) > 1 - P \n, \n\n, \nthen decide \"Yes\" otherwise decide \"No\" \n\nP \n\n, \n\nwith p = P(T = 1). \n\nSince L(V) is a random variable, the probability to decide \"Yes\" , given a target is \npresent, is \n\nP (Yes I T = 1) = P (L(V) > 1; PIT = 1) . \n\nAssuming Poisson distributions, this equals \n\nP (exP(Ao - Ad U~) v > ~ I T = 1) \n\nwith \n\n= P(V > ciT = 1), \n\nIn (l;P) + Al - AO \n\nc=---'--------'-----;-----;---\n\nIn U~) \n\nIn analogy to the Bayesian model, the ML model postulates that the response \nof a DSC neuron (number of spikes in a unit time interval) to a given target is \nproportional to the probability to decide that a target is present computed under \nthe optimal (maximum likelihood) strategy defined above. \n\n3.2 Predictions for Hit Probabilities \n\nIn order to compare the predictions of the ML model for unimodal vs. bimodal \ninputs, consider the likelihood ratio for bimodal Poisson input under conditional \nindependence: \n\nL(V, A) \n\nP(V = v, A = a I T = 1) \nP(V = v, A = a I T = 0) \nexp(Ao _ Ad (~~) v exp(po _ pd (~~) A \n\nThe probability to decide \"Yes\" given bimodal input amounts to, after taking log(cid:173)\narithms, \n\nP (In (~~) V + In (~~) A > In (1; p) + Al - AO + PI -Po IT = 1) \n\n\fTable 1: Hit probabilities and MRE for different bimodal inputs \n\nMean Driven Input \n\nProb (Hit) \n\nLow \n\nMedium \n\nHigh \n\nAl \n\n6 \n7 \n8 \n8 \n8 \n\n12 \n12 \n\n16 \n16 \n\nJ.Ll \n\n7 \n7 \n8 \n9 \n10 \n\n12 \n13 \n\n16 \n20 \n\nV Driven A Driven V A Driven MRE \n\n.000 \n.027 \n.112 \n.112 \n.112 \n\n.652 \n.652 \n\n.873 \n.873 \n\n.027 \n.027 \n.112 \n.294 \n.430 \n\n.652 \n.748 \n\n.873 \n.961 \n\n.046 \n.117 \n.341 \n.528 \n.562 \n\n.872 \n.895 \n\n.984 \n.990 \n\n704 \n335 \n204 \n79 \n31 \n\n33 \n20 \n\n13 \n3 \n\nNote: A-priori target probability is set at p = O.l. Visual and auditory inputs have \nspontaneous means of 5 impulses per unit time. V Driven (A Driven, V A Driven) columns \nrefer to the hit probabilities given a unimodal visual (resp. auditory, bimodal) target. \nMultisensory response enhancement (last column) is computed using Eq. (1) \n\nFor Ad Ao = J.Ld J.Lo this probability is computed directly from the Poisson distri(cid:173)\nbution with mean (AI + J.Ld. Otherwise, hit probabilities follow the distribution \nof a linear combination of two Poisson distributed variables. Table 1 presents2 hit \nprobabilities and multisensory response enhancement values for different levels of \nmean driven input. Obviously, the ML model imitates the inverse effectiveness re(cid:173)\nlation: combining weak intensity unimodal stimuli leads to a much larger response \nenhancement than medium or high intensity stimuli. \n\n3.3 Predictions for discriminability measures \n\nThe ML model allows to assess the sensitivity of an individual DSC neuron to dis(cid:173)\ncriminate between target and distract or signals. Intuitively, this sensitivity should \nbe a (decreasing) function of the amount of overlap between the driven and the \nspontaneous likelihood (e.g., P(V = v IT = 1) and P(V = v I T = 0)). One possible \nappropriate measure of sensitivity for the Poisson observer is (cf. [12]) \n\nAl - Ao \n\nJ.Ll - J.Lo \n\nDy = (AI AO)I /4 and DA = (J.LIJ.LO)l /4 \n\n(4) \n\nfor the visual and auditory unimodal inputs, resp. A natural choice for the bimodal \nmeasure of sensitivity then is \n\nD \n\n(AI + J.Ll) - (J.Lo + Ao) \n\ny A = [(A I + J.Ld(Ao + J.Lo)Jl/4 . \n\n(5) \n\nNote that, unlike the hit probabilities, the relative increase in discriminability by \ncombining two unimodal inputs does not decrease with the intensity of the driven \ninput (see Table 2). Rather, the relation between bimodal and unimodal discrim(cid:173)\ninability measures for the input values in Table 2 is approximately of Euclidean \n2For input combinations with >'1 =I- J.t1 hit probabilities are estimated from samples of \n\n1,000 pseudo-random numbers. \n\n\fTable 2: Discriminability measure values and % increase for different bimodal inputs \n\nMean Driven Input Discriminability Value \n\nAl \n\n7 \n8 \n8 \n\n12 \n16 \n16 \n\nJ.Ll \n\n7 \n8 \n10 \n\n12 \n16 \n20 \n\nDv DA \n\nDVA \n\n% Increase \n\n.82 \n.82 \n1.19 1.19 \n1.19 1.88 \n\n2.52 \n2.52 \n3.68 3.68 \n3.68 4.74 \n\n1.16 \n1.69 \n2.18 \n\n3.57 \n5.20 \n5.97 \n\n41 \n41 \n16 \n\n41 \n41 \n26 \n\nNote: Visual and auditory inputs have spontaneous means of 5 impulses per unit time. \n% Increase of Dv A over Dv and DA (last column) is computed in analogy to Eq. (1) \n\ndistance form: \n\n(6) \n\nFor Al = J.Ll this amounts to Dv A = V2Dv yielding the 41 % increase in discrim(cid:173)\ninability. The fact that the discriminability measures do not follow the inverse \neffectiveness rule should not be not surprising: whether two stimuli are easy or \nhard to discriminate depends on their signal-to-noise ratio, but not on the level of \nintensity. \n\n4 Discussion and Conclusion \n\nThe maximum likelihood model of multisensory enhancement developed here as(cid:173)\nsumes that the response of a DSC neuron to a target stimulus is proportional to \nthe hit probability under a maximum likelihood decision strategy. Obviously, no \nclaim is made here that the neuron actually performs these computations, only that \nits behavior can be described approximately in this way. Similar to the Bayesian \nmodel suggested by Anastasio et al. [9], the neuron's behavior is solely based on \nthe a-priori probability of a target and the likelihood function for the different \nsensory inputs. The ML model predicts the inverse effectiveness observed in neu(cid:173)\nrophysiological experiments. Moreover, the model allows to derive a measure of \nthe neuron's ability to discriminate between targets and non-targets. It makes spe(cid:173)\ncific predictions how un i- and bimodal discriminability measures are related and, \nthereby, opens up further avenues for testing the model assumptions . \n\nThe ML model, like the Bayesian model, operates at the level of a single DSC \nneuron. However, an extension of the model to describe multisensory population \nresponses is desirable: First, this would allow to relate the model predictions to \nnumerous behavioral studies about multisensory effects (e.g., [13], [14]), and, second, \nas a recent study by Kadunce et al. \n[15) suggests, the effects of multisensory \nspatial coincidence observed in behavioral experiments may only be reconcilable \nwith the degree of spatial resolution achievable by a population of DSC neurons \nwith overlapping receptive fields. Moreover, this extension might also be useful to \nrelate behavioral and single-unit recording results to recent findings on multisensory \nbrain areas using functional imaging techniques (e.g., King and Calvert [16]). \n\n\fAcknowledgments \n\nThis research was partially supported by a grant from Deutsche Forschungs(cid:173)\ngemeinschaft-SFB 517 Neurokognition to the first author. \n\nReferences \n\n[1] McGurk, H. & MacDonald, J . (1976). Hearing lips and seeing voices. Nature, 264, \n746-748. \n\n[2] Wallace, M. T ., Meredith, M. A., & Stein, B. E. (1993) . Converging influences from \nvisual, auditory, and somatosensory cortices onto output neurons of the superior colliculus. \nJournal of N europhysiology, 69, 1797-1809. \n\n[3] Stein, B. E., & Meredith, M. A. (1996). The merging of the senses. Cambridge, MA: \nMIT Press. \n\n[4] Meredith, M. A. & Stein, B. E. (1986a). Spatial factors determine the activity of \nmultisensory neurons in cat superior colliculus. Brain R esearch, 365(2), 350-354. \n\n[5] Frens, van Opstal, & van der Willigen (1995) . Spatial and temporal factors deter(cid:173)\nmine auditory-visual interactions in human saccadic eye movements. Perception fj Psy(cid:173)\nchophysics, 57, 802-816. \n\n[6] Colonius, H. & Arndt , P. A. (2001). A two stage-model for visual-auditory interaction \nin saccadic latencies. Perception fj Psychophysics, 63, 126-147. \n\n[7] Stein, B. E. , Meredith, M. A., Huneycutt, W. S. , & McDade, L. (1989). Behavioral \nindices of multisensory integration: Orientation to visual cues is affected by auditory \nstimuli. Journal of Cognitive N eurosciences, 1, 12-24. \n\n[8] Welch, R. B., & Warren, D. H. (1986). Intersensory interactions. In K R. Boff, L. \nKaufman , & J. P. Thomas (eds.), Handbook of perception and human performance, Volum e \nI : Sensory process and perception (pp. 25-1-25-36) New York: Wiley \n\n[9] Anastasio\" T. J., Patton, P. E. , & Belkacem-Boussaid, K \n(2000). Using Bayes' rule \nto model multisensory enhancement in the superior colliculus. Neural Computation, 12 , \n1165-1187. \n\n[10] Meredith, M. A. & Stein, B. E. (1986b). Visual, auditory, and somatosensory con(cid:173)\nvergence on cells in superior colliculus results in multisensory integration. Journal of \nNeurophysiology, 56(3), 640-662. \n\n[11] Green, D. M., & Swets, J. A. (1974). Signal detection theory and psychophysics. New \nYork: Krieger Pub!. Co. \n\n[12] Egan, J . P. (1975) . Signal detection theory and ROC analysis. New York: Academic \nPress. \n\n[13] Craig, A., & Colquhoun, W. P. (1976). Combining evidence presented simultane(cid:173)\nously to the eye and the ear: A comparison of some predictive models. Perception fj \nPsychophysics, 19, 473-484. \n\n[14] Stein, B. E., London, N. , Wilkinson, L. K , & Price, D. D. (1996). Enhancement \nof perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of \nCognitive Neurosci ence, 8, 497-506. \n\n[15] Kadunce, D . C., Vaughan, J . W ., Wallace, M. T ., & Stein, B. E. (2001) . The influ(cid:173)\nence of visual and auditory receptive field organization on multisensory integration in the \nsuperior colliculus. Experimental Brain Research, 139, 303-310. \n\n[16] King, A. J., & Calvert, G. A. (2001). Multisensory integration: Perceptual grouping \nby eye and ear. Current Biology, 11, 322-325 . \n\n\f", "award": [], "sourceid": 1982, "authors": [{"given_name": "H.", "family_name": "Colonius", "institution": null}, {"given_name": "A.", "family_name": "Diederich", "institution": null}]}