{"title": "A Normative Theory for Causal Inference and Bayes Factor Computation in Neural Circuits", "book": "Advances in Neural Information Processing Systems", "page_first": 3804, "page_last": 3813, "abstract": "This study provides a normative theory for how Bayesian causal inference can be implemented in neural circuits. In both cognitive processes such as causal reasoning and perceptual inference such as cue integration, the nervous systems need to choose different models representing the underlying causal structures when making inferences on external stimuli. In multisensory processing, for example, the nervous system has to choose whether to integrate or segregate inputs from different sensory modalities to infer the sensory stimuli, based on whether the inputs are from the same or different sources. Making this choice is a model selection problem requiring the computation of Bayes factor, the ratio of likelihoods between the integration and the segregation models. In this paper, we consider the causal inference in multisensory processing and propose a novel generative model based on neural population code that takes into account both stimulus feature and stimulus reliability in the inference. In the case of circular variables such as heading direction, our normative theory yields an analytical solution for computing the Bayes factor, with a clear geometric interpretation, which can be implemented by simple additive mechanisms with neural population code. Numerical simulation shows that the tunings of the neurons computing Bayes factor are consistent with the \"opposite neurons\" discovered in dorsal medial superior temporal (MSTd) and the ventral intraparietal (VIP) areas for visual-vestibular processing. This study illuminates a potential neural mechanism for causal inference in the brain.", "full_text": "A Normative Theory for Causal Inference and Bayes\n\nFactor Computation in Neural Circuits\n\nwenhao.zhang@pitt.edu; siwu@pku.edu.cn; bdoiron@pitt.edu; tai@cnbc.cmu.edu\n\nWen-Hao Zhang1,2, Si Wu3, Brent Doiron2, Tai Sing Lee1\n\n1Center for the Neural Basis of Cognition, Carnegie Mellon University.\n\n2Department of Mathematics, University of Pittsburgh.\n\n3School of Electronics Engineering & Computer Science, IDG/McGovern\n\nInstitute for Brain Research, Peking-Tsinghua Center for Life Sciences, Peking University.\n\nAbstract\n\nThis study provides a normative theory for how Bayesian causal inference can\nbe implemented in neural circuits. In both cognitive processes such as causal\nreasoning and perceptual inference such as cue integration, the nervous systems\nneed to choose different models representing the underlying causal structures\nwhen making inferences on external stimuli.\nIn multisensory processing, for\nexample, the nervous system has to choose whether to integrate or segregate inputs\nfrom different sensory modalities to infer the sensory stimuli, based on whether\nthe inputs are from the same or different sources. Making this choice is a model\nselection problem requiring the computation of Bayes factor, the ratio of likelihoods\nbetween the integration and the segregation models. In this paper, we consider\nthe causal inference in multisensory processing and propose a novel generative\nmodel based on neural population code that takes into account both stimulus feature\nand stimulus reliability in the inference. In the case of circular variables such as\nheading direction, our normative theory yields an analytical solution for computing\nthe Bayes factor, with a clear geometric interpretation, which can be implemented\nby simple additive mechanisms with neural population code. Numerical simulation\nshows that the tunings of the neurons computing Bayes factor are consistent with\nthe \"opposite neurons\" discovered in dorsal medial superior temporal (MSTd) and\nthe ventral intraparietal (VIP) areas for visual-vestibular processing. This study\nilluminates a potential neural mechanism for causal inference in the brain.\n\n1\n\nIntroduction\n\nNumerous psychological studies have demonstrated that perception can be formulated as Bayesian\ninference of the underlying causes in the world that give rise to our sensations [1\u20136]. These causes\ncould be the sensory variables such as heading direction and orientation of edge, but often are causal\nstructures from which the observations are generated. In multisensory integration, as an example,\nwhen we move around the world, the optical \ufb02ows we see and the vestibular signals we experience are\nconcordant. In this case, an integration model will be selected so that multiple cues can be weighed\nand combined together to form a uni\ufb01ed estimate of head direction of self-motion [7, 8]. However,\nwhen we wear a goggle to navigate in a virtual reality world while sitting on a spinning chair, the\nvisual and the vestibular signals would be quite discordant and it would be wrong to integrate them\nduring inference [9]. In this case, a segregation model should be selected so that each cue will remain\nseparated and their sources can be inferred independently. The selection of these models or latent\ncausal structures during inference is called causal inference [10, 11].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: The generative model of causal inference. (A) The generative model. The two sensory\ncues are generated by the same stimulus in the integration model, while they are independently\ngenerated by two different stimuli in the segregation model. Dashed circle: latent variables; solid\ncircle: cues (observations). (B) The likelihood function derived from neural population code. Each set\nof stimulus parameters include the stimulus feature s (e.g., motion direction) and the stimulus strength\nR (e.g., motion coherence). Each cue consists of observed direction x and observed spike count \u039b\nrepresenting the input reliability. (C) A neural encoding model where the stimulus parameters w and\ncues d are represented by the population \ufb01ring rate \u03bb and observed spiking activities u respectively.\n\nA number of psychological studies have suggested that our brains indeed perform causal inference as\nan ideal observer (e.g., [10, 12\u201314]). However, it has been challenging to come up with a simple and\nbiologically plausible neural implementation for causal inference. This is because the computation\nof Bayes factor, which is the ratio of likelihoods between models, requires nonlinear operations\nincluding multiplication and division, while how these nonlinear operations could be implemented\nby neural circuits remain a mystery [13\u201315]. Thus, while cue integration assuming the integration\nmodel can be accomplished by an additive mechanism through linearly summing feedforward spiking\ninputs in the framework of probabilistic population codes [16], a neural model with similar additive\nmechanisms for model selection has not been attained [15].\nHere, we show that by incorporating stimulus strength or reliability R (e.g. motion coherence of the\nvisual cue, Fig. 1B) as a latent stimulus parameter to be inferred simultaneously with the stimulus\nfeature (e.g. heading direction) in a generative model, the Bayes factor can be computed by using\nadditive mechanism in a biologically plausible implementation. In this implementation, the neural\npopulation activities representing the Bayes factor can be computed by simply summing the inputs\nof one direction from one sensory modality with the inputs of the opposite direction from another\nmodality. We found the tunings of these neurons representing Bayes factor in the form of neural\npopulation code are similar to the \u201copposite\u201d neurons observed in MSTd and VIP whose preferred\nheading directions from the visual and vestibular modalities are indeed opposite, shifted by 180\ndegrees (Fig. 3B-C, [8, 17\u201319]). This work provides the \ufb01rst theoretical justi\ufb01cation that the opposite\ncells are computing and encoding Bayes factor, which is an essential step in causal inference. We\nprovide numerical simulation in support of this claim.\n\n2 A Generative Model for Multisensory Processing\n\n2.1 A probabilistic generative model\n\nWe study causal inference in the case of multisensory processing, an example of which is inferring\nheading direction using visual and vestibular cues [8, 19, 20]. The two cues are denoted by D =\n{dl}2\nl=1, with l = 1, 2 representing the visual and vestibular modality respectively. Each cue can\nbe regarded as the responses of uni-sensory neurons in visual or vestibular areas, which provide the\nfeedforward inputs to multisensory neurons in MSTd and VIP, respectively [8, 19]. In practice, the\ntwo cues can be generated by two different models M = {mint, mseg}, with each of them specifying\nan underlying causal structure (Fig. 1A, [10, 13, 14]). Each model mh (h \u2208 {int, seg}) has its\nparameters Wh = {wlh}2\nl=1, with wlh denoting the parameters of stimulus of sensory modality l.\nGiven a model (causal structure), the two sensory cues are generated independently (since they are\ngenerated and conveyed via different sensory pathways in the brain, Fig. 1A), i.e.,\n\n(1)\n\np(D|Wh, mh) =(cid:81)2\n\nl=1 p(dl|wlh).\n\n2\n\nBCAu\u00b8FiringrateFeedfwd. inputFeedfwd input u (Spike count)0246x (observed direction)\u00a4 (amount of spikes)Poisson spike generatorFiring rate \u00b8(s,R)Neuron index \u03b8-1801800s (heading direction)R (motion coherence)xsR\u00a4w1,segw2,segd1d2Segregration modelmsegStimulusparametersUnisensoryCuesIntegration modelmintMd = {x,\u00a4}w = {s,R}d1d2wint Model(causal structure)Neuralencoding\fIn the integration model mint, there is only one source in the world, so the features of stimuli in two\nmodalities are the same (Fig. 1A, [10, 13, 14, 21]), and we denote as wint (cid:44) w1,int = w2,int. The\nprior of parameter wint is assumed as a uniform distribution for simplicity,\n\np(wint|mint) = U(wint).\n\n(2)\nIn the segregation model mseg, there are two independent sources (Fig. 1A). Thus, the stimulus\nparameters in two modalities are independent with each other, and also satisfy the uniform distribution,\n(3)\nNotably, the two causal models are mutually exclusive to each other, in term of that only one of them\nholds at a single moment. The prior of the two models are assumed to be the same,\n\np(w1,seg, w2,seg|mseg) = p(w1,seg|mseg)p(w2,seg|mseg) = U(w1,seg) U(w2,seg).\n\np(mint = 1) = p(mseg = 1) = 1/2.\n\nCombing the likelihood and prior above, the whole generative process is summarized as,\n\np(D,Wh, mh) = p(D|Wh, mh)p(Wh|mh)p(mh),\n\n(cid:26) p(d1|wint)p(d2|wint),\n\nmh = mint,\np(d1|w1,seg)p(d2|w2,seg), mh = mseg.\n\n\u221d\n\n2.2 Neural population code\n\n(4)\n\n(5)\n\nIn the framework of neural population code [16, 22], the above generative model (Eq. 5) can be\ndescribed more speci\ufb01cally, which is a key step in linking abstract causal inference with neural circuit.\nConsider that wl = {sl, Rl} are the stimulus parameters of modality l, which is the heading direction\n(sl) and its reliability (Rl). The stimulus information is conveyed by the responses of N uni-sensory\nneurons in modality l, denoted as ul = {ulj}N\nj=1, which satisfy the Poisson statistics (Fig. 1C, [16]),\n\np (ul|\u03bblj(sl, Rl)) =(cid:81)N\n\nj=1 Poisson(ulj|\u03bblj) =(cid:81)N\n\nj=1\n\n\u03bb\n\nulj\nlj\n\nulj ! e\u2212\u03bblj ,\n\n(6)\n\nwhere \u03bblj is the \ufb01ring rate of neuron ulj and is a function of stimulus parameters sl and Rl,\n\n\u03bblj(sl, Rl) = Rl exp [a cos(sl \u2212 \u03b8j) \u2212 a] ,\n\n(7)\nwhere \u03b8j is the preferred direction of neuron ulj, and a is the width of the tuning function. Here, we\nassume that the stimulus reliability is encoded by the peak \ufb01ring rate of neurons [16, 22].\nAlthough the neuronal responses ul are high-dimensional, the likelihood function of the stimulus\nparameters wl given ul can be fully speci\ufb01ed by two one-dimensional variables (suf\ufb01cient statistics),\nwhich correspond to the readout (via population vector) of the direction (xl) from ul [23] and the\ntotal spike count (\u039bl),\n\n(cid:16)(cid:80)\n\n(cid:17)\n\n= tan\u22121(cid:16)(cid:80)\n(cid:80)\n\n(cid:17)\n\n, \u039bl =(cid:80)\n\nj ulj sin \u03b8j\nj ulj cos \u03b8j\n\nxl = arg\n\nj uljei\u03b8j\n\n(8)\nThe suf\ufb01cient statistics dl = {xl, \u039bl} correspond to the sensory cues in Eq. (1). The likelihood\nfunction of stimulus parameter wl derived from neural population code is calculated to be (see details\nin Supplementary Information (SI) 4),\n\nj ulj.\n\np (dl = {xl, \u039bl}|sl, Rl) = M (xl|sl, a\u03c1\u039bl) Poisson(\u039bl|\u03b2Rl),\n\u221d M (sl|xl, a\u03c1\u039bl) \u0393(Rl|\u039bl + 1, \u03b2),\n\n(9)\nwhere M(x), Poisson(x) and \u0393(x) denote a von Mises, a Poisson, and a Gamma distributions,\nrespectively. \u03c1 and \u03b2 represent the width of ul and the sum of normalized \ufb01ring rates, respectively\n(see SI. 4). The priors of slh and Rlh are assumed to be independent with each other (Eq. 2), i.e.,\n\nU(wlh) = U(slh) U(Rlh) = (LsLR)\u22121,\n\n(10)\nwhere Ls and LR are the lengths of the spaces of s and R, respectively. For heading direction\ns, Ls = 2\u03c0. Combining Eqs. (5, 9 and 10) together, the generative model in the form of neural\npopulation code is expressed as,\n\n(cid:26) (cid:81)2\n(cid:81)2\nl=1 M (xl|sint, a\u03c1\u039bl) Poisson(\u039bl|\u03b2Rint),\nmh = mint,\nl=1 M (xl|sl,seg, a\u03c1\u039bl) Poisson(\u039bl|\u03b2Rl,seg), mh = mseg.\n\np(D,Wh, mh) \u221d\n\n(11)\n\n3\n\n\fFigure 2: The geometric representation of Bayes factors. (A) The geometric representation of a von\nMises distribution where its mean and concentration can be represented by the angle and length of a\nvector in a 2d plane respectively. (B) The geometric representation of the posterior of direction under\ntwo models (Eq. 20) and the best-\ufb01t likelihood ratios (Eq. 25) in Bayes factor. Bottom: the likelihood\nratios depends on the disparity of direction as well as strength. Dashed line: the radius of half blue\nvector. (C) Evidence of integration and segregation models and Bayes factor with cue directions. (D)\nThe decision boundary with input spike count, where the spike count of two cues are always the same,\ni.e., \u039b1 = \u039b2. Parameters: Ls = 2\u03c0, LR = 100 Hz, a = 3 and N = 180. (C) \u039b1 = \u039b2 = 30.\n\nNotably, the generative model considered in the present study (Eq. 9) includes explicitly the stimulus\nstrength R, which was treated as a \u201cnuisance\u201d parameter in previous studies (e.g., [13\u201315]). We claim\nthat it is important for the neural system to exploit the disparity of the strength R of two stimuli to\nperform causal inference. For example, when you are watching a shaky video (low motion coherence)\nin virtual reality while you are walking straight ahead in real world (high reliability), even if the\nmoving direction and the speed of optic \ufb02ow in virtual reality is the same as your actual walking, you\nprobably feel the optic \ufb02ow is not generated by your walking and even feel motion-sickness because\nof the difference of motion coherence between visual and vestibular stimuli.\n\n3 Bayesian Causal Inference\n\nIn order to interpret the world, the neural circuit needs to infer the underlying causal structure mh\nbased on sensory cues D (Fig. 1), which can be achieved through estimating the posterior of each\nmodel mh. Although the spike count \u039bl is observed, the neural circuit is assumed to be only interested\nin the heading direction x = {x1, x2}, and evaluates each model\u2019s feasibility by its performance in\nexplaining x. According to the Bayes\u2019 theorem, the posterior of the integration model mint is,\n\n(cid:20)\n\n(cid:21)\u22121\n\np(mint|x) =\n\n(cid:80)\np(x|mint)p(mint)\nh p(x|mh)p(mh)\n\n=\n\n1 +\n\np(x|mseg)\np(x|mint)\n\n,\n\n(12)\n\nSince there are only two models, it always has(cid:80)\n\nwhere the condition that two models have the same prior is used (Eq. 4), i.e., p(mseg)/p(mint) = 1.\nh p(mh|x) = 1, and knowing the posterior of one\nmodel fully determines the posterior of another. From Eq. (12), we see the key of causal inference is\nto calculate the likelihood ratio between two models, which is called the Bayes factor [24, 25],\n\n(13)\nIf the Bayes factor is less than 1, p(mint|x) > p(mseg|x) and the integration model is favoured;\notherwise the segregation model is chosen. The core of computing the Bayes factor is to evaluate\n\n.\n\nB(x) =\n\np(x|mseg)\np(x|mint)\n\n4\n\nCD-90090Log. of Model Evidence and Bayes Factor Direction disparity x1\u00a1x2int.seg.seg.Int. model evidenceSeg. model evidenceBayes factor-40-20020int.seg.50100150Input spike count \u00a4 0901800501001502002500Direction disparity jx1\u00a1x2jLog. of Bayes factorB0.0050.010.01590\u00b0 270\u00b0180\u00b00\u00b0M(60\u00b0;\u2219=3)M(0\u00b0;\u2219=6)Geometric representation of von Mises distributionsGeometric representation of stimulus estimate and Bayes factorL24660\u00b0 0AParameter space: p(sint|D,mint): Likelihood ratio LR(xl): p(sl,seg|D,mseg)\u2219intejsint\u02c6\u02c6\u2219lejx\u2219lpejxlpl0L\u22191ejx1\u2219intejsint\u22192ejx2\u22192pejx2p\u22191pejx1p\u02c6\u02c6\u22191pejx1pStrength Disparity |\u00a41\u00a1\u00a42|Direction Disparity |x1\u00a1x2|Likelihood ratio decomposition:direction and strength disparity\fthe evidence of each model p(x|mh), which needs to marginalize the parameters Wh and the spike\ncounts \u039b = {\u039b1, \u039b2},\n\np(x|mh) =\n\np(x, \u039b|Wh)p(Wh|mh)dWhd\u039b,\n\n(cid:39) p(x| \u02c6Wh)\n\n\u00d7 p( \u02c6Wh|mh) det(Hh/2\u03c0)\u2212 1\n\n2\n\n,\n\n(14)\n\n(cid:123)(cid:122)\n\n(cid:125)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n(cid:125)\n\nBest-\ufb01t likelihood\n\nOccam factor, OF(mh)\n\n(cid:90) (cid:90)\n(cid:124)\n\n(15)\n\nwhere the Laplace\u2019s method is used to approximate the double integral (see SI. 2, [25, 26]), which\nworks well when the spike counts \u039b are suf\ufb01ciently large (Fig. S1, see details in SI. 5). Computing\nthe evidence of each model needs to \ufb01t the model to explain the sensory cues. Denote \u02c6Wh =\n{\u02c6slh, \u02c6Rlh}2\n\nl=1 the best-\ufb01t parameters (the maximal posterior estimate) of model mh, i.e.,\n\np(x| \u02c6Wh) =(cid:81)2\n\n\u02c6Wh = arg maxWh\nl=1 p(xl| \u02c6wlh) (cid:39)(cid:81)2\n\np(Wh|D, mh).\n\nThe best-\ufb01t likelihood of the observed direction x is given by (see details in SI. 5.3),\n\nl=1 M(xl|\u02c6slh, a\u03c1\u03b2 \u02c6Rlh).\n(16)\nIn Eq. (14), Hh = \u2212\u2207\u2207 ln p( \u02c6Wh|D, mh) is the negative Hessian matrix of the logarithm of the\nposterior p(Wh|D, mh), re\ufb02ecting the uncertainty of the inferred parameter Wh.\nNote that causal inference (model selection) is not simply choosing a causal structure (model) which\nbest explains the observed direction x, since a complex model can always \ufb01t the data well. An\nover-parameterized model or a model requiring too much \ufb01ne-tuning will be rejected, and this is\ncaptured by the Occam factor OF(mh) in Eq. (14). The Occam factor for a complex model is small,\nsince the probability of choosing a particular parameter value p( \u02c6Wh|mh) is low due to the large\nparameter space; and a \ufb01ne-tuned model has a large Hh, which also reduces the Occam factor [26].\nIn summary, Bayesian causal inference undergoes two levels of inference: the \ufb01rst level is inferring\nthe best-\ufb01t parameters \u02c6s and \u02c6R given each model (Eq. 15); and the second level is inferring the\nmodels M by using the best-\ufb01t parameters to evaluate each model\u2019s performance, with the model\ncomplexity penalized by the Occam factor (Eq. 14). In the section below, we presented how the two\nlevels of inference are performed.\n\n3.1 Maximum posterior estimate of stimulus parameters\n\nIn the segregation model mseg, each cue dl is exclusively used to \ufb01t the parameters wlh (Eq. 11),\n\np(Wseg|D, mseg) \u221d(cid:81)2\n\nl=1 M(cid:0)sl,seg|xl, \u03bal (cid:44) a\u03c1\u039bl\n\n(cid:1)\u0393(Rl,seg|\u039bl + 1, \u03b2),\n\nand the maximum-posterior estimates of the parameters are (see details in SI. 5.1),\n\n(17)\n\n(18)\nOn the other hand, the integration model mint only has one set of parameters wint = {sint, Rint}\n(Eq. 2), whose estimate involves combining two cues together (Eq. 9),\n\n\u02c6Rl,seg = \u039bl/\u03b2.\n\n\u02c6sl,seg = xl,\n\np(wint|D, mint) \u221d(cid:81)2\n\nl=1 M(sint|xl, \u03bal)\u0393(Rint|\u039bl + 1, \u03b2),\n\n\u221d M(sint|\u02c6sint, \u02c6\u03baint)\u0393(Rint|\u039b1 + \u039b2 + 1, 2\u03b2).\n\n(19)\n\n(20)\n\nThe parameters \u02c6sint and \u02c6\u03baint of the posterior of direction satisfy [27] (see details in SI. 5.2),\n\n\u02c6\u03baintej \u02c6sint = \u03ba1ejx1 + \u03ba2ejx2 .\n\n\u02c6sint = tan\u22121(cid:16) \u03ba1 sin x1+\u03ba2 sin x2\n\n(cid:17)\n\nCombining the above results (Eqs. 19-20), the parameter estimates in the integration model are,\n\n.\n\n,\n\n2\u03b2\n\n\u02c6Rint = \u039b1+\u039b2\n\n\u03ba1 cos x1+\u03ba2 cos x2\n\n(21)\nIt is worthy to note that there is a clear geometric interpretation of the parameters in the posterior of\ndirection s (Eqs. 17 and 19). The parameters of a von Mises distribution M(s|x, \u03ba) can be represented\nby the vector \u03baejx in a two-dimensional parameter plane with its mean x and concentration \u03ba\nrepresented by the angle and length of the vector, respectively (Fig. 2A). Thus, the posterior of\ndirection in the segregation model (M(sl,seg|xl, \u03bal) in Eq. 17) can be represented by two green\nvectors \u03balejxl in Fig. 2B. In comparison, since the integration model combines the two cues together,\nthe posterior of direction in the integration model can be represented by the blue vector in Fig. 2B,\nwhich is the sum of the two green vectors (Eq. 20). The geometry in the parameter space shows that\nthe integration model accumulates the common information of two cues to estimate stimulus, and the\nestimate of the integration model is always the consensus (reliability based average) of cues.\n\n5\n\n\f3.2 Occam factors of two models\n\nOF(mseg) = 4 \u00d7 OF(mint)2, OF(mint) = \u03c0[LsLR\n\nThe Occam factors of two models are (substituting Eqs. (18, 21) into Eq. (14), see SI. 5 for details),\n(22)\nThe OF(mseg) is smaller than OF(mint) by a order, because the number of parameters in the\nsegregation model is double that in the integration model. Moreover, the Occam factors of the two\nmodels are invariant constants with input spikes \u039bl and direction xl, because the dependence of the\nuncertainties of slh and Rlh on \u039bl cancel, which greatly simplify the neural implementation.\n\na\u03c1\u03b2]\u22121.\n\n\u221a\n\n3.3 The Bayes factor\n\nOnce the best-\ufb01t stimulus parameters (Eqs. 18 and 21) and the Occam factors (Eq. 22) are obtained,\nthe Bayes factor determining two models can be calculated as a function of heading direction (Eq. 13),\n\nB(x) (cid:39) 2(cid:89)\n\nl=1\n\nM(xl|\u02c6sl,seg, \u03bal)\nM(xl|\u02c6sint, \u02c6\u03baint/2)\n\nOF(mseg)\nOF(mint)\n\n=\n\nLR(xl) \u00d7 OFR,\n\n(23)\n\n2(cid:89)\n\nl=1\n\nwhere LR(xl) is the ratio of the best-\ufb01t likelihoods of two models, and OFR = OF(mseg)/OF(mint)\nis the Occam factor ratio which is a constant invariant to input (Eq. 22). In Eq. (23), \u03bal = a\u03c1\u03b2 \u02c6Rl,seg\n(Eq. 17) and \u02c6\u03baint/2 \u2248 (\u03ba1 + \u03ba2)/2 = a\u03c1\u03b2 \u02c6Rint due to |x1 \u2212 x2| (cid:28) min(\u03ba1, \u03ba2) (Eq. 20). Note that\nthe concentration of the best-\ufb01t likelihood of the integration model, i.e., \u02c6\u03baint/2 in the denominator of\nEq. (23), is half of the concentration of the posterior, i.e., \u02c6\u03baint in Eqs. (19-20). Intuitively, this is due\nto that the integration model uses the two cues\u2019 consensus (average) to explain each cue. When the\ncues are from the same source, their consensus is similar with themselves statistically.\nSince the Occam factor ratio OFR is a constant invariant with inputs, computing the dependency\nof Bayes factor on inputs lies in the computation of likelihood ratio LR(xl). Notably, the ratio\nbetween two circular distributions is still a circular distribution. Dividing by a circular distribution is\nproportional to rotating the distribution to opposite direction and multiplying it (comparing the below\nequation with Eq. 23), i.e.,\n\nLR(xl) \u221d M(xl|\u02c6sl,seg, \u03bal)M(xl|\u02c6sint + \u03c0, \u02c6\u03baint/2) = A \u00d7 M(xl|xlp, \u03balp),\n\n(24)\nwhere A is the product of normalizing constants1. Using Eq. (20), the parameters xlp and \u03balp of\nLR(xl) are calculated as,\n\nl(cid:48) = 3 \u2212 l.\n\n\u03balpejxlp = (\u03balejxl \u2212 \u03bal(cid:48)ejxl(cid:48) )/2 = [\u03balejxl + \u03bal(cid:48)ej(xl(cid:48) +\u03c0)]/2,\n\n(25)\nGeometrically, the likelihood ratio parameters (xlp and \u03balp in Eqs. 24-25) correspond to the difference\nbetween green vectors (the best-\ufb01t likelihood of the segregation model) and half of the blue vector (the\nbest-\ufb01t likelihood of the integration model), and they are represented by two red vectors in Fig. 2B.\nThis geometrical relationship suggests that the likelihood ratio takes into account the disparities of\nboth direction |x1\u2212 x2| and strength |\u039b1\u2212 \u039b2| (Fig. 2B bottom), and re\ufb02ects how well the integration\nmodel can explain the two cues, as the lengths of the red vectors increase with the cue disparity.\nFrom the property of parallelogram, the two red vectors are always of the same length but point to\nthe opposite direction with each other, implying the parameters of the two likelihood ratios have the\nsame concentration, i.e., \u03ba1p = \u03ba2p, but opposite means, i.e., x2p = x1p + \u03c0.\nFig. 2 presents the results of model evidence and Bayes factor. The evidence of the segregation model,\np(x|mseg), is a constant irrelevant of |x1 \u2212 x2| (Fig. 2C, blue line), since each cue is independently \ufb01t\nby a parameter and hence the cues can always be perfectly \ufb01t regardless of their disparity. However,\nthe segregation model is penalized by the Occam factor much more compared with the integration\nmodel since it has more parameters (Eq. 22). In contrast, the integration model parsimoniously uses\nthe two cues\u2019 consensus to explain cues, and hence its explanatory power, p(x|mint), decreases\nwith the cue disparity (Fig. 2C, red line). In summary, the integration model will be favoured when\ntwo cues are similar (Fig. 2C), consistent with the intuition that cues from the same object will be\nstatistically more similar than cues from different objects (Fig. 1A) [12, 13].\nThe spike counts \u039b affects the integration probability indirectly through the estimate of R (Eqs. 18\nand 21). When the spike counts of both cues are low, i.e., noisy cues due to low motion coherence,\n\n1A = 2\u03c0I0(\u02c6\u03baint/2)I0(\u03balp)/I0(\u03bal). I0(x) is the modi\ufb01ed Bessel function of the \ufb01rst kind and zero order.\n\n6\n\n\fFigure 3: Congruent and opposite neurons implement the integration and the Bayes factor respectively.\n(A) The schematic of the network structure. Congruent (opposite) neurons receive the feedforward\ninputs from two cues in a congruent (opposite) manner. Each circle represents a neuron where the two\narrows inside denotes its preferred directions under two sensory modalities, with the color specifying\nthe modality. (B) The tunings of an example congruent and opposite neuron in the network given\ntwo sensory cues. For illustration, the strength of two cues is set to be different. (C) Tuning curves\nof a congruent and an opposite neuron in both MSTd and VIP (adapted from [8]). (D) The number\nof congruent and opposite neurons in MSTd and VIP (adapted from [17]). (E) The comparison\nbetween the mean \u02c6sint and concentration \u02c6\u03baint decoded from congruent neurons with the theoretical\npredictions (Eq. 19). (F) The Bayes factors decoded from opposite neurons are compared with\ntheoretical prediction (Eq. 23). Parameters: (E-F) s1 = 0\u25e6, s2 \u2208 [0\u25e6, 20\u25e6], Rl \u2208 [5, 50]Hz.\n\nthe system tends to integrate two cues together to increase the con\ufb01dence of the stimulus estimate, so\nthe range of integration is large (Fig. 2D). In contrast, in the case of large spike counts, the estimate\nof each cue is reliable enough even without integration, and the system can discriminate the disparity\nbetween two cues clearly and the range of integration shrinks.\n\n4 Neural Implementation of Causal Inference\n\nWe further explore how causal inference can be implemented in neural circuits. As described above,\ncausal inference involves two operations, estimating the best-\ufb01t stimulus parameters of each model\n(Eq. 15) and calculating the Bayes factor (Eq. 23). The neural system needs at least two populations\nof neurons to implement each of them.\n\n4.1 Congruent neurons responsible for cue integration\n\nSince the estimate of the stimulus parameters in the segregation model is the same as the likeli-\nhood (Eq. 18), the feedforward inputs ul represents the estimate of the segregation model already.\nThe integration model combines two cues together. Following the idea of [16], cue integration can\nbe achieved by a population of neurons which sum the feedforward inputs of two cues together (see\nderivations in SI. 6.1). Denote the responses of these neurons by rc, we have,\n\nrc(j) = u1(\u03b8j) + u2(\u03b8j),\n\n(26)\nwhere ul(\u03b8j) denotes the input from modality l with preferred direction \u03b8j given cue l (Eq. 6). The\npreferred direction of rc(j) under two cues are the same (Fig. 3B), consistent with the tuning of\ncongruent neurons found in MSTd and VIP (Fig. 3C, [8, 19]), which are known to be responsible for\ncue integration [8, 16, 28].\n\n4.2 Opposite neurons representing the Bayes factor\n\nThe core of computing Bayes factor is the likelihood ratio (Eq. 23), because the Occam factor of two\nmodels are both constants invariant with inputs (Eq. 22). Thus we consider another population of\nneurons computing the likelihood ratio LR(xl) in Bayes factor. Since two likelihood ratios LR(xl)\n\n7\n\n40302010Firing rate (spikes s\u20131)00VestibularVisualHeading direction (\u00b0)602080400Number of Neurons9012018001020304050|\u2206 Preferred direction (\u00b0)|Visual vs. vestibularCongruent neuronOpposite neuronCongruentOppositeAB180\u00b00\u00b0\u03b8\u03b8Cue 1 (visual)Cue 2 (vestibular)Congruent neurons(integration)Opposite neurons(likelihood ratio)Opposite neuronCue direction (\u00b0)Firing rate (Hz)01020180-1800Cue 1Cue 2Congruent neuron0102030Firing rate (Hz)Cue direction (\u00b0)180-1800\u2219int from int. model\u02c6\u2219int from cong. neurons\u02c6sint from cong. neuronssint from int. model\u02c6\u02c6CDBayes factor B(x) (theory)Bayes factor B(x)(oppo. neurons)xlppop. vectorOpposite neuronsOccam factorratio (constant with input)\u2219lp-1801800Heading direction (\u00b0)-180180060EFTuning curvesExperimental evidenceIntegration by congruent neuronsNetwork structureDecode Bayes factor from opposite neuronsFeedfwd. input ulSignal flow1002003000100200300-2020406080-20020406080-50-5000505010010000\fare always opposite to each other, they can be parsimoniously represented by the same population of\nneurons. Eqs. (24-25) reveal that the likelihood ratio is proportional to the product of two best-\ufb01t\nlikelihoods but in the opposite manner. Analogous to the neural implementation of cue integration\n(Eq. 19), the ratio LR(x1) can be represented by another population of neurons averaging the two\nfeedforward inputs in an opposite manner (see details in SI. 6.2), whose responses ro are given by,\n(27)\nThe preferred direction of ro(j) under modality 1 is \u03b8j, but becomes \u03b8j + \u03c0 under modality 2\n(Fig. 3B). Experiments also found such kind of \u201copposite\u201d neurons in MSTd and VIP whose preferred\ndirections are opposite in response to visual and vestibular cues (Fig. 3C, [8, 19]). Note that the two\npopulations of neurons explicitly represent the distributions of stimulus direction, while the estimate\nof stimulus strength \u02c6Rint implicitly affects the total responses of opposite neurons as the average of\nthe two inputs (Eq. 27), in contrast to congruent neurons which sum up two inputs (Eq. 26).\n\nro(j) = [u1(\u03b8j) + u2(\u03b8j + \u03c0)]/2.\n\n4.3 Simulation results\n\nWe simulate a population of congruent neurons and a population of opposite neurons with equal\nnumber, as found in the experiments (Fig. 3D, [17, 18]). The congruent neurons\u2019 responses rc sum\nup the two feedforward inputs (two cues) together (Eq. 26), while the opposite neurons\u2019 responses\nro average the two feedforward inputs in an opposite manner (Eq. 27, Fig. 3A, see details in SI. 7).\nWe decode the mean and concentration of the heading direction from the congruent neurons rc via\npopulation vector (Eq. 8, [23]) and compare the results with the posterior of direction derived from\ntheory (Eq. 20, see details in SI. 7). Meanwhile, we decode the mean and concentration from the\nopposite neurons ro as the neurons\u2019 estimate for x1p and \u03ba1p, which are the parameters of LR(x1).\nThe parameters of LR(x2) can be obtained by using the relations x2p = \u2212x1p and \u03ba2p = \u03ba1p,\nbecause the two likelihood ratios have the same length but opposite direction (Fig. 2B). The Bayes\nfactor will be obtained by multiplying the decoded likelihood ratios from opposite neurons with the\nconstant Occam factor ratios (Eqs. 23-24). We then compare the decoded posteriors represented\nby the congruent neurons, and the Bayes factor decoded from opposite neurons with theoretical\npredictions (Fig. 3E-F). The results con\ufb01rm that the congruent neurons achieve cue integration, and\nthe opposite neurons compute and represent the likelihood ratio in the Bayes factor.\n\n5 Conclusions and Discussions\n\nThis study develops a normative theory to address how causal inference can be implemented by\nsimple additive mechanisms in neural circuits, and demonstrate that the opposite neurons found in\nMSTd and VIP could compute and represent the likelihood ratios in Bayes factor in a generative\nmodel framework based on probabilistic population code. Our theory also provides a geometric\ninterpretation of causal inference which illuminates clearly how the Bayes factor and cue integration\ndepend on the input direction and strength. Compared to existing proposed complex neural circuits\nfor causal inference, our model is rather simple, relying only on an additive operation, and is hence\nbiologically more plausible. Notably, opposite neurons have been known for more than a decade, yet\ntheir precise computational and functional roles remain unclear [17, 19]. Here, our study suggests\nthat opposite neurons are responsible for implementing causal inference in neural systems.\nPrevious works exploring the implementation of causal inference in neural systems (e.g., [15]) have\nnot associated their models with the neuronal properties found in the cortex. An important insight\nfrom our study is that in computing Bayes factor, opposite neurons need to take into account not only\nthe difference in the heading directions, but also the difference in the stimulus strength or reliability\nof signals, from the two sensory modalities (Fig. 2B), which is an issue missed in the previous works\n(e.g. [13, 15]). Previous theoretical works also suggested that opposite neurons compute the ratio\nbetween distributions [27, 29], but they consider the difference of the inferred common stimulus\ndirection sint from two cues, i.e., the posterior ratio of the stimulus direction [29]. Here, we consider\nthe opposite neurons compute the difference of the reconstructions of the input x from two models,\ni.e., the ratio of best-\ufb01t likelihoods (Eq. 14).\nNevertheless, we would like to point out that our theory on neural computation and representation of\nBayes factor in the current form only holds for circular variables, such as direction or orientation. How\nthe Bayes factor of a non-periodic variable, e.g., depth or spatial location, are computed by neurons\nremains unclear. Further experimental evidence for cue integration with non-periodic variables are\n\n8\n\n\fneeded to address this issue. Furthermore, the present study mainly focuses on the computation of\nBayes factor, and how the neural system carries out the followed computations based on the inferred\ncausal structure has yet been explored, which forms our future research.\n\nAcknowledgments\n\nThis work is supported by National Science Foundation (1816568), Intelligence Advanced Research\nProjects Activity (D16PC00007), the National Institutes of Health (Grants 1U19NS107613-01\nand R01EB026953), the Vannevar Bush Faculty (Fellowship N00014-18-1-2002), and the Simons\nFoundation Collaboration on the Global Brain. We also thank Rob Kass and Xaq Pitkow for their\nuseful suggestions.\n\nReferences\n[1] Marc O Ernst and Heinrich H B\u00fclthoff. Merging the senses into a robust percept. Trends in\n\nCognitive Sciences, 8(4):162\u2013169, 2004.\n\n[2] James J Clark and Alan L Yuille. Data Fusion for Sensory Information Processing Systems,\n\nvolume 105. Springer Science & Business Media, 2013.\n\n[3] Marc O Ernst and Martin S Banks. Humans integrate visual and haptic information in a\n\nstatistically optimal fashion. Nature, 415(6870):429\u2013433, 2002.\n\n[4] Konrad P K\u00f6rding and Daniel M Wolpert. Bayesian integration in sensorimotor learning. Nature,\n\n427(6971):244, 2004.\n\n[5] Robert A Jacobs. Optimal integration of texture and motion cues to depth. Vision Research,\n\n39(21):3621\u20133629, 1999.\n\n[6] David Alais and David Burr. The ventriloquist effect results from near-optimal bimodal\n\nintegration. Current Biology, 14(3):257\u2013262, 2004.\n\n[7] RJV Bertin and A Berthoz. Visuo-vestibular interaction in the reconstruction of travelled\n\ntrajectories. Experimental Brain Research, 154(1):11\u201321, 2004.\n\n[8] Yong Gu, Dora E Angelaki, and Gregory C DeAngelis. Neural correlates of multisensory cue\n\nintegration in macaque mstd. Nature Neuroscience, 11(10):1201\u20131210, 2008.\n\n[9] Kalpana Dokka, Hyeshin Park, Michael Jansen, Gregory C DeAngelis, and Dora E Angelaki.\nCausal inference accounts for heading perception in the presence of object motion. Proceedings\nof the National Academy of Sciences, 116(18):9060\u20139065, 2019.\n\n[10] Ladan Shams and Ulrik R Beierholm. Causal inference in perception. Trends in Cognitive\n\nSciences, 14(9):425\u2013432, 2010.\n\n[11] Judea Pearl et al. Causal inference in statistics: An overview. Statistics Surveys, 3:96\u2013146,\n\n2009.\n\n[12] Mark T Wallace, GE Roberson, W David Hairston, Barry E Stein, J William Vaughan, and\nJim A Schirillo. Unifying multisensory signals across time and space. Experimental Brain\nResearch, 158(2):252\u2013258, 2004.\n\n[13] Konrad P K\u00f6rding, Ulrik Beierholm, Wei Ji Ma, Steven Quartz, Joshua B Tenenbaum, and\n\nLadan Shams. Causal inference in multisensory perception. PLoS One, 2(9):e943, 2007.\n\n[14] Yoshiyuki Sato, Taro Toyoizumi, and Kazuyuki Aihara. Bayesian inference explains perception\nof unity and ventriloquism aftereffect: identi\ufb01cation of common sources of audiovisual stimuli.\nNeural Computation, 19(12):3335\u20133355, 2007.\n\n[15] Wei Ji Ma and Masih Rahmati. Towards a neural implementation of causal inference in cue\n\ncombination. Multisensory Research, 26(1-2):159\u2013176, 2013.\n\n[16] Wei Ji Ma, Jeffrey M Beck, Peter E Latham, and Alexandre Pouget. Bayesian inference with\n\nprobabilistic population codes. Nature Neuroscience, 9(11):1432\u20131438, 2006.\n\n9\n\n\f[17] Yong Gu, Paul V Watkins, Dora E Angelaki, and Gregory C DeAngelis. Visual and nonvisual\ncontributions to three-dimensional heading selectivity in the medial superior temporal area. The\nJournal of Neuroscience, 26(1):73\u201385, 2006.\n\n[18] Aihua Chen, Gregory C DeAngelis, and Dora E Angelaki. Representation of vestibular\nand visual cues to self-motion in ventral intraparietal cortex. The Journal of Neuroscience,\n31(33):12036\u201312052, 2011.\n\n[19] Aihua Chen, Gregory C DeAngelis, and Dora E Angelaki. Functional specializations of the\nventral intraparietal area for multisensory heading discrimination. The Journal of Neuroscience,\n33(8):3567\u20133581, 2013.\n\n[20] Christopher R Fetsch, Gregory C DeAngelis, and Dora E Angelaki. Bridging the gap between\ntheories of sensory cue integration and the physiology of multisensory neurons. Nature Reviews\nNeuroscience, 14(6):429\u2013442, 2013.\n\n[21] David R Wozny, Ulrik R Beierholm, and Ladan Shams. Probability matching as a computational\n\nstrategy used in perception. PLoS computational biology, 6(8):e1000871, 2010.\n\n[22] Peter Dayan and Laurence F Abbott. Theoretical Neuroscience, volume 806. Cambridge, MA:\n\nMIT Press, 2001.\n\n[23] Apostolos P Georgopoulos, Andrew B Schwartz, and Ronald E Kettner. Neuronal population\n\ncoding of movement direction. Science, 233(4771):1416\u20131419, 1986.\n\n[24] Robert E Kass and Adrian E Raftery. Bayes factors. Journal of the American Statistical\n\nAssociation, 90(430):773\u2013795, 1995.\n\n[25] Christopher M Bishop. Pattern Recognition and Machine Learning. Springer, 2006.\n\n[26] David JC MacKay.\n\nInformation theory, Inference and Learning Algorithms. Cambridge\n\nuniversity press, 2003.\n\n[27] Wen-Hao Zhang, He Wang, KY Michael Wong, and Si Wu. \u201ccongruent\u201d and \u201copposite\u201d\nneurons: Sisters for multisensory integration and segregation. In Advances in Neural Information\nProcessing Systems, pages 3180\u20133188, 2016.\n\n[28] Wen-Hao Zhang, Aihua Chen, Malte J Rasch, and Si Wu. Decentralized multisensory informa-\n\ntion integration in neural systems. The Journal of Neuroscience, 36(2):532\u2013547, 2016.\n\n[29] Wen-Hao Zhang, He Wang, Aihua Chen, Yong Gu, Tai Sing Lee, KY Michael Wong, and Si Wu.\nComplementary congruent and opposite neurons achieve concurrent multisensory integration\nand segregation. eLife, 8:e43753, 2019.\n\n10\n\n\f", "award": [], "sourceid": 2072, "authors": [{"given_name": "Wenhao", "family_name": "Zhang", "institution": "Carnegie Mellon & U. of Pittsburgh"}, {"given_name": "Si", "family_name": "Wu", "institution": "Peking University"}, {"given_name": "Brent", "family_name": "Doiron", "institution": "University of Pittsburgh"}, {"given_name": "Tai Sing", "family_name": "Lee", "institution": "Carnegie Mellon University"}]}