{"title": "A Framework for Testing Identifiability of Bayesian Models of Perception", "book": "Advances in Neural Information Processing Systems", "page_first": 1026, "page_last": 1034, "abstract": "Bayesian observer models are very effective in describing human performance in perceptual tasks, so much so that they are trusted to faithfully recover hidden mental representations of priors, likelihoods, or loss functions from the data. However, the intrinsic degeneracy of the Bayesian framework, as multiple combinations of elements can yield empirically indistinguishable results, prompts the question of model identifiability. We propose a novel framework for a systematic testing of the identifiability of a significant class of Bayesian observer models, with practical applications for improving experimental design. We examine the theoretical identifiability of the inferred internal representations in two case studies. First, we show which experimental designs work better to remove the underlying degeneracy in a time interval estimation task. Second, we find that the reconstructed representations in a speed perception task under a slow-speed prior are fairly robust.", "full_text": "A Framework for Testing Identi\ufb01ability\n\nof Bayesian Models of Perception\n\nLuigi Acerbi1,2\n\nWei Ji Ma2\n\nSethu Vijayakumar1\n\n1 School of Informatics, University of Edinburgh, UK\n\n2 Center for Neural Science & Department of Psychology, New York University, USA\n{luigi.acerbi,weijima}@nyu.edu\nsethu.vijayakumar@ed.ac.uk\n\nAbstract\n\nBayesian observer models are very effective in describing human performance in\nperceptual tasks, so much so that they are trusted to faithfully recover hidden men-\ntal representations of priors, likelihoods, or loss functions from the data. However,\nthe intrinsic degeneracy of the Bayesian framework, as multiple combinations of\nelements can yield empirically indistinguishable results, prompts the question of\nmodel identi\ufb01ability. We propose a novel framework for a systematic testing of\nthe identi\ufb01ability of a signi\ufb01cant class of Bayesian observer models, with practi-\ncal applications for improving experimental design. We examine the theoretical\nidenti\ufb01ability of the inferred internal representations in two case studies. First,\nwe show which experimental designs work better to remove the underlying de-\ngeneracy in a time interval estimation task. Second, we \ufb01nd that the reconstructed\nrepresentations in a speed perception task under a slow-speed prior are fairly ro-\nbust.\n\n1 Motivation\n\nBayesian Decision Theory (BDT) has been traditionally used as a benchmark of ideal perceptual\nperformance [1], and a large body of work has established that humans behave close to Bayesian\nobservers in a variety of psychophysical tasks (see e.g. [2, 3, 4]). The ef\ufb01cacy of the Bayesian\nframework in explaining a huge set of diverse behavioral data suggests a stronger interpretation\nof BDT as a process model of perception, according to which the formal elements of the decision\nprocess (priors, likelihoods, loss functions) are independently represented in the brain and shared\nacross tasks [5, 6]. Importantly, such mental representations, albeit not directly accessible to the\nexperimenter, can be tentatively recovered from the behavioral data by \u2018inverting\u2019 a model of the\ndecision process (e.g., priors [7, 8, 9, 10, 11, 12, 13, 14], likelihood [9], and loss functions [12, 15]).\nThe ability to faithfully reconstruct the observer\u2019s internal representations is key to the understanding\nof several outstanding issues, such as the complexity of statistical learning [11, 12, 16], the nature\nof mental categories [10, 13], and linking behavioral to neural representations of uncertainty [4, 6].\nIn spite of these successes, the validity of the conclusions reached by \ufb01tting Bayesian observer\nmodels to the data can be questioned [17, 18]. A major issue is that the inverse mapping from\nobserved behavior to elements of the decision process is not unique [19]. To see this degeneracy,\nconsider a simple perceptual task in which the observer is exposed to stimulus s that induces a noisy\nsensory measurement x. The Bayesian observer reports the optimal estimate s\u2217 that minimizes his\nor her expected loss, where the loss function L (s, \u02c6s) encodes the loss (or cost) for choosing \u02c6s when\nthe real stimulus is s. The optimal estimate for a given measurement x is computed as follows [20]:\n\n(1)\nwhere qprior(s) is the observer\u2019s prior density over stimuli and qmeas(x|s) the observer\u2019s sensory\nlikelihood (as a function of s). Crucially, for a given x, the solution of Eq. 1 is the same for any\n\ns\u2217(x) = arg min\n\n\u02c6s (cid:90) qmeas(x|s)qprior(s)L (s, \u02c6s) ds\n\n1\n\n\ftriplet of prior qprior(s)\u00b7 \u03c61(s), likelihood qmeas(x|s)\u00b7 \u03c62(s), and loss function L (\u02c6s, s)\u00b7 \u03c63(s), where\nthe \u03c6i(s) are three generic functions such that(cid:81)3\ni=1 \u03c6i(s) = c, for a constant c > 0. This analysis\nshows that the \u2018inverse problem\u2019 is ill-posed, as multiple combinations of priors, likelihoods and\nloss functions yield identical behavior [19], even before considering other confounding issues, such\nas latent states. If uncontrolled, this redundancy of solutions may condemn the Bayesian models of\nperception to a severe form of model non-identi\ufb01ability that prevents the reliable recovery of model\ncomponents, and in particular the sought-after internal representations, from the data.\nIn practice, the degeneracy of Eq. 1 can be prevented by enforcing constraints on the shape that the\ninternal representations are allowed to take. Such constraints include: (a) theoretical considerations\n(e.g., that the likelihood emerges from a speci\ufb01c noise model [21]); (b) assumptions related to\nthe experimental layout (e.g., that the observer will adopt the loss function imposed by the reward\nsystem of the task [3]); (c) additional measurements obtained either in independent experiments or\nin distinct conditions of the same experiment (e.g., through Bayesian transfer [5]). Crucially, both\n(b) and (c) are under partial control of the experimenter, as they depend on the experimental design\n(e.g., choice of reward system, number of conditions, separate control experiments). Although\nseveral approaches have been used or proposed to suppress the degeneracy of Bayesian models of\nperception [12, 19], there has been no systematic analysis \u2013 neither empirical nor theoretical \u2013 of\ntheir effectiveness, nor a framework to perform such study a priori, before running an experiment.\nThis paper aims to \ufb01ll this gap for a large class of psychophysical tasks. Similar issues of model\nnon-identi\ufb01ability are not new to psychology [22], and generic techniques of analysis have been\nproposed (e.g., [23]). Here we present an ef\ufb01cient method that exploits the common structure shared\nby many Bayesian models of sensory estimation. First, we provide a general framework that allows a\nmodeller to perform a systematic, a priori investigation of identi\ufb01ability, that is the ability to reliably\nrecover the parameters of interest, for a chosen Bayesian observer model. Second, we show how,\nby comparing identi\ufb01ability within distinct ideal experimental setups, our framework can be used\nto improve experimental design. In Section 2 we introduce a novel class of observer models that is\nboth \ufb02exible and ef\ufb01cient, key requirements for the subsequent analysis. In Section 3 we describe\na method to ef\ufb01ciently explore identi\ufb01ability of a given observer model within our framework. In\nSection 4 we show an application of our technique to two well-known scenarios in time perception\n[24] and speed perception [9]. We conclude with a few remarks in Section 5.\n\n2 Bayesian observer model\n\nHere we introduce a continuous class of Bayesian observer models parametrized by vector \u03b8. Each\nvalue of \u03b8 corresponds to a speci\ufb01c observer that can be used to model the psychophysical task of\ninterest. The current model (class) extends previous work [12, 14] by encompassing any sensori-\nmotor estimation task in which a one-dimensional stimulus magnitude variable s, such as duration,\ndistance, speed, etc. is directly estimated by the observer. This is a fundamental experimental condi-\ntion representative of several studies in the \ufb01eld (e.g., [7, 9, 12, 24, 14]). With minor modi\ufb01cations,\nthe model can also cover angular variables such as orientation (for small errors) [8, 11] and multi-\ndimensional variables when symmetries make the actual inference space one-dimensional [25]. The\nmain novel feature of the presented model is that it covers a large representational basis with a sin-\ngle parametrization, while still allowing fast computation of the observer\u2019s behavior, both necessary\nrequirements to permit an exploration of the complex model space, as described in Section 3.\nThe generic observer model is constructed in four steps (Figure 1 a & b): 1) the sensation stage\ndescribes how the physical stimulus s determines the internal measurement x; 2) the perception stage\ndescribes how the internal measurement x is combined with the prior to yield a posterior distribution;\n3) the decision-making stage describes how the posterior distribution and loss function guide the\nchoice of an \u2018optimal\u2019 estimate s\u2217 (possibly corrupted by lapses); and \ufb01nally 4) the response stage\ndescribes how the optimal estimate leads to the observed response r.\n\n2.1 Sensation stage\nFor computational convenience, we assume that the stimulus s \u2208 R+ (the task space) comes from\na discrete experimental distribution of stimuli si with frequencies Pi, with Pi > 0,(cid:80)i Pi = 1\nfor 1 \u2264 i \u2264 Nexp. Discrete distributions of stimuli are common in psychophysics, and continu-\n\n2\n\n\ff (s) = A ln(cid:34)1 +(cid:18) s\n\ns0(cid:19)d(cid:35) + B with inverse\n\nf\u22121(t) = s0\n\nt\u2212B\n\nA \u2212 1\n\n(2)\n\nd(cid:113)e\n\nFigure 1: Observer model. Graphical model of a sensorimotor estimation task, as seen from the\noutside (a), and from the subjective point of view of the observer (b). a: Objective generative\nmodel of the task. Stimulus s induces a noisy sensory measurement x in the observer, who decides\nfor estimate s\u2217 (see b). The recorded response r is further perturbed by reporting noise. Shaded\nnodes denote experimentally accessible variables. b: Observer\u2019s internal model of the task. The\nobserver performs inference in an internal measurement space in which the unknown stimulus is\ndenoted by t (with t = f (s)). The observer either chooses the subjectively optimal value of t, given\ninternal measurement x, by minimizing the expected loss, or simply lapses with probability \u03bb. The\nobserver\u2019s chosen estimate t\u2217 is converted to task space through the inverse mapping s\u2217 = f\u22121(t\u2217).\nThe whole process in this panel is encoded in (a) by the estimate distribution pest (s\u2217|x).\nous distributions can be \u2018binned\u2019 and approximated up to the desired precision by increasing Nexp.\nDue to noise in the sensory systems, stimulus s induces an internal measurement x \u2208 R accord-\ning to measurement distribution pmeas(x|s) [20]. In general, the magnitude of sensory noise may be\nstimulus-dependent in task space, in which case the shape of the likelihood would change from point\nto point \u2013 which is unwieldy for subsequent computations. We want instead to \ufb01nd a transformed\nspace in which the scale of the noise is stimulus-independent and the likelihood translationally in-\nvariant [9] (see Supplementary Material). We assume that such change of variables is performed by\na function f (s) : s \u2192 t that monotonically maps stimulus s from task space into t = f (s), which\nlives with x in an internal measurement space. We assume for f (s) the following parametric form:\n\nwhere A and B are chosen, without loss of generality, such that the discrete distribution of stimuli\nmapped in internal space, {f (si)} for 1 \u2264 i \u2264 Nexp, has range [\u22121, 1]. The parametric form of the\nsensory map in Eq. 2 can approximate both Weber-Fechner\u2019s law and Steven\u2019s law, for different\nvalues of base noise magnitude s0 and power exponent d (see Supplementary Material).\nWe determine the shape of pmeas(x|s) with a maximum-entropy approach by \ufb01xing the \ufb01rst four\nmoments of the distribution, and under the rather general assumptions that the sensory measure-\nment is unimodal and centered on the stimulus in internal measurement space. For computational\nconvenience, we express pmeas(x|s) as a mixture of (two) Gaussians in internal measurement space:\n(3)\nwhere N(cid:0)x|\u00b5, \u03c32(cid:1) is a normal distribution with mean \u00b5 and variance \u03c32 (in this paper we consider\n\na two-component mixture but derivations easily generalize to more components). The parameters in\nEq. 3 are partially determined by specifying the \ufb01rst four central moments: E [x] = f (s), Var[x] =\n\u03c32, Skew[x] = \u03b3, Kurt[x] = \u03ba; where \u03c3, \u03b3, \u03ba are free parameters. The remaining degrees of freedom\n(one, for two Gaussians) are \ufb01xed by picking a distribution that satis\ufb01es unimodality and locally\nmaximizes the differential entropy (see Supplementary Material). The sensation model represented\nby Eqs. 2 and 3 allows to express a large class of sensory models in the psychophysics literature,\nincluding for instance stimulus-dependent noise [9, 12, 24] and \u2018robust\u2019 mixture models [21, 26].\n\npmeas(x|s) = \u03c0N(cid:0)x|f (s) + \u00b51, \u03c32\n\n1(cid:1) + (1 \u2212 \u03c0)N(cid:0)x|f (s) + \u00b52, \u03c32\n2(cid:1)\n\n2.2 Perceptual stage\n\nWithout loss of generality, we represent the observer\u2019s prior distribution qprior(t) as a mixture of M\ndense, regularly spaced Gaussian distributions in internal measurement space:\n\nqprior(t) =\n\nM(cid:88)m=1\n\nwmN(cid:0)t|\u00b5min + (m \u2212 1)a, a2(cid:1)\n\na \u2261\n\n\u00b5max \u2212 \u00b5min\n\nM \u2212 1\n\n(4)\n\n3\n\nsxs\u2217rtxminimize(cid:10)L(\u02c6t\u2212t)(cid:11)lapset\u22171\u2212\u03bb\u03bba.Generativemodelb.InternalmodelSensationpmeas(x|s)Perception&Decision-makingpest(s\u2217|x)Responsepreport(r|s\u2217)qprior(t)PerceptionDecision-makingqmeas(x|t)\fwhere wm are the mixing weights, a the lattice spacing and [\u00b5min, \u00b5max] the range in internal space\nover which the prior is de\ufb01ned (chosen 50% wider than the true stimulus range). Eq. 4 allows\nthe modeller to approximate any observer\u2019s prior, where M regulates the \ufb01ne-grainedness of the\nrepresentation and is determined by computational constraints (for all our analyses we \ufb01x M = 15).\nFor simplicity, we assume that the observer\u2019s internal representation of the likelihood, qmeas(x|t), is\nexpressed in the same measurement space and takes again the form of a unimodal mixture of two\nGaussians, Eq. 3, although with possibly different variance, skewness and kurtosis (respectively,\n\u02dc\u03c32, \u02dc\u03b3 and \u02dc\u03ba) than the true likelihood. We write the observer\u2019s posterior distribution as: qpost(t|x) =\nZ qprior(t)qmeas(x|t) with Z the normalization constant.\n2.3 Decision-making stage\n\n1\n\nAccording to Bayesian Decision Theory (BDT), the observer\u2019s \u2018optimal\u2019 estimate corresponds to the\nvalue of the stimulus that minimizes the expected loss, with respect to loss function L(t, \u02c6t), where\nt is the true value of the stimulus and \u02c6t its estimate. In general the loss could depend on t and \u02c6t in\ndifferent ways, but for now we assume a functional dependence only on the stimulus difference in\ninternal measurement space, \u02c6t \u2212 t. The (subjectively) optimal estimate is:\n\nt\u2217(x) = arg min\n\n\u02c6t (cid:90) qpost(t|x)L(cid:0)\u02c6t \u2212 t(cid:1) dt\n\nwhere the integral on the r.h.s. represents the expected loss. We make the further assumption that\nthe loss function is well-behaved, that is smooth, with a unique minimum at zero (i.e., the loss is\nminimal when the estimate matches the true stimulus), and with no other local minima. As before,\nwe adopt a maximum-entropy approach and we restrict ourselves to the class of loss functions that\ncan be described as mixtures of two (inverted) Gaussians:\n\nL(\u02c6t \u2212 t) = \u2212\u03c0(cid:96)N(cid:16)\u02c6t \u2212 t|\u00b5(cid:96)\n\n1, \u03c3(cid:96)\n1\n\n2(cid:17) \u2212 (1 \u2212 \u03c0(cid:96))N(cid:16)\u02c6t \u2212 t|\u00b5(cid:96)\n\n2, \u03c3(cid:96)\n2\n\n2(cid:17) .\n\nAlthough the loss function is not a distribution, we \ufb01nd convenient to parametrize it in terms\nof statistics of a corresponding unimodal distribution obtained by \ufb02ipping Eq. 6 upside down:\nMode [t(cid:48)] = 0, Var [t(cid:48)] = \u03c32\n(cid:96) , Skew [t(cid:48)] = \u03b3(cid:96), Kurt [t(cid:48)] = \u03ba(cid:96); with t(cid:48) \u2261 \u02c6t \u2212 t. Note that we \ufb01x\nthe location of the mode of the mixture of Gaussians so that the global minimum of the loss is at\nzero. As before, the remaining free parameter is \ufb01xed by taking a local maximum-entropy solution.\nA single inverted Gaussian already allows to express a large variety of losses, from a delta function\n\n(MAP strategy) for \u03c3(cid:96) \u2192 0 to a quadratic loss for \u03c3(cid:96) \u2192 \u221e (in practice, for \u03c3(cid:96) (cid:38) 1), and it has been\n\nshown to capture human sensorimotor behavior quite well [15]. Eq. 6 further extends the range of\ndescribable losses to asymmetric and more or less peaked functions. Crucially, Eqs. 3, 4, 5 and 6\ncombined yield an analytical expression for the expected loss that is a mixture of Gaussians (see\nSupplementary Material) that allows for a fast numerical solution [14, 27].\nWe allow the possibility that the observer may occasionally deviate from BDT due to lapses with\nprobability \u03bb \u2265 0. In the case of lapse, the observer\u2019s estimate t\u2217 is drawn randomly from the prior\n[11, 14]. The combined stochastic estimator with lapse in task space has distribution:\n(7)\n\npest(s\u2217|x) = (1 \u2212 \u03bb) \u00b7 \u03b4(cid:2)s\u2217 \u2212 f\u22121 (t\u2217(x))(cid:3) + \u03bb \u00b7 qprior(s\u2217)|f(cid:48)(s\u2217)|\n\nwhere f(cid:48)(s\u2217) is the derivative of the mapping in Eq. 2 (see Supplementary Material).\n\n(5)\n\n(6)\n\n(8)\n\n2.4 Response stage\n\nWe assume that the observer\u2019s response r is equal to the observer\u2019s estimate corrupted by indepen-\ndent normal noise in task space, due to motor error and other residual sources of variability:\n\npreport(r|s\u2217) = N(cid:0)r|s\u2217, \u03c32\n\nreport(s\u2217)(cid:1)\n\nwhere we choose a simple parameteric form for the variance: \u03c32\n1s2, that is the sum\nof two independent noise terms (constant noise plus some noise that grows with the magnitude of\nthe stimulus). In our current analysis we are interested in observer models of perception, so we do\nnot explicitly model details of the motor aspect of the task and we do not include the consequences\nof response error into the decision making part of the model (Eq. 5).\n\nreport(s) = \u03c12\n\n0 + \u03c12\n\n4\n\n\fFinally, the main observable that the experimenter can measure is the response probability density,\npresp(r|s; \u03b8), of a response r for a given stimulus s and observer\u2019s parameter vector \u03b8 [12]:\n\nobtained by marginalizing over unobserved variables (see Figure 1 a), and which we can compute\nthrough Eqs. 3\u20138. An observer model is fully characterized by parameter vector \u03b8:\n\npresp(r|s; \u03b8) =(cid:90) N(cid:0)r|s\u2217, \u03c32\n\u03b8 =(cid:16)\u03c3, \u03b3, \u03ba, s0, d, \u02dc\u03c3, \u02dc\u03b3, \u02dc\u03ba, \u03c3(cid:96), \u03b3(cid:96), \u03ba(cid:96),{wm}M\n\nreport(s\u2217)(cid:1) pest(s\u2217|x)pmeas(x|s) ds\u2217 dx,\nm=1 , \u03c10, \u03c11, \u03bb(cid:17) .\n\n(10)\nAn experimental design is speci\ufb01ed by a reference observer model \u03b8\u2217, an experimental distribution\nof stimuli (a discrete set of Nexp stimuli si, each with relative frequency Pi), and possibly a subset\nof parameters that are assumed to be equal to some a priori or experimentally measured values\nduring the inference. For experiments with multiple conditions, an observer model typically shares\nseveral parameters across conditions. The reference observer \u03b8\u2217 represents a \u2018typical\u2019 observer for\nthe idealized task under examination; its parameters are determined from pilot experiments, the\nliterature, or educated guesses. We are ready now to tackle the problem of identi\ufb01ability of the\nparameters of \u03b8\u2217 within our framework for a given experimental design.\n\n(9)\n\n3 Mapping a priori identi\ufb01ability\n\nTwo observer models \u03b8 and \u03b8\u2217 are a priori practically non-identi\ufb01able if they produce similar re-\nsponse probability densities presp(r|si; \u03b8) and presp(r|si; \u03b8\u2217) for all stimuli si in the experiment.\nSpeci\ufb01cally, we assume that data are generated by the reference observer \u03b8\u2217 and we ask what is\nthe chance that a randomly generated dataset D of a \ufb01xed size Ntr will instead provide support for\nobserver \u03b8. For one speci\ufb01c dataset D, a natural way to quantify support would be the posterior\nprobability of a model given the data, Pr(\u03b8|D). However, randomly generating a large number of\ndatasets so as to approximate the expected value of Pr(\u03b8|D) over all datasets, in the spirit of previous\nwork on model identi\ufb01ability [23], becomes intractable for complex models such as ours.\nInstead, we de\ufb01ne the support for observer model \u03b8, given dataset D, as its log likelihood,\nlog Pr(D|\u03b8). The log (marginal) likelihood is a widespread measure of evidence in model com-\nparison, from sampling algorithms to metrics such as AIC, BIC and DIC [28]. Since we know the\ngenerative model of the data, Pr(D|\u03b8\u2217), we can compute the expected support for model \u03b8 as:\n\n(cid:104)log Pr(D|\u03b8)(cid:105) =(cid:90)|D|=Ntr\n\nlog Pr (D|\u03b8) Pr (D|\u03b8\u2217) dD.\n\nThe formal integration over all possible datasets with \ufb01xed number of trials Ntr yields:\n\n(11)\n\n(12)\n\nNexp(cid:88)i=1\n\n(cid:104)log Pr(D|\u03b8)(cid:105) = \u2212Ntr\n\nPi \u00b7 DKL (presp(r|si; \u03b8\u2217)||presp(r|si; \u03b8)) + const\n\nwhere DKL(\u00b7||\u00b7) is the Kullback-Leibler (KL) divergence between two distributions, and the con-\nstant is an entropy term that does not affect our subsequent analysis, not depending on \u03b8 (see Sup-\nplementary Material for the derivation). Crucially, DKL is non-negative, and zero only when the\ntwo distributions are identical. The asymmetry of the KL-divergence captures the different status of\n\u03b8\u2217 and \u03b8 (that is, we measure differences only on datasets generated by \u03b8\u2217). Eq. 12 quanti\ufb01es the\naverage support for model \u03b8 given true model \u03b8\u2217, which we use as a proxy to assess model iden-\nti\ufb01ability. As an empirical tool to explore the identi\ufb01ability landscape, we de\ufb01ne the approximate\nexpected posterior density as:\n\n(13)\nand we sample from Eq. 13 via MCMC. Clearly, E (\u03b8|\u03b8\u2217) is maximal for \u03b8 = \u03b8\u2217 and generally\nhigh for regions of the parameter space empirically close to the predictions of \u03b8\u2217. Moreover, the\npeakedness of E(\u03b8|\u03b8\u2217) is modulated by the number of trials Ntr (the more the trials, the more\ninformation to discriminate between models).\n\nE (\u03b8|\u03b8\u2217) \u221d e(cid:104)log Pr(D|\u03b8)(cid:105)\n\n4 Results\n\nWe apply our framework to two case studies: the inference of priors in a time interval estimation\ntask (see [24]) and the reconstruction of prior and noise characteristics in speed perception [9].\n\n5\n\n\fFigure 2: Internal representations in interval timing (Short condition). Accuracy of the recon-\nstructed priors in the Short range; each row corresponds to a different experimental design. a: The\n\ufb01rst column shows the reference prior (thick red line) and the recovered mean prior \u00b1 1 SD (black\nline and shaded area). The other columns display the distributions of the recovered central moments\nof the prior. Each panel shows the median (black line), the interquartile range (dark-shaded area)\nand the 95 % interval (light-shaded area). The green dashed line marks the true value. b: Box plots\nof the symmetric KL-divergence between the reconstructed priors and the prior of the reference ob-\nserver. At top, the primacy probability P \u2217 of each setup having less reconstruction error than all\nthe others (computed by bootstrap). c: Joint posterior density of sensory noise \u03c3 and motor noise\n\u03c11 in setup BSL (gray contour plot; colored plots are marginal distributions). The parameters are\nanti-correlated, and discordant with the true value (star and dashed lines). d: Marginal posterior\ndensity for loss width parameter \u03c3(cid:96), suitably rescaled.\n\n4.1 Temporal context and interval timing\n\nWe consider a time interval estimation and reproduction task very similar to [24]. In each trial, the\nstimulus s is a time interval (e.g., the interval between two \ufb02ashes), drawn from a \ufb01xed experimental\ndistribution, and the response r is the reproduced duration (e.g., the interval between the second\n\ufb02ash and a mouse click). Subjects perform in one or two conditions, corresponding to two different\ndiscrete uniform distributions of durations, either on a Short (494-847 ms) or a Long (847-1200\nms) range. Subjects are trained separately on each condition till they (roughly) learn the underlying\ndistribution, at which point their performance is measured in a test session; here we only simulate the\ntest sessions. We assume that the experimenter\u2019s goal is to faithfully recover the observer\u2019s priors,\nand we analyze the effect of different experimental designs on the reconstruction error.\nTo cast the problem within our framework, we need \ufb01rst to de\ufb01ne the reference observer \u03b8\u2217. We\nmake the following assumptions: (a) the observer\u2019s priors (or prior, in only one condition) are\nsmoothed versions of the experimental uniform distributions; (b) the sensory noise is affected by\nthe scalar property of interval timing, so that the sensory mapping is logarithmic (s0 \u2248 0, d = 1);\n(c) we take average sensorimotor noise parameters from [24]: \u03c3 = 0.10, \u03b3 = 0, \u03ba = 0, and \u03c10 \u2248 0,\n\u03c11 = 0.07; (d) for simplicity, the internal likelihood coincides with the measurement distribution;\n(e) the loss function in internal measurement space is almost-quadratic, with \u03c3(cid:96) = 0.5, \u03b3(cid:96) = 0,\n\u03ba(cid:96) = 0; (f) we assume a small lapse probability \u03bb = 0.03; (g) in case the observer performs in two\nconditions, all observer\u2019s parameters are shared across conditions (except for the priors). For the\ninferred observer \u03b8 we allow all model parameters to change freely, keeping only assumptions (d)\nand (g). We compare the following variations of the experimental setup:\n\n1. BSL: The baseline version of the experiment, the observer performs in both the Short and\n\nLong conditions (Ntr = 500 each);\n\n2. SRT or LNG: The observer performs more trials (Ntr = 1000), but only either in the Short\n\n(SRT) or in the Long (LNG) condition;\n\n6\n\nMean010010010010100.511.501SD040Skewness01Kurtosis0104001010400101\u2212101201\u2212202401ms600800010ms50100040SRT05MAP05MTRms49484705PriorBSL a.05BSLSRTMAPMTR0.010.1110KLP\u22170.060.130.02 0.79 b.0.060.080.10.12BSL0.10\u03c30.060.02\u03c11 c.\u03c3\u2113 d.01\f3. MAP: As BSL, but we assume a difference in the performance feedback of the task such\n\nthat the reference observer adopts a narrower loss function, closer to MAP (\u03c3(cid:96) = 0.1);\n\n4. MTR: As BSL, but the observer\u2019s motor noise parameters \u03c10, \u03c11 are assumed to be known\n\n(e.g. measured in a separate experiment), and therefore \ufb01xed during the inference.\n\nWe sample from the approximate posterior density (Eq. 13), obtaining a set of sampled priors for\neach distinct experimental setup (see Supplementary Material for details). Figure 2 a shows the\nreconstructed priors and their central moments for the Short condition (results are analogous for\nthe Long condition; see Supplementary Material). We summarize the reconstruction error of the\nrecovered priors in terms of symmetric KL-divergence from the reference prior (Figure 2 b). Our\nanalysis suggests that the baseline setup BSL does a relatively poor job at inferring the observers\u2019\npriors. Mean and skewness of the inferred prior are generally acceptable, but for example the SD\ntends to be considerably lower than the true value. Examining the posterior density across various\ndimensions, we \ufb01nd that this mismatch emerges from a partial non-identi\ufb01ability of the sensory\nnoise, \u03c3, and the motor noise, w1 (Figure 2 c).1 Limiting the task to a single condition with double\nnumber of trials (SRT) only slightly improves the quality of the inference. Surprisingly, we \ufb01nd that\na design that encourages the observer to adopt a loss function closer to MAP considerably worsens\nthe quality of the reconstruction in our model. In fact, the loss width parameter \u03c3(cid:96) is only weakly\nidenti\ufb01able (Figure 2 d), with severe consequences for the recovery of the priors in the MAP case.\nFinally, we \ufb01nd that if we can independently measure the motor parameters of the observer (MTR),\nthe degeneracy is mostly removed and the priors can be recovered quite reliably.\nOur analysis suggests that the reconstruction of internal representations in interval timing requires\nstrong experimental constraints and validations [12]. This worked example also shows how our\nframework can be used to rank experimental designs by the quality of the inferred features of interest\n(here, the recovered priors), and to identify parameters that may critically affect the inference. Some\n\ufb01ndings align with our intuitions (e.g., measuring the motor parameters) but others may be non-\nobvious, such as the bad impact that a narrow loss function may have on the inferred priors within\nour model. Incidentally, the low identi\ufb01ability of \u03c3(cid:96) that we found in this task suggests that claims\nabout the loss function adopted by observers in interval timing (see [24]), without independent\nvalidation, might deserve additional investigation. Finally, note that the analysis we performed\nis theoretical, as the effects of each experimental design are formulated in terms of changes in the\nparameters of the ideal reference observer. Nevertheless, the framework allows to test the robustness\nof our conclusions as we modify our assumptions about the reference observer.\n\n4.2 Slow-speed prior in speed perception\n\nAs a further demonstration, we use our framework to re-examine a well-known \ufb01nding in visual\nspeed perception, that observers have a heavy-tailed prior expectation for slow speeds [9, 29]. The\noriginal study uses a 2AFC paradigm [9], that we convert for our analysis into an equivalent estima-\ntion task (see e.g. [30]). In each trial, the stimulus magnitude s is speed of motion (e.g., the speed\nof a moving dot in deg/s), and the response r is the perceived speed (e.g., measured by interception\ntiming). Subjects perform in two conditions, with different contrast levels of the stimulus, either\nHigh (cHigh = 0.5) or Low (cLow = 0.075), corresponding to different levels of estimation noise.\nNote that in a real speed estimation experiment subjects quickly develop a prior that depends on the\nexperimental distribution of speeds [30] \u2013 but here we assume no learning of that kind in agreement\nwith the underlying 2AFC task. Instead, we assume that observers use their \u2018natural\u2019 prior over\nspeeds. Our goal is to probe the reliability of the inference of the slow-speed prior and of the noise\ncharacteristics of the reference observer (see [9]).\nWe de\ufb01ne the reference observer \u03b8\u2217 as follows: (a) the observer\u2019s prior is de\ufb01ned in task space by\nprior)\u2212kprior, with sprior = 1 deg/s and kprior = 2.4 [29];\na parametric formula: pprior(s) = (s2 + s2\n(b) the sensory mapping has parameters s0 = 0.35 deg/s, d = 1 [29]; (c) the amount of sensory\nnoise depends on the contrast level, as per [9]: \u03c3High = 0.2, \u03c3Low = 0.4, and \u03b3 = 0, \u03ba = 0; (d)\nthe internal likelihood coincides with the measurement distribution; (e) the loss function in internal\nmeasurement space is almost-quadratic, with \u03c3(cid:96) = 0.5, \u03b3(cid:96) = 0, \u03ba(cid:96) = 0; (f) we assume a consider-\n1This degeneracy is not surprising, as both sensory and motor noise of the reference observer \u03b8\u2217 are ap-\nproximately Gaussian in internal measurement space (\u223c log task space). This lack of identi\ufb01ability also affects\nthe prior since the relative weight between prior and likelihood needs to remain roughly the same.\n\n7\n\n\fInternal representations in speed perception. Accuracy of the reconstructed internal\nFigure 3:\nrepresentations (priors and likelihoods). Each row corresponds to different assumptions during the\ninference. a: The \ufb01rst column shows the reference log prior (thick red line) and the recovered mean\nlog prior \u00b1 1 SD (black line and shaded area). The other two columns display the approximate\nposteriors of kprior and sprior, obtained by \ufb01tting the reconstructed \u2018non-parametric\u2019 priors with a\nparametric formula (see text). Each panel shows the median (black line), the interquartile range\n(dark-shaded area) and the 95 % interval (light-shaded area). The green dashed line marks the\ntrue value. b: Box plots of the symmetric KL-divergence between the reconstructed and reference\nprior. c: Approximate posterior distributions for sensory mapping and sensory noise parameters. In\nexperimental design STD, the internal likelihood parameters (\u02dc\u03c3High, \u02dc\u03c3Low) are equal to their objective\ncounterparts (\u03c3High, \u03c3Low).\n\nable amount of reporting noise, with \u03c10 = 0.3 deg/s, \u03c11 = 0.21; (g) we assume a contrast-dependent\nlapse probability (\u03bbHigh = 0.01, \u03bbLow = 0.05); (h) all parameters that are not contrast-dependent\nare shared across the two conditions. For the inferred observer \u03b8 we allow all model parameters to\nchange freely, keeping only assumptions (d) and (h). We consider the standard experimental setup\ndescribed above (STD), and an \u2018uncoupled\u2019 variant (UNC) in which we do not take the usual as-\nsumption that the internal representation of the likelihoods is coupled to the experimental one (so,\n\u02dc\u03c3High, \u02dc\u03c3Low, \u02dc\u03b3 and \u02dc\u03ba are free parameters). As a sanity check, we also consider an observer with a\nuniformly \ufb02at speed prior (FLA), to show that in this case the algorithm can correctly infer back the\nabsence of a prior for slow speeds (see Supplementary Material).\nUnlike the previous example, our analysis shows that here the reconstruction of both the prior and the\ncharacteristics of sensory noise is relatively reliable (Figure 3 and Supplementary Material), without\nmajor biases, even when we decouple the internal representation of the noise from its objective\ncounterpart (except for underestimation of the noise lower bound s0, and of the internal noise \u02dc\u03c3High,\nFigure 3 c). In particular, in all cases the exponent kprior of the prior over speeds can be recovered\nwith good accuracy. Our results provide theoretical validation, in addition to existing empirical\nsupport, for previous work that inferred internal representations in speed perception [9, 29].\n\n5 Conclusions\n\nWe have proposed a framework for studying a priori identi\ufb01ability of Bayesian models of perception.\nWe have built a fairly general class of observer models and presented an ef\ufb01cient technique to\nexplore their vast identi\ufb01ability landscape. In one case study, a time interval estimation task, we have\ndemonstrated how our framework could be used to rank candidate experimental designs depending\non their ability to resolve the underlying degeneracy of parameters of interest. The obtained ranking\nis non-trivial: for example, it suggests that experimentally imposing a narrow loss function may\nbe detrimental, under certain assumptions. In a second case study, we have shown instead that the\ninference of internal representations in speed perception, at least when cast as an estimation task in\nthe presence of a slow-speed prior, is generally robust and in theory not prone to major degeneracies.\nSeveral modi\ufb01cations can be implemented to increase the scope of the psychophysical tasks covered\nby the framework. For example, the observer model could include a generalization to arbitrary loss\nspaces (see Supplementary Material), the generative model could be extended to allow multiple\ncues (to analyze cue-integration studies), and a variant of the model could be developed for discrete-\nchoice paradigms, such as 2AFC, whose identi\ufb01ability properties are largely unknown.\n\n8\n\nkprior01sprior01\u03c3High0510\u03c3Low05\u02dc\u03c3High0510\u02dc\u03c3Low050240100.20.405100.20.40.60500.20.405100.20.40.605deg/s01201deg/s0.010.1105deg/sUNC0.51248\u2212100Log priorSTD a.\u2212100STDUNC0.010.1110KL b.s0 c.05\fReferences\n[1] Geisler, W. S. (2011) Contributions of ideal observer theory to vision research. Vision Res 51, 771\u2013781.\n[2] Knill, D. C. & Richards, W. (1996) Perception as Bayesian inference. (Cambridge University Press).\n[3] Trommersh\u00a8auser, J., Maloney, L., & Landy, M. (2008) Decision making, movement planning and statis-\n\ntical decision theory. Trends Cogn Sci 12, 291\u2013297.\n\n[4] Pouget, A., Beck, J. M., Ma, W. J., & Latham, P. E. (2013) Probabilistic brains: knowns and unknowns.\n\nNat Neurosci 16, 1170\u20131178.\n\n[5] Maloney, L., Mamassian, P., et al. (2009) Bayesian decision theory as a model of human visual perception:\n\ntesting Bayesian transfer. Vis Neurosci 26, 147\u2013155.\n\n[6] Vilares, I., Howard, J. D., Fernandes, H. L., Gottfried, J. A., & K\u00a8ording, K. P.\n\nrepresentations of prior and likelihood uncertainty in the human brain. Curr Biol 22, 1641\u20131648.\n\n(2012) Differential\n\n[7] K\u00a8ording, K. P. & Wolpert, D. M.\n\n244\u2013247.\n\n(2004) Bayesian integration in sensorimotor learning. Nature 427,\n\n[8] Girshick, A., Landy, M., & Simoncelli, E. (2011) Cardinal rules: visual orientation perception re\ufb02ects\n\nknowledge of environmental statistics. Nat Neurosci 14, 926\u2013932.\n\n[9] Stocker, A. A. & Simoncelli, E. P. (2006) Noise characteristics and prior expectations in human visual\n\nspeed perception. Nat Neurosci 9, 578\u2013585.\n\n[10] Sanborn, A. & Grif\ufb01ths, T. L. (2008) Markov chain monte carlo with people. Adv Neural Inf Process Syst\n\n20, 1265\u20131272.\n\n[11] Chalk, M., Seitz, A., & Seri`es, P. (2010) Rapidly learned stimulus expectations alter perception of motion.\n\nJ Vis 10, 1\u201318.\n\n[12] Acerbi, L., Wolpert, D. M., & Vijayakumar, S. (2012) Internal representations of temporal statistics and\n\nfeedback calibrate motor-sensory interval timing. PLoS Comput Biol 8, e1002771.\n\n[13] Houlsby, N. M., Husz\u00b4ar, F., Ghassemi, M. M., Orb\u00b4an, G., Wolpert, D. M., & Lengyel, M. (2013) Cognitive\n\ntomography reveals complex, task-independent mental representations. Curr Biol 23, 2169\u20132175.\n\n[14] Acerbi, L., Vijayakumar, S., & Wolpert, D. M. (2014) On the origins of suboptimality in human proba-\n\nbilistic inference. PLoS Comput Biol 10, e1003661.\n\n[15] K\u00a8ording, K. P. & Wolpert, D. M. (2004) The loss function of sensorimotor learning. Proc Natl Acad Sci\n\nU S A 101, 9839\u20139842.\n\n[16] Gekas, N., Chalk, M., Seitz, A. R., & Seri`es, P. (2013) Complexity and speci\ufb01city of experimentally-\n\ninduced expectations in motion perception. J Vis 13, 1\u201318.\n\n[17] Jones, M. & Love, B. (2011) Bayesian Fundamentalism or Enlightenment? On the explanatory status and\n\ntheoretical contributions of Bayesian models of cognition. Behav Brain Sci 34, 169\u2013188.\n\n[18] Bowers, J. S. & Davis, C. J. (2012) Bayesian just-so stories in psychology and neuroscience. Psychol\n\nBull 138, 389.\n\n[19] Mamassian, P. & Landy, M. S. (2010) It\u2019s that time again. Nat Neurosci 13, 914\u2013916.\n[20] Simoncelli, E. P. (2009) in The Cognitive Neurosciences, ed. M, G. (MIT Press), pp. 525\u2013535.\n[21] Knill, D. C. (2003) Mixture models and the probabilistic structure of depth cues. Vision Res 43, 831\u2013854.\n[22] Anderson, J. R. (1978) Arguments concerning representations for mental imagery. Psychol Rev 85, 249.\n[23] Navarro, D. J., Pitt, M. A., & Myung, I. J. (2004) Assessing the distinguishability of models and the\n\ninformativeness of data. Cognitive Psychol 49, 47\u201384.\n\n[24] Jazayeri, M. & Shadlen, M. N. (2010) Temporal context calibrates interval timing. Nat Neurosci 13,\n\n1020\u20131026.\n\n[25] Tassinari, H., Hudson, T., & Landy, M. (2006) Combining priors and noisy visual cues in a rapid pointing\n\ntask. J Neurosci 26, 10154\u201310163.\n\n[26] Natarajan, R., Murray, I., Shams, L., & Zemel, R. S. (2009) Characterizing response behavior in multi-\n\nsensory perception with con\ufb02icting cues. Adv Neural Inf Process Syst 21, 1153\u20131160.\n\n[27] Carreira-Perpi\u02dcn\u00b4an, M. A. (2000) Mode-\ufb01nding for mixtures of gaussian distributions. IEEE T Pattern\n\nAnal 22, 1318\u20131323.\n\n[28] Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002) Bayesian measures of model\n\ncomplexity and \ufb01t. J R Stat Soc B 64, 583\u2013639.\n\n[29] Hedges, J. H., Stocker, A. A., & Simoncelli, E. P.\n\ncoherence of visual motion stimuli. J Vis 11, 14, 1\u201316.\n\n(2011) Optimal inference explains the perceptual\n\n[30] Kwon, O. S. & Knill, D. C. (2013) The brain uses adaptive internal models of scene statistics for senso-\n\nrimotor estimation and planning. Proc Natl Acad Sci U S A 110, E1064\u2013E1073.\n\n9\n\n\f", "award": [], "sourceid": 615, "authors": [{"given_name": "Luigi", "family_name": "Acerbi", "institution": "New York University"}, {"given_name": "Wei Ji", "family_name": "Ma", "institution": "New York University"}, {"given_name": "Sethu", "family_name": "Vijayakumar", "institution": "University of Edinburgh"}]}