{"title": "Efficient inference for time-varying behavior during learning", "book": "Advances in Neural Information Processing Systems", "page_first": 5695, "page_last": 5705, "abstract": "The process of learning new behaviors over time is a problem of great interest in both neuroscience and artificial intelligence. However, most standard analyses of animal training data either treat behavior as fixed or track only coarse performance statistics (e.g., accuracy, bias), providing limited insight into the evolution of the policies governing behavior. To overcome these limitations, we propose a dynamic psychophysical model that efficiently tracks trial-to-trial changes in behavior over the course of training. Our model consists of a dynamic logistic regression model, parametrized by a set of time-varying weights that express dependence on sensory stimuli as well as task-irrelevant covariates, such as stimulus, choice, and answer history. Our implementation scales to large behavioral datasets, allowing us to infer 500K parameters (e.g. 10 weights over 50K trials) in minutes on a desktop computer. We optimize hyperparameters governing how rapidly each weight evolves over time using the decoupled Laplace approximation, an efficient method for maximizing marginal likelihood in non-conjugate models. To illustrate performance, we apply our method to psychophysical data from both rats and human subjects learning a delayed sensory discrimination task. The model successfully tracks the psychophysical weights of rats over the course of training, capturing day-to-day and trial-to-trial fluctuations that underlie changes in performance, choice bias, and dependencies on task history. Finally, we investigate why rats frequently make mistakes on easy trials, and suggest that apparent lapses can be explained by sub-optimal weighting of known task covariates.", "full_text": "Ef\ufb01cient inference for time-varying behavior\n\nduring learning\n\nNicholas A. Roy1 Ji Hyun Bak2 Athena Akrami1,3,\u2217\n\nCarlos D. Brody1,3,4 Jonathan W. Pillow1,5\n\n1Princeton Neuroscience Institute, Princeton University\n\n2Korea Institute for Advanced Study 3Howard Hughes Medical Institute\n4Dept. of Molecular Biology, 5Dept. of Psychology, Princeton University\n\n\u2217current address at Sainsbury Wellcome Centre, UCL\n\n{nroy,brody,pillow}@princeton.edu,\n\njhbak@kias.re.kr, athena.akrami@ucl.ac.uk\n\nAbstract\n\nThe process of learning new behaviors over time is a problem of great interest in\nboth neuroscience and arti\ufb01cial intelligence. However, most standard analyses of\nanimal training data either treat behavior as \ufb01xed or track only coarse performance\nstatistics (e.g., accuracy, bias), providing limited insight into the evolution of the\npolicies governing behavior. To overcome these limitations, we propose a dynamic\npsychophysical model that ef\ufb01ciently tracks trial-to-trial changes in behavior over\nthe course of training. Our model consists of a dynamic logistic regression model,\nparametrized by a set of time-varying weights that express dependence on sensory\nstimuli as well as task-irrelevant covariates, such as stimulus, choice, and answer\nhistory. Our implementation scales to large behavioral datasets, allowing us to infer\n500K parameters (e.g., 10 weights over 50K trials) in minutes on a desktop com-\nputer. We optimize hyperparameters governing how rapidly each weight evolves\nover time using the decoupled Laplace approximation, an ef\ufb01cient method for max-\nimizing marginal likelihood in non-conjugate models. To illustrate performance,\nwe apply our method to psychophysical data from both rats and human subjects\nlearning a delayed sensory discrimination task. The model successfully tracks the\npsychophysical weights of rats over the course of training, capturing day-to-day\nand trial-to-trial \ufb02uctuations that underlie changes in performance, choice bias,\nand dependencies on task history. Finally, we investigate why rats frequently\nmake mistakes on easy trials, and suggest that apparent lapses can be explained by\nsub-optimal weighting of known task covariates.\n\n1\n\nIntroduction\n\nA vast swath of modern neuroscience research requires training animals to perform speci\ufb01c tasks. This\ntraining is expensive and time-consuming, yet the data collected during the training period are often\ndiscarded from analysis. Moreover, animals can learn at vastly different rates, and may learn different\nstrategies to achieve a criterion level of performance in a given task. Most neuroscience studies ignore\nsuch variability, and commonly track only coarse statistics like accuracy and bias during training.\nThese statistics are not suf\ufb01cient to reveal subtle differences in strategy, such as unequal weighting of\ntask variables or reliance on particular aspects of trial history. However, behavior collected during\ntraining may provide valuable insights into an animal\u2019s mental arsenal of problem solving strategies,\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: Model schematic. (a) On each trial, a variety of both task-related and task-irrelevant\nvariables may affect an animal\u2019s choice behavior. We call the carrier vector of all K input variables\non a particular trial gt. (b) As the animal trains on the task, psychophysical weights wt evolve with\nindependent Gaussian noise, altering how strongly each variable affects behavior. (c) The probability\nof \u201cright\u201d given the input is described by a logistic function of gt \u00b7 wt.\n\nand uncover how those strategies evolve with experience. Understanding detailed differences in\nbehavior may shed light on differences in neural activity across animals or task conditions, reveal\ngeneral aspects of behavior, or inspire the development of new learning algorithms [1].\nOne reason training data are frequently ignored is a lack of good methods for tracking behavior during\ntraining, or for tracking continued learning after the dedicated training phase has ended. Of the few\napproaches to characterizing time-varying psychophysical behavior, perhaps the simplest is to apply\nstandard logistic regression to separate blocks of trials. While useful in certain speci\ufb01c situations,\nthere are numerous drawback to such a blocking approach, including: the need to choose a block\nsize, the removal of dependencies between adjacent blocks, and the inability to track \ufb01ner time-scale\nchanges within a block. Such an approach also assumes that there is a single timescale at which all\npsychophysical weights vary. Smith and Brown introduced an assumed density \ufb01ltering method for\ntracking psychophysical performance on a trial-by-trial basis [2]. This approach of explicitly tracking\na parameter to determine the earliest time at which statistically signi\ufb01cant learning had occurred\nduring training has been extended in various contexts [3, 4]. Here we propose an alternate approach\nbased on exact MAP estimation of time-varying psychophysical weights, with ef\ufb01cient and scalable\nmethods for inferring hyperparameters governing the timescale of changes for different weights.\nIn this paper, we present a dynamic logistic regression model for time-varying psychophysical\nbehavior. Our model quanti\ufb01es animal behavior at single-trial resolution, allowing for intuitive\nvisualization of learning dynamics and direct analysis of psychophysical weight trajectories. We\ndevelop ef\ufb01cient inference methods that exploit sparse structure in order to scale to large datasets with\nhigh-dimensional, time-varying psychophysical weights. Moreover, we use the decoupled Laplace\napproximation method [5] to perform highly ef\ufb01cient approximate maximum marginal likelihood\ninference for a set of hyperparameters governing the rates of change for different psychophysical\nweights. We apply our method to a large behavioral data set of rats demonstrating a variety of\nconstantly evolving complex behaviors over tens of thousands of trials, as well as human subjects with\nsigni\ufb01cantly more stable behavior. We compare the predictions of our model to conventional measures\nof behavior, and conclude with an analysis of lapses on perceptually easy trials to demonstrate the\nmodel\u2019s explanatory power. We expect that our method will provide immediate practical bene\ufb01t\nto trainers, in addition to giving unprecedented insight into the development of new behaviors. An\nimplementation of all methods are available as the Python package PsyTrack [6].\n\n2 Dynamic logistic regression model\n\nHere we describe our dynamic model for time-varying psychophysical behavior. We consider a\ngeneral two-alternative forced choice (2AFC) sensory discrimination task in which the animal is\npresented with a stimulus xt \u2208 Rd, and makes a choice yt \u2208 {0, 1} between two options that we will\nrefer to as \u201cleft\u201d and \u201cright\u201d (although the method can be extended to multi-choice tasks [7]).\nWe model the animal\u2019s behavior as depending on an internal model parametrized by a set of weights\nwt \u2208 RK that govern how the animal\u2019s choice depends on an input \u201ccarrier\u201d vector gt \u2208 RK for the\ncurrent trial t (Fig. 1a,b). This carrier gt contains the task stimuli xt for the current trial, as well as a\n\n2\n\n1dynamic weightsfeature vectorBiasStim AStim Bt-2t-1tt+1tTrial #BiasStim AStim B(a)(b)(c)\fvariety of additional covariates (e.g., stimulus, choice, or answer history over the preceding one or\nmore trials), and a constant \u201c1\u201d to capture bias towards one choice or the other (see Sec. S1 for more\ndetail). Empirically, animal behavior in early training often exhibits dependencies on both stimulus\nand choice history [8\u201310]; including these features is therefore critical to building an accurate model\nof the animal\u2019s evolving psychophysical strategy (we return to this with a study of lapses in Sec. 6).\nGiven the weight and carrier vectors, the animal\u2019s choice behavior on a given trial is described by a\nBernoulli generalized linear model (GLM), also known as the logistic regression model (Fig. 1c):\n\np(yt | gt, wt) =\n\nexp(yt(gt \u00b7 wt))\n1 + exp(gt \u00b7 wt)\n\n.\n\n(1)\n\nUnlike standard psychophysical models, which assume weights are constant across trials and that\nbehavior is therefore constant, we instead assume that the weights evolve gradually through time. We\nmodel this evolution with independent Gaussian innovations noise added to the weights after each\ntrial [11, 12]:\n\nwt = wt\u22121 + \u03b7t, \u03b7t \u223c N (0, diag(\u03c32\n\n(2)\nwhere wt denotes the weight vector on trial t, \u03b7t is the noise added to the weights after the previous\nk denotes the variance of the noise for weight k, also known as the volatility hyperparameter.\ntrial, and \u03c32\nHere diag(\u03c32\nK) denotes a diagonal matrix with the volatility hyperparameters for each weight\nalong the main diagonal. We note that this choice of prior on w is largely agnostic, though more\nstructured priors could be considered.\n\n1, . . . , \u03c32\n\n1, . . . , \u03c32\n\nK)),\n\n3\n\nInference\n\nInference involves \ufb01tting the entire trajectory of the weights from the noisy response data collected\nover the course of experiment. This amounts to a very high-dimensional optimization problem when\nwe consider models with several weights and datasets with tens of thousands of trials. Moreover,\nwe wish to learn the volatility hyperparameters \u03c3k in order to determine how quickly each weight\nevolves across trials.\n\n3.1 Ef\ufb01cient global optimization for wMAP\n\nLet w denote the massive weight vector formed by concatenating all of the length-N trajectory\nvectors for each weight k = 1, . . . , K, where N is the total number of trials. We can then express the\nprior over the weights by noting that \u03b7 = Dw, where D is a block-diagonal matrix of K identical\nN \u00d7 N difference matrices (i.e., 1 on the diagonal and \u22121 on the lower off-diagonal). Because the\nprior on \u03b7 is simply N (0, \u03a3), where \u03a3 has each of the \u03c32\nk stacked N times along the diagonal, the\nprior for w is N (0, C) with C\u22121 = D(cid:62)\u03a3\u22121D. The log-posterior is then given by\n\n2 (log |C\u22121| \u2212 w(cid:62)C\u22121w) +(cid:80)N\n\nlog p(w|D) = 1\n\nt=1 log p(yt|gt, wt) + const,\n\n(3)\nwhere D = {(gt, yt)}N\nt=1 is the set of input carriers and responses, and const is independent of w.\nOur goal is to \ufb01nd the w that maximizes this log-posterior. With N K total parameters (potentially\n100\u2019s of thousands) in w, however, most procedures that perform a global optimization of all parame-\nters at once are not feasible; for example, related work has calculated trajectories by maximizing the\nlikelihood using local approximations [2]. Whereas the use of the Hessian matrix for second-order\nmethods often provides dramatic speed-ups, a Hessian of (N K)2 parameters is usually too large to\n\ufb01t in memory (let alone invert) for N > 1000 trials. On the other hand, we observe that the Hessian\nof our log-posterior is sparse:\n\n\u22022\n\nH =\n\n\u2202w2 log p(w|D) = C\u22121 +\n\n(4)\nwhere C\u22121 is a sparse (banded) matrix, and \u22022L/\u2202w2 is a block-diagonal matrix. The block diagonal\nstructure arises because the log-likelihood is additive over trials, and weights at one trial t do not\naffect the log-likelihood component from another trial t(cid:48). We take advantage of this sparsity, using a\nvariant of conjugate gradient optimization that only requires a function for computing the product of\nthe Hessian matrix with an arbitrary vector [13]. Since we can compute such a product using only\nsparse terms and sparse operations, we can utilize quasi-Newton optimization methods in SciPy to\n\ufb01nd a global optimum for our weights, even for very large N [14].\n\n\u22022L\n\u2202w2\n\n3\n\n\fAlgorithm 1 Optimizing hyperparameters with the decoupled Laplace approximation\nRequire: input carriers g, choices y\nRequire: initial hyperparameters \u03b80, subset of hyperparameters to be optimized \u03b8OPT\n1: repeat\n2:\n3:\n4:\n\nOptimize for w given current \u03b8 \u2212\u2192 wMAP, Hessian of log-posterior H\u03b8, log-evidence E\nDetermine Gaussian prior N (0, C\u03b8) and Laplace appx. posterior N (wMAP,\u2212H\u22121\n\u03b8 )\nCalculate Gaussian approximation to likelihood N (wL, \u0393) using product identity, where\n\u0393\u22121 = \u2212(H\u03b8 + C\u22121\nOptimize E w.r.t. \u03b8OPT using closed form update (with sparse operations)\nwMAP = \u2212H\u22121\nUpdate best \u03b8 and corresponding best E\n\n\u03b8 ) and wL = \u2212\u0393H\u03b8wMAP\n\n5:\n\n\u03b8 \u0393\u22121wL\n\n6:\n7: until \u03b8 converges\n8: return wMAP and \u03b8 with best E\n\n3.2 Hyperparameter \ufb01tting with the decoupled Laplace approximation\n\nSo far we have addressed the problem of \ufb01nding a global optimum for w given a speci\ufb01c hyperpa-\nrameter setting \u03b8 = {\u03c3k}; now we must also \ufb01nd the optimal hyperparameters. Cross-validation\nis not easily applied given the number of different volatility parameters, and so we turn instead to\napproximate marginal likelihood. To select between models with different \u03b8, we use a Laplace ap-\nproximation to the posterior, p(w|D, \u03b8) \u2248 N (w|wMAP,\u2212H\u22121), to estimate the marginal likelihood\n(or evidence) as [15]:\n\np(y|g, \u03b8) =\n\np(y|g, w) p(w|\u03b8)\n\np(w|D, \u03b8)\n\n\u2248 exp(L) \u00b7 N (w|0, C)\nN (w|wMAP,\u2212H\u22121)\n\n.\n\n(5)\n\nNaive optimization of \u03b8 requires a re-optimization of wMAP for every change in \u03b8, strongly restricting\nthe dimensionality of tractable \u03b8 to whatever could be explored with grid search; the simplest approach\nis to reduce all \u03c3k to a single \u03c3, as assumed in [16].\nHere we use the decoupled Laplace method [5] to avoid the need to re-optimize for our weight\nparameters after every update to our hyperparameters by making a Gaussian approximation to the\nlikelihood of our model. The optimization is explained in Algorithm1. By circumventing nested\noptimization of \u03b8 and w, we can consider larger sets of hyperparameters and more complex priors\nover our weights, while still \ufb01tting in minutes on a laptop (Fig. 2c). For example, letting each weight\nevolve with its own distinct \u03c3k often allows for both a more accurate \ufb01t to data and additional insight\ninto the dynamics (as in Fig. 3b). In practice, we also parametrize \u03b8 by \ufb01xing \u03c3k,t=0 = 16, an\narbitrary large value that allows the likelihood to determine w0 rather than forcing the weights to\ninitialize near some predetermined value.\n\n3.3 Overnight dynamics\n\nAnother speci\ufb01c parametrization of \u03b8 made possible by the decoupled Laplace method is the inclusion\nof an additional type of hyperparameter, \u03c3day, to modulate the change in weights occurring between\ntraining sessions. Intuitively, one might expect that between the last trial of a session and the\n\ufb01rst trial of the next session, change in behavior is greater than between trials that are consecutive\nwithin the same session. By indexing the \ufb01rst trial of each session, we can introduce a new set of\nhyperparameters {\u03c3k,day} which we can then optimize to account for the between-session changes\nwithin each weight.\nWhereas all 2\u00b7K hyperparameters in \u03b8 = {\u03c31, . . . , \u03c3K, \u03c31,day, . . . , \u03c3K,day} can have distinct values\nin the most \ufb02exible version of the model, there are certain optional constraints that may be more\nrelevant to animal behavior. For example, when both the {\u03c3k} and {\u03c3k,day} are \ufb01xed to be very small,\nit means that weights effectively do not change, replicating the standard logistic regression model\nwith constant weights. On the other hand, when \ufb01xing {\u03c3k} to be very small and {\u03c3k,day} to be very\nlarge, we would recover a different set of constant weights for each session, replicating a particular\nblocked approach to logistic regression discussed earlier. By only \ufb01xing the {\u03c3k,day} to be large\nwhile optimizing freely over each of {\u03c3k}, we essentially \ufb01nd the best weight trajectory within each\nsession, while allowing the weights to \u201creset\u201d at the start of each new session. The decoupled Laplace\n\n4\n\n\fFigure 2: Recovering weights and hyperparameters from simulated data. (a) Generating 20 behavioral\nrealizations (y\u2019s) from one simulated set of K = 4 weights (in bold), we recover 20 sets of weights\n(faded). Observe that the recovered weights closely track the real weights in all realizations. (b)\nThe hyperparameters \u03c3k recovered for each weight over 20 distinct simulations, as a function of\nnumber of trials. Note that with more trials, the recovered \u03c3k converge to the true \u03c3k (dotted black\nline). (c) Average computation time for full optimization of weights and hyperparameters for a single\nrealization. Even with tens of thousands of trials, this model can be \ufb01t in minutes on a laptop.\n\nmethod makes it feasible to optimize over any subset of these hyperparameters at once, allowing\nexploration of many types of models and the localization of behavioral dynamics to speci\ufb01c weights\nor periods of training.\n\n4 Simulation results\n\nWe \ufb01rst demonstrate our method using simulated data. We generate K = 4 weight trajectories over\nN = 64, 000 trials, simulating each as a Gaussian random walk with variance \u03c32\nk and a re\ufb02ecting\nboundary at \u00b14. For each trial, we then drew the carrier vector gt from a standard normal, calculated\nP (Right), and used this probability to sample a choice yt. Since our model is probabilistic (Eq. 1),\nwe can draw many behavioral realizations (y\u2019s) from the same \u201ctrue\u201d weight trajectories. Our method\nnot only accurately estimates the weight trajectories across realizations (Fig. 2a), but also recovers\nthe hyperparameter \u03c3k for each weight across many different simulations (Fig. 2b). We also tested\nthe scalability of the method over increasing number of trials (Fig. 2c). We note that having more\nthan 64K trials for a single animal is highly unusual, and so \ufb01fteen minutes of computation time\non a laptop is a rough upper bound for most practical use; behavioral datasets commonly have only\na few thousand trials and can be \ufb01t in seconds. In order to con\ufb01rm the ef\ufb01cacy of our decoupled\nLaplace method in recovering the best setting of hyperparameters, we con\ufb01rm with grid search that\nthe algorithm converges on the hyperparameters with the highest evidence and highest cross-validated\nlog-likelihood on simulated data (see Fig. S1).\n\n5 Behavioral dynamics in rats & humans\n\nTo further explore the advantages and insights provided by our model, we apply our method to\nbehavioral data from both rats and humans performing a 2AFC delayed response task, as reported in\n[17]. The task involves the presentation of two auditory stimuli of different amplitude, separated by\na delay. If the \ufb01rst stimulus (Tone A) is louder than the second (Tone B), then the subject must go\nright to receive a reward, and vice-versa (Fig. 3a; for more detail see [17]). In our model, the \u201ccorrect\u201d\nset of weights for performing this task with high accuracy are a large, positive weight for Tone A,\nan equal and opposite weight for Tone B (the two sensitivities to stimuli), and zeros for all other\n(task-irrelevant) weights. We applied our method to early training data from 20 rats and 9 human\n\n5\n\n(a)(b)(c)\fFigure 3: Application to rat and human data. (a) For this data from [17], a 2AFC delayed response\ntask was used in which the subject experiences an auditory stimulus (Tone A) of a particular amplitude,\na delay period, a second auditory stimulus (Tone B) of a different amplitude, and \ufb01nally the choice\nto go either left or right. If Tone A was louder than Tone B, then a rightward choice receives a\nreward, and vice-versa. (b) The psychometric weights recovered from the \ufb01rst 20,000 trials of a rat.\nWeights in the legend labeled with a \u201c-1\u201d superscript indicate that the weight carries information\nfrom the previous trial. The faded vertical gray lines indicate session boundaries. In addition to being\n\ufb01t with its own trial-to-trial volatility hyperparameter \u03c3k, each weight is also \ufb01t with an additional\nhyperparameter \u03c3k,day for volatility between sessions. This results in \u201csteps\u201d at the session boundaries\nfor some weights (see Sec. 3.3). Each weight also has a 95% posterior credible interval, indicated by\nthe shaded region of matching color (for derivation refer to Sec. S2). (c) The psychometric weights\nrecovered from a human subject.\n\nsubjects to uncover how behavior evolved in this particular task. Here we show examples from one\nrat and one human subject (Figs. 3b,c); see Figs. S2 & S3 for analysis of additional rats and human\nsubjects.\n\n5.1 Rat data\n\nBehavior is highly dynamic in the case of a rat (Fig. 3b), re\ufb02ective of the animal\u2019s initial uncertainty\nabout the task structure and gradual honing of its behavioral strategy. First, we notice that the animal\nstarts naive: the initial strategy does not depend upon the two auditory stimuli at all, as both the Tone\nA & B weights (red & yellow) begin near 0. Instead, behavior is clearly in\ufb02uenced by the previous\ntrial: the weights on answer history (purple; preference to choose the side that was correct on the\nprevious trial, or \u201cwin-stay/lose-switch\u201d) and on choice history (green; preference to choose the same\nside as on the previous trial, or \u201cperseverance\u201d) both dominate initially. There is also an overall\ntendency to choose left, as indicated by the negative bias weight (blue). As training progresses, both\nthe bias and dependencies on task history steadily decrease, suggesting that the rat is learning the\ntask structure.\nSecond, we can compare the evolution of the weights on Tone A vs. Tone B. The sensitivity to the\nvalue of Tone B is developed very early in training, and quickly grows to a large negative value\n(preference to go left when Tone B is loud). In contrast, the sensitivity to Tone A stays close to zero\nfor many thousands of trials before growing to have a large positive value (preference to go right\n\n6\n\nTone ATone BRightLeftTrial StartTone ADelayTone BChoice(a)(b)(c)BiasAvg. Tone -1Tone BTone AAnswer -1Choice -1025005000750010000125001500017500\u2212101Weight02505007501000125015001750Trial #\u22123\u22122\u221210123Weight\fFigure 4: Comparing to empirical metrics. (a) The empirical accuracy of the rat in red, with a 95%\ncon\ufb01dence interval indicated by the shaded region. We overlay the predicted accuracy from model\nweights in maroon, using P (Correct) for each trial instead of the empirical {0, 1}. (b) The empirical\nbias of the rat, represented as the correct side minus the animal\u2019s choice for each trial, where {Left,\nRight} = {0,1}. We plot a 95% con\ufb01dence interval indicated by the shaded region, as well as the\npredicted bias from model weights, substituting P (Right) for the animal\u2019s choice. All lines are\nsmoothed with a Gaussian kernel of \u03c3 = 50. Predicted performance and bias are calculated using\ncross-validated weights (calculations and cross-validation procedure detailed in Secs. S3 & S4).\n\nwhen Tone A is loud). Again, this observation is consistent with the intuition that associative learning\nis stronger for the most recent stimulus. The temporal separation of Tone A from the choice not\nonly makes it more dif\ufb01cult to learn the association, but also makes leveraging knowledge of that\nassociation more dif\ufb01cult since the rat must work to maintain information about Tone A in working\nmemory [18, 17]. Despite this, we see that the animal ultimately develops weights of equal magnitude\nand opposite signs for the two stimuli, again demonstrating successful learning.\nFinally, we observe a small but signi\ufb01cant sensitivity to the previous trial\u2019s stimuli (pink); the positive\nvalue indicates a preference to go right when the average of Tones A & B on the previous trial was\nhigher. This recon\ufb01rms the dependence of choice behavior on sensory history found in [17].\n\n5.2 Human subjects\n\nIn contrast to the rat, the weight trajectories for the human subject are largely stable and re\ufb02ect\naccurate behavioral performance (Fig. 3c); not much learning is happening. This is expected, as a\nhuman subject can understand the task structure and execute the correct behavioral strategy from\nthe very \ufb01rst trial. We emphasize that the strength of our model is not only its \ufb02exibility to \ufb01t the\ndynamic behavior of the rat, but also to automatically detect and con\ufb01rm the stable behavior of the\nhuman. While the human dataset is stable enough to be \ufb01tted using standard logistic regression, it\nwould require starting from the assumption that behavior was indeed stable.\nOur method also allows several interesting observations regarding the types of decision-making\nbiases a human subject might possess. For example, there is a non-zero choice bias (blue) with a\nslow \ufb02uctuation, that tends leftward over most of the session. Also, while the weights for Tones A &\nB are clearly the two largest, the magnitude of the Tone B weight is consistently larger, indicating\na higher sensitivity to the more recent stimulus. Furthermore, the weight on sensory history (pink)\nis non-vanishing, once again corroborating the \ufb01ndings of [17]; whereas behavior was even better\nexplained without the weights on answer and choice history (see Sec. S1 for more detail).\n\n5.3 Comparison to conventional measures\n\nFinally, we ask how well our model actually describes the animal\u2019s choice behavior. To this end,\nwe relate our model back to more conventional measures of behavior, considering two important\nmeasures most commonly used by a trainer: the empirical accuracy (Fig. 4a) and the empirical\nbias (Fig. 4b). The empirical accuracy tracks the local fraction of trials with a correct response,\nwhereas the empirical bias captures the tendency to prefer one side on error trials (see Sec. S3 for\ndetails). The two measures were then compared with the corresponding quantities predicted by our\n\n7\n\n(a)(b)\u2212101WeightRat W080Tone ATone BBiasAvg. Tone\u22121Answer\u22121Choice\u221210.40.60.81.0AccuracyEmpiricalPredicted025005000750010000125001500017500Trial #\u22120.250.000.25BiasEmpiricalPredicted\fFigure 5: Psychometric curve and model predictions. (a) A conventional psychometric curve for a\nrat, generated from a subset of trials at the end of training where Tone B is held constant. (b) The\nsame rat during early training, with two models: the basic model with stimuli and bias weights only\n(red), and the history-aware model (black). The histograms (with right-side axis) show the number\nof trials within each g \u00b7 w bin. The dots (with left-side axis) plot the fraction of trials, within each\nbin, in which the rat went right (yt = 1). (c) For all 20 rats in the population, we plot the predicted\naccuracy vs. empirical accuracy for the top 1, 5, & 10% of most strongly predicted trials in the basic\nmodel (red). (d) Same as (c) but for the history-aware model (black).\n\nmodel. The close match between empirical and predicted performance validates the model\u2019s ability\nto capture the animal\u2019s true dynamic strategy, in addition to the already-demonstrated success of our\ninference method to \ufb01nd the best weights and hyperparameters given the model. It also emphasizes\nthat our analysis provides highly interpretable measures that could successfully replace (and extend)\nconventional training evaluators.\n\n6 Exploring lapse\n\nLooking at the psychometric curve of a single rat from its end-of-training data (Fig. 5a), it is clear that\nthe rat does not achieve perfect performance even on the easiest trials (left/right ends of the stimulus\naxis). This gap in performance is particularly common in rodent data and there is much speculation\nas to its cause [19]. One possible hypothesis is that the trials are not easy enough, and that perfect\nperformance would be achieved on suf\ufb01ciently easy trials; in other words, this is to explain the gap\nas a result of insuf\ufb01cient sensitivity to task stimuli [20\u201322]. An alternative hypothesis is that there\nexists a so-called \"lapse rate\" inherent in the animal\u2019s behavior, for example as an effect from an\n\u0001-greedy strategy where the animal makes a completely random choice on a certain fraction of trials,\nperhaps for exploratory purposes. Our analysis of the rat data can provide an answer to the debate, as\nit captures the behavior of the animal precisely enough to predict, not just describe.\nTo explore the predictive power of our method further, we look at two distinct models in Fig. 5b: the\nbasic model (in red) has dynamic weights only on the task stimuli (Tones A & B) and choice bias,\nwhile the history-aware model (in black) has additional weights for various history dependencies. On\nthe x-axis of Fig. 5b, we have binned all trials of our rat according to their gt \u00b7 wt values. Recall that\nin our model, larger magnitudes of g \u00b7 w result in more con\ufb01dent predictions, with predicted choice\nprobabilities closer to 0 or 1. We see that the empirical probability of choosing right within each bin\nof g \u00b7 w (plotted in dots) matches the predicted probability according to the logistic function of Eq. 1\n(plotted as faded gray curve). We then plotted the number of trials in each g \u00b7 w bin in the histograms\n\n8\n\n(c)(d)(b)(a)\f(with right-side axis). We see that for our basic model, the trial predictions are never more con\ufb01dent\nthan 90% (no tails on the red histogram), whereas our history-aware model has a substantial portion\nof trials predicted with almost 100% con\ufb01dence (longer tails on the black histogram). In terms of the\ncross-validated log-likelihood, the history-aware model provides a 20% boost over the basic model.\nAll model predictions are calculated on held-out data; see Sec. S4 for details.\nFinally, we directly compare the model-predicted accuracy to the empirical accuracy, across all 20\nrats, for both the basic model (Fig. 5c) and the history-aware model (Fig. 5d). Sorting trials by their\npredicted accuracy, we plot the top 1, 5, & 10% of trials in each rat and \ufb01nd that almost all rats have\na signi\ufb01cant proportion of their trials being predicted with >95% accuracy in the history-aware model\n(Fig. 5d).\nWe thus demonstrated that our method can predict the rats\u2019 choice behavior with near-perfect accuracy\non a signi\ufb01cant subset of trials. This \ufb01nding contradicts the hypothesis postulating an inherent lapse\nrate, where the animal is making random choices on a subset of trials (where such randomness would\nprevent prediction above a certain accuracy). While random choice is a well-established behavioral\nstrategy seen in many experimental settings, our method allows for critical disambiguation between\ntrue randomness and deterministic strategies that may appear random [23]. Our method, with the full\nhistory-aware model, is able to quantify and explain gaps in performance typically left unexplained\nby conventional analyses.\n\n7 Discussion\n\nWe presented a method for ef\ufb01ciently and \ufb02exibly characterizing the dynamics of psychophysical\nbehavior, allowing for unprecedented insight into how animals learn new tasks. We have made key\nadvancements with regard to both ef\ufb01ciency and scalability, allowing us to quickly \ufb01t a complex,\ntrial-to-trial description of behavior for even the largest of datasets. We demonstrated on a real dataset\n(of unusually large size) the explanatory as well as predictive power of our method, as compared\nto two conventional measures of behavior. In particular, the \ufb02exibility of the model allowed us to\naddress an important open question in behavioral psychology, known as lapse.\nOur approach is developed under a simple generic model of psychometric behavior, which worked\nnicely for the datasets we analyzed in this paper. Here we brie\ufb02y discuss two aspects of the model\nthat may be extended in a future study, potentially to address speci\ufb01c features of different tasks. First,\nwhile the weight trajectories are allowed to evolve over time, the volatility hyperparameter \u03c3 is a\nsingle value optimized over the entire dataset. When analyzing a long trajectory, it may be necessary\nto also allow \u03c3 to slowly vary over time, so that the dynamics of early training and the stability of late\ntraining can be explained separately. Including more complex parameterizations of the prior, such as\nthe overnight \u03c3day described in Sec. 3.3, may also provide a practical solution in modeling sudden,\nstep-like changes in behavior. Second, the success of our method is fundamentally dependent on the\nability of the psychometric model to correctly describe the animal\u2019s behavior. Different tasks may\nrequire more careful modeling of certain aspects of the choice behavior. In particular, our model only\napplies to 2-alternative forced choice tasks in its current form, though there is a clear extension to\nmulti-alternative choice [7]. Despite these limitations, we expect the agnostic \ufb02exibility, explanatory\npower, and computational ef\ufb01ciency of our method to make it a useful tool for exploring behavioral\ndynamics. Our Python package PsyTrack should make the analysis easily accessible [6].\nOur method can be easily applied to the vast troves of largely unanalyzed animal training data\nto provide both scienti\ufb01c insight and practical utility. The immediate applicability, as a potential\neveryday tool for scientists-trainers, places our method a signi\ufb01cant step forward from previous works\nthat offered theoretical paths [16]. At the lowest level, this method allows trainers to stay aware\nof the behavioral strategies developed by their animals, useful for identifying common pitfalls and\ndisentangling distinct strategies that may appear similar on the surface. Furthermore, while many\ntrainers are already using various automated heuristics during training, the output of our method can\nbe used as a more speci\ufb01c and accurate input to such heuristics. By enabling a quantitative feedback\nloop where the trainer can (i) diagnose a problem, (ii) prescribe an adaptively optimized training\nprogram to correct it, and (iii) monitor the consequence of that correction, we feel that our method\nwill set a new standard for systematic animal training.\n\n9\n\n\fAcknowledgements\n\nThis work was supported by grants from the Simons Foundation (SCGB AWD1004351 and\nAWD543027), the NIH (R01EY017366, R01NS104899) and a U19 NIH-NINDS BRAIN Initiative\nAward (NS104648-01).\n\nReferences\n[1] John W Krakauer, Asif A Ghazanfar, Alex Gomez-Marin, Malcolm A MacIver, and David\nPoeppel. Neuroscience needs behavior: correcting a reductionist bias. Neuron, 93(3):480\u2013490,\n2017.\n\n[2] Anne C Smith, Loren M Frank, Sylvia Wirth, Marianna Yanike, Dan Hu, Yasuo Kubota, Ann M\nGraybiel, Wendy A Suzuki, and Emery N Brown. Dynamic analysis of learning in behavioral\nexperiments. Journal of Neuroscience, 24(2):447\u2013461, 2004.\n\n[3] Wendy A Suzuki and Emery N Brown. Behavioral and neurophysiological analyses of dynamic\n\nlearning processes. Behavioral and cognitive neuroscience reviews, 4(2):67\u201395, 2005.\n\n[4] Michael J Prerau, Anne C Smith, Uri T Eden, Yasuo Kubota, Marianna Yanike, Wendy Suzuki,\nAnn M Graybiel, and Emery N Brown. Characterizing learning by simultaneous analysis of\ncontinuous and binary measures of performance. Journal of neurophysiology, 102(5):3060\u2013\n3072, 2009.\n\n[5] Anqi Wu, Nicholas A Roy, Stephen Keeley, and Jonathan W Pillow. Gaussian process based\nnonlinear latent structure discovery in multivariate spike train data. In Advances in Neural\nInformation Processing Systems, pages 3499\u20133508, 2017.\n\n[6] Nicholas A Roy, Ji Hyun Bak, and Jonathan W Pillow. PsyTrack: Open source dynamic\n\nbehavioral \ufb01tting tool for Python, 2018\u2013.\n\n[7] Ji Hyun Bak and Jonathan W. Pillow. Adaptive stimulus selection for multi-alternative psycho-\n\nmetric functions with lapses. Journal of Vision, 18(12):4, 2018.\n\n[8] Arman Abrahamyan, Laura Luz Silva, Steven C. Dakin, Matteo Carandini, and Justin L. Gardner.\nAdaptable history biases in human perceptual decisions. Proceedings of the National Academy\nof Sciences, 113(25):E3548\u2013E3557, 2016.\n\n[9] Laura Busse, Asli Ayaz, Neel T. Dhruv, Steffen Katzner, Aman B. Saleem, Marieke L.\nSch\u00f6lvinck, Andrew D. Zaharia, and Matteo Carandini. The detection of visual contrast\nin the behaving mouse. Journal of Neuroscience, 31(31):11351\u201311361, 2011.\n\n[10] Eun Jung Hwang, Jeffrey E Dahlen, Madan Mukundan, and Takaki Komiyama. History-based\n\naction selection bias in posterior parietal cortex. Nature Communications, 8(1):1242, 2017.\n\n[11] Yashar Ahmadian, Jonathan W Pillow, and Liam Paninski. Ef\ufb01cient markov chain monte carlo\n\nmethods for decoding neural spike trains. Neural Computation, 23(1):46\u201396, 2011.\n\n[12] Liam Paninski, Yashar Ahmadian, Daniel Gil Ferreira, Shinsuke Koyama, Kamiar Rahnama\nRad, Michael Vidne, Joshua Vogelstein, and Wei Wu. A new look at state-space models for\nneural data. Journal of computational neuroscience, 29(1-2):107\u2013126, 2010.\n\n[13] Jorge Nocedal and Stephen J Wright. Quasi-newton methods. Numerical optimization, pages\n\n135\u2013163, 2006.\n\n[14] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scienti\ufb01c tools for\n\nPython, 2001\u2013.\n\n[15] Maneesh Sahani and Jennifer F Linden. Evidence optimization techniques for estimating\nstimulus-response functions. In Advances in neural information processing systems, pages\n317\u2013324, 2003.\n\n10\n\n\f[16] Ji Hyun Bak, Jung Yoon Choi, Athena Akrami, Ilana Witten, and Jonathan W Pillow. Adaptive\noptimal training of animal behavior. In Advances in Neural Information Processing Systems,\npages 1947\u20131955, 2016.\n\n[17] Athena Akrami, Charles D Kopec, Mathew E Diamond, and Carlos D Brody. Posterior parietal\ncortex represents sensory history and mediates its effects on behaviour. Nature, 554(7692):368,\n2018.\n\n[18] Ranulfo Romo and Emilio Salinas. Cognitive neuroscience: \ufb02utter discrimination: neural codes,\n\nperception, memory and decision making. Nature Reviews Neuroscience, 4(3):203, 2003.\n\n[19] Joshua I Gold and Long Ding. How mechanisms of perceptual decision-making affect the\n\npsychometric function. Progress in neurobiology, 103:98\u2013114, 2013.\n\n[20] Nicolaas Prins. The psychometric function: The lapse rate revisited. Journal of Vision, 12(6):25\u2013\n\n25, 2012.\n\n[21] Jeffrey C Erlich, Bingni W Brunton, Chunyu A Duan, Timothy D Hanks, and Carlos D Brody.\nDistinct effects of prefrontal and parietal cortex inactivations on an accumulation of evidence\ntask in the rat. Elife, 4:e05457, 2015.\n\n[22] Benjamin B Scott, Christine M Constantinople, Jeffrey C Erlich, David W Tank, and Carlos D\nBrody. Sources of noise during accumulation of evidence in unrestrained and voluntarily\nhead-restrained rats. Elife, 4:e11308, 2015.\n\n[23] Samuel J Gershman. Uncertainty and exploration. bioRxiv, page 265504, 2018.\n\n11\n\n\f", "award": [], "sourceid": 2741, "authors": [{"given_name": "Nicholas", "family_name": "Roy", "institution": "Princeton Neuroscience Institute"}, {"given_name": "Ji Hyun", "family_name": "Bak", "institution": "KIAS"}, {"given_name": "Athena", "family_name": "Akrami", "institution": "Princeton University"}, {"given_name": "Carlos", "family_name": "Brody", "institution": "Princeton University"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}]}