{"title": "Prediction and Change Detection", "book": "Advances in Neural Information Processing Systems", "page_first": 1281, "page_last": 1288, "abstract": null, "full_text": "Prediction and Change Detection \n\n \n \n \n \n \n\nMark Steyvers \n\nmsteyver@uci.edu \n\nScott Brown \n\nscottb@uci.edu \n\nUniversity of California, Irvine University of California, Irvine \n\nIrvine, CA 92697 \n\nIrvine, CA 92697 \n\nAbstract \n\nWe measure the ability of human observers to predict the next datum \nin a sequence that is generated by a simple statistical process \nundergoing change at random points in time. Accurate performance \nin this task requires the identification of changepoints. We assess \nindividual differences between observers both empirically, and \nusing two kinds of models: a Bayesian approach for change detection \nand a family of cognitively plausible fast and frugal models. Some \nindividuals detect \ntoo many changes and hence perform \nsub-optimally due to excess variability. Other individuals do not \ndetect enough changes, and perform sub-optimally because they fail \nto notice short-term temporal trends. \n\n1 I n t r o d u c t i o n \n\nDecision-making often requires a rapid response to change. For example, stock \nanalysts need to quickly detect changes in the market in order to adjust investment \nstrategies. Coaches need to track changes in a player\u2019s performance in order to adjust \nstrategy. When tracking changes, there are costs involved when either more or less \nchanges are observed than actually occurred. For example, when using an overly \nconservative change detection criterion, a stock analyst might miss important \nshort-term trends and interpret them as random fluctuations instead. On the other \nhand, a change may also be detected too readily. For example, in basketball, a player \nwho makes a series of consecutive baskets is often identified as a \u201chot hand\u201d player \nwhose underlying ability is perceived to have suddenly increased [1,2]. This might \nlead to sub-optimal passing strategies, based on random fluctuations. \n\nWe are interested in explaining individual differences in a sequential prediction task. \nObservers are shown stimuli generated from a simple statistical process with the task \nof predicting the next datum in the sequence. The latent parameters of the statistical \nprocess change discretely at random points in time. Performance in this task depends \non the accurate detection of those changepoints, as well as inference about future \noutcomes based on the outcomes that followed the most recent inferred changepoint. \nThere is much prior research in statistics on the problem of identifying changepoints \n[3,4,5]. In this paper, we adopt a Bayesian approach to the changepoint identification \nproblem and develop a simple inference procedure to predict the next datum in a \nsequence. The Bayesian model serves as an ideal observer model and is useful to \ncharacterize the ways in which individuals deviate from optimality. \n\n\f \n\nThe plan of the paper is as follows. We first introduce the sequential prediction task \nand discuss a Bayesian analysis of this prediction problem. We then discuss the results \nfrom a few individuals in this prediction task and show how the Bayesian approach \ncan capture individual differences with a single \u201ctwitchiness\u201d parameter that \ndescribes how readily changes are perceived in random sequences. We will show that \nsome individuals are too twitchy: their performance is too variable because they base \ntheir predictions on too little of the recent data. Other individuals are not twitchy \nenough, and they fail to capture fast changes in the data. We also show how behavior \ncan be explained with a set of fast and frugal models [6]. These are cognitively \nrealistic models that operate under plausible computational constraints. \n\n2 A p r e d i c t i o n t a s k w i t h m u l t i p l e c h a n g e p o i n t s \n\nIn the prediction task, stimuli are presented sequentially and the task is to predict the \nnext stimulus in the sequence. After t trials, the observer has been presented with \nstimuli y1, y2, \u2026, yt and the task is to make a prediction about yt+1. After the prediction \nis made, the actual outcome yt+1 is revealed and the next trial proceeds to the \nprediction of yt+2. This procedure starts with y1 and is repeated for T trials. \n\nThe observations yt are D-dimensional vectors with elements sampled from binomial \ndistributions. The parameters of those distributions change discretely at random \npoints in time such that the mean increases or decreases after a change point. This \ngenerates a sequence of observation vectors, y1, y2, \u2026, yT, where each yt = {yt,1 \u2026 \nyt,D}. Each of the yt,d is sampled from a binomial distribution Bin(\u03b8t,d,K), so 0 \u2264 yt,d \u2264 \nK. The parameter vector \u03b8t ={\u03b8t,1 \u2026 \u03b8t,D} changes depending on the locations of the \nchangepoints. At each time step, \nx is a binary indicator for the occurrence of a \n\nt\n\nchangepoint occurring at time t+1. The parameter \u03b1 determines the probability of a \nchange occurring in the sequence. The generative model is specified by the following \nalgorithm: \n\n \n\n1. For d=1..D sample \u03b81,d from a Uniform(0,1) distribution \n\n2. For t=2..T, \n\n(a) Sample xt-1 from a Bernoulli(\u03b1) distribution \n\n(b) If xt-1=0, then \u03b8t=\u03b8t-1, else \n\nfor d=1..D sample \u03b8t,d from a Uniform(0,1) distribution \n\n(c) for d=1..D, sample yt from a Bin(\u03b8t,d,K) distribution \n\n \n\nTable 1 shows some data generated from the changepoint model with T=20, \u03b1=.1,and \nD=1. In the prediction task, y will be observed, but x and \u03b8 are not. \n\n \n\nt\n\nx\n\n\u03b8\n\ny\n\nTable 1: Example data \n\n1\n\n0\n\n2\n\n0\n\n3\n\n0\n\n4\n\n1\n\n5\n\n0\n\n6\n\n0\n\n7\n\n1\n\n8\n\n0\n\n9 10 11 12 13 14 15 16 17 18 19 20\n\n0\n\n0\n\n0\n\n0\n\n1\n\n0\n\n1\n\n0\n\n0\n\n0\n\n0\n\n0\n\n.68 .68 .68 .68 .48 .48 .48 .74 .74 .74 .74 .74 .74 .19 .19 .87 .87 .87 .87 .87\n8 \n\n7\n\n9\n\n1\n\n8\n\n3\n\n6\n\n7\n\n4\n\n9\n\n7\n\n8\n\n2\n\n9\n\n8\n\n8\n\n8\n\n4\n\n4\n\n9\n\n \n\n\f \n\n3 A B a y e s i a n p r e d i c t i o n m o d e l \n\nIn both our Bayesian and fast-and-frugal analyses, the prediction task is decomposed \ninto two inference procedures. First, the changepoint locations are identified. This is \nfollowed by predictive inference for the next outcome based on the most recent \nchangepoint locations. Several Bayesian approaches have been developed for \nchangepoint problems involving single or multiple changepoints [3,5]. We apply a \nMarkov Chain Monte Carlo (MCMC) analysis to approximate the joint posterior \ndistribution over changepoint assignments x while integrating out \u03b8. Gibbs sampling \nwill be used to sample from this posterior marginal distribution. The samples can then \nbe used to predict the next outcome in the sequence. \n\n3 . 1 \n\nI n f e r e n c e f o r c h a n g e p o i n t a s s i g n m e n t s . \n\nTo apply Gibbs sampling, we evaluate the conditional probability of assigning a \nchangepoint at time i, given all other changepoint assignments and the current \u03b1 value. \nBy integrating out \u03b8, the conditional probability is \n\n \n\nP x x\n\n|\n\n(\n\ni\n\n\u2212\n\n,\n\n,\ny\n\n\u03b1\n\ni\n\n)\n\n= \u222b\n\n(\nP x\n\ni\n\n,\n\n\u03b8 \u03b1\n\n,\n\n|\n\nx\n\n\u2212\n\n,\n\ny\n\n)\n\n \n\ni\n\n\u03b8\n\n(1) \n\n \n\nwhere \n\nx\n\n\u2212\n\ni\n\n represents all switch point assignments except xi. This can be simplified by \n\nconsidering the location of the most recent changepoint preceding and following time \ni and the outcomes occurring between these locations. Let\nn be the number of time \n\nL\n\ni\n\nsteps from the last changepoint up to and including the current time step i such that \n\nx\n\nL\ni n\ni\n\n\u2212\n\n=1 and \n\nx\n\ni n\n\u2212\n\nj\n\n+\n\nL\n\ni\n\n=0 for 0C. The parameter C governs the \ntwitchiness of the model predictions. If C is large, only very dramatic changepoints \nwill be detected, and the model will be too conservative. If C is small, the model will \nbe too twitchy, and will detect changepoints on the basis of small random fluctuations. \n\nPredictions are based on the most recent M observations, which are kept in memory, \nunless a changepoint has been detected in which case only those observations \noccurring after the changepoint are used for prediction. The prediction for time step \nt+1 is simply the mean of these observations, say p. Human observers were reticent to \nmake predictions very close to the boundaries. This was modeled by allowing the FF \nmodel to change its prediction for the next time step, yt+1, towards the mean prediction \n(0.5). This change reflects a two-way bet. If the probability of a change occurring is \n\u03b1, the best guess will be 0.5 if that change occurs, or the mean p if the change does not \noccur. Thus, the prediction made is actually yt+1=1/2 \u03b1+(1-\u03b1)p. Note that we do not \nallow perfect knowledge of the probability of a changepoint, \u03b1. Instead, an estimated \nvalue of \u03b1 is used based on the number of changepoints detected in the data series up \nto time t. \n\n\f \n\nThe FF model nests two simpler FF models that are psychologically interesting. If the \ntwitchiness threshold parameter C becomes arbitrarily large, the model never detects a \nchange and instead becomes a continuous running average model. Predictions from \nthis model are simply a boxcar smooth of the data. Alternatively, if we assume no \nmemory the model must based each prediction on only the previous stimulus (i.e., \nM=1). Above, in Figure 3, we labeled the complete FF model as FF1, the boxcar \nmodel as FF2 and the memoryless model FF3. \n\nFigure 3 showed that the complete FF model (FF1) fit the data from all observers \nsignificantly better than either the boxcar model (FF2) or the memoryless model \n(FF3). Exceptions were observers PH, DN and ML, for whom all three FF model fit \nequally well. This result suggests that our observers were (mostly) doing more than \njust keeping a running average of the data, or using only the most recent observation. \nThe FF1 model fit the data about as well as the Bayesian models for all observers \nexcept MY and MS. Note that, in general, the FF1 and Bayesian model fits are very \ngood: the average city block distance between the human data and the model \nprediction is around 0.75 (out of 10) buttons on both the x- and y-axes. \n\n6 C o n c l u s i o n \n\nWe used an online prediction task to study changepoint detection. Human observers \nhad to predict the next observation in stochastic sequences containing random \nchangepoints. We showed that some observers are too \u201ctwitchy\u201d: They perform \npoorly on the prediction task because they see changes where only random fluctuation \nexists. Other observers are not twitchy enough, and they perform poorly because they \nfail to see small changes. We developed a Bayesian changepoint detection model that \nperformed the task optimally, and also provided a good fit to human data when \nsub-optimal parameter settings were used. Finally, we developed a fast-and-frugal \nmodel that showed how participants may be able to perform well at the task using \nminimal information and simple decision heuristics. \n\nA c k n o w l e d g m e n t s \n\nWe thank Eric-Jan Wagenmakers and Mike Yi for useful discussions related to this \nwork. This work was supported in part by a grant from the US Air Force Office of \nScientific Research (AFOSR grant number FA9550-04-1-0317). \n\nR e f e r e n c e s \n\n[1] Gilovich, T., Vallone, R. and Tversky, A. (1985). The hot hand in basketball: on the \nmisperception of random sequences. Cognitive Psychology17, 295-314. \n\n[2] Albright, S.C. (1993a). A statistical analysis of hitting streaks in baseball. Journal of the \nAmerican Statistical Association, 88, 1175-1183. \n\n[3] Stephens, D.A. (1994). Bayesian retrospective multiple changepoint identification. Applied \nStatistics 43(1), 159-178. \n\n[4] Carlin, B.P., Gelfand, A.E., & Smith, A.F.M. (1992). Hierarchical Bayesian analysis of \nchangepoint problems. Applied Statistics 41(2), 389-405. \n\n[5] Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian \nmodel determination. Biometrika 82(4), 711-732. \n\n[6] Gigerenzer, G., & Goldstein, D.G. (1996). Reasoning the fast and frugal way: Models of \nbounded rationality. Psychological Review, 103, 650-669. \n\n\f", "award": [], "sourceid": 2946, "authors": [{"given_name": "Mark", "family_name": "Steyvers", "institution": null}, {"given_name": "Scott", "family_name": "Brown", "institution": null}]}