{"title": "Algorithm selection by rational metareasoning as a model of human strategy selection", "book": "Advances in Neural Information Processing Systems", "page_first": 2870, "page_last": 2878, "abstract": "Selecting the right algorithm is an important problem in computer science, because the algorithm often has to exploit the structure of the input to be efficient. The human mind faces the same challenge. Therefore, solutions to the algorithm selection problem can inspire models of human strategy selection and vice versa. Here, we view the algorithm selection problem as a special case of metareasoning and derive a solution that outperforms existing methods in sorting algorithm selection. We apply our theory to model how people choose between cognitive strategies and test its prediction in a behavioral experiment. We find that people quickly learn to adaptively choose between cognitive strategies. People's choices in our experiment are consistent with our model but inconsistent with previous theories of human strategy selection. Rational metareasoning appears to be a promising framework for reverse-engineering how people choose among cognitive strategies and translating the results into better solutions to the algorithm selection problem.", "full_text": "Algorithm selection by rational metareasoning as a\n\nmodel of human strategy selection\n\nFalk Lieder\n\nHelen Wills Neuroscience Institute, UC Berkeley\n\nfalk.lieder@berkeley.edu\n\nJessica B. Hamrick\n\nDepartment of Psychology, UC Berkeley\n\njhamrick@berkeley.edu\n\nDillon Plunkett\n\nDepartment of Psychology, UC Berkeley\ndillonplunkett@berkeley.edu\n\nStuart J. Russell\n\nEECS Department, UC Berkeley\nrussell@cs.berkeley.edu\n\nNicholas J. Hay\n\nEECS Department, UC Berkeley\nnickjhay@berkeley.edu\n\nThomas L. Grif\ufb01ths\n\nDepartment of Psychology, UC Berkeley\ntom griffiths@berkeley.edu\n\nAbstract\n\nSelecting the right algorithm is an important problem in computer science, be-\ncause the algorithm often has to exploit the structure of the input to be ef\ufb01cient.\nThe human mind faces the same challenge. Therefore, solutions to the algorithm\nselection problem can inspire models of human strategy selection and vice versa.\nHere, we view the algorithm selection problem as a special case of metareasoning\nand derive a solution that outperforms existing methods in sorting algorithm selec-\ntion. We apply our theory to model how people choose between cognitive strate-\ngies and test its prediction in a behavioral experiment. We \ufb01nd that people quickly\nlearn to adaptively choose between cognitive strategies. People\u2019s choices in our\nexperiment are consistent with our model but inconsistent with previous theories\nof human strategy selection. Rational metareasoning appears to be a promising\nframework for reverse-engineering how people choose among cognitive strategies\nand translating the results into better solutions to the algorithm selection problem.\n\n1\n\nIntroduction\n\nTo solve complex problems in real-time, intelligent agents have to make ef\ufb01cient use of their \ufb01nite\ncomputational resources. Although there are general purpose algorithms, particular problems can\noften be solved more ef\ufb01ciently by specialized algorithms. The human mind can take advantage of\nthis fact: People appear to have a toolbox of cognitive strategies [1] from which they choose adap-\ntively [2, 3]. How these choices are made is an important, open question in cognitive science [4].\nAt an abstract level, choosing a cognitive strategy is equivalent to the algorithm selection problem\nin computer science [5]: given a set of possible inputs I, a set of possible algorithms A, and a per-\nformance metric, \ufb01nd the selection mapping from I to A that maximizes the expected performance.\nHere, we draw on a theoretical framework from arti\ufb01cial intelligence\u2013rational metareasoning [6]\u2013\nand Bayesian machine learning to develop a mathematical theory of how people should choose\nbetween cognitive strategies and test its predictions in a behavioral experiment.\nIn the \ufb01rst section, we apply rational metareasoning to the algorithm selection problem and de-\nrive how the optimal algorithm selection mapping can be ef\ufb01ciently approximated by model-based\nlearning when a small number of features is predictive of the algorithm\u2019s runtime and accuracy. In\nSection 2, we evaluate the performance of our solution against state-of-the-art methods for sorting\n\n1\n\n\falgorithm selection. In Sections 3 and 4, we apply our theory to cognitive modeling and report a be-\nhavioral experiment demonstrating that people quickly learn to adaptively choose between cognitive\nstrategies in a manner predicted by our model but inconsistent with previous theories. We conclude\nwith future directions at the interface of psychology and arti\ufb01cial intelligence.\n\n2 Algorithm selection by rational metareasoning\n\nMetareasoning is the problem of deciding which computations to perform given a problem and a\ncomputational architecture [6]. Algorithm selection is a special case of metareasoning in which the\nchoice is limited to a few sequences of computations that generate a complete result. According\nto rational metareasoning [6], the optimal solution maximizes the value of computation (VOC).\nThe VOC is the expected utility of acting after having performed the computation (and additional\ncomputations) minus the expected utility of acting immediately. In the general case, determining\nthe VOC requires solving a Markov decision problem [7]. Yet, in the special case of algorithm\nselection, the hard problem of planning which computations to perform how often and in which\norder reduces to the simpler one-shot choice between a small number algorithms. We can therefore\nuse the following approximation to the VOC from [6] as the performance metric to be maximized:\n(1)\n(2)\nwhere a \u2208 A is one of the available algorithms, i \u2208 I is the input, S and T are the score and the\nruntime, and TC(T ) is the opportunity cost of running the algorithm for T units of time. The score S\ncan be binary (correct vs. incorrect output) or numeric (e.g., error penalty). The selection mapping\nm de\ufb01ned in Equation 2 depends on the conditional distributions of score and runtime (P (S|a, i)\nand P (T|a, i)). These distributions are generally unknown, but they can be learned. Learning an\napproximation to the VOC from experience, i.e. meta-level learning [6], is a hard technical challenge\nin the general case [8], but it is tractable in the special case of algorithm selection.\nLearning the conditional distributions of score and runtime separately for every possible input is\ngenerally intractable. However, in many domains the inputs are structured and can be approximately\nrepresented by a small number of features. The effect of the input on score and runtime is mediated\nby its features f = (f1(i),\u00b7\u00b7\u00b7 , fN (i)):\n\nVOC(a; i) \u2248 EP (S|a,i) [S] \u2212 EP (T|a,i) [TC(T )]\nm(i) = arg max\n\na\u2208A VOC(a; i),\n\ndistributions P (S|f1(i),\u00b7\u00b7\u00b7 , fN (i), a)\n\nP (S|a, i) = P (S|f , a) = P (S|f1(i),\u00b7\u00b7\u00b7 , fN (i), a)\nP (T|a, i) = P (T|f , a) = P (T|f1(i),\u00b7\u00b7\u00b7 , fN (i), a).\n\n(3)\n(4)\nand\nIf\nP (T|f1(i),\u00b7\u00b7\u00b7 , fN (i), a) have been learned,\nthen one can very ef\ufb01ciently compute an esti-\nmate of the expected value of applying the algorithm to a novel input. To learn the distributions\nP (S|f1(i),\u00b7\u00b7\u00b7 , fN (i), a) and P (T|f1(i),\u00b7\u00b7\u00b7 , fN (i), a) from examples, we assume simple para-\nmetric forms for these distributions and estimate their parameters from the scores and runtimes of\nthe algorithms on previous problem instances.\nAs a \ufb01rst approximation, we assume that\nfeatures f\nfurther assume that\n(f1(i),\u00b7\u00b7\u00b7 , fN (i), log(f1(i)),\u00b7\u00b7\u00b7 , log(fN (i))) and that the variance is independent of the mean:\n\nthe runtime of an algorithm on problems with\nis normally distributed with mean \u00b5(f ; a) and standard deviation \u03c3(f ; a). We\nin the extended features \u02dcf =\n\nthe mean is a 2nd order polynomial\n\nobservable\n\nfeatures\n\nand\n\nthe\n\nthe\n\nare\n\nP (T|f ; a, \u03b1) = N (\u00b5T (f ; a, \u03b1), \u03c3T (a))\n\n2(cid:88)\n\n\u00b7\u00b7\u00b7\n\n2\u2212(cid:80)N\u22121\ni=1 ki(cid:88)\n\n\u00b5T (f ; a, \u03b1) =\nP (\u03c3T (a)) = Gamma(\u03c3\u22122\n\nk1=0\n\nkN =0\n\n\u03b1k1,\u00b7\u00b7\u00b7 ,kN ;a \u00b7 \u02dcf k1\n\n1 \u00b7 . . . \u00b7 \u02dcf kN\n\nN\n\n(7)\nwhere \u03b1 are the regression coef\ufb01cients. Similarly, we model the probability that the algorithm re-\nturns the correct answer by a logistic function of a second order polynomial of the extended features:\n\nT ; 0.01, 0.01),\n\nP (S = 1|a, f , \u03b2) =\n\n1 + exp\n\n(cid:16)(cid:80)2\nk1=0 \u00b7\u00b7\u00b7(cid:80)2\u2212(cid:80)N\u22121\n\nkN =0\n\n1\n\ni=1 ki\n\n\u03b2k1,\u00b7\u00b7\u00b7 ,kN ;a \u00b7 \u02dcf k1\n\n1 \u00b7 . . . \u00b7 \u02dcf kN\n\nN\n\n2\n\n(5)\n\n(6)\n\n(8)\n\n(cid:17) ,\n\n\fwith regression coef\ufb01cients \u03b2. The conditional distribution of a continuous score can be modeled\nanalogously to Equation 5, and we use \u03b3 to denote its regression coef\ufb01cients.\nIf the time cost is linear in the runtime, i.e. TC(t) = c \u00b7 t for some constant c, then the value of\napplying the algorithm depends only on the expected values of runtime and score. For linear scores\n\nEP (S,T|a,i) [S \u2212 TC(T )] = \u00b5S(f (i); a, \u03b3) \u2212 c \u00b7 \u00b5T (f (i); a, \u03b1),\n\nand for binary scores\n\nEP (S,T|a,i) [S \u2212 TC(T )] = EP (\u03b2|s,a,i) [P (S = 1; i, \u03b2)] \u2212 c \u00b7 \u00b5T (f (i); a, \u03b1).\n\n(9)\n\n(10)\n\nWe approximated EP (\u03b2|s,a,i) [P (S = 1; i, \u03b2)] according to Equation 10 in [9].\nThus, the algorithm selection mapping m can be learned by estimating the parameters \u03b1 and \u03b2 or\n\u03b3. Our method estimates \u03b1 by Bayesian linear regression [10, 11]. When the score is binary, \u03b2\nis estimated by variational Bayesian logistic regression [9], and when the score is continuous, \u03b3 is\nestimated by Bayesian linear regression. For Bayesian linear regression, we use conjugate Gaussian\npriors with mean zero and variance one, so that the posterior distributions can be computed very\nef\ufb01ciently by analytic update equations. Given the posterior distributions on the parameters, we\ncompute the expected VOC by marginalization. When the score is continuous \u00b5S(f (i); a, \u03b3) is linear\nin \u03b3 and \u00b5T (f (i); a, \u03b1) is linear in \u03b1. Thus integrating out \u03b1 and \u03b3 with respect to the posterior yields\n\n(11)\nwhere \u00b5\u03b1|i,t and \u00b5\u03b3|i,s are posterior means of \u03b1 and \u03b3 respectively. This implies the following\nsimple solution to the algorithm selection problem:\n\nVOC(a; i) = \u00b5S\n\n(cid:0)f (i); a, \u00b5\u03b3|i,s\n(cid:0)\u00b5S(f (i); a, \u00b5\u03b3|itrain,strain\n\n(cid:0)f (i); a, \u00b5\u03b1|i,t\n\n(cid:1) \u2212 c \u00b7 \u00b5T\n(cid:1) \u2212 c \u00b7 \u00b5T (f (i); a, \u00b5\u03b1|itrain,ttrain )).\n\n(cid:1) ,\n\n(12)\n\na(i; c) = arg max\na\u2208A\n\nFor binary scores, the runtime component is predicted in exactly the same way, and a variational\napproximation to the posterior predictive density can be used for the score component [9].\nTo discover the best model of an algorithm\u2019s runtime and score, our method performs feature se-\nlection by Bayesian model choice [12]. We consider all possible combinations of the regressors\nde\ufb01ned above. To ef\ufb01ciently \ufb01nd the optimal set of features in this exponentially large model space,\nwe exploit that all models are nested within the full model. This allows us to ef\ufb01ciently compute\nBayes factors using Savage-Dickey ratios [13].\n\n3 Performance evaluation against methods for selecting sorting algorithms\n\nOur goal was to evaluate rational metareasoning not only against existing methods but also against\nhuman performance. To facilitate the comparison with how people choose between cognitive strate-\ngies, we chose to evaluate our method in the domain of sorting. Algorithm selection is relevant to\nsorting, because there are many sorting algorithms with very different characteristics. In sorting, the\ninput i is the list to be sorted. Conventional sorting algorithms are guaranteed to return the elements\nin correct order. Thus, the critical difference between them is in their runtimes, and runtime depends\nprimarily on the number of elements to be sorted and their presortedness. The number of elements\ndetermines the relative importance of the coef\ufb01cients of low (e.g., constant and linear) versus high\norder terms (e.g., n2, or n \u00b7 log(n)) whose weights differ between algorithms. Presortedness is im-\nportant because it determines the relative performance of algorithms that exploit pre-existing order,\ne.g., insertion sort, versus algorithms that do not, e.g., quicksort.\nAccording to recent reviews [14, 15], there are two key methods for sorting algorithm selection:\nGuo\u2019s decision-tree method [16] and Lagoudakis et al.\u2019s recursive algorithm selection method [17].\nWe thus evaluated the performance of rational metareasoning against these two approaches.\n\n3.1 Evaluation against Guo\u2019s method\n\nGuo\u2019s method learns a decision-tree, i.e. a sequence of logical rules that are applied to the list\u2019s\nfeatures to determine the sorting algorithm [16]. Guo\u2019s method and our method represent inputs by\nthe same pair of features: the length of the list to be sorted (f1 = |i|), and a measure of the list\u2019s\n\n3\n\n\ftest set\nDsort5\n\nnearly sorted lists\ninversely sorted lists\nrandom permutations\n\nperformance\n\n95% CI\n\nGuo\u2019s performance\n\n99.78%\n99.99%\n83.37%\n99.99%\n\n[99.7%, 99.9%]\n[99.3%, 100%]\n[82.7%, 84.1%]\n[99.2%, 100%]\n\n98.5%\n99.4%\n77.0%\n85.3%\n\np-value\np < 10\u221215\np < 10\u221215\np < 10\u221215\np < 10\u221215\n\nTable 1: Evaluation of rational metareasoning against Guo\u2019s method. Performance was measured\nby the percentage of problems for which the method chose the fastest algorithm.\n\nlack of presortedness (f2). The second feature ef\ufb01ciently estimates the number of inversions from\n2 \u00b7 RUNS(i), where RUNS(i) = |{m : im > im+1}|. If f2 = 0\nthe number of runs in the list: f2 = f1\nthe list is already sorted and the higher f2 the less sorted it is.\nOur method learns the conditional distributions of runtime and score given these two features, and\nuses them to approximate the conditional distributions given the input (Equations 3\u20134). We veri\ufb01ed\nthat our method can learn how runtime depends on list length and presortedness (data not shown).\nNext, we subjected our method to Guo\u2019s performance evaluation [16]. We thus evaluated rational\nmetareasoning on the problem of choosing between insertion sort, shell sort, heapsort, merge sort,\nand quicksort. We matched our training sets to Guo\u2019s DSort4 in the number of lists (i.e. 1875) and\nthe distributions of length and presortedness. We provided the run-time of all algorithms rather than\nthe index of the fastest algorithm. Otherwise, the training sets were equivalent. For each of Guo\u2019s\nfour test sets, we trained and evaluated rational metareasoning on 100 randomly generated pairs of\ntraining and test sets. The \ufb01rst test set mimicked Guo\u2019s Dsort5 problem set [16]. It comprised 1000\npermutations of the numbers 1 to 1000. Of the 1000 lists, 950 were random permutations and 50\nwere nearly-sorted. The lists contained between 1 and 520 runs (mean=260, SD=110). The second\ntest set comprised 1000 nearly-sorted lists of length 1000. The third test set comprised 100 lists\nin reverse order, and the fourth test set comprised 1000 random permutations. Nearly-sorted lists\nwere created by swapping 10 random pairs of the numbers 1\u20131000; both elements of each pair were\nsampled uniformly at random from the numbers 1\u20131000 with the constraint that they be different.\nTable 1 compares how frequently rational metareasoning chose the best algorithm on each test set to\nthe results reported by Guo [16]. We estimated our method\u2019s expected performance \u03b8 by its average\nperformance and 95% credible intervals. Credible intervals (CI) were computed by Bayesian infer-\nence with a uniform prior, and they comprise the values with highest posterior density whose total\nprobability is 0.95. In brief, rational metareasoning signi\ufb01cantly outperformed Guo\u2019s decision-tree\nmethod on all four test sets. The performance gain was highest on random permutations: rational\nmetareasoning chose the best algorithm 99.99% rather than only 85.3% of the time.\n\n3.2 Evaluation against Lagoudakis et al.\u2019s method\n\nDepending on a list\u2019s length Lagoudakis et al.\u2019s method chooses either insertion sort, merge sort, or\nquicksort [17]. If merge sort or quicksort is chosen the same decision rule is applied to each of the\ntwo sublists it creates. The selection mapping from lengths to algorithms is determined by mini-\nmizing the expected runtime [17]. We evaluated rational metareasoning against Lagoudakis et al.\u2019s\nrecursive method on 21 versions of Guo\u2019s Dsort5 test set [16] with 0%, 5%,\u00b7\u00b7\u00b7 , 100% nearly-sorted\nlists. To accommodate differences in implementation and architecture, we recomputed Lagoudakis\net al.\u2019s solution for the runtimes measured on our system. Rational metareasoning chose between\nthe \ufb01ve algorithms used by Guo and was trained on Guo\u2019s Dsort4 [16]. We compare the performance\nof the two methods in terms of their runtime, because none of the numerous choices of recursive\nalgorithm selection corresponds to our method\u2019s algorithm choice.\nOn average, our implementation of Lagoudakis et al.\u2019s method took 102.5\u00b1 0.83 seconds to sort the\n21 test sets, whereas rational metareasoning \ufb01nished in only 27.96 \u00b1 0.02 seconds. Rational metar-\neasoning was thus signi\ufb01cantly faster (p < 10\u221215). Next, we restricted the sorting algorithms avail-\nable to rational metareasoning to those used by Lagoudakis et al.\u2019s method. The runtime increased\nto 47.90 \u00b1 0.02 seconds, but rational metareasoning remained signi\ufb01cantly faster than Lagoudakis\net al.\u2019s method (p < 10\u221215). These comparisons highlight two advantages of our method: i) it can\nexploit presortedness, and ii) it can be used with arbitrarily many algorithms of any kind.\n\n4\n\n\f3.3 Discussion\n\nRational metareasoning outperformed two state-of-the-art methods for sorting algorithm selection.\nOur results in the domain of sorting should be interpreted as a lower bound on the performance gain\nthat rational metareasoning can achieve on harder problems such as combinatorial optimization,\nplanning, and search, where the runtimes of different algorithms are more variable [14]. Future\nresearch might explore the application of our theory to these harder problems, take into account\nheavy-tailed runtime distributions, use better representations, and incorporate active learning.\nOur results show that rational metareasoning is not just theoretically sound, but it is also competitive.\nWe can therefore use it as a normative model of human strategy selection learning.\n\n4 Rational metareasoning as a model of human strategy selection\n\nMost previous theories of how humans learn when to use which cognitive strategy assume basic\nmodel-free reinforcement learning [18\u201320]. The REinforcement Learning among Cognitive Strate-\ngies model (RELACS [19]) and the Strategy Selection Learning model (SSL [20]) each postulate\nthat people learn just one number for each cognitive strategy: the expected reward of applying it to\nan unknown problem and the sum of past rewards, respectively. These theories therefore predict that\npeople cannot learn to instantly adapt their strategy to the characteristics of a new problem. By con-\ntrast, the Strategy Choice And Discovery Simulation (SCADS [18]) postulates that people separately\nlearn about a strategy\u2019s performance on particular types of problems and its overall performance and\nintegrate the resulting predictions by multiplication.\nOur theory makes critically different assumptions about the mental representation of problems and\neach strategy\u2019s performance than the three previous psychological theories. First, rational metar-\neasoning assumes that problems are represented by multiple features that can be continuous or bi-\nnary. Second, rational metareasoning postulates that people maintain separate representations of\na strategy\u2019s execution time and the quality of its solution. Third, rational metareasoning can dis-\ncover non-additive interactions between features. Furthermore, rational metareasoning postulates\nthat learning, prediction, and strategy choice are more rational than previously modeled. Since our\nmodel formalizes substantially different assumptions about mental representation and information\nprocessing, determining which theory best explains human behavior will teach us more about how\nthe human brain represents and solves strategy selection problems.\nTo understand when and how the predictions of our theory differ from the predictions of the three\nexisting psychological theories, we performed computer simulations. To apply the three reinforce-\nment learning based psychological theories to the selection among sorting strategies, we had to\nde\ufb01ne the reward r. We considered three notions of reward: i) correctness (r \u2208 {\u22120.1, +0.1}; these\nnumbers are based on the SCADS model [18]), ii) correctness minus time cost (r \u2212 c \u00b7 t, where t\nis the execution time and c is a constant), and iii) reward rate (r/t). We evaluated all nine com-\nbinations of the three theories with the three notions of reward. We provided the SCADS model\nwith reasonable problem types: short lists (length \u2264 16), long lists (length \u2265 32), nearly-sorted lists\n(less than 10% inversions), and random lists (more than 25% inversions). We evaluated the perfor-\nmance of these nine models against rational metareasoning in the selection between seven sorting\nalgorithms: insertion sort, selection sort, bubble sort, shell sort, heapsort, merge sort, and quicksort.\nTo do so, we trained each model on 1000 randomly generated lists, \ufb01xed the learned parameters and\nevaluated how many lists each model could sort per second. Training and test lists were generated\nby sampling. list lengths were sampled from a Uniform({2,\u00b7\u00b7\u00b7 , u}) distribution where u was 10,\n100, 1000, or 10000 with equal probability. The fraction of inversions between subsequent numbers\nwas drawn from a Beta(2, 1) distribution. We performed 100 train-and-test episodes. Sorting time\nwas measured by selection time plus execution time. We estimated the expected sorting speed for\neach model by averaging. We found that while rational metareasoning achieved 88.1 \u00b1 0.7% of the\nhighest possible sorting speed, none of the nine alternative models achieved more than 30% of the\nmaximal sorting speed. Thus, the time invested in metareasoning was more than offset by the time\nsaved with the chosen strategy.\n\n5\n\n\f5 How do people choose cognitive strategies?\n\nGiven that rational metareasoning outperformed the nine psychological models in strategy selection,\nwe asked whether the mind is more adaptive than those theories assume. To answer this question,\nwe designed an experiment for which rational metareasoning predicts distinctly different choices.\n\n5.1 Pilot studies and simulations\n\nTo design an experiment that can distinguish between our competing hypotheses, we ran two pilot\nstudies measuring the execution time characteristics of cocktail sort (CS) respectively merge sort\n(MS). For each pilot study we recruited 100 participants on Amazon Mechanical Turk. In the \ufb01rst\npilot study, the interface shown in Figure 1(a) required participants to follow the step-by-step in-\nstructions of the cocktail sort algorithm. In the second pilot study, participants had to execute merge\nsort with the computer interface shown in Figure 1(b). We measured their sorting times for lists\nof varying length and presortedness. Based on this data, we estimated how long comparisons and\nmoves take for each strategy. This led to the following sorting time models:\n\nTCS = \u02c6tCS + \u03b5CS, \u02c6tCS = 19.59 + 0.19 \u00b7 ncomparisons + 0.31 \u00b7 nmoves, \u03b5CS \u223c N (0, 0.21 \u00b7 \u02c6t2\nCS) (13)\nTMS = \u02c6tMS + \u03b5MS, \u02c6tMS = 13.98 + 1.10 \u00b7 ncomparisons + 0.52 \u00b7 nmoves, \u03b5MS \u223c N (0, 0.15 \u00b7 \u02c6t2\nMS) (14)\nWe then used these sorting time models to simulate 104 candidate experiments according to each\nof the 10 models. We found several potential experiments for which rational metareasoning makes\nqualitatively different predictions than all of the alternative psychological theories, and we chose the\none that achieved the best compromise between discriminability and duration.\nAccording to the two runtime models (Equations 13\u201314) and how many comparisons and moves\neach algorithm would perform, people should choose merge sort for long and nearly inversely\nsorted lists and cocktail sort for lists that are either nearly-sorted or short. For the chosen exper-\nimental design, the three existing psychological theories predicted that people would fail to learn\nthis contingency; see Figure 2. By contrast, rational metareasoning predicted that adaptive strategy\nselection would be evident from the choices of more than 70% of our participants. Therefore, the\nchosen experimental design was well suited to discriminate rational metareasoning from previous\ntheories. The next section describes the chosen experiment in detail.\n\n5.2 Methods\n\nThe experiment was run online1 with 100 participants recruited on Amazon Mechanical Turk and it\npaid $1.25. The experiment comprised three stages: training, choice, and execution. In the training\nstage, each participant was taught to sort lists of numbers by executing cocktail sort and merge\nsort. On each of the 11 training trials, the participant was instructed which strategy to use. The\ninterface enforced that he or she correctly performed each step of that strategy. The interfaces were\nthe same as in the pilot studies (see Figure 1). For both strategies, the chosen lists comprised nearly\nreversely sorted lists of length 4, 8, and 16 and nearly-sorted lists of length 16 and 32. For the\ncocktail sort strategy, each participant was also trained on a nearly inversely sorted list with 32\nelements. Participants \ufb01rst practiced cocktail sort for \ufb01ve trials and then practiced merge sort. The\nlast two trials contrasted the two strategies on long, nearly-sorted lists with identical length. Nearly-\nsorted lists were created by inserting a randomly selected element at a different random location\nof an ascending list. Nearly inversely sorted lists were created applying the same procedure to\na descending list. In the choice phase, participants were shown 18 test lists. For each list, they\nwere asked to choose which sorting strategy they would use, if they had to sort it. Participants\nwere told that they would have to sort one randomly selected list with the strategy they chose for\nit. The test lists comprised six instances of each of three kinds of lists: long and nearly inversely\nsorted, long and nearly-sorted, and short and nearly-sorted. The order of these lists was randomized\nacross participants. In the execution phase, one of the 12 short lists was randomly selected, and the\nparticipant had to sort it using the strategy he or she had previously chosen for that list.\nTo derive theoretical predictions, we gave each model the same information as our participants.\n\n1http://cocosci.berkeley.edu/mturk/falk/StrategyChoice/consent.html\n\n6\n\n\fa) Cocktail sort\n\nb) Merge sort\n\nFigure 1: Interfaces used to train participants to perform (a) cocktail sort and (b) merge sort in the\nbehavioral experiment.\n\n5.3 Results\nOur participants took 24.7\u00b16.7 minutes to complete the experiment (mean \u00b1 standard deviation). In\nthe training phase, the median number of errors per list was 2.45, and 95% of our participants made\nbetween 0.73 and 12.55 errors per list. In the choice phase, 83% of our participants chose merge sort\nmore often when it was the superior strategy than when it was not. We can thus be 95% con\ufb01dent\nthat the population frequency of this adaptive strategy choice pattern lies between 74.9% and 89.4%;\nsee Figure 2b). This adaptive choice pattern was signi\ufb01cantly more frequent than could be expected,\nif strategy choice was independent of the lists\u2019 features (p < 10\u221211). This is consistent with our\nmodel\u2019s predictions but inconsistent with the predictions of the RELACS, SSL, and SCADS models.\nOnly rational metareasoning correctly predicted that the frequency of the adaptive strategy choice\npattern would be above chance (p < 10\u22125 for our model and p \u2265 0.46 for all other models). Figure\n2(b) compares the proportion of participants exhibiting this pattern with the models\u2019 predictions.\nThe non-overlapping credible intervals suggest that we can be 95% con\ufb01dent that the choices of\npeople and rational metareasoning are more adaptive than those predicted by the three previous\ntheories (all p < 0.001). Yet we can also be 95% con\ufb01dent that, at least in our experiment, people\nchoose their strategy even more adaptively than rational metareasoning (p \u2264 0.02).\nOn average, our participants chose merge sort for 4.9 of the 6 long and nearly inversely sorted lists\n(81.67% of the time, 95% credible interval: [77.8%; 93.0%]), but for only 1.79 of the 6 nearly-sorted\nlong lists (29.83% of the time, 95% credible interval: [12.9%, 32.4%]), and for only 1.62 of the 6\nnearly-sorted short lists (27.00% of the time, 95% credible interval: [16.7%, 40.4%]); see Figure\n2(a). Thus, when merge sort was superior, our participants chose it signi\ufb01cantly more often than\ncocktail sort (p < 10\u221210). But, when merge sort was inferior, they chose cocktail sort more often\nthan merge sort (p < 10\u22127).\n\n5.4 Discussion\n\nWe evaluated our rational metareasoning model of human strategy selection against nine models\ninstantiating three psychological theories. While those nine models completely failed to predict our\nparticipants\u2019 adaptive strategy choices, the predictions of rational metareasoning were qualitatively\ncorrect, and its choices came close to human performance. The RELACS and the SSL models\nfailed, because they do not represent problem features and do not learn about how those features\naffect each strategy\u2019s performance. The model-free learning assumed by SSL and RELACS was\nmaladaptive because cocktail sort was faster for most training lists, but was substantially slower\nfor the long, nearly inversely sorted test lists. The SCADS model failed mainly because its sub-\n\n7\n\n\fFigure 2: Pattern of strategy choices: (a) Relative frequency with which humans and models chose\nmerge sort by list type. (b) Percentage of participants who chose merge sort more often when it was\nsuperior than when it was not. Error bars indicate 95% credible intervals.\n\noptimal learning mechanism was fooled by the slight imbalance between the training examples for\ncocktail sort and merge sort, but also because it can neither extrapolate nor capture the non-additive\ninteraction between length and presortedness. Instead human-like adaptive strategy selection can\nbe achieved by learning to predict each strategy\u2019s execution time and accuracy given features of\nthe problem. To further elucidate the human mind\u2019s strategy selection learning algorithm, future\nresearch will evaluate our theory against an instance-based learning model [21].\nOur participants outperformed the RELACS, SSL, and SCADS models, as well as rational metar-\neasoning in our strategy selection task. This suggests that neither psychology nor AI can yet fully\naccount for people\u2019s adaptive strategy selection. People\u2019s superior performance could be enabled by\na more powerful representation of the lists, perhaps one that includes reverse-sortedness, or the abil-\nity to choose strategies based on mental simulations of their execution on the presented list. These\nare just two of many possibilities and more experiments are needed to unravel people\u2019s superior per-\nformance. In contrast to the sorting strategies in our experiment, most cognitive strategies operate on\ninternal representations. However, there are two reasons to expect our conclusions to transfer: First,\nthe metacognitive principles of strategy selection might be domain general. Second, the strategies\npeople use to order things mentally might be based on their sorting strategies in the same way in\nwhich mental arithmetic is based on calculating with \ufb01ngers or on paper.\n\n6 Conclusions\n\nSince neither psychology nor AI can yet fully account for people\u2019s adaptive strategy selection, fur-\nther research into how people learn to select cognitive strategies may yield not only a better un-\nderstanding of human intelligence, but also better solutions to the algorithm selection problem in\ncomputer science and arti\ufb01cial intelligence. Our results suggest that reasoning about which strategy\nto use can be resource-rational [22] by saving more time than it takes and thereby contribute to our\nadaptive intelligence. Since our framework is very general, it can be applied to strategy selection\nin all areas of human cognition including judgment and decision-making [1, 3], as well as to the\ndiscovery of novel strategies [2]. Future research will investigate human strategy selection learning\nin more ecological domains such as mental arithmetic, decision-making, and problem solving where\npeople have to trade off speed versus accuracy. In conclusion, rational metareasoning is a promising\ntheoretical framework for reverse-engineering people\u2019s capacity for adaptive strategy selection.\nAcknowledgments. This work was supported by ONR MURI N00014-13-1-0341.\n\n8\n\n\fReferences\n[1] G. Gigerenzer and R. Selten, Bounded rationality: The adaptive toolbox. MIT Press, 2002.\n[2] R. S. Siegler, \u201cStrategic development,\u201d Trends in Cognitive Sciences, vol. 3, pp. 430\u2013435, Nov. 1999.\n[3] J. W. Payne, J. R. Bettman, and E. J. Johnson, \u201cAdaptive strategy selection in decision making.,\u201d Journal\n\nof Experimental Psychology: Learning, Memory, and Cognition, vol. 14, no. 3, p. 534, 1988.\n\n[4] J. N. Marewski and D. Link, \u201cStrategy selection: An introduction to the modeling challenge,\u201d Wiley\n\nInterdisciplinary Reviews: Cognitive Science, vol. 5, no. 1, pp. 39\u201359, 2014.\n\n[5] J. R. Rice, \u201cThe algorithm selection problem,\u201d Advances in Computers, vol. 15, pp. 65\u2013118, 1976.\n[6] S. Russell and E. Wefald, \u201cPrinciples of metareasoning,\u201d Arti\ufb01cial Intelligence, vol. 49, no. 1-3, pp. 361\u2013\n\n395, 1991.\n\n[7] N. Hay, S. Russell, D. Tolpin, and S. Shimony, \u201cSelecting computations: Theory and applications,\u201d in\nUncertainty in Arti\ufb01cial Intelligence: Proceedings of the Twenty-Eighth Conference (N. de Freitas and\nK. Murphy, eds.), (P.O. Box 866 Corvallis, Oregon 97339 USA), AUAI Press, 2012.\n\n[8] D. Harada and S. Russell, \u201cMeta-level reinforcement learning,\u201d in NIPS\u201998 Workshop on Abstraction and\n\nHierarchy in Reinforcement Learning, 1998.\n\n[9] T. Jaakkola and M. Jordan, \u201cA variational approach to Bayesian logistic regression models and their\n\nextensions,\u201d in Sixth International Workshop on Arti\ufb01cial Intelligence and Statistics, 1997.\n\n[10] D. V. Lindley and A. F. M. Smith, \u201cBayes estimates for the linear model,\u201d Journal of the Royal Statistical\n\nSociety. Series B (Methodological), vol. 34, no. 1, 1972.\n\n[11] S. Kunz, \u201cThe Bayesian linear model with unknown variance,\u201d tech. rep., Seminar for Statistics, ETH\n\nZurich, Switzerland, 2009.\n\n[12] R. E. Kass and A. E. Raftery, \u201cBayes factors,\u201d Journal of the American Statistical Association, vol. 90,\n\npp. 773\u2013795, June 1995.\n\n[13] W. D. Penny and G. R. Ridgway, \u201cEf\ufb01cient posterior probability mapping using Savage-Dickey ratios,\u201d\n\nPLoS ONE, vol. 8, no. 3, pp. e59655+, 2013.\n\n[14] L. Kotthoff, \u201cAlgorithm selection for combinatorial search problems: A survey,\u201d AI Magazine, 2014.\n[15] K. A. Smith-Miles, \u201cCross-disciplinary perspectives on meta-learning for algorithm selection,\u201d ACM\n\nComput. Surv., vol. 41, Jan. 2009.\n\n[16] H. Guo, Algorithm selection for sorting and probabilistic inference: a machine learning-based approach.\n\nPhD thesis, Kansas State University, 2003.\n\n[17] M. G. Lagoudakis, M. L. Littman, and R. Parr, \u201cSelecting the right algorithm,\u201d in Proceedings of the 2001\n\nAAAI Fall Symposium Series: Using Uncertainty within Computation, Cape Cod, MA, 2001.\n\n[18] J. Shrager and R. S. Siegler, \u201cSCADS: A model of children\u2019s strategy choices and strategy discoveries,\u201d\n\nPsychological Science, vol. 9, pp. 405\u2013410, Sept. 1998.\n\n[19] I. Erev and G. Barron, \u201cOn adaptation, maximization, and reinforcement learning among cognitive strate-\n\ngies.,\u201d Psychological review, vol. 112, pp. 912\u2013931, Oct. 2005.\n\n[20] J. Rieskamp and P. E. Otto, \u201cSSL: A theory of how people learn to select strategies.,\u201d Journal of Experi-\n\nmental Psychology: General, vol. 135, pp. 207\u2013236, May 2006.\n\n[21] C. Gonzalez and V. Dutt, \u201cInstance-based learning: Integrating sampling and repeated decisions from\n\nexperience,\u201d Psychological Review, vol. 118, no. 4, pp. 523\u2013551, 2011.\n\n[22] T. L. Grif\ufb01ths, F. Lieder, and N. D. Goodman, \u201cRational use of cognitive resources: Levels of analysis\n\nbetween the computational and the algorithmic,\u201d Topics in Cognitive Science, in press.\n\n9\n\n\f", "award": [], "sourceid": 1485, "authors": [{"given_name": "Falk", "family_name": "Lieder", "institution": "UC Berkeley"}, {"given_name": "Dillon", "family_name": "Plunkett", "institution": "UC Berkeley"}, {"given_name": "Jessica", "family_name": "Hamrick", "institution": "University of California, Berkeley"}, {"given_name": "Stuart", "family_name": "Russell", "institution": "UC Berkeley"}, {"given_name": "Nicholas", "family_name": "Hay", "institution": "UC Berkeley"}, {"given_name": "Tom", "family_name": "Griffiths", "institution": "UC Berkeley"}]}