{"title": "Human Decision-Making under Limited Time", "book": "Advances in Neural Information Processing Systems", "page_first": 100, "page_last": 108, "abstract": "Abstract Subjective expected utility theory assumes that decision-makers possess unlimited computational resources to reason about their choices; however, virtually all decisions in everyday life are made under resource constraints---i.e. decision-makers are bounded in their rationality. Here we experimentally tested the predictions made by a formalization of bounded rationality based on ideas from statistical mechanics and information-theory. We systematically tested human subjects in their ability to solve combinatorial puzzles under different time limitations. We found that our bounded-rational model accounts well for the data. The decomposition of the fitted model parameter into the subjects' expected utility function and resource parameter provide interesting insight into the subjects' information capacity limits. Our results confirm that humans gradually fall back on their learned prior choice patterns when confronted with increasing resource limitations.", "full_text": "Human Decision-Making under Limited Time\n\nPedro A. Ortega\n\nDepartment of Psychology\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19104\n\nope@seas.upenn.edu\n\nAlan A. Stocker\n\nDepartment of Psychology\nUniversity of Pennsylvania\n\nPhiladelphia, PA 19014\n\nastocker@sas.upenn.edu\n\nAbstract\n\nSubjective expected utility theory assumes that decision-makers possess unlimited\ncomputational resources to reason about their choices; however, virtually all deci-\nsions in everyday life are made under resource constraints\u2014i.e. decision-makers\nare bounded in their rationality. Here we experimentally tested the predictions\nmade by a formalization of bounded rationality based on ideas from statistical\nmechanics and information-theory. We systematically tested human subjects in\ntheir ability to solve combinatorial puzzles under different time limitations. We\nfound that our bounded-rational model accounts well for the data. The decompo-\nsition of the \ufb01tted model parameter into the subjects\u2019 expected utility function and\nresource parameter provide interesting insight into the subjects\u2019 information ca-\npacity limits. Our results con\ufb01rm that humans gradually fall back on their learned\nprior choice patterns when confronted with increasing resource limitations.\n\n1\n\nIntroduction\n\nHuman decision-making is not perfectly rational. Most of our choices are constrained by many fac-\ntors such as perceptual ambiguity, time, lack of knowledge, or computational effort [6]. Classical\ntheories of rational choice do not apply in such cases because they ignore information-processing\nresources, assuming that decision-makers always pick the optimal choice [10]. However, it is well\nknown that human choice patterns deviate qualitatively from the perfectly rational ideal with in-\ncreasing resource limitations.\nIt has been suggested that such limitations in decision-making can be formalized using ideas from\nstatistical mechanics [9] and information theory [16]. These frameworks propose that decision-\nmakers act as if their choice probabilities were an optimal compromise between maximizing the\nexpected utility and minimizing the KL-divergence from a set of prior choice probabilities, where\nthe trade-off is determined by the amount of available resources. This optimization scheme reduces\nthe decision-making problem to the inference of the optimal choice from a stimulus, where the like-\nlihood function results from a combination of the decision-maker\u2019s subjective preferences and the\nresource limitations.\nThe aim of this paper is to systematically validate the model of bounded-rational decision-making\non human choice data. We conducted an experiment in which subjects had to solve a sequence\nof combinatorial puzzles under time pressure. By manipulating the allotted time for solving each\npuzzle, we were able to record choice data under different resource conditions. We then \ufb01t the\nbounded-rational choice model to the dataset, obtaining a decomposition of the choice probabilities\nin terms of a resource parameter and a set of stimulus-dependent utility functions. Our results show\nthat the model captures very well the gradual shifts due to increasing time constraints that are present\nin the subjects\u2019 empirical choice patterns.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\f2 A Probabilistic Model of Bounded-Rational Choices\n\nWe model a bounded-rational decision maker as an expected utility maximizer that is subject to\ninformation constraints. Formally, let X and Y be two \ufb01nite sets, the former corresponding to a\nset of stimuli and the latter to a set of choices; and let P (y) be a prior distribution over optimal\nchoices y \u2208 Y that the decision-maker may have learned from experience. When presented with a\nstimulus x \u2208 X , a bounded-rational decision-maker transforms the prior choice probabilities P (y)\ninto posterior choice probabilities P (y|x) and then generates a choice according to P (y|x).\nThis transformation is modeled as the optimization of a regularized expected utility known as the\nfree energy functional:\n\nF(cid:2)Q(y|x)(cid:3) :=\n\n(cid:88)\n(cid:124)\n\ny\n\n(cid:88)\n\ny\n\n\u2212 1\n\u03b2\n\n(cid:124)\n\n(cid:125)\n\nQ(y|x)Ux(y)\n\n(cid:123)(cid:122)\n\nQ(y|x) log\n\n(cid:123)(cid:122)\n\nQ(y|x)\nP (y)\n\n,\n\n(cid:125)\n\n(1)\n\nExpected Utility\n\nRegularization\n\nwhere the posterior is de\ufb01ned as the maximizer P (y|x) := arg maxQ(y|x) F [Q(y|x)]. Crucially,\nthe optimization is determined by two factors. The \ufb01rst is the decision-maker\u2019s subjective utility\nfunction Ux : Y \u2192 R encoding the desirability of a choice y given a stimulus x. The second is\nthe inverse temperature \u03b2, which determines the resources of deliberation available for the decision-\ntask1, but which are neither known to, nor controllable by the decision-maker. The resulting posterior\nhas an analytical expression given by the Gibbs distribution\n\nP (y) exp(cid:8)\u03b2Ux(y)(cid:9),\n\nP (y|x) =\n\n1\n\nZ\u03b2(x)\n\nwhere Z\u03b2(x) is a normalizing constant [9]. The expression (2) highlights a connection to infer-\nence: bounded-rational decisions can also be computed via Bayes\u2019 rule in which the likelihood is\ndetermined by \u03b2 and Ux as follows:\n\nP (y|x) =\n\n(cid:80)\nP (y)P (x|y)\ny(cid:48) P (y(cid:48))P (x|y(cid:48))\n\n,\n\nhence P (x|y) \u221d exp(cid:8)\u03b2Ux(y)(cid:9).\n\nThe objective function (1) can be motivated as a trade-off between maximizing expected utility and\nminimizing information cost [9, 16]. Near-zero values of \u03b2, which correspond to heavily-regularized\ndecisions, yield posterior choice probabilities that are similar to the prior. Conversely, with growing\nvalues of \u03b2, the posterior choice probabilities approach the perfectly-rational limit.\n\nConnection to regret. Bounded-rational decision-making is related to regret theory [2, 4, 8]. To\nsee this, de\ufb01ne the certainty-equivalent as the maximum attainable value for (1):\n\nU\u2217\nx := max\nQ(y|x)\n\n=\n\n1\n\u03b2\n\nlog Z\u03b2(x).\n\n(4)\n\nThe certainty-equivalent quanti\ufb01es the net worth of the stimulus x prior to making a choice. The de-\ncision process treats (4) as a reference utility used in the assessment of the alternatives. Speci\ufb01cally,\nthe modulation of any choice is obtained by measuring up the utility against the certainty-equivalent:\n\n(cid:110)\nF(cid:2)Q(y|x)(cid:3)(cid:111)\n\n(2)\n\n(3)\n\n(5)\n\n(cid:104)\n\nlog\n\n(cid:124)\n\nP (y|x)\nP (y)\n\n(cid:123)(cid:122)\n\n(cid:125)\n\nChange of y\n\n= \u2212\u03b2\n\n(cid:123)(cid:122)\n(cid:125)\nx \u2212 Ux(y)\nU\u2217\n\n(cid:124)\n\nRegret of y\n\n(cid:105)\n\n.\n\nAccordingly, the difference in log-probability is proportional to the negative regret [3]. The decision-\nmaker\u2019s utility function speci\ufb01es a direction of change relative to the certainty-equivalent, whereas\nthe strength of the modulation is determined by the inverse temperature.\n\n1For simplicity, here we consider only strictly positive values for the inverse temperature \u03b2, but its domain\n\ncan be extended to negative values to model other effects, e.g. risk-sensitive estimation [9].\n\n2\n\n\f3 Experimental Methods\n\nWe conducted a choice experiment where subjects had to solve puzzles under time pressure. Each\npuzzle consisted of Boolean formula in conjunctive normal form (CNF) that was disguised as an\narrangement of circular patterns (see Fig. 1). The task was to \ufb01nd a truth assignment that satis\ufb01ed\nthe formula. Subjects could pick an assignment by setting the colors of a central pattern highlighted\nin gray. Formally, the puzzles and the assignments corresponded to the stimuli x \u2208 X and the choices\ny \u2208 Y respectively, and the duration of the puzzle was the resource parameter that we controlled\n(see equation 1).\n\nFigure 1: Example puzzle. a) Each puzzle is a set of six circularly arranged patches containing\npatterns of black (\u2022) and white circles (\u25e6). In each trial, the positions of the patches were randomly\nassigned to one of the six possible locations. Subjects had to choose the three center colors such that\nthere was at least one (color and position) match for each patch. For instance, the choice in (b) only\nmatches four out of six patches (in red), while (c) solves the puzzle. The puzzle is a visualization of\nthe Boolean formula in (d).\n\nWe restricted our puzzles to a set of \ufb01ve CNF formulas having 6 clauses, 2 literals per clause, and\n3 variables. Subjects were trained only on the \ufb01rst four puzzles, whereas the last one was used as a\ncontrol puzzle during the test phase. All the chosen puzzles had a single solution out of the 23 = 8\npossible assignments.\nWe chose CNF formulas because they provide a general2 and \ufb02exible platform for testing decision-\nmaking behavior. Crucially, unlike in an estimation task, \ufb01nding the relation between a stimulus and\na choice is non-trivial and requires solving a computational problem.\n\n3.1 Data Collection\n\nTwo symmetric versions of the experiment were conducted on Amazon Mechanical Turk. For each,\nwe collected choice data from 15 anonymized participants living in the United States, totaling 30\nsubjects. Subjects were paid 10 dollars for completing the experiment. The typical runtime of the\nexperiment ranged between 50 and 130 minutes.\nFor each subject, we recorded a sequence of 90 training and 285 test trials. The puzzles were dis-\nplayed throughout the whole trial, during which the subjects could modify their choice at will. The\ntraining trials allowed subjects to familiarize themselves with the task and the stimuli, whereas the\ntest trials measured their adapted choice behavior as a function of the stimulus and the task duration.\nTraining trials were presented in blocks of 18 for a long, \ufb01xed duration; the test trials, which were\nof variable duration, were presented in blocks of 19 (18 regular + 1 control trial). To avoid the col-\nlection of poor quality data, subjects had to repeat a block if they failed more than 6 trials within the\nsame block, thereby setting a performance threshold that was well above chance level. Participants\ncould initiate a block whenever they felt ready to proceed. Within a block, the inter-trial durations\nwere drawn uniformly between 0.5 and 1.5s.\nEach trial consisted of one puzzle that had to be solved within a limited time. Training trials lasted\n10s each, while test trials had durations of 1.25, 2.5, and 5s. Apart from a visual cue shown 1s before\nthe end of each trial, there was no explicit feedback communicating the trial length. Therefore,\nsubjects did not know the duration of individual test trials beforehand and thus could not use this\ninformation in their solution strategy. A trial was considered successful only if all the clauses of the\npuzzle were satis\ufb01ed.\n\n2More precisely, the 2-SAT and SAT problems are NL- and NP-complete respectively. This means that every\n\nother decision problem within the same complexity class can be reduced (i.e. rephrased) as a SAT problem.\n\n3\n\nb)c)d)a)???\f4 Analysis\nThe recorded data D consists of a set of tuples (x, r, y), where x \u2208 X is a stimulus, r \u2208 R is a\nresource parameter (i.e. duration), and y \u2208 Y a choice. In order to analyze the data, we made the\nfollowing assumptions:\n\n1. Transient regime: During the training trials, the subjects converged to a set of subjective\n\npreferences over the choices which depended only on the stimuli.\n\n2. Permanent regime: During the test trials, subjects did not signi\ufb01cantly change the prefer-\nences that they learned during the training trials. Speci\ufb01cally, choices in the same stimulus-\nduration group were i.i.d. throughout the test phase.\n\n3. Negligible noise: We assumed that the operation of the input device and the cue signaling\nthe imminent end of the trial did not have a signi\ufb01cant impact on the distribution over\nchoices.\n\nOur analysis only focused only the test trials. Let P (x, r, y) denote the empirical probabilities3 of the\ntuples (x, r, y) estimated from the data. From these, we derived the probability distribution P (x, r)\nover the stimulus-resource context, the prior P (y) over choices, and the posterior P (y|x, r) over\nchoices given the context through marginalization and conditioning.\n\n4.1\n\nInferring Preferences\n\nBy \ufb01tting the model, we decomposed the choice probabilities into: (a) an inverse temperature func-\ntion \u03b2 : R \u2192 R; and (b) a set of subjective utility functions Ux : Y \u2192 R, one for each stimulus x.\nWe assumed that the sets X , R, and Y were \ufb01nite, and we used vector representations for \u03b2 and\nthe Ux. To perform the decomposition, we minimized the average Kullback-Leibler divergence\n\n(cid:88)\n\n(cid:20)(cid:88)\n\nJ =\n\nP (x, r)\n\nx,r\n\ny\n\n(cid:21)\n\nP (y|x, r) log\n\nP (y|x, r)\nQ(y|x, r)\n\n,\n\n(6)\n\nw.r.t. the inverse temperatures \u03b2(r) and the utilities Ux(y) through the probabilities Q(y|x, r) of the\nchoice y given the context (x, r) as derived from the Gibbs distribution\n\n(cid:110)\n\n(cid:111)\n\nQ(y|x, r) =\n\n1\nZ\u03b2\n\nP (y) exp\n\n\u03b2(r)Ux(y)\n\n,\n\nwhere Z\u03b2 is the normalizing constant. We used the objective function (6) because it is the Bregman\ndivergence over the simplex of choice probabilities [1]. Thus, by minimizing the objective func-\ntion (6) we were seeking a decomposition such that the Shannon information contents of P (y|x, r)\nand Q(y|x, r) were matched against each other in expectation.\nWe minimized (6) using gradient descent. For this, we \ufb01rst rewrote (6) as\n\n(7)\n\n(8)\n\n(9)\n\nto expose the coordinates of the exponential manifold and then calculated the gradient. The partial\nderivatives of J w.r.t. \u03b2(r) and Ux(y) are equal to\n\n(cid:26)\n\n(cid:88)\n\nx,\u03b2,y\n\nJ =\n\nP (x, r, y)\n\nlog\n\nP (y|x, r)\n\nP (y)\n\n\u2212 \u03b2(r)Ux(y) + log Z\u03b2\n\n(cid:27)\n\n(cid:104)\n\n(cid:88)\n(cid:104)\n\ny\n\nP (x, r)\n\nP (x, r)\n\n(cid:88)\n(cid:88)\n\nx,y\n\nx,y\n\n\u2202J\n\n\u2202\u03b2(r)\n\n\u2202J\n\n\u2202Ux(y)\n\n=\n\n=\n\nand\n\n(cid:105)\n\n(cid:105)\n\nQ(y|x, r) \u2212 P (y|x, r)\n\nUx(y)\n\nQ(y|x, r) \u2212 P (y|x, r)\n\n\u03b2(r)\n\nrespectively. The Gibbs distribution (7) admits an in\ufb01nite number of decompositions, and therefore\nwe had to \ufb01x the scaling factor and the offset to obtain a unique solution. The scale was set by\nclamping the value of \u03b2(r0) = \u03b20 for an arbitrarily chosen resource parameter r0 \u2208 R; we used\n\n3More precisely, P (x, r, y) \u221d N (x, r, y) + 1, where N (x, r, y) is the count of ocurrences of (x, r, y).\n\n4\n\n\f\u03b2(r0) = 1 for r0 = 1s. The offset was \ufb01xed by normalizing the utilities. A simple way to achieve\nthis is by subtracting the certainty-equivalent from the utilities, i.e. for all (x, y),\n\nUx(y) \u2190 Ux(y) \u2212 1\n\u03b2(r0)\n\nlog\n\nP (y) exp\n\n\u03b2(r0)Ux(y)\n\n.\n\n(10)\n\n(cid:110)\n\n(cid:111)\n\n(cid:88)\n\ny\n\nUtilities normalized in this way are proportional to the negative regret (see Section 2) and thus have\nan intuitive interpretation as modulators of change of the choice distribution.\nThe resulting decomposition algorithm repeats the following two steps until convergence: \ufb01rst it\nupdates the inverse temperature and utility functions using gradient descent, i.e.\n\n\u2202J\n\n\u03b2(r) \u2190\u2212 \u03b2(r) \u2212 \u03b7t\n\n(11)\nfor all (r, x, y) \u2208 R \u00d7 X \u00d7 Y ; and seconds it projects the parameters back onto a standard subman-\nifold by setting r = r0 and normalizing the utilities in each iteration using (10). For the learning rate\nt \u03b7t = \u221e and\n\n\u03b7t > 0, we choose a simple schedule that satis\ufb01ed the Robbins-Monro conditions(cid:80)\n(cid:80)\n\n\u2202Ux(y)\n\n\u2202\u03b2(r)\n\nand Ux(y) \u2190\u2212 Ux(y) \u2212 \u03b7t\n\n\u2202J\n\nt < \u221e.\n\nt \u03b72\n\n4.2 Expected Utility and Decision Bandwidth\n\n(cid:88)\n\nThe inferred model is useful for investigating the decision-maker\u2019s performance under different\nsettings of the resource parameter\u2014in particular, to determine the asymptotic performance limits.\nTwo quantities are of special interest: the expected utility averaged over the stimuli and the mutual\ninformation between the stimulus and the choice, both as functions of the inverse temperature \u03b2.\nGiven \u03b2, we de\ufb01ne these quantities as\n\n(cid:88)\nUx(y). The marginal over choices is given by Q\u03b2(y) = (cid:80)\n\nrespectively. Both de\ufb01nitions are based on the joint distribution P (x)Q\u03b2(y|x) in which Q\u03b2(y|x) \u221d\nP (y) exp{\u03b2Ux(x)} is the Gibbs distribution derived from the prior P (y) and the utility functions\nx P (x)Q\u03b2(y|x). The mutual informa-\ntion I\u03b2 is a measure of the decision bandwidth, because it quanti\ufb01es the average amount of informa-\ntion that the subject has to extract from the stimulus in order to produce the choice.\n\nP (x)Q\u03b2(y|x) log\n\nQ\u03b2(y|x)\nQ\u03b2(y)\n\n(12)\n\nP (x)Q\u03b2(y|x)Ux(y)\n\nEU\u03b2 :=\n\nI\u03b2 :=\n\nand\n\nx,y\n\nx,y\n\n5 Results\n\n5.1 Decomposition into prior, utility, and inverse temperature\n\nFor each one of the 30 subjects, we \ufb01rst calculated the empirical choice probabilities and then esti-\nmated their decomposition into an inverse temperature \u03b2 and utility functions Ux using the procedure\ndetailed in the previous section. The mean error of the \ufb01t was very low (0.0347 \u00b1 0.0024 bits), im-\nplying that the choice probabilities are well explained by the model. As an example, Fig. 2 shows\nthe decomposition for subject 1 (error 0.0469 bits, 83% percentile rank) along with a comparison\nbetween the empirical posterior and the model posterior calculated from the inferred components\nusing equation (7). As durations become longer and \u03b2 increases, the model captures the gradual\nshift from the prior towards the optimal choice distribution.\nAs seen in Fig. 3, the resulting decomposition is stable and shows little variability across subjects.\nThe stimuli of version B of the experiment differed from version A only in that they were color-\ninverted, leading to mirror-symmetric decompositions of the prior and the utility functions. The\nresults suggest the following trends:\n\n\u2022 Prior: Compared to the true distribution over solutions, subjects tended to concentrate their\nchoices slightly more on the most frequent optimal solution (i.e. either y = 2 or y = 7 for\nversion A or B respectively) and on the all-black or all-white solution (either y = 1 or\ny = 8).\n\n5\n\n\fFigure 2: Decomposition of subject 1\u2019s posterior choice probabilities. Each row corresponds to a\ndifferent puzzle. The left column shows each puzzle\u2019s stimulus and optimal choice. The posterior\ndistributions P (y|x, \u03b2) were decomposed into a prior P (y); a set of time-dependent inverse tem-\nperatures \u03b2r; and a set of stimulus-dependent utility functions Ux over choices, normalized relative\nto the certainty-equivalent (10). The plots compare the subject\u2019s empirical frequencies against the\nmodel \ufb01t (in the posterior plots) or against the true optimal choice probabilities (in the prior plot).\nThe stimuli are shown on the left (more speci\ufb01cally, one out of the 6! arrangement of patches) along\nwith their probability. Note that the untrained stimulus x = 7 is the color-inverse of x = 2.\n\n\u2022 Inverse temperature: The inverse temperature increases monotonically with longer dura-\n\ntions, and the dependency is approximately linear in log-time (Fig. 2 and 3).\n\n\u2022 Utility functions: In the case of the stimuli that subjects were trained in (namely, x \u2208\n{1, 2, 4, 6}), the maximum subjective utility coincides with the solution of the puzzle. No-\ntice that some choices are enhanced while others are suppressed according to their sub-\njective utility function. Especially the choice for the most frequent stimulus (x = 2) is\nsuppressed when it is suboptimal. In the case of the untrained stimulus (x = 7), the utility\nfunction is comparatively \ufb02at and variable across subjects.\n\nFinally, as a comparison, we also computed the decomposition assuming a Softmax function (or\nBoltzmann distribution):\n\nQ(y|x, r) =\n\n1\nZ\u03b2\n\n(13)\nThe mean error of the resulting \ufb01t was signi\ufb01cantly worse (error 0.0498 \u00b1 0.0032 bits) than the\none based on (7), implying that the inclusion of the prior choice probabilities P (y) improves the\nexplanation of the choice data.\n\n\u03b2(r)Ux(y)\n\nexp\n\n(cid:110)\n\n(cid:111)\n\n.\n\n6\n\nx = 2EmpiricalModelOptimumEmpiricalTrueTime [s]Choice [id]StimulusUtilityPosteriorx = 1x = 4x = 6x = 7*PriorInv. Temperature9/193/193/193/191/19\fFigure 3: Summary of inferred preferences across all subjects. The two rows depict the results for\nthe two versions of the experiment, each one averaged over 15 subjects. The stimuli of both versions\nare the same but with their colors inverted, resulting in a mirror symmetry along the vertical axis.\nThe \ufb01gure shows the inferred utility functions (normalized to the certainty-equivalent); the inverse\ntemperatures; and the prior over choices. Optimal choices are highlighted in gray. Error bars denote\none standard deviation.\n\nFigure 4: Extrapolation of the performance measures. The panels show the expected utility EU\u03b2,\nthe mutual information I\u03b2, and the expected percentage of correct choices as a function of the in-\nverse temperature \u03b2. The top and bottom rows correspond to subject 1 and the averaged subjects\nrespectively. Each plot shows the performance measure obtained from the empirical choice proba-\nbilities (blue markers) and the choice probabilities derived from the model (red curve) together with\nthe maximum attainable value (dotted red).\n\n5.2 Extrapolation of performance measures\n\nWe calculated the expected utility and the mutual information as a function of the inverse tempera-\nture using (12). The resulting curves for subject 1 and the average subject are shown in Fig. 4 together\nwith the predicted percentage of correct choices. All the curves are monotonically increasing and\nupper bounded. The expected utility and the percentage of correct choices are concave in the inverse\ntemperature, indicating marginally diminishing returns with longer durations. Similarly, the mutual\ninformation approaches asymptotically the upper bound set by the stimulus entropy H(X) \u2248 1.792\nbits (excluding the untrained stimulus).\n\n7\n\nx = 2Time [s]Choice [id]Version AUtilityx = 1x = 4x = 6x = 7*PriorInverseTemperatureChoice [id]Version BOptimum0.6521.792100.00% CorrectMutual InformationExpected UtilitySubject 1Average0.6881.78395.68\f6 Discussion and Conclusion\n\nIt has long been recognized that the model of perfect rationality does not adequately capture human\ndecision-making because it neglects the numerous resource limitations that prevent the selection of\nthe optimal choice [13]. In this work, we considered a model of bounded-rational decision-making\ninspired by ideas from statistical mechanics and information-theory. A distinctive feature of this\nmodel is the interplay between the decision-maker\u2019s preferences, a prior distribution over choices,\nand a resource parameter. To test the model, we conducted an experiment in which participants had\nto solve puzzles under time pressure. The experimental results are very well predicted by the model,\nwhich allows us to draw the following conclusions:\n\n1. Prior: When the decision-making resources decrease, people\u2019s choices fall back on a prior\ndistribution. This conclusion is supported by two observations. First, the bounded-rational\nmodel explains the gradual shift of the subjects\u2019 choice probabilities towards the prior as\nthe duration of the trial is reduced (e.g. Fig.2). Second, the model \ufb01t obtained by the Soft-\nmax rule (13), which differs from the bounded rational model (7) only by the lack of a\nprior distribution, has a signi\ufb01cantly larger error. Thus, our results con\ufb02ict with the pre-\ndictions made by models that lack a prior choice distribution\u2014most notably with expected\nutility theory [11, 17] and the choice models based on the Softmax function (typical in re-\ninforcement learning, but also in e.g. the logit rule of quantal response equilibria [5] or in\nmaximum entropy inverse reinforcement learning [18]).\n\n2. Utility and Inverse Temperature: Posterior choice probabilities can be meaningfully pa-\nrameterized in terms of utilities (which capture the decision-maker\u2019s preferences) and in-\nverse temperatures (which encode resource constraints). This is evidenced by the quality of\nthe \ufb01t and the cogent operational role of the parameters. Utilities are stimulus-contingent\nenhancers/inhibitors that act upon the prior choice probabilities, consistent with the role\nof utility as a measure of relative desirability in regret theory [3] and also related to the\ncognitive functions attributed to the dorsal anterior cingulate cortex [12]. On the other\nhand, the inverse temperature captures a determinant factor of choice behavior that is inde-\npendent of the preferences\u2014mathematically embodied in the low-rank assumption of the\nlog-likelihood function that we used for the decomposition in the analysis. This assump-\ntion does not comply with the necessary conditions for rational meta-reasoning, wherein\ndecision-makers can utilize the knowledge about their own resources in their strategy [7].\n3. Preference Learning: Utilities are learned from experience. As is seen in the utility func-\ntions of Fig. 3, subjects did not learn the optimal choice of the untrained stimulus (i.e. x =\n7) in spite of being just a simple color-inversion of the most frequent stimulus (i.e. x = 2).\nOur experiment did not address the mechanisms that underlie the acquisition of preferences.\nHowever, given that the information necessary to establish a link between the stimulus and\n\nthe optimal choice is below two bits (that is, far below the(cid:0)3\n\nto represent an arbitrary member of the considered class of puzzles), it is likely that the\ntraining phase had subjects synthesize perceptual features that allowed them to ef\ufb01ciently\nidentify the optimal solution. Other avenues are explored in [14, 15] and references therein.\n4. Diminishing returns: The decision-maker\u2019s performance is marginally diminishing in the\namount of resources. This is seen in the concavity of the expected utility curve (Fig. 4;\nsimilarly in the percentage of correct choices) combined with the sub-linear growth of the\ninverse temperature as a function of the duration (Fig. 3). For most subjects, the model\npredicts a perfectly-rational choice behavior in the limit of unbounded trial duration.\n\n(cid:1) \u00b7 22 \u00b7 6 = 72 bits necessary\n\n2\n\nIn summary, in this work we have shown empirically that the model of bounded rationality provides\nan adequate explanatory framework for resource-constrained decision-making in humans. Using a\nchallenging cognitive task in which we could control the time available to arrive at a choice, we have\nshown that human decision-making can be explained in terms of a trade-off between the gains of\nmaximizing subjective utilities and the losses due to the deviation from a prior choice distribution.\n\nAcknowledgements\n\nThis work was supported by the Of\ufb01ce of Naval Research (Grant N000141110744) and the Univer-\nsity of Pennsylvania.\n\n8\n\n\fReferences\n[1] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman Divergences. Journal of\n\nMachine Learning Research, 6:1705\u20131749, 2005.\n\n[2] D.E. Bell. Regret in decision making under uncertainty. Operations Research, 33:961\u2013981, 1982.\n\n[3] H. Bleichrodt and P. P. Wakker. Regret theory: A bold alternative to the alternatives. The Economic\n\nJournal, 125(583):493\u2013532, 2015.\n\n[4] P.C. Fishburn. The Foundations of Expected Utility. D. Reidel Publishing, Dordrecht, 1982.\n\n[5] J.W. Friedman and C. Mezzetti. Random belief equilibrium in normal form games. Games and Economic\n\nBehavior, 51(2):296\u2013323, 2005.\n\n[6] G. Gigerenzer and R. Selten. Bounded rationality: the adaptive toolbox. MIT Press, Cambridge, MA,\n\n2001.\n\n[7] F. Lieder, D. Plunkett, J. B. Hamrick, S. J. Russell, N. Hay, and T. Grif\ufb01ths. Algorithm selection by ratio-\nnal metareasoning as a model of human strategy selection. Advances in Neural Information Processing\nSystems, pages 2870\u20132878, 2014.\n\n[8] G. Loomes and R. Sugden. Regret theory: An alternative approach to rational choice under uncertainty.\n\nEconomic Journal, 92:805\u2013824, 1982.\n\n[9] P. A. Ortega and D. A. Braun. Thermodynamics as a theory of decision-making with information-\nprocessing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science,\n469(2153), 2013.\n\n[10] A. Rubinstein. Modeling bounded rationality. MIT Press, 1998.\n\n[11] L.J. Savage. The Foundations of Statistics. John Wiley and Sons, New York, 1954.\n\n[12] A. Shenhav, M. M. Botvinick, and J. D. Cohen. The expected value of control: an integrative theory of\n\nanterior cingulate cortex function. Neuron, 79:217\u2013240., 2013.\n\n[13] H. Simon. Models of Bounded Rationality. MIT Press, Cambridge, MA, 1984.\n\n[14] N. Srivastava and P. R. Schrater. Rational inference of relative preferences. Advances in neural informa-\n\ntion processing systems, 2012.\n\n[15] N. Srivastava, E. Vul, and P. R. Schrater. Magnitude-sensitive preference formation. Advances in neural\n\ninformation processing systems, 2014.\n\n[16] N. Tishby and D. Polani. Information Theory of Decisions and Actions. In Hussain Taylor Vassilis, editor,\n\nPerception-reason-action cycle: Models, algorithms and systems. Springer, Berlin, 2011.\n\n[17] J. Von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University\n\nPress, Princeton, 1944.\n\n[18] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey. Maximum Entropy Inverse Reinforcement\n\nLearning. In AAAI, pages 1433\u20131438, 2008.\n\n9\n\n\f", "award": [], "sourceid": 65, "authors": [{"given_name": "Pedro", "family_name": "Ortega", "institution": "DeepMind"}, {"given_name": "Alan", "family_name": "Stocker", "institution": "University of Pennsylvania"}]}