Paper ID: | 1860 |
---|---|

Title: | Threshold Learning for Optimal Decision Making |

Drift-diffusion model are a well-validated and neuroscience-supported approach for modeling the result of two-alternative forced choice experiments in psychophysics and neuroscience. Decision making under uncertainty is modeled commonly as a process of competitive evidence accumulation (a drift-diffusion process) that reaches a decision when a threshold is hit. WHat is the optimal threshold to set for a given problem? By recasting the problem one of miniminsing the stopping time and decision error the authors obtain an optimal threshold. This is then comptued using two different algorithms that treat the optmisation as a reinforcement learning problem.

This is a nice piece of work, but i would like to see it applied to human or animal data (which is available online) and not just synthetic exmaple. How optimal are humans if at all? Can you provide us a sense of this, passing this threshold, would enable me to up-rate this paper's impact.

2-Confident (read it all; understood it all reasonably well)

The drift-diffusion model (DDM) is a standard approach for modeling the result of two-alternative forced choice experiments in psychophysics. In its simplest version, it consists of a random walk representing the evidence accumulation process, which evolves until hitting either an upper or lower threshold associated with the choices respectively. This paper discusses the problem of finding optimal thresholds for DDM. Prior studies suggest that thresholds are set in order to maximize the reward rate; however, the mechanics for doing so are not well understood. In a nutshell, this paper proposes that thresholds are determined by solving a meta-optimality problem that attempts to minimize the "Bayes risk" (a convex objective function which is assumed to arise from a loss function that is linear in the expected decision time and the type I and II errors). The resulting meta-optimization is then treated as a continuous-armed bandit problem. Two bandit strategies are discussed: one based on REINFORCE and another based on Bayesian optimization (with a custom acquisition function). These are compared against each other in empirical tests.

Pros: - The question addressed is very interesting. Understanding the mechanisms by which the DDM thresholds are chosen might have farreaching implications for psychophysics. - The paper is well written: the mathematics is sound and the empirical results well presented. Cons: - There are potential conceptual pitfalls in addressing the question of "optimally setting the very parameters of optimal choice". Why is it obvious that this can be solved using meta-reasoning? This problem is not addressed at all. - There's quite a bit of typos in the math. For instance, equation (4) uses t and T. - Some design choices are unjustified. What loss function was used to formulate the Bayes risk? Why is the bandit strategy a linear combination of binary units? Isn't the exponential spread of coefficient values s_j not just a-ary coding? - IMHO however, the most important shortcoming of the paper is that its findings are neither sufficient nor surprising. From a ML point of view, the models are straightforward and thus less interesting; and from a psychophysics point of view, it is hard to evaluate the relevance of the proposed models without a careful comparison to monkey or human choice data. I believe that this paper has potential, but at this stage it seems premature.

2-Confident (read it all; understood it all reasonably well)

This is a clearly written paper which presents and evaluates (through simulation) practical algorithms for decision threshold optimization.

Major comments: - I think this paper makes a valuable contribution to the theoretical literature on DDMs. It seems that this paper is written primarily for a neuroscience/psychology audience; these have been the main consumers of these algorithms. However, I was disappointed to see almost no application to neuroscience or psychology. For example, are these algorithms cognitively or biologically plausible? Is there empirical evidence to support any of these algorithms? What kinds of experiments could be done to test these hypotheses? - It would be really useful to see learning curves for the two methods as a function of computation time rather than trials. This would help determine the optimal choice of algorithm for a particular computational budget. - It's never explained why Bayesian optimization has a larger variance. - Simen, Holmes & Cohen (2006) describe a neural network that optimizes decision thresholds. They claim that it is neurally plausible. Frank (2006) also presents a biologically plausible mechanism for threshold adjustment (though not in the DDM formalism). Both of these models are put forth as theories of how the brain actually does threshold optimization, so they are important to discuss (and compare against with respect to empirical data) if we are to take seriously the new model as a psychological/neural theory. I have read the other reviews as well as the author feedback, and I have decided to keep my current scores. Minor comments: - p. 3: W0 and W1 appear to be error costs, but this isn't stated explicitly. - p. 3: "sum collected" -> "sum of collected" - Footnote 1: W0/W1 are referred to as thresholds, but I think theta0/theta1 are meant here. - p. 4: "dependance" -> "dependence"

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The authors consider learning the thresholds of a drift diffusion model and apply two well known algorithms, REINFORCE and Bayesian optimization to do so.

Why do the asymptotic values in Fig.2D and 3D differ, in particular if the Bayes risk has a unique minimum? Optimal thresholds have been considered before and have been found to be time-dependent, e.g. J Drugowitsch et al J Neurosci 2012, S Deneve Frontiers 2012 and Huang et al NIPS 2012. Missing links to actual neural and behavioral data. REINFORCE can obviously be improved using reinforcement comparison for the reinforcement baseline instead of zero, in order to reduce the variance of the gradient estimate.

2-Confident (read it all; understood it all reasonably well)

The paper talks about the problem of finding the optimal time to make an action in the problem of decision making under uncertainty. This is the same as finding the proper threshold in the drift-diffusion model commonly used for two-alternative forced choice tasks. The paper proposes two methods, i.e. REINFORCE and Bayesian optimization to learn this threshold.

The paper brings up an important problem and it is presented well. However, as authors mentioned, no biologically plausible implementation is suggested in the paper. Also to my understanding, the current models only handle the Bayes risk. There is also a biological cost for observing and accumulating evidence (see Drugowitsch et al 2012, J neuro). How is it possible to add this cost of observing to the methods for learning the threshold?

2-Confident (read it all; understood it all reasonably well)

This paper represents an interesting application of two existing learning algorithm, Williams' reinforce algorithm and Bayesian optimization, on estimating the decision threshold in 2-AFC task. The threshold is estimated by minimizing the difference between the reward sum of the optimal decision policy and the sum of collected reward. The author also compares the performance of the two algorithms and maps the SPRT decision performance over main parameter space. This paper looks technically sound. I like the main the idea of this paper because estimating decision threshold is an important problem in decision science.

Technical quality: This paper focuses on estimating a decision threshold - the problem which is not sufficiently studied in this community. Two existing algorithms are used to estimate the threshold and their performances are compared against each other. The paper is interesting and technically sound. Novelty: The idea of estimating the threshold has some novel contribution to the computational psychology society. However, the algorithms used in this paper are not novel because the author uses these algorithm without much modifications based on the previous work [5],[6] & [7]. Moreover, in this paper, other parameters need to be fixed when the optimal threshold is being estimated. Since other parameters are fixed, a naive approach can be searching the parameter space of the threshold to minimize the cost function, which can be computed by averaging trials simulated with SPRT method. The complexity of this problem is similar to other problems aiming at estimating parameters with fixed threshold.I think the main novel contribution of this paper is trying to estimate the decision threshold which has not been sufficiently studied. Potential impact: This paper estimates a constant decision threshold in 2-AFC task with infinite horizon. I think this paper will attract attention from the researchers in psychology and cognitive science society. However, in most psychology experiment, the 2-AFC task needs to be finished within a certain deadline, thus making the threshold change over time within each trial. I would like to see a more sophisticated estimation algorithm to estimate the time-dependent threshold under finite horizon. Moreover, this paper would have more impact if the algorithm can jointly learn the threshold with other parameters. Clarity and presentation: The paper is not very clearly written. The method section misses a lot of important information. It would be hard for some readers who are trying to replicate the work by reading through this paper. For example, in section 3.2 REINFORCE METHOD, the author should provide an illustrative network figure so that the reader can have a clearer idea about what your inputs are and how your network is trained. The information in Sec 3.3 is also vague. I understand that the author lists corresponding references including more details. However, the main framework of those algorithms should be clearly explained in this paper. I would recommend the author includes more information about the algorithms and condense the long discussion section (cut by half). Also, please check and correct a few typo, e.g., 1.For equation (6), I think you mean "-NC_risk/c" 2.Line 114 "work work well" --------------- After discussion, I realized that optimal threshold has already been studied by other researchers with different approaches, so I lowered the impact score as the the advantage of the methodology used in this paper is not clear when compared to the others. I also increased the clarity score after I read the paper again.

2-Confident (read it all; understood it all reasonably well)