
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Major issues:
1. Too many technical details for simulations are missinge.g., sure rewards, cost of sampling, discount factor.
2. The assumption that the decision maker knows the possible states (coherences) seems entirely unrealistic for the usual experimental scenario, where that information is not provided. As noted on line 256, coherence is observable to the experimenter, but not to the subject.
3. The paper stakes the claim that the POMDP model is better than a DDM, but no comparisons are actually made.
4. The logic of using two POMDPs (with and without the environment model) as models for accuracy and confidence was not clear to me.
5. The virtues of normativity for modeling human behaviour are not adequately defended here. Why would normativity be good in and of itself?
6. Does the model make any clear, testable predictions? I don't see any adduced hereonly fits to the existing data.
Smaller bits.
The figures were very hard to read. The lines were thin and dots tiny. The modeling plots should strive to be like the data plots (e.g., 1A, 3A, 6A). Also, the legend should appear with each plot.  What temporal dynamics cannot be captured by a DDM (implied on line 55). line 79, trial (not trail). also in caption to Fig 4. line 140. goal is maximizing expected discounted reward (not total) line 208. actions ... are Fig 5 was entirely unreadable. The data in Fig 6 (sacc end point) do not match the supposed dependent variable (confidence).
Q2: Please summarize your review in 12 sentences
This paper presents a POMDP model of confidence judgments in decision making. The simulations, however, are vaguely described and the case for how this model compares with existing approaches is not strong.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors' model confidence data from two experiments (conducted by others and previously published in the scientific literature) using a POMDP. In both experiments, subjects saw a randomdot kinematogram on each trial and made a binary choice about the dominant motion direction. The first experiment used monkeys as subjects and stimuli had a fixed duration. The second experiment used people as subjects and stimuli continued until a subject made a response. The paper reports that the POMDP model does a good job of fitting the experimental data, both the accuracy data and the confidence data.
I should start by stating that I know a bit about perceptual decision making and a (small) bit about POMDPs but I am unfamiliar with the application of POMDPs to perceptual decision making. The paper reports that this is the first time that a POMDP has been used to model confidence data in a perceptual decision making task. I'll take the authors' word on this.
Taken as a whole, I have a positive opinion of this manuscript. I very much like the problem studied here (modeling confidence data in perceptual decision making) and I like the idea of applying a POMDP to this data. The fit to the experimental data is qualitatively good (though quantitatively mediocre).
A significant shortcoming of this paper is the lack of a literature review section providing context for the work provided here. Other researchers have previously applied POMDPs to perceptual decision making. Did they only model accuracy data? Did they model reaction time data too? How does the POMDP studied here relate to the POMDPs presented in these earlier papers? Is it largely the same or is it substantially different? If the latter , how (and why) is it different? If the current paper is the first to apply POMDPs to confidence data in a perceptual decision making task, how did researchers model confidence data previously? Without answers to these questions, there is no way to evaluate the novelty and significance of the contribution made by the current paper.
Q2: Please summarize your review in 12 sentences
I like the problem studied here, and I like the application of POMDPs to confidence data in the context of perceptual decision tasks. However, the paper does not include an adequate literature review section and, thus, it is hard to know the extent to which the reported research is a novel and significant contribution above and beyond what already exists in the literature.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We wish to thank all the reviewers for their
insightful comments. Any new explanations presented here will be added to
the paper as well. We will also include a supplementary doc containing
more details of the model and simulations. These details will be explained
here in summary. Also, the plots will be improved.
Model (Reviewer
1,3) There are two POMDPs: one for the experimenter and one for the
subject. At time t, b_t of the experimenter's POMDP can be related to
accuracy and b_t of the subject's to confidence. The experimenter has the
perfect model of the environment. The subject has the same model except
for awareness of the 0% coherence state. The main difference however, is
not the model but the initial belief state (b_0). When the data is binned
based on the coherence, the experimenter knows the coherence in each
single trial, but not the subject. As a result, b_0 of the experimenter is
.5 in the two states with corresponding coherence (two directions), and 0
for the rest. b_0 of the subject is always a uniform distribution among
all states.
Simulation (Reviewer 1,3,4) Fixed duration: To find
free parameters we fit the experimenter's POMDP to fig 1a. We start from a
guessed value for each parameter and then gradually change the values to
get a closer fit. There are 5 different coherences, and 7 data points for
each coherence (for each 100ms). For each data point at time t and
coherence c, the average belief of experimenter's POMDP at that time (b_t)
should be equal to accuracy at that time. This average belief is
calculated by running 1000 simulations of POMDP for each t and c. After
finding parameter values that give the closest fit, we predict confidence
and wagering by generating b_t for the subject's POMDP (fig 2 & 3).
Again the average b_t is from 1000 simulations. Parameters found: r_sure
= 0.68, r_sample = 10^5. Reaction time: To find free parameters, we
start from guessed values and start to search by fitting: 1) The average
final b_t (when action(a_t)=up/down) of experimenter's POMDP to average
accuracy, for each coherence (fig 5a); 2) The average time to choosing a
direction (t when a_t = up/down) in the subject's POMDP to average
reaction time in the experiment, for each coherence (fig 5b). Each average
value is obtained from 1000 simulations. After finding the parameters, the
confidence v.s reactiontime plot is predicted by simulating subject's
POMDP and using the final b_t (when a_t=up/down) (average of 1000
simulations for each coherence). Parameters found: gamma = 0.97, r_sample
= 10^5.
Novelty in POMDP (Reviewer 2) This is the first time
that confidence has been modelled with POMDPs. Other POMDP papers only
considered accuracy. The focus of [18] (cited in submission) is on network
implementation of the POMDP rather than prediction in experiments. [5] and
[6] test their predictions on experiments but only on the reaction time
task (not fixed duration) and only on accuracy vs. coherence and reaction
time vs. coherence (not accuracy vs. reaction time). There is no discount
factor in their model and cost of actions remains constant throughout the
trial. Our confidence model and our discount factor explains for the first
time certain important attributes of task such the absence of reaction
time vs. accuracy analysis in previous work and why confidence decreases
with time (lines 366370 & fig 6). The material on mathematical
soundness of the POMDP approach (sections 2 & 3.1) is new as
well.
DDM (Reviewer 1) As mentioned in the introduction, a
separate optimizer is needed for the DDM to include costs. Also, not all
kinds of uncertainties can be incorporated in the DDM. For example, in
modeling confidence, there is an uncertainty over coherence, which is the
drift rate in the DDM. In the classical DDM, there is no uncertainty over
the drift rate. One could add functions to DDM to cover these situations,
but this confirms that DDM is not a unifying and universal model for
decisionmaking and needs to be modified in an adhoc manner for different
tasks.
Normativity (Reviewer 1) Normative models are important
because they allow us to compare biological strategies to how an optimal
agent 'should' behave under some explicit notion of what the agent is
trying to optimize.
Qualitative results (Reviewer 7,1) While
exact quantitative fits can provide interesting insights, we have opted
for qualitative fits for the following reasons. First, using extra
parameters, e.g., mapping between saccade endpoint and probability (fig 6,
lines 407409) or assuming an imperfect model for the subject (lines
411417) can lead to overfitting and hurt the generality of the framework.
Also, the main contribution of this paper is presenting a new normative
framework for confidence based on POMDPs and demonstrating its
mathematical soundness. The data presented has been previously published
and not new  the emphasis is on confidence prediction within the context
of the POMDP framework. 
