Reviews: Deliberative Explanations: visualizing network insecurities

The task is novel and methodology uses existing work on computing attribution maps using Taylor series approximation to compute ambiguity maps. The use and analysis of the effects of hardness scores are interesting. The work has a clear motivation (I like Fig 1), uses sound methodology, and relevant empirical experiments and so it meets the requirements for a NeurIPS-like conference. Overall, I find the paper clear enough to understand and the figures useful, especially the landing figure and the output figures. Although, I think are many easy fixes that can make the paper much more readable. Here are some suggestions (in chronological order): p. Choose a different visual illusion or remove it: I like the idea of using a visual illusion because it explains the task. But the current illusion it is not very convincing because--to me-- the individual image regions are not ambiguous but the entire image is. The bird example actually is pretty convincing and real. q. Line 72: "This observation is quantified by the introduction of a procedure to measure the alignment between network insecurities and attribute ambiguity, for datasets annotated with attributes." This is sentence is not clear, at all. r. From the introduction, it is not clear why we care about difficulty scores for generating deliberative explanations. s. Make the definition of deliberative explanation more visible: The most important part of the beginning of Section 3 is the definition of deliberative explanations, and in your current writeup, the definition is a tiny sentence at the end of second paragraph. Also, the first paragraph can be easily made more concise to make room for t. It is not clear what the green dots in the figures are and why they are only available for the bird classification dataset. u. a. Line 70: "For example, in Figure 1, insecurity 4 is caused by the presence of “black legs” ....” How do you know it is not caused by the presence of concept “two pointy sticks”?

Reviewer 2

I think that this is a good and well-written paper. Section 2 has a relevant review of related approaches. It is fairly easy to follow the arguments in most places of the paper and notation is neat; though, sections 3.1, 3.2, and 3.3 are quite compressed, and I could not understand the details deeply even though notation is very good in those sections. The examples in Fig. 1 (right) are very useful and encouraging, but the authors could clarify if those particular examples where generated by their methods. In Table 1, third section, the improvement of the second-order methods is rather small (small effect size). It "always outperforms" as the authors said, but I am not sure if this is a clear improvement. Do other results also confirm the improvements due to the second-order method?

Reviewer 3

Strength (beyond the points mentioned in the contribution): 1. This paper establishes a new framework for interpretability evaluation. The proposed method extends interpretability mining to the granularity level of attribution and enhances it with ambiguity class pairs. 2. This paper is well-organized. The author provides motivation with detailed examples and mathematical formulations to quantify the concept of insecurities. 3. This paper has a rigorous experimental design and convincing evaluation standard. The comparison on different formulation of difficulty score is particularly relevant to this field of research since they stand for different rationale for quantifying ambiguity. Weakness: A limitation of this method is that it needs to be trained on datasets that have strong annotation of feature. Most classification dataset lacks such annotation. A reasonable next step of research could focus on the automatic discovery and proposal of features that are relevant for the downstream tasks.

Paper ID:	789
Title:	Deliberative Explanations: visualizing network insecurities

Reviewer 1

Reviewer 2

Reviewer 3