Reviews: On the (In)fidelity and Sensitivity of Explanations

Please see above for my summary of the paper's main contributions. Originality: Incremental overall. - Infidelity: The main innovation in Definition 2.1 appears to be taking the mean square error over a distribution of perturbations. While it is nice that the optimal explanation can be found as in Proposition 2.1, it is the straightforward solution of a quadratic minimization. - Sensitivity: Definition 3.1 is closely related to local Lipschitz continuity (4) and is difficult to claim as original since it is a rather basic notion. In Section 4, the idea of smoothing explanations is not novel but the analysis leading to Theorems 4.1 and 4.2 is novel. Quality: - Below Proposition 2.1 and also in the abstract and introduction, it is claimed that the "smoothing operation [is] reminiscent of SmoothGrad". I think this claim is vague and needs justification. - Since the explanation function Phi(x) is vector-valued, \nabla_x Phi should be a matrix of partial derivatives. However before (3), the text states "norm of the sensitivity vector: || \nabla_x Phi(f(x)) ||". I think a matrix norm is required and the wording needs to change. - Section 5: It would be good for completeness to also consider the classes of perturbations in Section 2.3 to verify the claims of optimality (or at least near-optimality) in that section. - Appendix B.3: Figure 6 suggests that a good value for the SmoothGrad radius is around 1 and that the value of 0.2 mentioned in Appendix B.1 is too small. Increasing the radius may yield a fairer comparison. Significance: - I think that Section 5 presents compelling evidence, particularly of the benefit of optimizing explanations for the application, and also of smoothing explanations. In Table 1, the infidelity numbers for the optimal methods are much lower. This appears to translate into qualitatively better visualizations in Figures 1 and 2 and higher human accuracy in Table 3. Figure 4 is particularly striking. - Proposition 2.1: I think more attention should be paid to the non-invertible case to be more useful. Is it true for example that if I is deterministic, the integral of I I^T is rank-one and cannot be inverted? - The motivation for Definition 3.1 and its advantages over e.g. local Lipschitz continuity (4) are unclear. Lines 211-212 state that "the main attraction ... is that it can be robustly estimated via Monte-Carlo sampling" but I don't see why this is not true for other sensitivity measures. - While I think Theorem 4.2 is a sound result, I would have liked to see it taken further to have more impact. Can the constants C_1 and C_2 be estimated in practice? Since smoothing with a Gaussian kernel has already been proposed in the form of SmoothGrad, what is the benefit of introducing a more general kernel? For example, how does the choice of kernel affect sensitivity and fidelity? Clarity: - The writing is a little repetitive overall. As one example, I think the point about simultaneously reducing sensitivity and infidelity is repeated and emphasized in too many places. As another example, lines 65-72 in the introduction could be more focused. - In the introduction, "significant" versus "insignificant" perturbations are never really defined. I think these terms are confusing since the perturbations are conceptually different: explanations should capture changes in the prediction under perturbations of interest to the test point, while the explanations themselves should not be too sensitive to perturbations of the test point. - Line 105: Should "0" be a more general baseline value? - Line 116: A different symbol is needed for the domain of I. - Line 148: Does the circle with a dot inside mean element-wise product? - Proposition 2.3: Should all basis vectors e_i be considered to get all components of the gradient, not just individual partial derivatives? - Line 191: I do not understand the modification. - Section 3: The notation for the explanation function is not consistent. Phi(f, x), Phi(f(x)), Phi(x) are all used. - Theorem 4.1: I believe it is assumed that the kernel k(x,z) is non-negative. While this might be implied by the term "kernel", I think it's clearer to make such assumptions explicit. - Section 5: Pointers to specific sections in the appendix are needed as it is hard to navigate on its own. It would be better if some of the details in the Setup paragraph could be brought into the main text, e.g. number of images, baseline value and magnitude of noise. - Appendix B.1: Why does the optimal solution require so many (20,000) Monte Carlo samples? - Appendix B.1: Does "square size from 1x1 to 10x10", etc. mean that the perturbation square size is drawn uniformly from 1 to 10? *** Response to Author Feedback *** I thank the authors in particular for their responses regarding 1) additional "sanity check" experiments, 2) tuning the SmoothGrad radius, 3) the relationship between SmoothGrad and Proposition 2.1, and 4) the non-invertible case in Proposition 2.1. I encourage the authors to add as many of these results and clarifications to the main paper as possible. Regarding 4), my quibble is that Proposition 2.2 as written states that the completeness axiom is a necessary condition for optimal infidelity but not necessarily sufficient. It would be good to check that the two are indeed equivalent.

I understand that definition 2.1 is new, but this is not entirely clear to me. The authors should explain what is new in definition 2.1. Has it been considered in the literature in the past? The implications of Def. 2.1 presented in sections 2.2 and 2.3 are strong and interesting. One of the key contributions of the paper addresses the question of how to modify explanations to improve sensitivity and fidelity. The authors show that they can improve both sensitivity and fidelity simultaneously. In order to judge the significance of this contribution, I would expect the authors to clearly explain the differences between fidelity and sensitivity. Fidelity is based on the effect of perturbations, whereas in lines 58-59 the authors say "It is natural to wish for our explanation to not be too sensitive, since that would entail differing explanations with minor variations in the input (or prediction values)". It is clear to me that fidelity and sensitivity are highly correlated, and I am a not sure why anyone could think that it is important to show that sensitivity and fidelity could be improved simultaneously. Sensitivity seems to be a weaker metric because it does not use information about the outputs when the inputs are perturbed, whereas infidelity uses the error. So, it is a kind of trivial that infidelity that has both perturbations and access to the outputs is consistent with sensitivity that knows about perturbations but ignores the outputs. I believe that if the authors could explain the difference between sensitivity and fidelity, and it would be possible to see importance of this contribution. It seems to me that the idea of smoothing of explanations is related to "ensembles" in machine learning. If so, the authors should mention this fact clearly and state that their explanations are in fact ensemble explanations. As long as I am familiar with explainability in machine learning, my knowledge of metrics to evaluate explanations is limited. For this reason, my questions above ask for some clarifications. The experimental results are very comprehensive, and they indicate that the methods proposed in this paper can generate explanations that are useful both with respect to a subjective evaluation by humans and quantitative metrics. I have to admit that the empirical evaluation is strong. The authors could proofread the paper to fix small errors, e.g., "to to" in line 96.

Paper ID:	5866
Title:	On the (In)fidelity and Sensitivity of Explanations

Reviewer 1

Reviewer 2

Reviewer 3