NeurIPS 2020

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness


Meta Review

3 out of 4 reviewers have accepted as this work as original and offering convincing experiments. However, a knowledgeable reviewer (R4) issued a clear reject. The ensuing discussion over the reason of the reject shows that the meta-reviewer agrees with the concerns of R4, but that the debate this paper triggers may make it worth publishing. This paper offers two clearly distinct algorithms: - one based on Gaussian Processes (GP) builds a loss where the distance between an example and the training data in the last hidden layer is taken into account for OOD modelling - one based on Spectral Norm (SN) better ties the distance in the hidden space to the input space distance. This is justified by Lipschitz bounds that seem very loose. The objections raised by R4, but also hinted by other reviewers are serious: in a deep learning architecture, as the input data lives in a low dimensional manifold, there is no reason for a distance that is not aware of this manifold to be meaningful (except locally as shown for adversarial learning). Many distance-based methods for OOD look at the activations in penultimate layer and do not justify this from a mapping to the input layer. However, experiments reported in the appendix (section C.2, table 6) show that the SN algorithm is essential for performance. While I agree with R4 that it is unlikely this algorithm can be properly justified by the loose Lipschitz bounds, like Batch Normalization, the authors may have stumbled into a very powerful algorithm with an unsatisfactory explanation. Acting in the opposite direction as Batch Normalization, this algorithm seems to reduce the range of weights and activations and improve calibration.