NeurIPS 2020

Deep Metric Learning with Spherical Embedding


Meta Review

This paper points out a widespread problem with angular losses, and proposes a simple, elegant scheme to address the problem (regularizing each embedding to lie on a shell), getting moderate but consistent improvements across a range of problem settings and datasets. As pointed out by Reviewer 5, the majority of the theoretical results were already known in Section 3.3 of "Heated-Up Softmax Embedding" (2018, unpublished, https://arxiv.org/abs/1809.04157 ). That paper, however, did not really propose a solution to the problem, merely noted its existence. Reviewer 5 also complains that the interaction with the Adam optimizer is under-explored in this work. Reviewer 3 points out that previous work, e.g. "Improved Deep Metric Learning with Multi-class N-pair Loss Objective," also regularized the L2 norm of embedding vectors (towards 0; see their Section 3.2.2). Table 2 in your rebuttal, however, shows clearly that this yields substantially worse performance than your proposal of regularizing towards the average, on one particular problem. Reviewer 4's primary remaining complaint after the rebuttal is that the proposed SEC scheme is too simple. In this case, where there are dozens of papers that might have benefited from this scheme but (so far as any of us know) have not invented it, I don't view this as a disadvantage, but rather a benefit, since it will be trivially easy for future work using angular loss functions to employ SEC. Thus, I am left to agree with Reviewer 2, who views this work as "a novel method that can be easily incorporated to several tasks and domains with clear explanation and good evidence to support it," and therefore recommend acceptance. Some changes are necessary in the camera-ready version: - First, you need to cite "Heated-Up Softmax Embedding" in your theory section (and introduction) as having made most of the same observations in the past. The novel contribution of this paper to the literature, then, is primarily the SEC method and the demonstration that it works reliably in several settings. - The various changes to the discussion, clarifications, additional experiments, etc. mentioned in the rebuttal will all improve the paper. In particular, some more discussion – and, ideally, experimentation – of the situation with the common Adam optimizer would be helpful. (Incidentally, I also found the terminology "vertical to" quite unusual; the standard term would be "perpendicular" or "orthogonal.") - In addition, I would strongly recommend adding at least a subset of the mu=0 methods considered in the rebuttal's Table 2 to all of your experiments in the paper, to more convincingly demonstrate that regularizing towards the mean norm is much superior to regularizing towards zero in a variety of settings. - Finally, one of the main application areas for SEC is in facial recognition, a (rightfully) extremely controversial area – see, e.g., https://dl.acm.org/doi/10.1145/3313129 or https://en.wikipedia.org/wiki/Facial_recognition_system#Controversies . This is not by any means grounds to reject your paper. But, seeing a paper significantly about facial recognition claim that discussions of broader impacts on society and ethical considerations are "not applicable for our work" reduces public trust that machine learning researchers are responsible members of society, who, say, can be trusted to operate without onerous regulation. You must update the Broader Impacts section to note that, although facial recognition is quite controversial as a technology, there is no reason to expect that SEC's mild improvement to facial recognition performance should make any substantial difference to its societal application, nor is it expected to exacerbate its e.g. racial unbalances. (In fact, it would be interesting to study, though perhaps beyond the scope of this paper, whether the unregularized embedding norms end up correlated to various demographic attributes, and hence SEC might even improve fairness in these systems.)