Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Janine Thoma, Danda Pani Paudel, Luc V. Gool
Localization by image retrieval is inexpensive and scalable due to simple mapping and matching techniques. Such localization, however, depends upon the quality of image features often obtained using Contrastive learning frameworks. Most contrastive learning strategies opt for features to distinguish different classes. In the context of localization, however, there is no natural definition of classes. Therefore, images are usually artificially separated into positive and negative classes, with respect to the chosen anchor images, based on some geometric proximity measure. In this paper, we show why such divisions are problematic for learning localization features. We argue that any artificial division based on some proximity measure is undesirable, due to the inherently ambiguous supervision for images near proximity threshold. To this end, we propose a novel technique that uses soft positive/negative assignments of images for contrastive learning, avoiding the aforementioned problem. Our soft assignment makes a gradual distinction between close and far images in both geometric and feature spaces. Experiments on four large-scale benchmark datasets demonstrate the superiority of the proposed soft contrastive learning over the state-of-the-art method for retrieval-based visual localization.