Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental


Ehsan Amid, Manfred K. K. Warmuth, Rohan Anil, Tomer Koren


We introduce a temperature into the exponential function and replace the softmax output layer of the neural networks by a high-temperature generalization. Similarly, the logarithm in the loss we use for training is replaced by a low-temperature logarithm. By tuning the two temperatures, we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural networks by our bi-temperature generalization of the logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large datasets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method that uses the Tsallis divergence.