Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Mathieu Blondel, Felipe Llinares-Lopez, Robert Dadashi, Leonard Hussenot, Matthieu Geist
Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function, typically parametrized by a neural network. This allows one to capture potentially complex relationships between inputs andoutputs.To learn the parameters of the energy function, the solution to thatoptimization problem is typically fed into a loss function.The key challenge for training energy networks lies in computing loss gradients,as this typically requires argmin/argmax differentiation.In this paper, building upon a generalized notion of conjugate function,which replaces the usual bilinear pairing with a general energy function,we propose generalized Fenchel-Young losses, a natural loss construction forlearning energy networks. Our losses enjoy many desirable properties and theirgradients can be computed efficiently without argmin/argmax differentiation.We also prove the calibration of their excess risk in the case of linear-concaveenergies. We demonstrate our losses on multilabel classification and imitation learning tasks.