Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

*Marco Cuturi, Olivier Teboul, Jean-Philippe Vert*

Sorting is used pervasively in machine learning, either to define elementary algorithms, such as $k$-nearest neighbors ($k$-NN) rules, or to define test-time metrics, such as top-$k$ classification accuracy or ranking losses. Sorting is however a poor match for the end-to-end, automatically differentiable pipelines of deep learning. Indeed, sorting procedures output two vectors, neither of which is differentiable: the vector of sorted values is piecewise linear, while the sorting permutation itself (or its inverse, the vector of ranks) has no differentiable properties to speak of, since it is integer-valued. We propose in this paper to replace the usual \texttt{sort} procedure with a differentiable proxy. Our proxy builds upon the fact that sorting can be seen as an optimal assignment problem, one in which the $n$ values to be sorted are matched to an \emph{auxiliary} probability measure supported on any \emph{increasing} family of $n$ target values. From this observation, we propose extended rank and sort operators by considering optimal transport (OT) problems (the natural relaxation for assignments) where the auxiliary measure can be any weighted measure supported on $m$ increasing values, where $m \ne n$. We recover differentiable operators by regularizing these OT problems with an entropic penalty, and solve them by applying Sinkhorn iterations. Using these smoothed rank and sort operators, we propose differentiable proxies for the classification 0/1 loss as well as for the quantile regression loss.

Do not remove: This comment is monitored to verify that the site is working properly