Paper ID: | 3730 |
---|---|

Title: | Differentiable Ranking and Sorting using Optimal Transport |

Post-response comments: In light of great enthusiasm from other reviewers, I see no reason to stand in the way of acceptance. I was already leaning positive on this paper, so will bump my rating to a 7. I was glad that the authors provided more convincing experimental results in their response, and that they agreed to provide more examples to accompany their verbal descriptions and equations. These will help to prevent obfuscation of the simple and attractive generalization that they present in the paper. Related to this: I think they misinterpreted my point here, and suggested a change to the introduction that I don't think is necessary. I was fine with their introduction as is. I think that grasping the basic idea of their generalization is well within the capability of the average NIPS reader, if presented in a concrete manner. Lastly, I might suggest that they briefly add notational notes on their use of \circ for elementwise multiplication and vector notation for permutations. ============================================= I found the paper to be a nice theoretical discussion of generalized sorting, quantiles, and CDFs. It includes some experiments that show its applicability, but it was not clear to me that these were effective uses of the concepts constructed in the first part of the work. I'm interested to hear other opinions on this. As such, I tend very slightly towards acceptance, with a recommendation to work on finding more effective uses for the tools presented, if rejection is the result. More below: Originality: I think the basic idea of generalizing sorting is quite fundamental and interesting to consider, so I quite liked this part. As acknowledged by the authors, there seem to be a few other works that have targeted differentiability of related concepts via Sinkhorn, so I'm less certain on the degree of novelty for this portion. The applications did not seem overly original. Lastly, I look forward to hearing the perspective of others since I'm not intimately familiar with works [1,11,15,16]. Quality: The theoretical portion is quite complete, but I did not feel like the experimental results were terribly supportive of its efficacy in the suggested uses. Clarity: The paper was mostly clear, but I found the notation to be a little confounding. The use of function composition as elementwise product in Def. 1 was a particularly unfortunate use, IMO. The greatest contribution to my understanding of the concept was given by the single example of Fig. 1, so I applaud its inclusion. Further small notes in the Improvements section later. Significance: As noted, high on the theoretical front, low on the experimental front, as far as I can tell.

# Basic idea is cool I think the basic idea is to do stuff like this: def compute_differentiable_quantile(Xs, quant, epsilon, n_iterations, metric): P = sinkorn(Xs, [0,1,2 ..., len(Xs)], epsilon, n_iterations,metric) return (Xs @ P) [int(quant * (len(Xs)-1))] Note that this is differentiable w.r.t X. What's more, as long as the metric is translation-invariant, it turns out you get exact quantile computation in the limit as epsilon ->0 and n_iterations -> infinity. I suppose as epsilon -> infinity you'll get the mean of the Xs. This is cool. I don't think you needed to motivate it as much as you did, first describing the exact OT and proving the connection to quantiles and then loosening it to sinkorn. Instead, if you just write the procedure down and then say "hey look this is a quantile when epsilon -> 0" and prove in the appendix, everyone will be just as happy. I actually found the motivation distracting, I'm afraid. Anyway, idea is cool. # A question left undiscussed So your set \mathbb{O}_n seems really important. In the exact OT case I see that it doesn't matter, but in Sinkhorn land it seems like your choice could be really important. Maybe I'm missing something, but I'd like to see this discussed. # Results Not great, but that's ok. This thing seems so general that its got to have a use somewhere.

After reading the feedback from the authors: I am pleased to see that the authors have already managed to obtain much more convincing numerical results. I maintain my rating, and look forward to other applications of the Sinkhorn sorting operator in the future. == The article starts with an equivalence between sorting and an optimal transport problem. It leverages this equivalence to introduce a new generalized sorting operator, called Kantorovich sorting, and an entropy-regularized version called Sinkhorn sorting. The latter is special in that it corresponds to a differentiable function of its input vector, as opposed to standard sorting. Consequently some uses of sorting operators in machine learning can then benefit from gradients, which would be unavailable with the standard sorting operator. The paper introduces new fundamental objects, Kantorovich and Sinkhorn sorting operators (as well as CDFs and quantile functions), in a very insightful and elegant presentation (despite a number of typos, see below). Applications to a variety of tasks, such as top-k loss, quantile regression, or soft quantile normalization, help to understand the potential use for machine learning. Nevertheless, the main contribution is of a conceptual nature, and while it is hard to anticipate where exactly the proposed objects will be most useful, they are clearly of interest on their own right, and very innovative. The numerical experiments are a nice addition but it seems clear that the most fruitful applications of the new operators are yet to be found. There are also some open questions about the choice of "m" and of the regularization parameter. Overall I find the proposed manuscript to be very exciting. It is very original and very clear thanks to examples and diagrams. It can have far-reaching implications for a vast number of tasks in machine learning, despite the relatively modest advantages illustrated in the numerical experiments. There are a number of typos that can be easily fixed.