Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Originality: -Compared to Cotter et al, they have shown more classes of problems (i.e. rate metric objective & constraint, sum of ratios) can be reduced to finding a saddle point. -In terms of theoretical techniques of solving the problem, I am not exactly sure (due to my confusion which I describe more further below) exactly what is novel and what is not, especially compared to Cotter et al. My understanding is that even with surrogate approaches, the authors of this paper have shown that there's a way to get convergence by having both max and min players play no regret algorithms (with respect to true lagragian and surrogate lagrangian respectively). In Cotter et al, the non-zero sum issue can be side-stepped with one player playing to minimize no-swap regret. Quality: -Overall, the paper flows nicely with sections divided up to account for different problems (rate metric, objective & constraint, sum of ratios). Also, most of proofs in the appendix seem to to be good. -Also, the empirical evaluations of the algorithm for different settings and on different datasets shows practicality of the approach and should be given credit. Clarity: -Personally, 'three player' game seems a little misleading in that it's rather that for the min player (i.e. the player who's trying to minimize the Lagrangian), the objective function is separable, so it can be minimized by minimizing L_1 and L_2 separately. Essentially, these two players trying to minimize the Lagrangian should be viewed simply as one player. Even when the surrogate functions are used, the minimizing player is trying to minimize L_1 + tilde(L)_2. -I don't quite understand the connection to Cotter et al. In both of these papers, the problem is when the constraints are non-convex and surrogate functions need to be used. In my understanding, Cotter et al's approach seems to be that it provides a proxy Lagrangian (instead of the real constraints, it uses the surrogate) where one player is playing no-regret while the other's trying to minimize no swap regret; the primal player is trying to minimize a proxy Lagrangian that involves the surrogate constraints, whereas the dual player is trying to maximize Lagragian with respect to the true constraints. In this paper, it's shown that one can do a similar thing through tilde(L)_2 (instead of real constraints, it uses the surrogate), but simply having both players play no-regret converges to approximately optimal solution. In this sense, my understanding is that proxy Lagrangian and tilde(L)_2 both serve the same purpose and the only difference is how they get the approximately optimal solution. For this reason, with my understanding (which may be wrong), I think it's more fitting to say that a different approach has been proposed than to say that an 'open problem' has been solved. Significance: -I think their results that that dynamics between a primal and dual player can solve many more constrained optimization problem involving non-convex functions are significant. Also, their unifying approach to show how previous works (SPADE, NEMSIS, etc) can be re-derived and/or be improved upon (appendix B, C) is quite nice, as in the future, one can turn to the given framework to solve the problems. Even just the fact that these problems can be reduced to this approached should be credited. However, this significance of the framework should also be qualified by noting that there are some previous papers that have already provided a similar framework. -Showing that both players playing no-regret with respect to true constraint and surrogate constraint respectively to attain an approximately optimal solution is of theoretical importance. -Availability of the tensor flow code will be of significance, too. Typos: -Right after line 485, right parenthesis is missing. -Line 568. Surrogate functions are not used in the oracle-based approach. -Line 578 "first state a lemma that* relate[s]" ********** POST REBUTTAL ************ They have clarified the paper's relationship to Cotter et al and said that they will add some clarifications in the main body and the appendix. Hence, my confusion about this has been addressed pretty well, so I still vote for accept.
The paper is well-written. See answers to question 1 for comments on originality etc.
Overall the paper provides new and insightful approach to tackle learning problems with non decomposable performance metrics. The result generalizes some existing approaches to optimizing non decomposable metrics and give provable algorithms for more general objective functions or constraints made of classification rates. The authors also resolve open questions from earlier work of Cotter et al (ALT 2019, JMLR 2019) about using OGD on true and surrogate Lagrangian to achieve convergence as a positive result although receiving a slightly worse convergence rate.) The problem of optimizing non decomposable metrics is significant and this paper provides algorithms with theoretically guarantees for a class of such learning problems and thus the results are themselves significant.