Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This is an interesting and very well written paper. The observation that previous methods (such as extragradient) fail in the stochastic setting for minimax problems is novel and important,. The main algorithm (SVRE) aims to address this failure. My major comment is that the theoretical guarantees given in the paper need strong assumptions, e.g. the assumptions do not hold even for the counter-example provided in section 2.2. Can the authors find a counter-example that satisfies assumption 1 (i..e both L^G and L^D are strongly monotone). Also, can the authors provide some theoretical guarantee for the case where functions are non-convex?
Overall, I found this paper to be interesting and well-written. It was somewhat surprising to me that stochastic extragradient would not converge in cases where batch extragradient does, and the authors did a nice job of presenting the counterexample and explaining why this is the case. The proposed algorithm SVRE appears to work fairly well in practice, and the main theorem (Thm. 2) appears to be correct, and doesn't make excessively restrictive assumptions. I expect these results would be of interest to the community, in part because it combines two timely topics: variance reduction and optimization of games. One comment is that in the description of variance reduced gradient methods in page 4, I think you could have explained a little more about the work of Palaniappan and Bach and why their work is a natural model for your own. In particular, what are the main differences, and how does the analysis differ? I would have been curious to see a comparison of the IS (or something comparable) as a function of wall-clock time for BatchE in addition to SE. Is there any reason that was not included? In general, it seemed like the authors did not spend as much time/space evaluating the computational tradeoffs between the algorithms as a function of wall-clock time, which I think is ultimately the more important metric (as opposed to number of mini-batches, for instance). Another minor comment is that there are some strange grammatical errors in the introduction in particular, it may be worth reading over that to clean it up.
In this paper, authors first investigate the interplay between noise and multi-objective problems in the context, and then propose a new method “stochastic variance reduced extragradient” (SVRE) which combines SVRG estimates of the gradient with the extragradient method (EG). This method succeeds to reduce noise in GAN training and improve upon the best convergence rates. advantages: Show that the noise can make stochastic extragradient with a motivating example. Combine the advantages of stochastic variance reduced extragradient method (SVRE) and extragradient method (EG). Experimentally, it effectively reduces the noise in GAN training. As shown in experiments (table2), it can improve SOTA deep models in the late stage of their optimization. disadvantages: In table2, SVRE has worst performance on CIFAR10. Authors do not give a explanation here but only show WS-SVRE, which apply SVRE from an iterate point of other method, has best performance. It will be better if the reason behind that is carefully explored.