Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Yuanhao Wang, Jian Li
This paper studies minimax optimization problems $\min_\x \max_\y f(\x,\y)$, where $f(\x,\y)$ is $m_\x$-strongly convex with respect to $\x$, $m_\y$-strongly concave with respect to $\y$ and $(L_\x,L_{\x\y},L_\y)$-smooth. Zhang et al. \cite{zhang2019lower} provided the following lower bound of the gradient complexity for any first-order method: $\Omega\Bigl(\sqrt{\frac{L_\x}{m_\x}+\frac{L_{\x\y}^2}{m_\x m_\y}+\frac{L_\y}{m_\y}}\ln(1/\epsilon)\Bigr).$ This paper proposes a new algorithm and proved a gradient complexity bound of $\Tilde{O}\Bigl(\sqrt{\frac{L_\x}{m_\x}+\frac{L\cdot L_{\x\y}}{m_\x m_\y}+\frac{L_\y}{m_\y}}\ln\left(1/\epsilon\right)\Bigr),$ where $L=\max\{L_\x,L_{\x\y},L_\y\}$. This improves over the best known upper bound $\Tilde{O}\left(\sqrt{\nicefrac{L^2}{m_\x m_\y}} \ln^3\left(1/\epsilon\right)\right)$ by Lin et al. \cite{lin2020near}. Our bound achieves linear convergence rate and tighter dependency on condition numbers, especially when $L_{\x\y}\ll L$ (i.e., the weak interaction regime). Via simple reduction, our new bound also implies improved bounds for strongly convex-concave problems and convex-concave problems. When $f$ is quadratic, we can further improve the bound to $O\Bigl(\sqrt{\frac{L_\x}{m_\x}+\frac{L_{\x\y}^2}{m_\x m_\y}+\frac{L_\y}{m_\y}}\left(\frac{L^2}{m_\x m_\y}\right)^{o(1)}\ln(1/\epsilon)\Bigr)$, which matches the lower bound up to a sub-polynomial factor.