Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track
Luo Luo, Yujun Li, Cheng Chen
We study the smooth minimax optimization problem min, where f is \ell-smooth, strongly-concave in {\bf y} but possibly nonconvex in {\bf x}. Most of existing works focus on finding the first-order stationary point of the function f({\bf x},{\bf y}) or its primal function P({\bf x})\triangleq \max_{\bf y} f({\bf x},{\bf y}), but few of them focus on achieving the second-order stationary point, which is essential to nonconvex problems. In this paper, we propose a novel approach for minimax optimization, called Minimax Cubic Newton (MCN), which could find an {\mathcal O}\left(\varepsilon,\kappa^{1.5}\sqrt{\rho\varepsilon}\right)-second-order stationary point of P({\bf x}) with calling {\mathcal O}\left(\kappa^{1.5}\sqrt{\rho}\varepsilon^{-1.5}\right) times of second-order oracles and \tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\right) times of first-order oracles, where \kappa is the condition number and \rho is the Lipschitz continuous constant for the Hessian of f({\bf x},{\bf y}). In addition, we propose an inexact variant of MCN for high-dimensional problems to avoid calling the expensive second-order oracles. Instead, our method solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains the desired approximate second-order stationary point with high probability but only requires \tilde{\mathcal O}\left(\kappa^{1.5}\ell\varepsilon^{-2}\right) Hessian-vector oracle calls and \tilde{\mathcal O}\left(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\right) first-order oracle calls. To the best of our knowledge, this is the first work that considers the non-asymptotic convergence behavior of finding second-order stationary points for minimax problems without the convex-concave assumptions.