NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 855 SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

### Reviewer 1

============================================================== After response----I keep my evaluation on the technical innovation and suboptimality of this paper. 1. The basic Spider and SpiderBoost algorithms are both for first-order stationary point, they are almost the same, and both give n^{1/2} rate. The simple way to modify both algorithms to escape saddle point is to add Negative Curvature Search (NCS) subroutine (which can be done in a very modular way, and is already shown in the Spider paper). I'd say it's almost trivial to also show SpiderBoost+NCS to find second-order stationary point with n^{1/2} rate. Comparing this paper with SpiderBoost+NCS, there's no improvement from n^{2/3} to n^{1/2} (since Spiderboost is already n^{1/2}), no simplification of Spider (as Spiderboost already did so). The only difference is replacing NCS by perturbations, which again requires some work, but most techniques are already there. 2. There are two error quantities in the guarantee---epsilon and delta. When people talk about optimal rate, it should be optimal respect to both epsilon and delta, i.e. for any choice of the pair. It is already suboptimal when one says they can match the best rate only in some restricted range. It is also clear from Table 2 when we choose epsilon = delta^2, which is a nature setting in most previous paper, this paper gives 1/epsilon^{3.5}, while SPIDER-SFO + (+Neon2) gives 1/epsilon^3. It is clearly suboptimal. ============================================================== I think the result is solid, but this paper lacks innovation since most techniques are available in the field. It uses SPIDER [11] to do variance reduction and achieve nearly optimal rate, and uses perturbations to escape saddle points which is also available in [15, 18]. It seems to be the combination of both paper, and the rate in escaping saddle points is actually worse than [11]. I do not view the contribution to be significant enough to be published in NIPS, especially the improvement is rather straightforward and incremental compared to [15].

### Reviewer 2

- originality: This paper proposes a novel and simple algorithm called SSRGD for finding local minima in nonconvex optimization. - quality: This work is solid. - clarity: This paper is well-written and easy to follow. - significance: The proposed SSRGD algorithm does not need an extra negative-curvature search subroutine unlike previous algorithms and improves previous results to the almost optimal results.

### Reviewer 3

This paper proposes a simple algorithm for escaping saddle points in nonconvex optimization. Unlike previous algorithms, the proposed SSRGD algorithm does not need an extra negative-curvature search subroutine and it only needs to add a uniform perturbation sometimes. Thus it is more attractive and easy to use in practice. Moreover, the authors prove that the convergence results of SSRGD improve previous results and are near-optimal now. The authors also give a clear interpretation for the comparison of the proposed SSRGD and SVRG algorithm, which is very useful for better understanding these two algorithms. The paper is well-written and the proofs are easy to follow.

### Reviewer 4

=== after author's rebuttal === Although the rebuttal did not address much about my concern about a high level intuition of the improvement, and R1's opinion that is incremental, I think this is still a good result, so I suggest for a poster admission. === after author's rebuttal === Originality: This paper reviews many related versions and improve the theoretical bound upon them. I believe it is a totally novel analysis. Quality: This paper gives detailed analysis and the flow of the proof is clear and inspiring. Clarity: The main paper gives the high level sketch of their proof and also the intuition why they can improve the existing result. The supplement is also well organized with full details. Significance: Improves the known bound and match the $n^{-1/2}$ lower bound. I think it's of high significance.