NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 3202 Efficiently escaping saddle points on manifolds

### Reviewer 1

========================================================== After reading response, I do agree that getting the clean results and analysis is even nicer (compared to complicated analysis). It is also an important contribution to get the results for escaping saddle points for manifold. I will vote for acceptance. ========================================================== The result is very nice. It's only slightly unsatisfactory that smooth assumption 3.2 and 3.3 are in terms of function dot retractions, which make these assumptions a bit indirect and hard to verify. It also hides interesting information about the geometry of manifold there.

### Reviewer 2

This paper is a nice extension on Jin et al's work. However, I have two concerns. First, the analysis is very similar to Jin et al's except a few steps that are needed to connect the gradient/Hessian of the original function and those of the pullback function. This reduces the novelty of this paper in its technical analysis a little bit. Second, I am not sure how b can be estimated. Although in practice, many parameters can be tuned but I am not sure where to start when I have to tune b. Although in the examples in the paper b=+infinity, I believe there are cases where b is finite and unknown. For other parameters such as L and rho, we can at least give some estimation (may be conservative) but, for b, it is not clear how. Are there any examples where b<+infty and some conservative estimation of b are possible? ============================================== I have read the authors' response and I am satisfied with that. I will slightly increase my score.

### Reviewer 3

This paper shows that a perturbed variant of gradient descent on Riemannian manifolds can also escape saddle points efficiently. The assumptions and the preliminaries are clearly described in the paper. The PRGD algorithm for Riemannian manifolds (analogous to PRGD on Euclidean spaces) involves taking one of the two steps: If the gradient is large, then perform gradient descent else perturb and take tau gradient steps along the tangent space. The authors should further discuss setting step-size, perturbation radius and no of perturbation steps in the context of different examples to highlight the practicality of their algorithm. Post rebuttal: Having read the author's responses, I would like to keep my current score.

### Reviewer 4

This paper proposes a perturbed Riemannian gradient descent method (PRGD) for minimizing a smooth and nonconvex function over Riemannian manifold. It sheds light on understanding RGD and its capability for escaping from saddle points. There are two main concerns about this work. 1. The algorithm depends on many unknown parameters which make it impractical, although other works such as Jin et.al. in Euclidean setting have the same issue. 2. This is more serious. The algorithm design and proof techniques are largely the same as the ones in previous works by Jin et.al. The same topic has also been studied by Sun et.al. on submanifolds of Euclidean space. As a result, it seems that the only contribution of this paper is to formalize or generalize these techniques to more general manifolds. Therefore, the contribution is a bit thin. ----------------------------------------------- Comments after rebuttal: I appreciate the authors' detailed feedback, which clarified some of my previous concerns. I thus increased my score.