NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:344
Title:No Pressure! Addressing the Problem of Local Minima in Manifold Learning Algorithms

Reviewer 1

-- Post author-feedback comments -- I appreciate the authors taking the time to address the optional improvements I suggested. I think the addition of UMAP will improve the paper, and I would like to the encourage the authors to keep exploring the other discussed aspects in future work. In my initial review I already strongly recommended to accept this paper, as I still do now, so my overall score remains unchanged. -- Initial review -- This work aims to alleviate the sensitivity of popular dimensionality reduction approaches such as SNE and tSNE to initialization and local minima in the optimization of low dimensional coordinates to fit the neighborhood structure of the original (high dimensional) data. In particular, the authors focus on identifying points that get stuck in suboptimal positions due to the inability of the optimization to find a low dimensional path that will ultimately improve the placement of the points, even though in the short term it might incur higher penalty due to the way attractive and repulsive forces interact in the optimization loss (hence the local minima). To address such cases, the authors define a notion of pressure applied to data points by the dimensionality of the embedded coordinates, and propose to release such pressure by temporarily allowing the optimization process to use an additional auxiliary dimension for identified pressured points. Then, as the optimization progresses, a regularization is applied to gradually restrict the use of such auxiliary dimensions. The manuscript is well written, well motivated, and convincingly establishes the reasoning behind the proposed approach as well as its effectiveness. Further, the problem dealt with here is indeed timely and crucial for the reliability of visualization methods based on SNE, tSNE, and other variations. Therefore, I recommend this paper be accepted and look forward to seeing it presented at NeurIPS. Optional improvements appear in a separate part of the review.

Reviewer 2

The paper is written rather clearly, and I think the method itself is interesting from a practical point of view, although it will be great if there is more theory to it.

Reviewer 3

I have read the authors' response and other reviewers' comments. I choose to raise my score. ------------------------------------------------------------- Pros: - The idea of identifying pressure points is new in the context of nonlinear dimension reduction. The way these points are utilized is also inspirational. - Extensive results are run on several datasets. Cons: - Like many other work in nonlinear dimension reduction, this paper does not provide a systematic, convincing way to evaluate and compare the new approach against previous methods which are claimed to be less effective. The visual comparison in Figure 6 appears cherry-picking in methodology. In the second comparison experiments, even with this dimension increasing strategy, the algorithm failed to identify the unique global minimum, which is frustrating; the authors claimed the objective function value was close to optimal but it makes little sense without interpretable units. - It may well be the case that the premise of embedding all high-dimensional points into a uniform low-dimensional space is misleading. As demonstrated in the motivation and experiments in this paper, if local minimum is really a concern for nonlinear dimension reduction algorithms, maybe one should use different dimensions for different points. - Are there interpretable properties associated with those "pressure points"? Is there any particular explanation for why they become stubborn during the optimization procedure? Do they depend on initialization and randomness of the algorithm? Only with such in-depth questions (at least partially) addressed will this work be justified as a solid contribution instead of a coincidental case study. - Before applied to real data, algorithms should be first applied to synthetic data for which one knows what the "ground truth" to expect. An algorithm that successfully recovers the expected truth can then be trusted. Otherwise, it is difficult to argue the algorithm works in real data, as for such high-dimensional datasets it is not unlikely that some correlations will create non-existent patterns captured by human perception.