NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 4134 An adaptive nearest neighbor rule for classification

### Reviewer 1

After reading the author feedback, I am increasing my score as the authors have largely addressed my main concern of comparison/relating to existing locally adaptive NN estimators. The authors address this concern on the theory side. I still think additional numerical experiments to compare the proposed method with other adaptive NN approaches would strengthen the paper and be beneficial to practitioners (and could possibly lead to interesting theoretical questions if, for instance, there are some surprising performance gaps). Thanks to the authors for a well-thought-out response to the reviews! * * * * * Original review below * * * * * I found this paper very well-written and clear. The analysis is quite clean and elegant due to the newly introduced local notion of "advantage" as an alternative to the usual local Lipschitz conditions. The paper, however, does not discuss enough related work, especially adaptive nearest neighbor and kernel methods based on Lepski's method (for example, the adaptive k-NN method by Kpotufe (2011)). I realize that much of these existing methods are specified in the context of regression rather than classification but it's straightforward casting the regression results in terms of classification (by thresholding on the regression function, taken to be the conditional probability function). Mainly I would like to better understand fundamentally what is different about the strategy used by Kpotufe (2011) (as well as Lepski in many papers that are more focused on automatic bandwidth selection for kernel methods) vs your approach (I realize the theoretical analysis is quite different, and the proof ideas using advantage are much cleaner). Separately, there's also the $k^*$-NN approach for adaptively choosing k per data point (Anava and Levy 2016), for which theoretical analysis for a toy case is presented in Section 3.7 of Chen and Shah (2018). Is there some toy setup for which all three approaches could be analyzed to compare them theoretically? Separately, there are various works on adaptive k-NN classification for which the metric rather than k is adaptive (e.g., Short and Fukunaga 1981; Hastie and Tibshirani 1996; Domeniconi, Peng, and Gunopulos 2002; Wang, Neskovic, and Cooper 2007, Kpotufe, Boularias, Schultz, and Kim 2016). In experimental results, perhaps it would be helpful to benchmark against existing locally adaptive methods for choosing k (Kpotufe 2011; Anava and Levy 2016) as well as some simple sample splitting/cross validation kind of approach to selecting a single k, preferably on more datasets. In particular, it would be helpful to practitioners to understand how these locally adaptive nearest neighbor methods compare with one another, and when using a locally adaptive k yields a significant accuracy improvement over a global choice of k.

### Reviewer 2

The algorithm is new. A novel concept was introduced to the theoretical analysis.

### Reviewer 3

My main concern with this this paper is that, although the authors provide an algorithm that chooses k adaptively, they have introduced another tuning parameter \delta, and there is no guidance on how to choose this in practice. In the introduction the authors claim that their results are instance optimal'. However, there is no result that proves the optimality of the bounds. The rates of convergence' provided are not rates of convergence in the usual sense. They provide a threshold above which n must be for a given point x to be classified correctly with high probability. This says nothing about the average behaviour over test points x. The comparison with the standard k-NN classifier in Section 4.2 does not seem fully convincing to me. [CD14] show that the points in \mathcal{X}_{n,k}' are likely to be correctly classified by k-NN, but it is not clear whether there could be other points that are also likely to be correctly classified by k-NN. I think that more work needs to be done here to compare the methods convincingly. =========================================================== UPDATE (after author rebuttal): Thank you to the authors for carefully reading my review. I still feel that there should be more comparison to the literature on kNN classification, but my other concerns have now been mostly addressed. I have changed the score for the submission.