Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper has the feature of being one of the first (arguably the first) to address the problem of sparse isotonic regression, thus it is not easy to establish comparisons with other works. Moreover, isotonic regression is somewhat a niche area, although important, thus it is not clear what type of impact and significance this work may have. The paper is quite well written and, as far as I can assess (not having checked the proofs in detail) it is technically solid. In addition to a simple synthetic problem, the experimental section of the manuscript is focused on a single real-data experiment with gene expression data for cancer classification. One thing that is missing is an explanation of the motivation for using isotonic regression for this problem. I think the authors did a good job in addressing the reviewers' comments, so I'm keeping my original score 8 ("A very good submission; a clear accept.").
The proposed problems by the authors is interesting. Essentially one is looking for dimensions according to which the target variable becomes monotonic. The writing of the paper requires attention, and it suffers from the conference format. Most of the algoritmic side of the paper is pushed in the appendix, and the explanation of the algorithm is barebone: - it is not clear whether the problem is NP-hard or is it tractable. - Algorithm 1 requires integer programming, do you do this in practice, or is the algorithm just the theoretical construct? - Ignoring the relaxation of v_k, the authors do not prove that Algorithm 2/3+4 actually solves the same thing as Algorithm 1. Specifically, the objective in Eq. 10 is not properly explained, and how it is linked to Eq. 5. - Line 343 in Appendix seems incorrect: The new objective should be sum (1 - F_i)^2 + sum F_i^2 which is not a linear objective, note that F_i may be real numbers even if target is binary. This seems to be problematic for Lemma 1. The experiments should be improved. Additional benchmark dataserts would add more credibility. Also, the chosen baselines should perform relatively badly for the chosen data: 1. Based on Figure 1, the decision boundary is not linear which puts SVM at disadvantage. 2. barebone k-nn with (probably) high-dimensional data suffers form curse of dimension. A better comparison would be a decision tree and k-nn with feature selection/dimension reduction.
Isotonic regression is an important model both in theory and practice, and thus I think the work in this article is meaningful. As suggesting in the theorem, it is desirable for estimator to have the property such as consistency and support recovery, however, I'm interested in the convergence rate is how optimal. Actually, the convergence rate would seem a little bit slow in practice. In experimental results, the authors check the performance of the support recovery, while estimation accuracy should also be assessed for confirming the consistency. In addition, for example in Theorem 2, $N=sn$ is defined as a samples for algorithm. However, since the input of algorithm is only $n$ sample and sparsity level, I'm not sure actually $N$ means. Please make it clear. As a minor comment, Algorithm 3 in Theorem 3 would be Algorithm 1 from the context.