NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6721
Title:Selective Sampling-based Scalable Sparse Subspace Clustering

Reviewer 1

This is a well-written original paper. The key intuition of the proposed algorithm is neat. The effectiveness of the algorithm is well supported by theoretical guarantees and experimental results. I think this paper's contribution is sufficiently significant for NeurIPS publication. One main question is that I wonder how this algorithm performs in noisy subspace clustering. I think that limiting the support set will make the algorithm performing worse in noisy setting. Some experiments with synthetic data will help understanding this. [Minor comments/Typos] (3) : subject to C_{ii} = 0, ==> subject to C_{i' i} = 0, (9) : min aff ==> max aff? Shouldn't max be upper bounded in the sufficient condition?

Reviewer 2

The proposed algorithm is well motivated and provided significant improvement over the original SSC in terms of computational cost. The approximation has been well justified by performance guarantees that hold with high probability. I think the contributions are significant. On the other hand, there are several questions and major issues particularly with the presentation. 1. The algorithm is not clearly presented in the main manuscript. (It was better described in an appendix in the supplementary material.) To understand the implication of the main theoretic results, it will be important to see the key structures of the algorithm. So I suggest to elaborate on Section 3.1 in this perspective. 2. Continued from the previous item, due to some missing critical details about the algorithm, the statement of Theorem 1 is not very clear. For example, there is no randomness in the model. The only source of randomness is random selection per iteration within the algorithm. Without clearly explaining this aspect, one may wonder what the probability means in the statement of the theorem. 3. Another issue with the statement of the main theorems is the interpretation of the result. T is rather the size of selection. The number of iteration has to do with how the algorithm finds S. So its meaning as the cardinality of S looks more important. 4. It would be interesting to see how the main results reduces to the analogous results by the original SSC. Does it reproduce the original result as a special case? 5. How can the result be compared to fully stochastic setup where new selection is given not by an optimization of a given large set but by random sampling of the union of subspaces? What would be a gain from "optimizing" from a given set rather than purely random increment? 6. As the authors discussed, the original SSC formulation decouples over columns and can be implemented as a set of Lasso problems in a distributed way. If one uses a cluster to solve the problem, the implementation can be parallelized. For example, each lasso can be solved by FISTA or ADMM. If this is correct, can one still suffer from failures in Table 1 for the original SSC? 7. There were a few critical typos. For example equation (3) and line 193 (SEP?) 8. It might be useful to provide a geometric interpretation of the affinity in Definition 6 in terms of principal angles between subspaces.

Reviewer 3

This paper proposes an incremental algorithm based on a subgradient approximation to select a subsample of the data to improve the computational complexity of Sparse Subspace Clustering (SSC). The authors show that under standard coherence conditions, this subsampling strategy (S5C) will in fact recover the true underlying union of subspaces in the data, thus improving the computational complexity of SSC without compromising performance. In addition, the paper gives several experiments to show the effectiveness of their approach. One of my main concerns is whether the paper is novel/significant enough for NeurIPS. I am also curious as to the choice of datasets for experiments, some of which are not related to/justified for subspace clustering. I am also a bit confused. If S5C is indeed running SSC on a subsample of the data, shouldn't SSC (and other variants that exploit the entire sample) do better than S5C (precisely because they are exploiting the entire sample)? It would be interesting to understand why S5C seems to be performing better. I initially thought it was because the parameters were only fine tuned for S5C, but after carefully reading Appendix E in the supplementary material, it seems like the authors did a very thorough and fair job at fine tuning all other algorithms as well. Hence I am curious as to what is the explanation behind this performance improvement -- which could be very interesting; maybe somehow this sub sampling is helping in some other way, for example ignoring outliers, or something like that? Finally, I think such a thorough review/discussion of SSC, spectral clustering, and some technical definitions could be shrunk considerably. ---- After Rebuttal ---- The authors have addressed my concerns, so I am changing my score to accept.