NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6177
Title:Subquadratic High-Dimensional Hierarchical Clustering

Reviewer 1

This paper proposes a new approach to approximating hierarchical agglomerative clustering (HAC) by requiring that at each round, only a gamma-best merge be performed (gamma being the multiplicative approximation factor to the closest pair). Two algorithms are introduced to approximate HAC - one for Ward and one for Average linkage. In both cases, the algorithms rely on using approximate nearest neighbor ANN as a black box. In addition, a bucketing datastructure is used in Wards algorithm and a subsampling procedure in used for Average linkage to guarantee the subquadratic runtime. This is a new contribution to the theoretical literature on HAC, a provable subquadratic algorithm for (an approximation to) HAC cases other than single linkage. The paper demonstrates the performance of Ward’s approximation on real and synthetic datasets. The paper is well organized. There were parts where clarity could be improved. Specifically, here are some suggestions / typos: -59 fist -> first -95: Meaning of epsilon not clear. E.g. How does epsilon trade off with accuracy. Role of epsilon vs gamma becomes clear later, but it might help to explain this here (or simplify to a single parameter) -223: follows -> can be computed -Parts of section 3.2 seems to overlap with section 2 - e.g. description of general HAC procedure -Concept of merge value explained twice in sections 2 and 3 - can be combined -272/273 - resolution parameter alpha not defined / introduced, this part is unclear. -Explanation of the average linkage approximation algorithm was hard to follow. Pseudocode would be helpful Regarding the experimental section, the range of parameter tuning (3 combinations of epsilon, T, and L) does not seem comprehensive. Furthermore choosing a single combination as ‘best’ for evaluating the runtime does not seem fully justified. Finally, the speedup of 2.5X (on n=20K points and d=10) does not seem that impressive.

Reviewer 2

- Strength: This paper is well written and the ideas are clearly presented, which makes the paper easy to follow. The algorithm seems easy to implement and is likely to have some value in practice. - Weakness: My main concern is about the problem itself. The proposed method uses ANN to accelerate NN query, and claims achieving some "constant approximation". The "approximation" here means that in each step the two merged clusters differ only by constant times of the minimum "dissimilarity". However, with such relaxation, the output hierarchical tree can be completely different from the exact solution. It is not clear from the paper how to compare the quality of these two hierarchical trees. Obviously, if the two trees are not close, the acceleration would not make much sense. Although the proposed algorithm runs faster, it may not achieve the desired goal. The paper does give some empirical evidence (Table 1) to suggest that the approximate solution is of "good quality". But the experiment is conducted only on a few small datasets and thus not very convincing. There are a few papers discussing how to define a suitable objective function for the HC problem (e.g. [1] and [2]). I would like to see some discussions and comparisons with those approaches. - [1] Sanjoy Dasgupta. A cost function for similarity-based hierarchical clustering. STOC'16 - [2] Vincent Cohen-Addad et al. Hierarchical Clustering: Objective Functions and Algorithms. SODA'17

Reviewer 3

The authors present sub-quadratic algorithms for hierarchical clustering. In particular they present an efficient version of the average linkage algorithm and of the Ward algorithm. The main idea behind the algorithm is to slightly relax the decision principle used by these algorithms in every step and then to show that there exists efficient algorithms for the relaxed version of the problem. Interestingly, the authors also argue(via a reductions to closest pair) that it is unlikely to obtain fast algorithms without relaxing the decision principles. The main idea behind the algorithm is to use LSH to compute approximate nearest neighbors and to then use LSH to obtain fast algorithm. Unfortunately, LSH cannot be applied directly so the authors carefully design algorithms to obtain good theoretical guarantees. I found the results for average linkage particularly interesting. Minor comment: - ln. 189 and 191, I found confusing to use i for both indices - I find unmerged cluster a confusing name because the set may contained merged clusters - ln273, where is defined \alpha_{C_1\cup C_2} - ln.483, 488. It is odd to point to the appendix in the appendix