NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6991
Title:Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves

Reviewer 1

The contribution to prove effectiveness of random projection for polygonal curves is novel and is a neat extension of previous work on median and center clustering of polygonal curves using Frechet distance. As I understand, the sublinear dependence for 1-median clustering does not have much novelty beyond using a previously known sampling idea from Indyk's thesis (2000), already cited by the authors. I found the empirical results meaningful and the authors do improve the efficiency of Alt-Godau implementation by an order of magnitude or more. However, I cannot judge the empricial work and its significance or broader impact effectively as I am familiar with the previous theoretical results on this problem but have not studied their practical applications carefully.

Reviewer 2

This paper demonstrates a theoretically guaranteed approximation for computing the Frechet distance between two polygonal curves.The paper also provides a CUDA-parallelized implementation of the Alt-Gadau algorithm for computing the distance (parallelizing the distance computation over each pair of line segments, one from each curve). In the experiments, the paper 1) Verifies the accuracy of the approximate JL distance computation 2) Demonstrates the speedup in runtime - 3 hours vs 40 seconds on data consisting of 6 curves in 6K dimensional with 2K points in each. From eyeballing the results (Fig 2), the JL projection and parallelization each contribute roughly an order of magnitude (10X) to the speedup. 3) Applies the algorithm to cluster real word data (based on k-center approximation algorithm by [Buchin 2019a]). Finally, it also proposes a k-centers approximation (following [Indyk 2000], ongoing work). The theoretical contribution of the paper is a rather straightforward application of the JL algorithm, by projecting each vertex of the curve and applying the JL distortion guarantees. The main technical contribution here is a distance computation lemma (Lemma 8). While source code is provided for the CUDA implementation, there is no mention of any algorithmic advances to parallelize the algorithm. In the experimental section, in 1) it would help to have a discussion of the theoretical gain in performance from the JL projection (presumably ~log d*eps^2/d factor) and the tradeoff of quality vs speed. In 2) the speed-up of 3 hours to 40 sec is substantial. However this speed up is a combination of the JL projection and the parallization. It would be useful to decouple the contributions of these two factors, and to also evaluate the runtime on a larger collection of datasets (beyond the single hydraulic test rig dataset). Finally, the validation of cluster quality in 3) is not very compelling. due to the absence of stated ground truth / lack of a clustering metric and a standard baseline algorithm for comparison. Using datasets (synthetic or otherwise) with well defined ground truth clusters, a choice of cluster quality metrics, and a baseline algorithm for comparison of cluster quality would improve this experiment. The paper is well organized. One section that could improve clarity is the description of the datasets which is rather long-winded. A table would be helpful in describing the dataset charateristics. The notion of filtering and heuristic filters in lines 244-255 was not clear. The plots in all figures can be improved (e.g. increasing font size in Fig 3, showing more clearly the speedups obtained in Fig 2). The number of clusters expected (line 265/266) for the DELTA dataset can be explicitly stated (I assume k=5). Addendum: I think that the authors do address the experimental concerns I originally had, specifically the point regarding comparison to baseline /clustering metrics. Regarding the theoretical results, in light of the other reviews, I agree that these contributions are of some significance. I am adjusting my score accordingly.

Reviewer 3

This paper provides theorems and experiments that the well-known curve distance, the (continuous) Frechet distance, is preserved under JL random projections. The main theoretical result is not just a (1+eps)-relative error, it also requires a roughly gamma * sqrt{eps} additive term where gamma is the maximum distance between two points. This is not too surprising to have the additive term, but also not entirely clear it is needed. The experiments use a state of the art code (from SOCG 2019) to compute the distance, and the paper mentions a GPU / parallel extension. It shows results where curves are of length n and the dimension d = Theta(n), that the random projection provides significant speed up with reasonable loss of accuracy. These results are nice, and the mathematical / theoretical part is very well written. I have few concerns: * the paper does not specify what dimension the curves are projected down to. It shows these results empirically in terms of eps. But the JL results these are based on are assyptotic results require O((1/eps)^2 * log (n/delta)) dimensions. The paper does not say what constant is assumed, what delta is, and what n is. I can only guess they are all 1, but am not sure. * It would be useful to compare the result to just using PCA, and also understanding when PCA works just as well as JL for these problems, empirically and theoretically. Often PCA works much better in practice, but can be slower. Does that hold in this setting? * I am not completely sold on the application of the Frechet distance in high dimensions. Is this a good modeling choice, and how is that justified? In particular, why focus on the bottleneck-distance based Frechet, and not DTW or something else that uses a more "average" case approach to curve distances. And do the sequence information really matter as opposed to the distribution of the n data points (and thus a cosine or KL sort of distance)? Overall, the paper is fairly nicely done, and the results will generate some interest, so I am ok accepting. But there are some (small) concerns with the presentation of the experiments, and with the empirical motivation (I'll add, mathematically, the question is clearly interesting). Typo in Theorem 11. I think "(2+delta)-app..." should "(2+eps)-app..." ADDENDUM: I have read the response, and it has definitely addressed the main concerns above. I'd appreciate even more discussion on the application, since I think these insights into why specifically the Frechet distance is useful is pretty interesting. Also, the poor performance in this case of PCA (in initial experiments) is surprising, and deserves further investigation, IMO. The other comments, if leading to changes in the paper, I feel will make it stronger. I am elevating my score.