Reviews: Flattening a Hierarchical Clustering through Active Learning

This paper considers a setting in which we are given some hierarchical clustering over data, and we want to find some pruning/cut of this hierarchy by adaptively querying for must-link/cannot-link constraints among its leaves. There are three settings considered in this paper, each with its own algorithms and analysis: (i) the noiseless setting, where each queried pair is consistent with some true cut of the tree, (ii) the persistent noise setting, where each queried pair is consistent with some true cut of the tree except for some randomly selected subset of pairs for which the response is flipped, and (iii) the agnostic setting, in which no assumptions are made over the distribution of responses. The paper presents active learning algorithms for each of these settings, demonstrating that the number of queries can be upper bounded by quantities that depend on the tree and possibly the ground truth clustering. Some of these bounds are also accompanied by nearly-tight lower bounds on the query complexity of any active learning algorithm. In all, this paper represents a very thorough investigation into this problem, and I vote for acceptance. The settings considered here are quite diverse, and the algorithms are both practical and have rigorous query complexity and running time guarantees. The comparison to related work is also quite thorough, showing improvement over the bounds given in more generic active learning algorithms. Minor comments: - The notation is a little difficult to follow, possibly due to the overloading of notation. In particular, defining the prior distribution over cuts as P(c) in terms of the values P(i) makes certain statements confusing. Typos: - Line 22: obiquitous

Reviewer 3

This paper derives complexity results for active learning queries to hierarchical clustering. The result is a partition or "cut", c, of the cluster tree, where the "flat" clustering is defined by the clusters at the leaves of a subtree of nodes AB(c) that have the same root as the original cluster tree. Learning occurs by making pairwise judgments on items (leaf nodes). All pairwise judgments form a "ground truth" matrix \Sigma. Given consistency conditions, this is an equivalent way to represent a clustering. The objective of the learning process (e.g. "accuracy") not highlighted in the paper; the so called "good cut" mentioned p.3 l. 124, however the measure of success appears to be the ability to learn a clustering consistent with the \Sigma matrix. This raises a concern: A consistent or realizable \Sigma implies one possible clustering out of all those generated by cuts on one cluster tree. The Error Measure proposed (p. 4 l. 139) simply computes a distance from the one \Sigma - defined clustering. So \Sigma has no sense of a clustering "close" in the tree to the one it represents. How then can \Sigma be used to determine the "right level of granularity of the flat clustering? (p.7, l 273). In the so called "non realizable" case, apparently assuming an arbitrary \Sigma matrix, the paper builds on methods of Importance weighted active learning (IWAL). Ironically, it appears that any practical active learning setting will fall into the non-realizable case, and here the results are limited so far. . How one might exploit the methods proposed in the paper deserves further thought. Active learning presupposes an individual, an actual person comes to mind, whose effort one intends to conserve. The generation of pairwise queries on top of a set of priors assigned to nodes in T as a form of cluster learning, given one already has an agglumerative clustering may not be an efficent way to do active learning. Given the approach taken, the development of ideas in the paper is tedious, and the results appear to be modest, with shallow and not deep insight Typos: p.1 l. 36 We are lead -> we are led p. 2, l 55. by adaptating -> by adapting

Reviewer 4

Originality: I believe the authors have introduced novel approaches for selecting a flat clustering from a hierarchical clustering. I believe that this work is interesting and novel. Some closely related work is only mentioned in the supplementary material and I think it would be beneficial to move to the body of the paper with further discussion. Particularly, the line of work on clustering with same/different cluster membership queries [Wang et al 2012], [Vesdapunt et al, 2014], [Firmani et al, 2016] (others mentioned in this paper). Particularly, I'd be very interested to better understand the tradeoffs between the assumptions of this work and the related work and to better understand the motivations for the assumptions of this paper. I believe that the proposed work requires fewer queries, but at the same time is limited in selecting a tree consistent partition. Other related work that I think should be more thoroughly discussed includes: active learning methods for building the hierarchy by querying for pairwise similarities [Eriksson et al, 2011] and [Krishnamurthy et al, 2012] and the constraint-based approach of Vikram & Dasgupta [2016]. In these cases feedback is used to construct the hierarchy. While some use pairwise similarities, I imagine the works could be extended to think about those similarities as 1/0 labels for same cluster / different cluster. It would be insightful to understand / highlight advantages / disadvantages of the two approaches-- particularly regarding the fact that the desired clustering I believe is always realizable in the related work. Quality: The approaches in the paper seem to be technically sound. My main concerns about quality regard the problem setting and how being restricted to a tree-consistent partition is limited. I think a conclusion should be added to the paper. Clarity: The paper is clearly written. Clarifications on the above related work would be beneficial. While I realize that space is limited, I think more detail on the non-realizable case would be helpful-- perhaps some kind of illustration that shows for a ground truth clustering C* several tree structures for which C* is not realizable and for each cut that minimizes the error measure. Significance: I believe the authors contribute results that are important to the area of work on active/interactive clustering. If the authors were to provide additional motivation for problem setting, I think that this would be very beneficial. In particular, when would it be preferable to do gather feedback for selecting a cut from an existing tree rather than gathering feedback in building the tree itself (and say using that feedback to select a cut). [Wang et al 2012] Jiannan Wang, Tim Kraska, Michael J. Franklin, and Jianhua Feng. "Crowder: Crowdsourcing entity resolution." Proceedings of the VLDB Endowment 5, no. 11 (2012): 1483-1494. [Vesdapunt et al, 2014] Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. "Crowdsourcing algorithms for entity resolution." VLDB 2014 http://www.vldb.org/pvldb/vol7/p1071-vesdapunt.pdf [Firmani et al, 2016] Donatella Firmani, Barna Saha, and Divesh Srivastava. "Online entity resolution using an oracle." Proceedings of the VLDB Endowment 9, no. 5 (2016): 384-395. http://www.vldb.org/pvldb/vol9/p384-firmani.pdf [Eriksson et al, 2011] Brian Eriksson, Gautam Dasarathy, Aarti Singh, and Rob Nowak. Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities. AISTATS. 2011. http://proceedings.mlr.press/v15/eriksson11a/eriksson11a.pdf [Krishnamurthy et al, 2012] Akshay Krishnamurthy, Sivaraman Balakrishnan, Min Xu, and Aarti Singh. "Efficient active algorithms for hierarchical clustering." ICML (2012). https://icml.cc/2012/papers/473.pdf [Vikram & Dasgupta, 2016] Sharad Vikram, and Sanjoy Dasgupta. "Interactive bayesian hierarchical clustering." ICML. 2016.

Paper ID:	8768
Title:	Flattening a Hierarchical Clustering through Active Learning

Reviewer 1

Reviewer 2

Reviewer 3

Reviewer 4