NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 4148 Coresets for Clustering with Fairness Constraints

### Reviewer 1

This paper introduces a new coreset construction mechanism for fair clustering in which the points can be of multiple disjoint types. As in classic fair clustering, the goal of this work is to construct a clustering in which the types represented in each cluster are balanced. Unlike previous work, the focus here is on constructing the clustering efficiently via coresets. This work provides a coreset construction algorithm for fair k-median (previously unknown) and improves the previously known coreset construction algorithm for fair k-means. In addition to theoretical contributions with respect to coreset size and construction time, the authors also provide a small empirical study. The main strength of this paper is theoretical. The authors prove that their algorithms construct coresets with size independent of N for fair k-medians and k-means. To do so, they adapt the proof technique from a previous paper that does not deal with fair clustering to the fair clustering setting. This is a meaningful contribution as well as timely as fair clustering is an active and quickly growing research area. The primary weakness of this paper is the clarity of the technical sections. Section 3 helps with this, but Sections 4 and 5 are difficult to understand without carefully studying reference [25] (in fact, there is a longer version of Section 5 in the appendix!). Elements including an algorithm box and/or visual description of the 1-d case (section 4.1) would improve the presentation. The source code was helpful here although an algorithm box may be better for readers without access to the source. In the experiments, this paper compares their coreset construction algorithm to one other coreset construction algorithm for fair k-means. While this comparison is informative, the work misses other clear comparisons that would improve the paper. For example, there are no comparisons to fair clustering approaches that do not use coresets (for example, reference [4] in particular) to help explore the tradeoff between error and speed. **Edit**: thank you for including the additional experimentation. I think this does improve your paper. Additionally, your commitment to include visualization and pseudocode should improve readability. Originality: the paper exhibits a reasonable amount of originality. Many of the proof techniques and other ideas stem from reference 25. However, the authors show how that work can be extended to the fair clustering setting with multiple disjoint types, which requires novel analysis. Clarity: the writing and overall storyline of the paper is relatively clear however there are many details and definitions that are omitted that make understanding the paper challenging. For example, the definitions/differences between the terms “groups” and “types” is never made explicit. Also, this paper draws heavily on reference [25]; understanding the details this work was challenging for me without familiarity with [25]. Significance: this paper contributes to an active and important area of coresets for fair clustering problems. This includes both fair k-medians and fair k-means. The results presented in this paper will be important for other researchers studying scalable approaches to fair clustering with coresets. Quality: the theoretical pieces of this paper were challenging to evaluate but seem to be correct. === Smaller issues === Section 1: “Due the scale at which one is required to clustering”, ungrammatical Section 1: “Save the storage” → “Save storage” Section 1: “size independent on N” → “size independent of N” Section 1: “with size depend on” → “with size that depends on” Definition 2.1: K_z(S,F,C) \in (1 \pm \epsilon) K_z(X,F,C). I don’t think that “\in” is the correct notation here since K_z is a value and not a set. Section 4: “In a high level” → “at a high level” Section 4: “is the same to” → “is the same as” Section 5: “we show how the construction of coresets” → “we show how to construct coresets” Section 5: “same to [25]” → “similar to [25]” General comment: I think the terms “collections of groups” and “types” in this paper are closely connected but never defined and thus a bit confusing. Define (roughly) these and give examples in the introduction. Later on this becomes more clear.

### Reviewer 2

(Originality) The authors claims the novelty lies with the construction of a novel epsilon-corsets with k-median and k-mean objective. The constructed corsets which is of smaller size than the full set is used in existing fair clustering algorithms. This construction is very similar to that proposed in [25]. Originality seems minor. (Quality) It would be more meaningful if the corset construction was conducted with k-center objective. (Clarity) Well written and relatively clear- better if there were illustrations. (significance) Empirical error that is superlinear with increasing epsilon. How is this a concern?

### Reviewer 3

The paper clearly identifies which parts/techniques are taken from other papers (i.e. known coreset construction from Har-Peled et al), and where extensions have been made. It also gives a good overivew on the contributions and experiments and is easy to follow. The main techniques behind the theoretical coreset constructions are well-known from [25]. The two new components are the observation regarding the reduction of constraint types (theorem 4.2) and the Lemmata regarding the analysis of the number of "contiguous intervals" (Lemma 4.1 and Lemma 5.1 / B.2). (Note that Theorem 4.2 is a generalisation of Lemma 6 in [36].) It is great that, in contrast to many theoretical clustering papers, this paper includes an evaluation of a practical version of the algorithm that's analysed in the theoretical part of the paper. However, there are also some things about the experiments that I struggle with: * As always, it is a bit hard to interpret the cost alone (without any goal of what actually will be done with the clustering). For errors of up to 10%, the results seem to be comparable with the BICO solution [36]. After that, there's a small gap (up to about 5%) between the solutions. I'm not sure if this is significant or not. * There is *no* comparison to simpler baselines, ie.g. to uniform sampling where, to ensure fairness, one could just sample uniformly at random per "constraint type". (One might suspect that the number of constraint types grows, the performance of this simple baseline should become better and better (as the constraint types pre-partition the space kind of)). * The experiments are only on two specific data sets. Given the fast processing times (cf. T_S and T_X in Tables 3 and 4), it is surprising that not more datasets have been considered. * ... and the times needed to compute the coresets are not reported. Moreover, there is one point (right in the problem definition section) that remains unclear to me: the re-definition described in ll. 131 (after Definition 2.1): "a point [...] may be partially assigned to more than one cluster". This seems unnecessary. I guess the original intent was just to point out that the pointwise costs are now weighted? (otherwise, I might be missing something here) Besides, some minor remarks: - Theorem 4.2 is not a pure "existence" statement (and it would be useless in the following if it was just that) - this should be made clear. - Table 2 is a bit confusing since some rows still contain eps, which should be Omega(1) - It should be made clear that a practial variant of the theoretical algorithm is evaluated. - There are some typos: several upper/lower-case typos in the references , l. 40 (due to the scale at which one is required to clustering), l. 175 (let ... denotes the)