{"title": "Fair Clustering Through Fairlets", "book": "Advances in Neural Information Processing Systems", "page_first": 5029, "page_last": 5037, "abstract": "We study the question of fair clustering under the {\\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the \\emph{price of fairness} by quantifying the value of fair clustering on real-world datasets with sensitive attributes.", "full_text": "Fair Clustering Through Fairlets\n\nFlavio Chierichetti\n\nDipartimento di Informatica\n\nSapienza University\n\nRome, Italy\n\nRavi Kumar\n\nGoogle Research\n\n1600 Amphitheater Parkway\nMountain View, CA 94043\n\nSilvio Lattanzi\nGoogle Research\n\n76 9th Ave\n\nNew York, NY 10011\n\nSergei Vassilvitskii\nGoogle Research\n\n76 9th Ave\n\nNew York, NY 10011\n\nAbstract\n\nWe study the question of fair clustering under the disparate impact doctrine, where\neach protected class must have approximately equal representation in every clus-\nter. We formulate the fair clustering problem under both the k-center and the\nk-median objectives, and show that even with two protected classes the problem\nis challenging, as the optimum solution can violate common conventions\u2014for\ninstance a point may no longer be assigned to its nearest cluster center!\nEn route we introduce the concept of fairlets, which are minimal sets that satisfy\nfair representation while approximately preserving the clustering objective. We\nshow that any fair clustering problem can be decomposed into \ufb01rst \ufb01nding good\nfairlets, and then using existing machinery for traditional clustering algorithms.\nWhile \ufb01nding good fairlets can be NP-hard, we proceed to obtain ef\ufb01cient ap-\nproximation algorithms based on minimum cost \ufb02ow.\nWe empirically demonstrate the price of fairness by quantifying the value of fair\nclustering on real-world datasets with sensitive attributes.\n\n1\n\nIntroduction\n\nFrom self driving cars, to smart thermostats, and digital assistants, machine learning is behind many\nof the technologies we use and rely on every day. Machine learning is also increasingly used to\naid with decision making\u2014in awarding home loans or in sentencing recommendations in courts of\nlaw (Kleinberg et al. , 2017a). While the learning algorithms are not inherently biased, or unfair, the\nalgorithms may pick up and amplify biases already present in the training data that is available to\nthem. Thus a recent line of work has emerged on designing fair algorithms.\nThe \ufb01rst challenge is to formally de\ufb01ne the concept of fairness, and indeed recent work shows that\nsome natural conditions for fairness cannot be simultaneously achieved (Kleinberg et al. , 2017b;\nCorbett-Davies et al. , 2017). In our work we follow the notion of disparate impact as articulated\nby Feldman et al.\n(2015), following the Griggs v. Duke Power Co. US Supreme Court case.\nInformally, the doctrine codi\ufb01es the notion that not only should protected attributes, such as race and\ngender, not be explicitly used in making decisions, but even after the decisions are made they should\nnot be disproportionately different for applicants in different protected classes. In other words, if\nan unprotected feature, for example, height, is closely correlated with a protected feature, such as\ngender, then decisions made based on height may still be unfair, as they can be used to effectively\ndiscriminate based on gender.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fWhile much of the previous work deals with supervised\nlearning, in this work we consider the most common un-\nsupervised learning problem, that of clustering. In mod-\nern machine learning systems, clustering is often used for\nfeature engineering, for instance augmenting each exam-\nple in the dataset with the id of the cluster it belongs to\nin an effort to bring expressive power to simple learning\nmethods. In this way we want to make sure that the fea-\ntures that are generated are fair themselves. As in stan-\ndard clustering literature, we are given a set X of points\nlying in some metric space, and our goal is to \ufb01nd a par-\ntition of X into k different clusters, optimizing a partic-\nular objective function. We assume that the coordinates\nof each point x \u2208 X are unprotected; however each point\nalso has a color, which identi\ufb01es its protected class. The\nnotion of disparate impact and fair representation then\ntranslates to that of color balance in each cluster. We\nstudy the two color case, where each point is either red\nor blue, and show that even this simple version has a lot\nof underlying complexity. We formalize these views and de\ufb01ne a fair clustering objective that incor-\nporates both fair representation and the traditional clustering cost; see Section 2 for exact de\ufb01nitions.\nA clustering algorithm that is colorblind, and thus does not take a protected attribute into its decision\nmaking, may still result in very unfair clusterings; see Figure 1. This means that we must explicitly\nuse the protected attribute to \ufb01nd a fair solution. Moreover, this implies that a fair clustering solution\ncould be strictly worse (with respect to an objective function) than a colorblind solution.\nFinally, the example in Figure 1 also shows the main technical hurdle in looking for fair clusterings.\nUnlike the classical formulation where every point is assigned to its nearest cluster center, this may\nno longer be the case. Indeed, a fair clustering is de\ufb01ned not just by the position of the centers, but\nalso by an assignment function that assigns a cluster label to each input.\n\nFigure 1: A colorblind k-center cluster-\ning algorithm would group points a, b, c into\none cluster, and x, y, z into a second cluster,\nwith centers at a and z respectively. A fair\nclustering algorithm, on the other hand, may\ngive a partition indicated by the dashed line.\nObserve that in this case a point is no longer\nassigned to its nearest cluster center. For ex-\nample x is assigned to the same cluster as a\neven though z is closer.\n\nOur contributions. In this work we show how to reduce the problem of fair clustering to that of\nclassical clustering via a pre-processing step that ensures that any resulting solution will be fair.\nIn this way, our approach is similar to that of Zemel et al.\n(2013), although we formulate the\n\ufb01rst step as an explicit combinatorial problem, and show approximation guarantees that translate to\napproximation guarantees on the optimal solution. Speci\ufb01cally we:\n\n(i) De\ufb01ne fair variants of classical clustering problems such as k-center and k-median;\n(ii) De\ufb01ne the concepts of fairlets and fairlet decompositions, which encapsulate minimal fair\n\nsets;\n\n(iii) Show that any fair clustering problem can be reduced to \ufb01rst \ufb01nding a fairlet decomposition,\n\nand then using the classical (not necessarily fair) clustering algorithm;\n\n(iv) Develop approximation algorithms for \ufb01nding fair decompositions for a large range of\n\nfairness values, and complement these results with NP-hardness; and\n\n(v) Empirically quantify the price of fairness, i.e., the ratio of the cost of traditional clustering\n\nto the cost of fair clustering.\n\nRelated work. Data clustering is a classic problem in unsupervised learning that takes on many\nforms, from partition clustering, to soft clustering, hierarchical clustering, spectral clustering, among\nmany others. See, for example, the books by Aggarwal & Reddy (2013); Xu & Wunsch (2009) for\nan extensive list of problems and algorithms. In this work, we focus our attention on the k-center and\nk-median problems. Both of these problems are NP-hard but have known ef\ufb01cient approximation\nalgorithms. The state of the art approaches give a 2-approximation for k-center (Gonzalez, 1985)\nand a (1 +\nUnlike clustering, the exploration of fairness in machine learning is relatively nascent. There are\ntwo broad lines of work. The \ufb01rst is in codifying what it means for an algorithm to be fair. See\nfor example the work on statistical parity (Luong et al. , 2011; Kamishima et al. , 2012), disparate\nimpact (Feldman et al. , 2015), and individual fairness (Dwork et al. , 2012). More recent work\n\n3 + \u0001)-approximation for k-median (Li & Svensson, 2013).\n\n\u221a\n\n2\n\nxyzabc\fby Corbett-Davies et al. (2017) and Kleinberg et al. (2017b) also shows that some of the desired\nproperties of fairness may be incompatible with each other.\nA second line of work takes a speci\ufb01c notion of fairness and looks for algorithms that achieve fair\noutcomes. Here the focus has largely been on supervised learning (Luong et al. , 2011; Hardt et al.\n, 2016) and online (Joseph et al. , 2016) learning. The direction that is most similar to our work is\nthat of learning intermediate representations that are guaranteed to be fair, see for example the work\nby Zemel et al. (2013) and Kamishima et al. (2012). However, unlike their work, we give strong\nguarantees on the relationship between the quality of the fairlet representation, and the quality of\nany fair clustering solution.\nIn this paper we use the notion of fairness known as disparate impact and introduced by Feldman\net al.\n(2015). This notion is also closely related to the p%-rule as a measure for fairness. The\np%-rule is a generalization of the 80%-rule advocated by US Equal Employment Opportunity Com-\nmission (Biddle, 2006) and was used in a recent paper on mechanism for fair classi\ufb01cation (Zafar\net al. , 2017). In particular our paper addresses an open question of Zafar et al. (2017) presenting a\nframework to solve an unsupervised learning task respecting the p%-rule.\n\n2 Preliminaries\nLet X be a set of points in a metric space equipped with a distance function d : X 2 \u2192 R\u22650. For an\ninteger k, let [k] denote the set {1, . . . , k}.\nWe \ufb01rst recall standard concepts in clustering. A k-clustering C is a partition of X into k disjoint\nsubsets, C1, . . . , Ck, called clusters. We can evaluate the quality of a clustering C with different\nobjective functions. In the k-center problem, the goal is to minimize\n\nC\u2208C min\nc\u2208C\nand in the k-median problem, the goal is to minimize\nmin\nc\u2208C\n\n\u03c8(X,C) =\n\n\u03c6(X,C) = max\n\n(cid:88)\n\nC\u2208C\n\nmax\nx\u2208C\n\n(cid:88)\n\nx\u2208C\n\nd(x, c),\n\nd(x, c).\n\nA clustering C can be equivalently described via an assignment function \u03b1 : X \u2192 [k]. The points in\ncluster Ci are simply the pre-image of i under \u03b1, i.e., Ci = {x \u2208 X | \u03b1(x) = i}.\nThroughout this paper we assume that each point in X is colored either red or blue; let \u03c7 : X \u2192\n{RED, BLUE} denote the color of a point. For a subset Y \u2286 X and for c \u2208 {RED, BLUE}, let\nc(Y ) = {x \u2208 X | \u03c7(x) = c} and let #c(Y ) = |c(Y )|.\nWe \ufb01rst de\ufb01ne a natural notion of balance.\nDe\ufb01nition 1 (Balance). For a subset \u2205 (cid:54)= Y \u2286 X, the balance of Y is de\ufb01ned as:\n\n(cid:18) #RED(Y )\n\n(cid:19)\n\nThe balance of a clustering C is de\ufb01ned as:\n\nbalance(Y ) = min\n\n#BLUE(Y )\nbalance(C) = min\n\nC\u2208C balance(C).\n\n,\n\n#BLUE(Y )\n#RED(Y )\n\n\u2208 [0, 1].\n\nA subset with an equal number of red and blue points has balance 1 (perfectly balanced) and a\nmonochromatic subset has balance 0 (fully unbalanced). To gain more intuition about the notion of\nbalance, we investigate some basic properties that follow from its de\ufb01nition.\nLemma 2 (Combination). Let Y, Y (cid:48) \u2286 X be disjoint. If C is a clustering of Y and C(cid:48) is a clustering\nof Y (cid:48), then balance(C \u222a C(cid:48)) = min(balance(C), balance(C(cid:48))).\nIt is easy to see that for any clustering C of X, we have balance(C) \u2264 balance(X). In particular,\nif X is not perfectly balanced, then no clustering of X can be perfectly balanced. We next show an\ninteresting converse, relating the balance of X to the balance of a well-chosen clustering.\nLemma 3. Let balance(X) = b/r for some integers 1 \u2264 b \u2264 r such that gcd(b, r) = 1. Then there\nexists a clustering Y = {Y1, . . . , Ym} of X such that (i) |Yj| \u2264 b + r for each Yj \u2208 Y, i.e., each\ncluster is small, and (ii) balance(Y) = b/r = balance(X).\n\nFairness and fairlets. Balance encapsulates a speci\ufb01c notion of fairness, where a clustering with\na monochromatic cluster (i.e., fully unbalanced) is considered unfair. We call the clustering Y as\ndescribed in Lemma 3 a (b, r)-fairlet decomposition of X and call each cluster Y \u2208 Y a fairlet.\n\n3\n\n\fEquipped with the notion of balance, we now revisit the clustering objectives de\ufb01ned earlier. The\nobjectives do not consider the color of the points, so they can lead to solutions with monochromatic\nclusters. We now extend them to incorporate fairness.\nDe\ufb01nition 4 ((t, k)-fair clustering problems). In the (t, k)-fair center (resp., (t, k)-fair median)\nproblem, the goal is to partition X into C such that |C| = k, balance(C) \u2265 t, and \u03c6(X,C) (resp.\n\u03c8(X,C)) is minimized.\nTraditional formulations of k-center and k-median eschew the notion of an assignment function.\nInstead it is implicit through a set {c1, . . . , ck} of centers, where each point assigned to its near-\nest center, i.e., \u03b1(x) = arg mini\u2208[1,k] d(x, ci). Without fairness as an issue, they are equivalent\nformulations; however, with fairness, we need an explicit assignment function (see Figure 1).\nMissing proofs are deferred to the full version of the paper.\n\n3 Fairlet decomposition and fair clustering\n\nAt \ufb01rst glance, the fair version of a clustering problem appears harder than its vanilla counterpart.\nIn this section we prove, interestingly, a reduction from the former to the latter. We do this by \ufb01rst\nclustering the original points into small clusters preserving the balance, and then applying vanilla\nclustering on these smaller clusters instead of on the original points.\nAs noted earlier, there are different ways to partition the input to obtain a fairlet decomposition. We\nwill show next that the choice of the partition directly impacts the approximation guarantees of the\n\ufb01nal clustering algorithm.\nBefore proving our reduction we need to introduce some additional notation. Let Y = {Y1, . . . , Ym}\nbe a fairlet decomposition. For each cluster Yj, we designate an arbitrary point yj \u2208 Yj as its center.\nThen for a point x, we let \u03b2 : X \u2192 [1, m] denote the index of the fairlet to which it is mapped. We\nare now ready to de\ufb01ne the cost of a fairlet decomposition\nDe\ufb01nition 5 (Fairlet decomposition cost). For a fairlet decomposition, we de\ufb01ne its k-median cost\nx\u2208X d(x, \u03b2(x)), and its k-center cost as maxx\u2208X d(x, \u03b2(x)). We say that a (b, r)-fairlet de-\n\nas(cid:80)\n\ncomposition is optimal if it has minimum cost among all (b, r)-fairlet decompositions.\nSince (X, d) is a metric, we have from the triangle inequality that for any other point c \u2208 X,\n\nd(x, c) \u2264 d(x, y\u03b2(x)) + d(y\u03b2(x), c).\n\nNow suppose that we aim to obtain a (t, k)-fair clustering of the original points X. (As we observed\nearlier, necessarily t \u2264 balance(X).) To solve the problem we can cluster instead the centers of\neach fairlet, i.e., the set {y1, . . . , ym} = Y , into k clusters. In this way we obtain a set of centers\n{c1, . . . , ck} and an assignment function \u03b1Y : Y \u2192 [k].\nWe can then de\ufb01ne the overall assignment function as \u03b1(x) = \u03b1Y (y\u03b2(x)) and denote the clustering\ninduced by \u03b1 as C\u03b1. From the de\ufb01nition of Y and the property of fairlets and balance, we get that\nbalance(C\u03b1) = t. We now need to bound its cost. Let \u02dcY be a multiset, where each yi appears |Yi|\nnumber of times.\nLemma 6. \u03c8(X,C\u03b1) = \u03c8(X,Y) + \u03c8( \u02dcY ,C\u03b1) and \u03c6(X,C\u03b1) = \u03c6(X,Y) + \u03c6( \u02dcY ,C\u03b1).\nTherefore in both cases we can reduce the fair clustering problem to the problem of \ufb01nding a good\nfairlet decomposition and then solving the vanilla clustering problem on the centers of the fairlets.\nWe refer to \u03c8(X,Y) and \u03c6(X,Y) as the k-median and k-center costs of the fairlet decomposition.\n\n4 Algorithms\n\nIn the previous section we presented a reduction from the fair clustering problem to the regular\ncounterpart. In this section we use it to design ef\ufb01cient algorithms for fair clustering.\nWe \ufb01rst focus on the k-center objective and show in Section 4.3 how to adapt the reasoning to solve\nthe k-median objective. We begin with the most natural case in which we require the clusters to be\nperfectly balanced, and give ef\ufb01cient algorithms for the (1, k)-fair center problem. Then we analyze\nthe more challenging (t, k)-fair center problem for t < 1. Let B = BLUE(X), R = RED(X).\n\n4\n\n\f4.1 Fair k-center warmup: (1, 1)-fairlets\nSuppose balance(X) = 1, i.e., (|R| = |B|) and we wish to \ufb01nd a perfectly balanced clustering. We\nnow show how we can obtain it using a good (1, 1)-fairlet decomposition.\nLemma 7. An optimal (1, 1)-fairlet decomposition for k-center can be found in polynomial time.\n\nProof. To \ufb01nd the best decomposition, we \ufb01rst relate this question to a graph covering problem.\nConsider a bipartite graph G = (B \u222a R, E) where we create an edge E = (bi, rj) with weight\nwij = d(ri, bj) between any bichromatic pair of nodes. In this case a decomposition into fairlets\ncorresponds to some perfect matching in the graph. Each edge in the matching represents a fairlet,\nYi. Let Y = {Yi} be the set of edges in the matching.\nObserve that the k-center cost \u03c6(X,Y) is exactly the cost of the maximum weight edge in the\nmatching, therefore our goal is to \ufb01nd a perfect matching that minimizes the weight of the maximum\nedge. This can be done by de\ufb01ning a threshold graph G\u03c4 that has the same nodes as G but only those\nedges of weight at most \u03c4. We then look for the minimum \u03c4 where the corresponding graph has a\nperfect matching, which can be done by (binary) searching through the O(n2) values.\nFinally, for each fairlet (edge) Yi we can arbitrarily set one of the two nodes as the center, yi.\n\nSince any fair solution to the clustering problem induces a set of minimal fairlets (as described in\nLemma 3), the cost of the fairlet decomposition found is at most the cost of the clustering solution.\nLemma 8. Let Y be the partition found above, and let \u03c6\u2217\nt be the cost of the optimal (t, k)-fair center\nclustering. Then \u03c6(X,Y) \u2264 \u03c6\u2217\nt .\n\nThis, combined with the fact that the best approximation algorithm for k-center yields a 2-\napproximation (Gonzalez, 1985) gives us the following.\nTheorem 9. The algorithm that \ufb01rst \ufb01nds fairlets and then clusters them is a 3-approximation for\nthe (1, k)-fair center problem.\n\n4.2 Fair k-center: (1, t(cid:48))-fairlets\nNow, suppose that instead we look for a clustering with balance t (cid:12) 1. In this section we assume\nt = 1/t(cid:48) for some integer t(cid:48) > 1. We show how to extend the intuition in the matching construction\nabove to \ufb01nd approximately optimal (1, t(cid:48))-fairlet decompositions for integral t(cid:48) > 1.\nIn this case, we transform the problem into a minimum cost \ufb02ow (MCF) problem.1 Let \u03c4 > 0 be a\nparameter of the algorithm. Given the points B, R, and an integer t(cid:48), we construct a directed graph\nH\u03c4 = (V, E). Its node set V is composed of two special nodes \u03b2 and \u03c1, all of the nodes in B \u222a R,\nand t(cid:48) additional copies for each node v \u2208 B \u222a R. More formally,\nrj\ni\n\nV = {\u03b2, \u03c1} \u222a B \u222a R \u222a(cid:110)\n\n| bi \u2208 B and j \u2208 [t(cid:48)]\n\n| ri \u2208 R and j \u2208 [t(cid:48)]\n\n(cid:111) \u222a(cid:110)\n\n(cid:111)\n\n.\n\nbj\ni\n\ni ) edge. All of these edges have cost 0 and capacity 1.\n\nThe directed edges of H\u03c4 are as follows:\n(i) A (\u03b2, \u03c1) edge with cost 0 and capacity min(|B|,|R|).\n(ii) A (\u03b2, bi) edge for each bi \u2208 B, and an (ri, \u03c1) edge for each ri \u2208 R. All of these edges have cost\n0 and capacity t(cid:48) \u2212 1.\n(iii) For each bi \u2208 B and for each j \u2208 [t(cid:48)], a (bi, bj\ni ) edge, and for each ri \u2208 R and for each j \u2208 [t(cid:48)],\nan (ri, rj\n(iv) Finally, for each bi \u2208 B, rj \u2208 R and for each 1 \u2264 k, (cid:96) \u2264 t, a (bk\ncost of this edge is 1 if d(bi, rj) \u2264 \u03c4 and \u221e otherwise.\nTo \ufb01nish the description of this MCF instance, we have now specify supply and demand at every\nnode. Each node in B has a supply of 1, each node in R has a demand of 1, \u03b2 has a supply of |R|,\nand \u03c1 has a demand of |B|. Every other node has zero supply and demand. In Figure 2 we show an\nexample of this construction for a small graph.\n\nj) edge with capacity 1. The\n\ni , r(cid:96)\n\n1Given a graph with edges costs and capacities, a source, a sink, the goal is to push a given amount of\n\ufb02ow from source to sink, respecting \ufb02ow conservation at nodes, capacity constraints on the edges, at the least\npossible cost.\n\n5\n\n\fThe MCF problem can be solved in poly-\nnomial time and since all of the demands\nand capacities are integral, there exists an\noptimal solution that sends integral \ufb02ow\non each edge.\nIn our case, the solution\nis a set of edges of H\u03c4 that have non-zero\n\ufb02ow, and the total \ufb02ow on the (\u03b2, \u03c1) edge.\nIn the rest of this section we assume for\nsimplicity that any two distinct elements\nof the metric are at a positive distance\napart and we show that starting from a so-\nlution to the described MCF instance we\ncan build a low cost (1, t(cid:48))-fairlet decom-\nposition. We start by showing that every\n(1, t(cid:48))-fairlet decomposition can be used\nto construct a feasible solution for the MCF instance and then prove that an optimal solution for the\nMCF instance can be used to obtain a (1, t(cid:48))-fairlet decomposition.\nLemma 10. Let Y be a (1, t(cid:48))-fairlet decomposition of cost C for the (1/t(cid:48), k)-fair center problem.\nThen it is possible to construct a feasible solution of cost 2C to the MCF instance.\n\nFigure 2: The construction of the MCF instance for the bipar-\ntite graph for t(cid:48) = 2. Note that the only nodes with positive\ndemands or supplies are \u03b2, \u03c1, b1, b2, b3, r1, and r2 and all the\ndotted edges have cost 0.\n\n1, . . . , b1\n\nc to nodes r1\n\n1, . . . , rc\n\n1, . . . , rc\n\nProof. We begin by building a feasible solution and then bound its cost. Consider each fairlet in the\n(1, t(cid:48))-fairlet decomposition.\nSuppose the fairlet contains 1 red node and c blue nodes, with c \u2264 t(cid:48), i.e., the fairlet is of the form\ni , for i \u2208 [c] and\n{r1, b1, . . . , bc}. For any such fairlet we send a unit of \ufb02ow form each node bi to b1\na unit of \ufb02ow from nodes b1\n1. Furthermore we send a unit of \ufb02ow from\n1 to r1 and c \u2212 1 units of \ufb02ow from r1 to \u03c1. Note that in this way we saturate the\neach r1\ndemands of all nodes in this fairlet.\nSimilarly, if the fairlet contains c red nodes and 1 blue node, with c \u2264 t(cid:48), i.e., the fairlet is of the\nform {r1, . . . , rc, b1}. For any such fairlet, we send c \u2212 1 units of \ufb02ow from \u03b2 to b1. Then we send\na unit of \ufb02ow from each b1 to each b1\n1 to nodes\nc to the nodes r1, . . . , rc. Note\n1, . . . , r1\nr1\n1, . . . , r1\nthat also in this case we saturate all the request of nodes in this fairlet.\nSince every node v \u2208 B \u222a R is contained in a fairlet, all of the demands of these nodes are satis\ufb01ed.\nHence, the only nodes that can have still unsatis\ufb01ed demand are \u03b2 and \u03c1, but we can use the direct\nedge (\u03b2, \u03c1) to route the excess demand, since the total demand is equal to the total supply. In this\nway we obtain a feasible solution for the MCF instance starting from a (1, t(cid:48))-fairlet decomposition.\nTo bound the cost of the solution note that the only edges with positive cost in the constructed\nsolution are the edges between nodes bj\nk. Furthermore an edge is part of the solution only if\nthe nodes bi and rk are contained in the same fairlet F . Given that the k-center cost for the fairlet\ndecomposition is C, the cost of the edges between nodes in F in the constructed feasible solution\nfor the MCF instance is at most 2 times this distance. The claim follows.\n\nc. Furthermore we send a unit of \ufb02ow from each r1\n\n1 and a unit of \ufb02ow from nodes b1\n\n1, . . . , bc\n\n1, . . . , bc\n\ni and r(cid:96)\n\nNow we show that given an optimal solution for the MCF instance of cost C, we can construct a\n(1, t(cid:48))-fairlet decomposition of cost no bigger than C.\nLemma 11. Let Y be an optimal solution of cost C to the MCF instance. Then it is possible to\nconstruct a (1, t(cid:48))-fairlet decomposition for (1/t(cid:48), k)-fair center problem of cost at most C.\nCombining Lemma 10 and Lemma 11 yields the following.\nLemma 12. By reducing the (1, t(cid:48))-fairlet decomposition problem to an MCF problem, it is possible\nto compute a 2-approximation for the optimal (1, t(cid:48))-fairlet decomposition for the (1/t(cid:48), k)-fair\ncenter problem.\nNote that the cost of a (1, t(cid:48))-fairlet decomposition is necessarily smaller than the cost of a (1/t(cid:48), k)-\nfair clustering. Our main theorem follows.\nTheorem 13. The algorithm that \ufb01rst \ufb01nds fairlets and then clusters them is a 4-approximation for\nthe (1/t(cid:48), k)-fair center problem for any positive integer t(cid:48).\n\n6\n\nb1\u03b2\u2374b2b3r1r2b1b2b3r1r2\fFigure 3: Empirical performance of the classical and fair clustering median and center algorithms\non the three datasets. The cost of each solution is on left axis, and its balance on the right axis.\n\n4.3 Fair k-median\n\nThe results in the previous section can be modi\ufb01ed to yield results for the (t, k)-fair median problem\nwith minor changes that we describe below.\nFor the perfectly balanced case, as before, we look for a perfect matching on the bichromatic graph.\nUnlike, the k-center case, we let the weight of a (bi, rj) edge be the distance between the two points.\nOur goal is to \ufb01nd a perfect matching of minimum total cost, since that exactly represents the cost\n3 + \u0001 (Li &\nof the fairlet decomposition. Since the best known approximation for k-median is 1 +\nSvensson, 2013), we have:\n\u221a\nTheorem 14. The algorithm that \ufb01rst \ufb01nds fairlets and then clusters them is a (2 +\napproximation for the (1, k)-fair median problem.\nTo \ufb01nd (1, t(cid:48))-fairlet decompositions for integral t(cid:48) > 1, we again resort to MCF and create an\ninstance as in the k-center case, but for each bi \u2208 B, rj \u2208 R, and for each 1 \u2264 k, (cid:96) \u2264 t, we set the\ncost of the edge (bk\nTheorem 15. The algorithm that \ufb01rst \ufb01nds fairlets and then clusters them is a (t(cid:48) + 1 +\napproximation for the (1/t(cid:48), k)-fair median problem for any positive integer t(cid:48).\n\nj) to d(bi, rj).\n\n3 + \u0001)-\n\ni , r(cid:96)\n\n\u221a\n\n\u221a\n\n3 + \u0001)-\n\n4.4 Hardness\n\nWe complement our algorithmic results with discussion of computational hardness for fair cluster-\ning. We show that the question of \ufb01nding a good fairlet decomposition is itself computationally\nhard. Thus, ensuring fairness causes hardness, regardless of the underlying clustering objective.\nTheorem 16. For each \ufb01xed t(cid:48) \u2265 3, \ufb01nding an optimal (1, t(cid:48))-fairlet decomposition is NP-hard.\nAlso, \ufb01nding the minimum cost (1/t(cid:48), k)-fair median clustering is NP-hard.\n\n5 Experiments\n\nIn this section we illustrate our algorithm by performing experiments on real data. The goal of our\nexperiments is two-fold: \ufb01rst, we show that traditional algorithms for k-center and k-median tend\nto produce unfair clusters; second, we show that by using our algorithms one can obtain clusters\nthat respect the fairness guarantees. We show that in the latter case, the cost of the solution tends to\nconverge to the cost of the fairlet decomposition, which serves as a lower bound on the cost of the\noptimal solution.\nDatasets. We consider 3 datasets from the UCI repository Lichman (2013) for experimentation.\n\n7\n\n 0 2000 4000 6000 8000 10000 12000 14000 16000 3 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1Number of ClustersBank (k-center)Fair CostFair BalanceUnfair CostUnfair BalanceFairlet Cost 0 50000 100000 150000 200000 250000 300000 3 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1Number of ClustersCensus (k-center) 0 5 10 15 20 25 30 35 3 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1Number of ClustersDiabetes (k-center) 0 100000 200000 300000 400000 500000 600000 3 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1Number of ClustersBank (k-median)Fair CostFair BalanceUnfair CostUnfair BalanceFairlet Cost 0 5x106 1x107 1.5x107 2x107 2.5x107 3x107 3.5x107 4x107 4.5x107 3 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1Number of ClustersCensus (k-median) 0 2000 4000 6000 8000 10000 12000 3 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1Number of ClustersDiabetes (k-median)\fDiabetes. This dataset2 represents the outcomes of patients pertaining to diabetes. We chose numeric\nattributes such as age, time in hospital, to represent points in the Euclidean space and gender as the\nsensitive dimension, i.e., we aim to balance gender. We subsampled the dataset to 1000 records.\nBank. This dataset3 contains one record for each phone call in a marketing campaign ran by a\nPortuguese banking institution (Moro et al. , 2014)). Each record contains information about the\nclient that was contacted by the institution. We chose numeric attributes such as age, balance, and\nduration to represents points in the Euclidean space, we aim to cluster to balance married and not\nmarried clients. We subsampled the dataset to 1000 records.\nCensus. This dataset4 contains the census records extracted from the 1994 US census (Kohavi,\n1996). Each record contains information about individuals including education, occupation, hours\nworked per week, etc. . We chose numeric attributes such as age, fnlwgt, education-num, capital-\ngain and hours-per-week to represents points in the Euclidean space and we aim to cluster the dataset\nso to balance gender. We subsampled the dataset to 600 records.\n\nAlgorithms. We implement the \ufb02ow-based fairlet decomposition algorithm as described in Sec-\ntion 4. To solve the k-center problem we augment it with the greedy furthest point algorithm due\nto Gonzalez (1985), which is known to obtain a 2-approximation. To solve the k-median problem\nwe use the single swap algorithm due to Arya et al. (2004), which also gets a 5-approximation in\nthe worst case, but performs much better in practice (Kanungo et al. , 2002).\n\nResults. Figure 3 shows the results for k-center for the three datasets, and Figure 3 shows the same\nfor the k-median objective. In all of the cases, we run with t(cid:48) = 2, that is we aim for balance of at\nleast 0.5 in each cluster.\nObserve that the balance of the solutions produced by the classical algorithms is very low, and in\nfour out of the six cases, the balance is 0 for larger values of k, meaning that the optimal solution\nhas monochromatic clusters. Moreover, this is not an isolated incident, for instance the k-median\ninstance of the Bank dataset has three monochromatic clusters starting at k = 12. Finally, left\nunchecked, the balance in all datasets keeps decreasing as the clustering becomes more discrimina-\ntive, with increased k.\nOn the other hand the fair clustering solutions maintain a balanced solution even as k increases.\nNot surprisingly, the balance comes with a corresponding increase in cost, and the fair solutions are\ncostlier than their unfair counterparts. In each plot we also show the cost of the fairlet decomposition,\nwhich represents the limit of the cost of the fair clustering; in all of the scenarios the overall cost of\nthe clustering converges to the cost of the fairlet decomposition.\n\n6 Conclusions\n\nIn this work we initiate the study of fair clustering algorithms. Our main result is a reduction of\nfair clustering to classical clustering via the notion of fairlets. We gave ef\ufb01cient approximation\nalgorithms for \ufb01nding fairlet decompositions, and proved lower bounds showing that fairness can\nintroduce a computational bottleneck. An immediate future direction is to tighten the gap between\nlower and upper bounds by improving the approximation ratio of the decomposition algorithms, or\ngiving stronger hardness results. A different avenue is to extend these results to situations where\nthe protected class is not binary, but can take on multiple values. Here there are multiple challenges\nincluding de\ufb01ning an appropriate version of fairness.\n\nAcknowledgments\n\nFlavio Chierichetti was supported in part by the ERC Starting Grant DMAP 680153, by a Google\nFocused Research Award, and by the SIR Grant RBSI14Q743.\n\n2https://archive.ics.uci.edu/ml/datasets/diabetes\n3https://archive.ics.uci.edu/ml/datasets/Bank+Marketing\n4https://archive.ics.uci.edu/ml/datasets/adult\n\n8\n\n\fReferences\nAggarwal, Charu C., & Reddy, Chandan K. 2013. Data Clustering: Algorithms and Applications.\n\n1st edn. Chapman & Hall/CRC.\n\nArya, Vijay, Garg, Naveen, Khandekar, Rohit, Meyerson, Adam, Munagala, Kamesh, & Pandit,\nVinayaka. 2004. Local search heuristics for k-median and facility location problems. SIAM J.\nComput., 33(3), 544\u2013562.\n\nBiddle, Dan. 2006. Adverse Impact and Test Validation: A Practitioner\u2019G guide to Valid and De-\n\nfensible Employment Testing. Gower Publishing, Ltd.\n\nCorbett-Davies, Sam, Pierson, Emma, Feller, Avi, Goel, Sharad, & Huq, Aziz. 2017. Algorithmic\nDecision Making and the Cost of Fairness. Pages 797\u2013806 of: Proceedings of the 23rd ACM\nSIGKDD International Conference on Knowledge Discovery and Data Mining. KDD \u201917. New\nYork, NY, USA: ACM.\n\nDwork, Cynthia, Hardt, Moritz, Pitassi, Toniann, Reingold, Omer, & Zemel, Richard. 2012. Fairness\n\nthrough awareness. Pages 214\u2013226 of: ITCS.\n\nFeldman, Michael, Friedler, Sorelle A., Moeller, John, Scheidegger, Carlos, & Venkatasubramanian,\n\nSuresh. 2015. Certifying and removing disparate impact. Pages 259\u2013268 of: KDD.\n\nGonzalez, T. 1985. Clustering to minimize the maximum intercluster distance. TCS, 38, 293\u2013306.\n\nHardt, Moritz, Price, Eric, & Srebro, Nati. 2016. Equality of opportunity in supervised learning.\n\nPages 3315\u20133323 of: NIPS.\n\nJoseph, Matthew, Kearns, Michael, Morgenstern, Jamie H., & Roth, Aaron. 2016. Fairness in learn-\n\ning: Classic and contextual bandits. Pages 325\u2013333 of: NIPS.\n\nKamishima, Toshihiro, Akaho, Shotaro, Asoh, Hideki, & Sakuma, Jun. 2012. Fairness-aware clas-\n\nsi\ufb01er with prejudice remover regularizer. Pages 35\u201350 of: ECML/PKDD.\n\nKanungo, Tapas, Mount, David M., Netanyahu, Nathan S., Piatko, Christine D., Silverman, Ruth, &\nWu, Angela Y. 2002. An ef\ufb01cient k-means clustering algorithm: Analysis and implementation.\nPAMI, 24(7), 881\u2013892.\n\nKleinberg, Jon, Lakkaraju, Himabindu, Leskovec, Jure, Ludwig, Jens, & Mullainathan, Sendhil.\n\n2017a. Human decisions and machine predictions. Working Paper 23180. NBER.\n\nKleinberg, Jon M., Mullainathan, Sendhil, & Raghavan, Manish. 2017b. Inherent trade-offs in the\n\nfair determination of risk scores. In: ITCS.\n\nKohavi, Ron. 1996. Scaling up the accuracy of naive-Bayes classi\ufb01ers: A decision-tree hybrid.\n\nPages 202\u2013207 of: KDD.\n\nLi, Shi, & Svensson, Ola. 2013. Approximating k-median via pseudo-approximation. Pages 901\u2013\n\n910 of: STOC.\n\nLichman, M. 2013. UCI Machine Learning Repository.\n\nLuong, Binh Thanh, Ruggieri, Salvatore, & Turini, Franco. 2011. k-NN as an implementation of\n\nsituation testing for discrimination discovery and prevention. Pages 502\u2013510 of: KDD.\n\nMoro, S\u00b4ergio, Cortez, Paulo, & Rita, Paulo. 2014. A data-driven approach to predict the success of\n\nbank telemarketing. Decision Support Systems, 62, 22\u201331.\n\nXu, Rui, & Wunsch, Don. 2009. Clustering. Wiley-IEEE Press.\n\nZafar, Muhammad Bilal, Valera, Isabel, Gomez-Rodriguez, Manuel, & Gummadi, Krishna P. 2017.\n\nFairness constraints: Mechanisms for fair classi\ufb01cation. Pages 259\u2013268 of: AISTATS.\n\nZemel, Richard S., Wu, Yu, Swersky, Kevin, Pitassi, Toniann, & Dwork, Cynthia. 2013. Learning\n\nfair representations. Pages 325\u2013333 of: ICML.\n\n9\n\n\f", "award": [], "sourceid": 2591, "authors": [{"given_name": "Flavio", "family_name": "Chierichetti", "institution": "Sapienza University"}, {"given_name": "Ravi", "family_name": "Kumar", "institution": "Google"}, {"given_name": "Silvio", "family_name": "Lattanzi", "institution": "Google"}, {"given_name": "Sergei", "family_name": "Vassilvitskii", "institution": "Google"}]}