{"title": "When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness", "book": "Advances in Neural Information Processing Systems", "page_first": 6414, "page_last": 6423, "abstract": "Machine learning is now being used to make crucial decisions about people's lives. For nearly all of these decisions there is a risk that individuals of a certain race, gender, sexual orientation, or any other subpopulation are unfairly discriminated against. Our recent method has demonstrated how to use techniques from counterfactual inference to make predictions fair across different subpopulations. This method requires that one provides the causal model that generated the data at hand. In general, validating all causal implications of the model is not possible without further assumptions. Hence, it is desirable to integrate competing causal models to provide counterfactually fair decisions, regardless of which causal \"world\" is the correct one. In this paper, we show how it is possible to make predictions that are approximately fair with respect to multiple possible causal models at once, thus mitigating the problem of exact causal specification. We frame the goal of learning a fair classifier as an optimization problem with fairness constraints entailed by competing causal explanations. We show how this optimization problem can be efficiently solved using gradient-based methods. We demonstrate the flexibility of our model on two real-world fair classification problems. We show that our model can seamlessly balance fairness in multiple worlds with prediction accuracy.", "full_text": "When Worlds Collide: Integrating Different\n\nCounterfactual Assumptions in Fairness\n\nChris Russell\u2217\n\nThe Alan Turing Institute and\n\nUniversity of Surrey\n\ncrussell@turing.ac.uk\n\nMatt J. Kusner\u2217\n\nThe Alan Turing Institute and\n\nUniversity of Warwick\n\nmkusner@turing.ac.uk\n\nJoshua R. Loftus\u2020\nNew York University\nloftus@nyu.edu\n\nRicardo Silva\n\nThe Alan Turing Institute and\nUniversity College London\nricardo@stats.ucl.ac.uk\n\nAbstract\n\nMachine learning is now being used to make crucial decisions about people\u2019s lives.\nFor nearly all of these decisions there is a risk that individuals of a certain race,\ngender, sexual orientation, or any other subpopulation are unfairly discriminated\nagainst. Our recent method has demonstrated how to use techniques from coun-\nterfactual inference to make predictions fair across different subpopulations. This\nmethod requires that one provides the causal model that generated the data at hand.\nIn general, validating all causal implications of the model is not possible without\nfurther assumptions. Hence, it is desirable to integrate competing causal models to\nprovide counterfactually fair decisions, regardless of which causal \u201cworld\u201d is the\ncorrect one. In this paper, we show how it is possible to make predictions that are\napproximately fair with respect to multiple possible causal models at once, thus\nmitigating the problem of exact causal speci\ufb01cation. We frame the goal of learning\na fair classi\ufb01er as an optimization problem with fairness constraints entailed by\ncompeting causal explanations. We show how this optimization problem can be\nef\ufb01ciently solved using gradient-based methods. We demonstrate the \ufb02exibility of\nour model on two real-world fair classi\ufb01cation problems. We show that our model\ncan seamlessly balance fairness in multiple worlds with prediction accuracy.\n\n1\n\nIntroduction\n\nMachine learning algorithms can do extraordinary things with data. From generating realistic images\nfrom noise [7], to predicting what you will look like when you become older [18]. Today, governments\nand other organizations make use of it in criminal sentencing [4], predicting where to allocate police\nof\ufb01cers [3, 16], and to estimate an individual\u2019s risk of failing to pay back a loan [8]. However, in\nmany of these settings, the data used to train machine learning algorithms contains biases against\ncertain races, sexes, or other subgroups in the population [3, 6]. Unwittingly, this discrimination is\nthen re\ufb02ected in the predictions of such algorithms. Simply being born male or female can change an\nindividual\u2019s opportunities that follow from automated decision making trained to re\ufb02ect historical\nbiases. The implication is that, without taking this into account, classi\ufb01ers that maximize accuracy\nrisk perpetuating biases present in society.\n\n\u2217Equal contribution.\n\u2020This work was done while JL was a Research Fellow at the Alan Turing Institute.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFor instance, consider the rise of \u2018predictive policing\u2019, described as \u201ctaking data from disparate\nsources, analyzing them, and then using the results to anticipate, prevent and respond more effectively\nto future crime\u201d [16]. Today, 38% of U.S. police departments surveyed by the Police Executive\nResearch Forum are using predictive policing and 70% plan to in the next 2 to 5 years. However,\nthere have been signi\ufb01cant doubts raised by researchers, journalists, and activists that if the data used\nby these algorithms is collected by departments that have been biased against minority groups, the\npredictions of these algorithms could re\ufb02ect that bias [9, 12].\nAt the same time, fundamental mathematical results make it dif\ufb01cult to design fair classi\ufb01ers. In\ncriminal sentencing the COMPAS score [4] predicts if a prisoner will commit a crime upon release,\nand is widely used by judges to set bail and parole. While it has been shown that black and white\ndefendants with the same COMPAS score commit a crime at similar rates after being released [1], it\nwas also shown that black individuals were more often incorrectly predicted to commit crimes after\nrelease by COMPAS than white individuals were [2]. In fact, except for very speci\ufb01c cases, it is\nimpossible to balance these measures of fairness [3, 10, 20].\nThe question becomes how to address the fact that the data itself may bias the learning algorithm and\neven addressing this is theoretically dif\ufb01cult. One promising avenue is a recent approach, introduced\nby us in [11], called counterfactual fairness. In this work, we model how unfairness enters a dataset\nusing techniques from causal modeling. Given such a model, we state whether an algorithm is fair if\nit would give the same predictions had an individual\u2019s race, sex, or other sensitive attributes been\ndifferent. We show how to formalize this notion using counterfactuals, following a rich tradition of\ncausal modeling in the arti\ufb01cial intelligence literature [15], and how it can be placed into a machine\nlearning pipeline. The big challenge in applying this work is that evaluating a counterfactual e.g.,\n\u201cWhat if I had been born a different sex?\u201d, requires a causal model which describes how your sex\nchanges your predictions, other things being equal.\nUsing \u201cworld\u201d to describe any causal model evaluated at a particular counterfactual con\ufb01guration,\nwe have dependent \u201cworlds\u201d within a same causal model that can never be jointly observed, and\npossibly incompatible \u201cworlds\u201d across different models. Questions requiring the joint distribution of\ncounterfactuals are hard to answer, as they demand partially untestable \u201ccross-world\u201d assumptions\n[5, 17], and even many of the empirically testable assumptions cannot be falsi\ufb01ed from observational\ndata alone [14], requiring possibly infeasible randomized trials. Because of this, different experts as\nwell as different algorithms may disagree about the right causal model. Further disputes may arise\ndue to the con\ufb02ict between accurately modeling unfair data and producing a fair result, or because\nsome degrees of unfairness may be considered allowable while others are not.\nTo address these problems, we propose a method for ensuring fairness within multiple causal models.\nWe do so by introducing continuous relaxations of counterfactual fairness. With these relaxations\nin hand, we frame learning a fair classi\ufb01er as an optimization problem with fairness constraints.\nWe give ef\ufb01cient algorithms for solving these optimization problems for different classes of causal\nmodels. We demonstrate on three real-world fair classi\ufb01cation datasets how our model is able to\nsimultaneously achieve fairness in multiple models while \ufb02exibly trading off classi\ufb01cation accuracy.\n\n2 Background\n\nWe begin by describing aspects causal modeling and counterfactual inference relevant for modeling\nfairness in data. We then brie\ufb02y review counterfactual fairness [11], but we recommend that the\ninterested reader should read the original paper in full. We describe how uncertainty may arise over\nthe correct causal model and some dif\ufb01culties with the original counterfactual fairness de\ufb01nition.\nWe will use A to denote the set of protected attributes, a scalar in all of our examples but which\nwithout loss of generality can take the form of a set. Likewise, we denote as Y the outcome of\ninterest that needs to be predicted using a predictor \u02c6Y . Finally, we will use X to denote the set of\nobserved variables other than A and Y , and U to denote a set of hidden variables, which without loss\nof generality can be assumed to have no observable causes in a corresponding causal model.\n\n2.1 Causal Modeling and Counterfactual Inference\n\nWe will use the causal framework of Pearl [15], which we describe using a simple example. Imagine\nwe have a dataset of university students and we would like to model the causal relationships that\n\n2\n\n\fFigure 1: Dark nodes correspond to observed variables and light nodes are unobserved. (Left) This\nmodel predicts that both study S and motivation U directly cause graduation rate Y . However, this\nmodel does not take into account how an individual\u2019s race may affect observed variables. (Center)\nIn this model, we encode how an individual\u2019s race may affect whether they need to have a job J\nwhile attending university. (Right) We may wonder if there are further biases in society to expect\ndifferent rates of study for different races. We may also suspect that having a job may in\ufb02uence one\u2019s\ngraduation likelihood, independent of study.\n\nlead up to whether a student graduates on time. In our dataset, we have information about whether\na student holds a job J, the number of hours they study per week S, and whether they graduate\nY . Because we are interested in modeling any unfairness in our data, we also have information\nabout a student\u2019s race A. Pearl\u2019s framework allows us to model causal relationships between these\nvariables and any postulated unobserved latent variables, such as some U quantifying how motivated\na student is to graduate. This uses a directed acyclic graph (DAG) with causal semantics, called\na causal diagram. We show a possible causal diagram for this example in Figure 1, (Left). Each\nnode corresponds to a variable and each set of edges into a node corresponds to a generative model\nspecifying how the \u201cparents\u201d of that node causally generated it. In its most speci\ufb01c description, this\ngenerative model is a functional relationship deterministically generating its output given a set of\nobserved and latent variables. For instance, one possible set of functions described by this model\ncould be as follows:\n\nS = g(J, U ) + \u0001 Y = I[\u03c6(h(S, U )) \u2265 0.5]\n\n(1)\nwhere g, h are arbitrary functions and I is the indicator function that evaluates to 1 if the condition\nholds and 0 otherwise. Additionally, \u03c6 is the logistic function \u03c6(a) = 1/(1 + exp(\u2212a)) and \u0001 is\ndrawn independently of all variables from the standard normal distribution N (0, 1). It is also possible\nto specify non-deterministic relationships:\n\nU \u223c N (0, 1) S \u223c N (g(J, U ), \u03c3S) Y \u223c Bernoulli(\u03c6(h(S, U ))\n\n(2)\nwhere \u03c3S is a model parameter. The power of this causal modeling framework is that, given a\nfully-speci\ufb01ed set of equations, we can compute what (the distribution of) any of the variables would\nhave been had certain other variables been different, other things being equal. For instance, given\nthe causal model we can ask \u201cWould individual i have graduated (Y = 1) if they hadn\u2019t had a job?\u201d,\neven if they did not actually graduate in the dataset. Questions of this type are called counterfactuals.\nFor any observed variables V, W we denote the value of the counterfactual \u201cWhat would V have\nbeen if W had been equal to w?\u201d as VW\u2190w. Pearl et al. [15] describe how to compute these\ncounterfactuals (or, for non-deterministic models, how to compute their distribution) using three\nsteps: 1. Abduction: Given the set of observed variables X ={X1, . . . , Xd} compute the values of\nthe set of unobserved variables U ={U1, . . . , Up} given the model (for non-deterministic models,\nwe compute the posterior distribution P(U|X )); 2. Action: Replace all occurrences of the variable\nW with value w in the model equations; 3. Prediction: Using the new model equations, and U (or\nP(U|X )) compute the value of V (or P (V |X )). This \ufb01nal step provides the value or distribution of\nVW\u2190w given the observed, factual, variables.\n\n2.2 Counterfactual Fairness\n\nIn the above example, the university may wish to predict Y , whether a student will graduate, in\norder to determine if they should admit them into an honors program. While the university prefers\nto admit students who will graduate on time, it is willing to give a chance to some students without\na con\ufb01dent graduation prediction in order to remedy unfairness associated with race in the honors\n\n3\n\n(study)(job)(graduated)JSY(motivated)U(race)(study)(job)(graduated)AJSY(motivated)U(race)(study)(job)(graduated)AJSY(motivated)U\fprogram. The university believes that whether a student needs a job J may be in\ufb02uenced by their race.\nAs evidence they cite the National Center for Education Statistics, which reported3 that fewer (25%)\nAsian-American students were employed while attending university as full-time students relative to\nstudents of other races (at least 35%). We show the corresponding casual diagram for this in Figure 1\n(Center). As having a job J affects study which affects graduation likelihood Y this may mean\ndifferent races take longer to graduate and thus unfairly have a harder time getting into the honors\nprogram.\nCounterfactual fairness aims to correct predictions of a label variable Y that are unfairly altered by\nan individual\u2019s sensitive attribute A (race in this case). Fairness is de\ufb01ned in terms of counterfactuals:\nDe\ufb01nition 1 (Counterfactual Fairness [11]). A predictor \u02c6Y of Y is counterfactually fair given the\nsensitive attribute A = a and any observed variables X if\n\nP( \u02c6YA\u2190a = y | X = x, A = a) = P( \u02c6YA\u2190a(cid:48) = y | X = x, A = a)\n\n(3)\n\nfor all y and a(cid:48)(cid:54)= a.\nIn what follows, we will also refer to \u02c6Y as a function f (x, a) of hidden variables U, of (usually a\nsubset of) an instantiation x of X , and of protected attribute A. We leave U implicit in this notation\nsince, as we will see, this set might differ across different competing models. The notation implies\n\n\u02c6YA\u2190a = f (xA\u2190a, a).\n\n(4)\n\nNotice that if counterfactual fairness holds exactly for \u02c6Y , then this predictor can only be a non-trivial\nfunction of X for those elements X \u2208 X such that XA\u2190a = XA\u2190a(cid:48). Moreover, by construction\nUA\u2190a = UA\u2190a(cid:48), as each element of U is de\ufb01ned to have no causes in A \u222a X .\nThe probabilities in eq. (3) are given by the posterior distribution over the unobserved variables\nP(U | X = x, A = a). Hence, a counterfactual \u02c6YA\u2190a may be deterministic if this distribution is\ndegenerate, that is, if U is a deterministic function of X and A. One nice property of this de\ufb01nition\nis that it is easy to interpret: a decision is fair if it would have been the same had a person had a\ndifferent A (e.g., a different race4), other things being equal. In [11], we give an ef\ufb01cient algorithm\nfor designing a predictor that is counterfactually fair. In the university graduation example, a predictor\nconstructed from the unobserved motivation variable U is counterfactually fair.\nOne dif\ufb01culty of the de\ufb01nition of counterfactual fairness is it requires one to postulate causal\nrelationships between variables, including latent variables that may be impractical to measure directly.\nIn general, different causal models will create different fair predictors \u02c6Y . But there are several reasons\nwhy it may be unrealistic to assume that any single, \ufb01xed causal model will be appropriate. There\nmay not be a consensus among experts or previous literature about the existence, functional form,\ndirection, or magnitude of a particular causal effect, and it may be impossible to determine these from\nthe available data without untestable assumptions. And given the sensitive, even political nature of\nproblems involving fairness, it is also possible that disputes may arise over the presence of a feature\nof the causal model, based on competing notions of dependencies and latent variables. Consider the\nfollowing example, formulated as a dispute over the presence of edges. For the university graduation\nmodel, one may ask if differences in study are due only to differences in employment, or whether\ninstead there is some other direct effect of A on study levels. Also, having a job may directly affect\ngraduation likelihood. We show these changes to the model in Figure 1 (Right). There is also potential\nfor disagreement over whether some causal paths from A to graduation should be excluded from the\nde\ufb01nition of fairness. For example, an adherent to strict meritocracy may argue the numbers of hours\na student has studied should not be given a counterfactual value. This could be incorporated in a\nseparate model by omitting chosen edges when propagating counterfactual information through the\ngraph in the Prediction step of counterfactual inference5. To summarize, there may be disagreements\nabout the right causal model due to: 1. Changing the structure of the DAG, e.g. adding an edge; 2.\nChanging the latent variables, e.g. changing the function generating a vertex to have a different signal\nvs. noise decomposition; 3. Preventing certain paths from propagating counterfactual values.\n\n3https://nces.ed.gov/programs/coe/indicator_ssa.asp\n4At the same time, the notion of a \u201ccounterfactual race,\u201d sex, etc. often raises debate. See [11] for our take\n\non this.\n\n5In the Supplementary Material of [11], we explain how counterfactual fairness can be restricted to particular\n\npaths from A to Y , as opposed to all paths.\n\n4\n\n\f3 Fairness under Causal Uncertainty\n\nIn this section, we describe a technique for learning a fair predictor without knowing the true casual\nmodel. We \ufb01rst describe why in general counterfactual fairness will often not hold in multiple\ndifferent models. We then describe a relaxation of the de\ufb01nition of counterfactual fairness for\nboth deterministic and non-deterministic models. Finally we show an ef\ufb01cient method for learning\nclassi\ufb01ers that are simultaneously accurate and fair in multiple worlds. In all that follows we denote\nsets in calligraphic script X , random variables in uppercase X, scalars in lowercase x, matrices in\nbold uppercase X, and vectors in bold lowercase x.\n\n3.1 Exact Counterfactual Fairness Across Worlds\n\nWe can imagine extending the de\ufb01nition of counterfactual fairness so that it holds for every plausible\ncausal world. To see why this is inherently dif\ufb01cult consider the setting of deterministic causal\nmodels. If each causal model of the world generates different counterfactuals then each additional\nmodel induces a new set of constraints that the classi\ufb01er must satisfy, and in the limit the only\nclassi\ufb01ers that are fair across all possible worlds are constant classi\ufb01ers. For non-deterministic\ncounterfactuals, these issues are magni\ufb01ed. To guarantee counterfactual fairness, Kusner et al.\n[11] assumed access to latent variables that hold the same value in an original datapoint and in its\ncorresponding counterfactuals. While the latent variables of one world can remain constant under the\ngeneration of counterfactuals from its corresponding model, there is no guarantee that they remain\nconstant under the counterfactuals generated from different models. Even in a two model case, if the\nP.D.F. of one model\u2019s counterfactual has non-zero density everywhere (as is the case under Gaussian\nnoise assumptions) it may be the case that the only classi\ufb01ers that satisfy counterfactual fairness\nfor both worlds are the constant classi\ufb01ers. If we are to achieve some measure of fairness from\ninformative classi\ufb01ers, and over a family of different worlds, we need a more robust alternative to\ncounterfactual fairness.\n\n3.2 Approximate Counterfactual Fairness\n\nWe de\ufb01ne two approximations to counterfactual fairness to solve the problem of learning a fair\nclassi\ufb01er across multiple causal worlds.\nDe\ufb01nition 2 ((\u0001, \u03b4)-Approximate Counterfactual Fairness). A predictor f (X , A) satis\ufb01es (\u0001, 0)-\napproximate counterfactual fairness ((\u0001, 0)-ACF) if, given the sensitive attribute A = a and any\ninstantiation x of the other observed variables X , we have that:\n\n(cid:12)(cid:12)f (xA\u2190a, a) \u2212 f (xA\u2190a(cid:48), a\n\n(cid:48)\n\n)(cid:12)(cid:12) \u2264 \u0001\n\n(5)\nfor all a(cid:48) (cid:54)= a if the system deterministically implies the counterfactual values of X . For a non-\ndeterministic causal system, f satis\ufb01es (\u0001, \u03b4)-approximate counterfactual fairness, ((\u0001, \u03b4)-ACF) if:\n\n(cid:12)(cid:12)f (XA\u2190a, a) \u2212 f (XA\u2190a(cid:48), a\n\n(cid:48)\n\n)\n\n(cid:12)(cid:12) \u2264 \u0001 | X = x, A = a) \u2265 1 \u2212 \u03b4\n\nP(\n\n(6)\n\nfor all a(cid:48)(cid:54)= a.\nBoth de\ufb01nitions must hold uniformly over the sample space of X \u00d7 A. The probability measures used\nare with respect to the conditional distribution of background latent variables U given the observations.\nWe leave a discussion of the statistical asymptotic properties of such plug-in estimator for future work.\nThese de\ufb01nitions relax counterfactual fairness to ensure that, for deterministic systems, predictions f\nchange by at most \u0001 when an input is replaced by its counterfactual. For non-deterministic systems,\nthe condition in (6) means that this \u0001 change must occur with high probability, where the probability is\nagain given by the posterior distribution P(U|X ) computed in the Abduction step of counterfactual\ninference. If \u0001 = 0, the deterministic de\ufb01nitions eq. (5) is equivalent to the original counterfactual\nfairness de\ufb01nition. If also \u03b4 = 0 the non-deterministic de\ufb01nition eq. (6) is actually a stronger condition\nthan the counterfactual fairness de\ufb01nition eq. (3) as it guarantees equality in probability instead of\nequality in distribution6.\n\n6In the Supplementary Material of [11], we describe in more detail the implications of the stronger condition.\n\n5\n\n\fAlgorithm 1 Multi-World Fairness\n1: Input: features X = [x1, . . . , xn], labels y = [y1, . . . , yn], sensitive attributes a = [a1, . . . , an],\n\nInitialize classi\ufb01er f\u03bb.\nwhile loop until convergence do\n\nprivacy parameters (\u0001, \u03b4), trade-off parameters L = [\u03bb1, . . . , \u03bbl].\n2: Fit causal models: M1, . . . , Mm using X, a (and possibly y).\n3: Sample counterfactuals: XA1\u2190a(cid:48), . . . ,XAm\u2190a(cid:48) for all unobserved values a(cid:48).\n4: for \u03bb \u2208 L do\n5:\n6:\n7:\n8:\n9:\n10:\n11: end for\n12: Select model f\u03bb: For deterministic models select the smallest \u03bb such that equation (5) using f\u03bb\n\nSelect random batches Xb of inputs and batch of counterfactuals XA1\u2190a(cid:48), . . . , XAm\u2190a(cid:48).\nCompute the gradient of equation (7).\nUpdate f\u03bb using any stochastic gradient optimization method.\n\nend while\n\nholds. For non-deterministic models select the \u03bb that corresponds to \u03b4 given f\u03bb.\n\n3.3 Learning a Fair Classi\ufb01er\n\nAssume we are given a dataset of n observations a = [a1, . . . , an] of the sensitive attribute A\nand of other features X = [x1, . . . , xn] drawn from X . We wish to accurately predict a label Y\ngiven observations y = [y1, . . . , yn] while also satisfying (\u0001, \u03b4)-approximate counterfactual fairness.\nWe learn a classi\ufb01er f (x, a) by minimizing a loss function (cid:96)(f (x, a), y). At the same time, we\nincorporate an unfairness term \u00b5j(f, x, a, a(cid:48)) for each causal model j to reduce the unfairness in f.\nWe formulate this as a penalized optimization problem:\n\nmin\n\nf\n\n1\nn\n\n(cid:96)(f (xi, ai), yi) + \u03bb\n\ni=1\n\nj=1\n\n(cid:48)\n\u00b5j(f, xi, ai, a\n\n)\n\n(7)\n\nwhere \u03bb trades-off classi\ufb01cation accuracy for multi-world fair predictions. We show how to naturally\nde\ufb01ne the unfairness function \u00b5j for deterministic and non-deterministic counterfactuals.\n\n(cid:48)\n\u00b5j(f, xi, ai, a\n\nDeterministic counterfactuals. To enforce (\u0001, 0)-approximate counterfactual fairness a natural\npenalty for unfairness is an indicator function which is one whenever (\u0001, 0)-ACF does not hold, and\nzero otherwise:\n\n)(cid:12)(cid:12) \u2265 \u0001]\n) := I[(cid:12)(cid:12)f (xi,Aj\u2190ai, ai) \u2212 f (xi,Aj\u2190a(cid:48), a\n(cid:12)(cid:12)f (xi,Aj\u2190ai, ai) \u2212 f (xi,Aj\u2190a(cid:48), a\n)(cid:12)(cid:12) \u2212 \u0001}\n\n(cid:48)\n\n(cid:48)\n\n(8)\nUnfortunately, the indicator function I is non-convex, discontinuous and dif\ufb01cult to optimize. Instead,\nwe propose to use the tightest convex relaxation to the indicator function:\n\n(cid:48)\n\u00b5j(f, xi, ai, a\n\n) := max{0,\n\n(9)\nNote that when (\u0001, 0)-approximate counterfactual fairness is not satis\ufb01ed \u00b5j is non-zero and thus the\noptimization problem will penalize f for this unfairness. Where (\u0001, 0)-approximate counterfactual\nfairness is satis\ufb01ed \u00b5j evaluates to 0 and it does not affect the objective. For suf\ufb01ciently large \u03bb,\nthe value of \u00b5j will dominate the training loss 1\ni=1 (cid:96)(f (xi, ai), yi) and any solution will satisfy\nn\n(\u0001, 0)-approximate counterfactual fairness. However, an overly large choice of \u03bb causes numeric\ninstability, and will decrease the accuracy of the classi\ufb01er found. Thus, to \ufb01nd the most accurate\nclassi\ufb01er that satis\ufb01es the fairness condition one can simply perform a grid or binary search for the\nsmallest \u03bb such that the condition holds.\n\n(cid:80)n\n\nm(cid:88)\n\n1\nn\n\nn(cid:88)\n\n(cid:88)\n\ni=1\n\na(cid:48)(cid:54)=ai\n\nn(cid:88)\n\nNon-deterministic counterfactuals. For non-deterministic counterfactuals we begin by writing a\nMonte-Carlo approximation to (\u0001, \u03b4)-ACF, eq. (6) as follows:\n\n(10)\nwhere xk is sampled from the posterior distribution P(U|X ). We can again form the tightest convex\nrelaxation of the left-hand side of the expression to yield our unfairness function:\n\ns=1\n\nAj\u2190ai, ai)\u2212f (xs\n\n(cid:48)\nAj\u2190a(cid:48), a\n\n1\nS\n\n)(cid:12)(cid:12)\u2265 \u0001)\u2264 \u03b4\n\n(cid:48)\n\u00b5j(f, xi, ai, a\n\n) :=\n\ni,Aj\u2190ai, ai) \u2212 f (xs\n\n(cid:48)\ni,Aj\u2190a(cid:48), a\n\n(11)\n\n)(cid:12)(cid:12) \u2212 \u0001}\n\nS(cid:88)\n\nI((cid:12)(cid:12)f (xs\nS(cid:88)\n\n1\nS\n\nmax{0,(cid:12)(cid:12)f (xs\n\ns=1\n\n6\n\n\fNote that different choices of \u03bb in eq. (7) correspond to different values of \u03b4. Indeed, by choosing\n\u03bb = 0 we have the (\u0001, \u03b4)-fair classi\ufb01er corresponding to an unfair classi\ufb01er7. While a suf\ufb01ciently\nlarge, but \ufb01nite, \u03bb will correspond to a (\u0001, 0) approximately counterfactually fair classi\ufb01er. By varying\n\u03bb between these two extremes, we induce classi\ufb01ers that satisfy (\u0001, \u03b4)-ACF for different values of \u03b4.\nWith these unfairness functions we have a differentiable optimization problem eq. (7) which can be\nsolved with gradient-based methods. Thus, our method allows practitioners to smoothly trade-off\naccuracy with multi-world fairness. We call our method Multi-World Fairness (MWF). We give a\ncomplete method for learning a MWF classi\ufb01er in Algorithm 1.\nFor both deterministic and non-deterministic models, this convex approximation essentially describes\nan expected unfairness that is allowed by the classi\ufb01er:\nDe\ufb01nition 3 (Expected \u0001-Unfairness). For any counterfactual a(cid:48) (cid:54)= a, the Expected \u0001-Unfairness of\na classi\ufb01er f, or E\u0001[f ], is\n\nmax{0,(cid:12)(cid:12)f (XA\u2190a, a) \u2212 f (XA\u2190a(cid:48), a\n\n(cid:48)\n\n(cid:105)\n)(cid:12)(cid:12) \u2212 \u0001} | X = x, A = a\n\nE(cid:104)\n\nWe note that the term max{0,(cid:12)(cid:12)f (XA\u2190a, a)\u2212f (XA\u2190a(cid:48), a(cid:48))(cid:12)(cid:12)\u2212\u0001} is strictly non-negative and therefore\n\nwhere the expectation is over any unobserved U (and is degenerate for deterministic counterfactuals).\n\nthe expected \u0001-unfairness is zero if and only if f satis\ufb01es (\u0001, 0)-approximate counterfactual fairness\nalmost everywhere.\n\n(12)\n\nLinear Classi\ufb01ers and Convexity Although we have presented these results in their most general\nform, it is worth noting that for linear classi\ufb01ers, convexity guarantees are preserved. The family of\nlinear classi\ufb01ers we consider is relatively broad, and consists those linear in their learned weights w,\nas such it includes both SVMs and a variety of regression methods used in conjuncture with kernels\nor \ufb01nite polynomial bases.\nConsider any classi\ufb01er whose output is linear in the learned parameters, i.e., the family of classi\ufb01ers\nl wlgl(X , a), for a set of \ufb01xed kernels gl. Then the expected\n\nf all have the form f (X , A) = (cid:80)\nmax{0,(cid:12)(cid:12)f (XA\u2190a, a) \u2212 f (XA\u2190a(cid:48), a\nmax{0,(cid:12)(cid:12)(cid:88)\n\n))(cid:12)(cid:12)}(cid:105)\n\n(13)\n\n\u0001-unfairness is a linear function of w taking the form:\n\nwl(gl(XA\u2190a, a) \u2212 gl(XA\u2190a(cid:48), a\n(cid:48)\n\n)(cid:12)(cid:12) \u2212 \u0001}(cid:105)\n\n(cid:48)\n\nE(cid:104)\n= E(cid:104)\n\nThis expression is linear in w and therefore, if the classi\ufb01cation loss is also convex (as is the case for\nmost regression tasks), a global optima can be ready found via convex programming. In particular,\nglobally optimal linear classi\ufb01ers satisfying (\u0001, 0)-ACF or (\u0001, \u03b4)-ACF, can be found ef\ufb01ciently.\n\nl\n\nBayesian alternatives and their shortcomings. One may argue that a more direct alternative is to\nprovide probabilities associated with each world and to marginalize set of the optimal counterfactually\nfair classi\ufb01ers over all possible worlds. We argue this is undesirable for two reasons: \ufb01rst, the averaged\nprediction for any particular individual may violate (3) by an undesirable margin for one, more or\neven all considered worlds; second, a practitioner may be restricted by regulations to show that, to\nthe best of their knowledge, the worst-case violation is bounded across all viable worlds with high\nprobability. However, if the number of possible models is extremely large (for example if the causal\nstructure of the world is known, but the associated parameters are not) and we have a probability\nassociated with each world, then one natural extension is to adapt Expected \u0001-Unfairness eq. (3) to\nmarginalize over the space of possible worlds. However, we leave this extension to future work.\n\n4 Experiments\n\nWe demonstrate the \ufb02exibility of our method on two real-world fair classi\ufb01cation problems: 1. fair\npredictions of student performance in law schools; and 2. predicting whether criminals will re-offend\nupon being released. For each dataset we begin by giving details of the fair prediction problem. We\nthen introduce multiple causal models that each possibly describe how unfairness plays a role in the\ndata. Finally, we give results of Multi-World Fairness (MWF) and show how it changes for different\nsettings of the fairness parameters (\u0001, \u03b4).\n\n7In the worst case, \u03b4 may equal 1.\n\n7\n\n\fFigure 2: Causal models for the law school and COMPAS datasets. Shaded nodes are observed an\nunshaded nodes are unobserved. For each dataset we consider two possible causal worlds. The \ufb01rst\nlaw school model is a deterministic causal model with additive unobserved variables \u0001G, \u0001L, \u0001Y . The\nsecond is a non-deterministic causal model with a latent variable U. For COMPAS, the \ufb01rst causal\nmodel omits the dotted lines, and the second includes them. Both models are non-deterministic\nmodels with latent variables UJ , UD. The large white arrows signify that variables A, E are connected\nto every variable contained in the box they point to. The law school model equations are given in\neq. (14) and COMPAS model equations are shown in eq. (15).\n\n4.1 Fairly predicting law grades\n\nWe begin by investigating a dataset of survey results across\n163 U.S. law schools conducted by the Law School Admis-\nsion Council [19] . It contains information on over 20,000\nstudents including their race A (here we look at just black\nand white students as this difference had the largest effect\nin counterfactuals in [11]), their grade-point average G\nobtained prior to law school, law school entrance exam\nscores L, and their \ufb01rst year average grade Y . Consider\nthat law schools may be interested in predicting Y for all\napplicants to law school using G and L in order to decide\nwhether to accept or deny them entrance. However, due to societal inequalities, an individual\u2019s race\nmay have affected their access to educational opportunities, and thus affected G and L. Accordingly,\nwe model this possibility using the causal graphs in Figure 2 (Left). In this graph we also model the\nfact that G, L may have been affected by other unobserved quantities. However, we may be uncertain\nwhether what the right way to model these unobserved quantities is. Thus we propose to model this\ndataset with the two worlds described in Figure 2 (Left). Note that these are the same models as used\nin Kusner et al. [11] (except here we consider race as the sensitive variable). The corresponding\nequations for these two worlds are as follows:\n\nFigure 3: Test prediction results for differ-\nent \u0001 on the law school dataset.\n\nG = bG + wA\nGA + \u0001G\nL = bL + wA\nL A + \u0001L\nY = bY + wA\nY A + \u0001Y\n\u0001G, \u0001L, \u0001Y \u223c N (0, 1)\n\nG \u223c N (bG + wA\nGA + wU\nL \u223c Poisson(exp(bL + wA\nY \u223c N (wA\nY U, 1)\nU \u223c N (0, 1)\n\nY A + wU\n\nGU, \u03c3G)\nL A + wU\n\nL U ))\n\n(14)\n\nwhere variables b, w are parameters of the causal model.\n\nResults. Figure 3 shows the result of learning a linear MWF classi\ufb01er on the deterministic law\nschool models. We split the law school data into a random 80/20 train/test split and we \ufb01t casual\nmodels and classi\ufb01ers on the training set and evaluate performance on the test set. We plot the test\nRMSE of the constant predictor satisfying counterfactual fairness in red, the unfair predictor with\n\u03bb = 0, and MWF, averaged across 5 runs. Here as we have one deterministic and one non-deterministic\nmodel we will evaluate MWF for different \u0001 and \u03b4 (with the knowledge that the only change in the\nMWF classi\ufb01er for different \u03b4 is due to the non-deterministic model). For each \u0001, \u03b4, we selected the\nsmallest \u03bb across a grid (\u03bb \u2208 {10\u2212510\u22124, . . . , 1010}) such that the constraint in eq. (6) held across\n95% of the individuals in both models. We see that MWF is able to reliably sacri\ufb01ce accuracy for\nfairness as \u0001 is reduced. Note that as we change \u03b4 we can further alter the accuracy/fairness trade-off.\n\n8\n\n(race)A\u270fG\u270fLlaw schoolCOMPAS(race)(age)A(juvenilefelonies)(juvenilecriminality)(adultcriminality)JF(juvenilemisdem.)JMUJUDC(COMPAS)TP(type ofcrime)(num. priors)E(GPA)G(LSAT)LY(grade)(race)A(GPA)G(LSAT)LY(grade)U(know)\u270fY\u270ftest RMSEmulti-world fairnessunfair constant predictor(=0)=0.1=0.3=0.5\fFigure 4: Test prediction results for different \u0001 and \u03b4 on the COMPAS dataset.\n\n4.2 Fair recidivism prediction (COMPAS)\n\nWe next turn our attention to predicting whether a criminal will re-offend, or \u2018recidivate\u2019 after being\nreleased from prison. ProPublica [13] released data on prisoners in Broward County, Florida who\nwere awaiting a sentencing hearing. For each of the prisoners we have information on their race A\n(as above we only consider black versus white individuals), their age E, their number of juvenile\nfelonies JF , juvenile misdemeanors JM , the type of crime they committed T , the number of prior\noffenses they have P , and whether they recidivated Y . There is also a proprietary COMPAS score\n[13] C designed to indicate the likelihood a prisoner recidivates.\nWe model this dataset with two different non-deterministic causal models, shown in Figure 2 (Right).\nThe \ufb01rst model includes the dotted edges, the second omits them. In both models we believe that\ntwo unobserved latent factors juvenile criminality UJ and adult criminality UD also contribute to\nJF , JM , C, T, P . We show the equations for both of our casual models below, where the \ufb01rst causal\nmodel includes the blue terms and the second does not:\n\nT \u223c Bernoulli(\u03c6(bT + wUD\nC \u223c N (bC + wUD\nC UD + wE\nP \u223c Poisson(exp(bP + wUD\nJF \u223c Poisson(exp(bJF + wUJ\nJM \u223c Poisson(exp(bJM + wUJ\n\nC UD + wE\nC E + wA\nP UD + wE\nJF + wE\nJM + wE\n\nC E + wA\nC A + wT\nP E + wA\n\nJF E + wA\nJM E + wA\n\nC A)\nCT + wP\nP A))\nJF A))\nJM A))\n\nC P + wJF\n\nC JF + wJM\n\nC JM , \u03c3C)\n\n(15)\n\n[UJ , UD] \u223c N (0, \u03a3)\n\nResults. Figure 4 shows how classi\ufb01cation accuracy using both logistic regression (linear) and\na 3-layer neural network (deep) changes as both \u0001 and \u03b4 change. We split the COMPAS dataset\nrandomly into an 80/20 train/test split, and report all results on the test set. As in the law school\nexperiment we grid-search over \u03bb to \ufb01nd the smallest value such that for any \u0001 and \u03b4 the (\u0001, \u03b4)-ACF)\nconstraint in eq. (6) is satis\ufb01ed for at least 95% of the individuals in the dataset, across both worlds.\nWe average all results except the constant classi\ufb01er over 5 runs and plot the mean and standard\ndeviations. We see that for small \u03b4 (high fairness) both linear and deep MWF classi\ufb01ers signi\ufb01cantly\noutperform the constant classi\ufb01er and begin to approach the accuracy of the unfair classi\ufb01er as \u0001\nincreases. As we increase \u03b4 (lowered fairness) the deep classi\ufb01er is better able to learn a decision\nboundary that trades-off accuracy for fairness. But if \u0001, \u03b4 is increased enough (e.g., \u0001\u2265 0.13, \u03b4 = 0.5),\nthe linear MWF classi\ufb01er matches the performance of the deep classi\ufb01er.\n\n5 Conclusion\n\nThis paper has presented a natural extension to counterfactual fairness that allows us to guarantee fair\nproperties of algorithms, even when we are unsure of the causal model that describes the world.\nAs the use of machine learning becomes widespread across many domains, it becomes more important\nto take algorithmic fairness out of the hands of experts and make it available to everybody. The\nconceptual simplicity of our method, our robust use of counterfactuals, and the ease of implementing\nour method mean that it can be directly applied to many interesting problems. A further bene\ufb01t of\nour approach over previous work on counterfactual fairness is that our approach only requires the\nestimation of counterfactuals at training time, and no knowledge of latent variables during testing. As\nsuch, our classi\ufb01ers offer a fair drop-in replacement for other existing classi\ufb01ers.\n\n9\n\n=0.5\u270f=0.4\u270f=0.1=0.3\u270ftest accuracy\u270fMWF (linear)MWF (deep)constantunfair (linear)unfair (deep)\f6 Acknowledgments\n\nThis work was\nEP/N510129/1.\nEP/P022529/1.\n\nsupported by The Alan Turing Institute under\nthe EPSRC grant\nCR acknowledges additional support under the EPSRC Platform Grant\n\nReferences\n[1] Compas risk scales: Demonstrating accuracy equity and predictive parity performance of the compas risk\n\nscales in broward county, 2016. 2\n\n[2] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. https://www.propublica.\norg/article/machine-bias-risk-assessments-in-criminal-sentencing, 2016. Accessed:\nFri 19 May 2017. 2\n\n[3] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice\n\nrisk assessments: The state of the art. arXiv preprint arXiv:1703.09207, 2017. 1, 2\n\n[4] Tim Brennan, William Dieterich, and Beate Ehret. Evaluating the predictive validity of the compas risk\n\nand needs assessment system. Criminal Justice and Behavior, 36(1):21\u201340, 2009. 1, 2\n\n[5] A. P. Dawid. Causal inference without counterfactuals. Journal of the American Statistical Association,\n\npages 407\u2013448, 2000. 2\n\n[6] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through\nawareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pages\n214\u2013226. ACM, 2012. 1\n\n[7] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,\nAaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information\nProcessing Systems, pages 2672\u20132680, 2014. 1\n\n[8] Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine-learning\n\nalgorithms. Journal of Banking & Finance, 34(11):2767\u20132787, 2010. 1\n\n[9] Keith Kirkpatrick. It\u2019s not the algorithm, it\u2019s the data. Communications of the ACM, 60(2):21\u201323, 2017. 2\n\n[10] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination\n\nof risk scores. arXiv preprint arXiv:1609.05807, 2016. 2\n\n[11] Matt J Kusner, Joshua R Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. Advances in\n\nNeural Information Processing Systems, 31, 2017. 2, 4, 5, 8\n\n[12] Moish Kutnowski. The ethical dangers and merits of predictive policing. Journal of Community Safety and\n\nWell-Being, 2(1):13\u201317, 2017. 2\n\n[13] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism\n\nalgorithm. ProPublica (5 2016), 2016. 9\n\n[14] David Lopez-Paz. From dependence to causation. arXiv preprint arXiv:1607.03300, 2016. 2\n\n[15] J. Pearl, M. Glymour, and N. Jewell. Causal Inference in Statistics: a Primer. Wiley, 2016. 2, 3\n\n[16] Beth Pearsall. Predictive policing: The future of law enforcement. National Institute of Justice Journal,\n\n266(1):16\u201319, 2010. 1, 2\n\n[17] T.S. Richardson and J. Robins. Single world intervention graphs (SWIGs): A uni\ufb01cation of the counter-\nfactual and graphical approaches to causality. Working Paper Number 128, Center for Statistics and the\nSocial Sciences, University of Washington, 2013. 2\n\n[18] Paul Upchurch, Jacob Gardner, Kavita Bala, Robert Pless, Noah Snavely, and Kilian Weinberger. Deep\n\nfeature interpolation for image content changes. arXiv preprint arXiv:1611.05507, 2016. 1\n\n[19] Linda F Wightman. Lsac national longitudinal bar passage study. lsac research report series. 1998. 8\n\n[20] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness\nbeyond disparate treatment & disparate impact: Learning classi\ufb01cation without disparate mistreatment.\narXiv preprint arXiv:1610.08452, 2016. 2\n\n10\n\n\f", "award": [], "sourceid": 3209, "authors": [{"given_name": "Chris", "family_name": "Russell", "institution": "The Alan Turing Institute/ The University of Surrey"}, {"given_name": "Matt", "family_name": "Kusner", "institution": "Alan Turing Institute"}, {"given_name": "Joshua", "family_name": "Loftus", "institution": "The Alan Turing Institute"}, {"given_name": "Ricardo", "family_name": "Silva", "institution": "ucl.ac.uk"}]}