Reviews: When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness

This paper tackles the primary criticism aimed at applications of causal graphical models for fairness: one needs to completely believe an assumed causal model for the results to be valid. Instead, it presents a definition of fairness where we can assume many plausible causal models and requires fairness violations to be bounded below a threshold for all such plausible models. The authors present a simple way to formally express this idea: by defining an approximate notion of counterfactual fairness and using the amount of fairness violation as a regularizer for a supervised learner. This is an important theoretical advance and I think can lead to promising work. The key part, then, is to develop a method to construct counterfactual estimates. This is a hard problem because even for a single causal model, there might be unknown and unobserved confounders that affect relationships between observed variables. The authors use a method from past work where they first estimate the distribution for the unobserved confounders and then construct counterfactuals assuming perfect knowledge of the confounders. I find this method problematic because confounders can be a combination of many variables and can take many levels. It is unclear whether an arbitrary parameterization for them can account for all of their effects in a causal model. Further, if it was possible to model confounders and estimate counterfactuals from observed data in this way, then we could use it for every causal inference application (which is unlikely). It seems, therefore, that the estimated counterfactuals will depend heavily on the exact parameterization used for the confounders. I suggest that the authors discuss this limitations of their specific approach . It might also be useful to separate out counterfactual estimation as simply a pluggable component of their main contribution, which is to propose a learning algorithm robust to multiple causal models. That said, this is exactly where the concept of comparing multiple causal models can shine. To decrease dependence on specific parameterizations, one could imagine optimizing over many possible parameterized causal models. In the results section, the authors do test their method on 2 or 3 different worlds, but I think it will be useful if they can extend their analysis to many more causal worlds for each application. Not sure if there are constraints in doing so (computational or otherwise), but if so, will be good to mention them explicitly.

Reviewer 2

The authors consider a novel supervised learning problem with fairness constraints, where the goal is to find an optimal predictor that is counterfactually fair from a list of candidate causal models. The parameterizations of each candidate causal model are known. The authors incorporate the fairness constraint as a regularization term in the loss function. Evaluations are performed on two real-world datasets, and results show that the proposed method balance fairness in multiple worlds with prediction accuracy. While the idea of exploring a novel fairness measure in counterfactual semantics and enforcing it over multiple candidate models is interesting, there are a few issues I find confusing, which is listed next: 1. The Counterfactual Fairness definition (Def 1) is not clear. It is not immediate to see which counterfactual quantity the authors are trying to measure. Eq2, the probabilistic counterfactual fairness definition, measures the total causal effect (P(Y_x)) of the sensitive feature X on the predicted outcome Y. It is a relatively simple counterfactual quantity, which can be directly computed by physically setting X to a fixed value, without using Pearl’s algorithm of three steps. 2. If the authors are referring to the total causal effect, it is thus unnecessary to use a rather complicated algorithm to compute counterfactuals (line87-94). If the authors are indeed referring to the counterfactual fairness defined in [1], the motivation of using this novel counterfactual fairness has not been properly justified. A newly proposed fairness definition should often fall into one of following categories: i. It provides a stronger condition for existing fairness definitions; ii. It captures discriminations which are not covered by existing definitions; or iii. It provides a reasonable relaxation to improve prediction accuracy. I went back to the original counterfactual fairness paper [1] and found discussions regarding this problem. Since this a rather recent result, it would be better if the authors could explain it a bit further in the background section. 3. The contribution of this paper seems to be incremental unless I am missing something. The authors claim that the proposed technique “learns a fair predictor without knowing the true causal model”, but it still requires a finite list of known candidate causal models and then ranges over them. The natural question at this point is how to obtain a list of candidate models? In causal literature, “not knowing the true causal model” often means that only observational data is available, let alone a list of fully-parametrized possible models exists. The relaxation considered in this paper may be a good starting point, but it does not address the fundamental challenge of the unavailability of the underlying model. Minor comments: - All references to Figure2 in Sec2.2 should be Figure1. [1] Matt J Kusner, Joshua R Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. arXiv preprint arXiv:1703.06856, 2017. Post-rebuttal: Some of the main issues were addressed by the authors. One of the issues was around Definition 1, and I believe the authors can fix that in the camera-ready version. Connections with existing fairness measures can be added in the Supplement. Still, unless I am missing something, it seems the paper requires that each of the causal models is *fully* specified, which means that they know precisely the underlying structural functions and distributions over the exogenous variables. This is, generally, an overly strong requirement. I felt that the argument in the rebuttal saying that “These may come from expert knowledge or causal discovery algorithms like the popular PC or FCI algorithms [P. Spirtes, C. Glymour, and R. Scheines. “Causation, Prediction, and Search”, 2000]” is somewhat misleading. Even when the learning algorithms like FCI can pin down a unique causal structure (almost never the case), it’s still not the accurate to say that they provide the fully specified model with the structural functions and distributions over the exogenous. If one doesn’t have this type of knowledge and setting, one cannot run Pearl’s 3-step algorithm. I am, therefore, unable to find any reasonable justification or setting that would support the feasibility of the proposed approach.

Reviewer 3

Summary. This paper addresses the problem of learning predictors that trade-off prediction accuracy and fairness. A fair predictor with respect to attribute A is defined using the notion of counterfactual fairness, which basically means that predictions should be independent of which value a sensitive attribute A attains (for example, predictions are the same in distribution for both A=male and A=female). The contribution of the paper is to relax the problem of attaining exact fairness according to a known causal model of the world, to the problem of attaining approximate fairness without assuming the correct causal model specification is known. Instead, a family of M model specifications is allowed. Concretely, this is manifested by incorporating new terms that are added to the loss function penalising deviations from perfect fairness for each of the causal models in the family. By varying the importance of these terms, we can then trade-off prediction accuracy and fairness. The model is applied to two real-world datasets, and trade-off curves are presented to showcase the functionality of the approach. Comments. The paper is clear, well-written, technically sound and addresses an important problem domain. The idea of trading-off predictability with fairness by introducing penalty terms for deviations from fairness within each causal model is natural and intuitive. The paper sets up the optimisation problem and proposes an algorithm to solve it. It does not address the issue of how to weigh the different causal models in the family, and it does not provide a baseline trading-off strategy for comparison with the proposed approach. This is perhaps ok but I found it weakens the contribution of the paper. Could the authors address these concerns in their response, please? Certainly, the simplicity of the approach is appealing, but it is not easy to infer from the text how practical the approach may be in its current form.

Paper ID:	3209
Title:	When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness

Reviewer 1

Reviewer 2

Reviewer 3