Review for NeurIPS paper: Fair Performance Metric Elicitation

NeurIPS 2020

Fair Performance Metric Elicitation

Meta Review

The following meta-review is based on reading all reviews, the author response, the discussion, and the submission (though not the supplementary material). All reviewers agree that this paper studies an important problem: how to elicit fair performance metrics from user (or oracle) feedback. The paper proposes a non-trivial extension of the recently introduced Metric Elicitation framework as an approach to this task. Overall this was assessed to be a borderline submission. While the paper is strong in terms of its technical formalism, and the presentation is very precise, it is also very challenging for all but the most dedicated reader to follow along. This is due in large part to two factors: (1) the abundance of notation introduced and relied upon in lieu of more descriptive terms throughout; and (2) the emphasis on technical precision over more reader-friendly narrative style prose. Furthermore, while the main contribution of the paper is to the fairness literature, the work lacks a meaningful discussion of how the work is situated within the broader work on algorithmic fairness. I second here all of the points raised by R6. I encourage the authors to focus their revision on (1) making the paper more accessible; and (2) providing a more in-depth, "practitioner oriented" discussion of the contributions and limitations of the work within the algorithmic fairness literature. The paper would be made much stronger, not weaker, if the authors clarified, for instance, how reducing classifiers to rates may sweep under the rug distinctions that users feel are important (again, see R6), and how the assumptions/inputs to the method constrain what dimensions of fairness can and cannot be learned about. To be clear: I see the current paper as one that other researchers may be interested in building upon. However, the current presentation of the paper is likely to severely limit its audience and ultimate impact.