NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Originality: I think the task at hand is challenging and has important applications. The proposed method is a combination of two well known techniques with a necessary modification of support size to facilitate the theoretical guarantees of regret bounds. The proposed method is well motivated and has solid theoretical analysis. Quality: I think the paper is a work in progress. There is no experimental results provided! The paper ends abruptly with no conclusion. There is clearly missing comparisons to other methods or a proposed baseline, and also discussing any weaknesses. Clarity: The paper is well-written however it ends unfinished, with no experimental section and/or conclusion. Significance: Since the empirical results are missing it's hard to judge the significance of the method. However the idea is an interesting follow up to the previous works [1,2], with good theoretical support.
Reviewer 2
Clarity: The paper was very well written, and the contribution was clear. However, I think it would help if they make it more clear that which parts are standard bandit techniques and which parts are new. For example, they could explain more why adding fairness constraint makes the problem challenging and what is the new technique they are using and how much this technique is applicable to other constraints. Originality/significance: I think this paper is the first paper to come up with an algorithm that satisfies approximate EO at each round. However, I think the comparison to related work is not very clear. As they mentioned, one can also consider either of the following two notions to enforce in contextual bandit setting: For a fixed time horizon T, in the end, the algorithm satisfies EO (average False-positive over all rounds). At each round, each individual has the same probability of FP independent of the sensitive attribute. What is special about being fair at each round? Do you have a similar regret rate with (2)? They mentioned in the related work that (2) need strong assumptions, does your algorithm provide any guarantee at the individual level? Quality: All the theorems in the paper are sound (I only check some of them in the appendix).
Reviewer 3
This paper studies the problem of online classification with partial feedback under the new constraint that the policy satisfies a fairness (equality of false positives) constraint at each round. The paper leverages careful modification of a number of technical tools to prove the O(sqrt(T)) regret with gamma + O(T^(-1/4)) fairness rate. In particular, they reduce the partial feedback setting to a contextual bandits problem, construct an approximate "fair oracle" using a modification of the reductions approach to fair classification, and then modify ILOVETOCONBANDITS to use this approximate oracle. The relevant inspiration is clearly cited, and the main contribution is combining these tools to effectively handle the fairness constraint in the online learning problem. The proposed algorithm is intuitive: accept everyone in the early rounds to gather data and use this data to determine which classifiers satisfy the constrain. Then run a "fair" contextual bandits algorithm in the later rounds on a subset of the classifiers. The analysis appears correct, though I did not do a detailed pass over some of the lemmas in appendix B.4. The matching lower bound makes the story more complete-- the analysis isn't too loose. The paper does suffer from some clarity issues. The problem and results statements are sufficiently clear, but the analysis in section 3 is not very modular and could be re-organized to focus on the key lemmas and structure of the analysis rather than exposing all of the vagaries of the modifications to the tools used in the analysis. Overall, issues of partial feedback abound in fairness applications, and this paper presents a principled procedure to achieve good regret while ensuring fairness on every round. I wouldn't be suprised to see more work in the partial feedback setting, and this paper offers a reasonable baseline and intuition for how to proceed. After rebuttal: Thanks to the authors for their response. Aside from the writing concerns in section 3, I'm still a fan of this paper. I don't think the lack of experiment section is a show-stopping flaw, and I'll argue for acceptance.