Fairness in Learning: Classic and Contextual Bandits

Part of Advances in Neural Information Processing Systems 29 (NIPS 2016)

Bibtex Metadata Paper Reviews Supplemental

Authors

Matthew Joseph, Michael Kearns, Jamie H. Morgenstern, Aaron Roth

Abstract

We introduce the study of fairness in multi-armed bandit problems. Our fairness definition demands that, given a pool of applicants, a worse applicant is never favored over a better one, despite a learning algorithm’s uncertainty over the true payoffs. In the classic stochastic bandits problem we provide a provably fair algorithm based on “chained” confidence intervals, and prove a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence, providing a strong separation between fair and unfair learning that extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm and vice versa. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms.