Group-Envy Fairness in the Stochastic Bandit Setting




Scinocca, Stephen

Journal Title

Journal ISSN

Volume Title



We introduce a new, group fairness-inspired stochastic multi-armed bandit problem in the pure exploration setting. We look at the discrepancy between an arm’s mean reward from a group and the highest mean reward for any arm from that group, and call this the disappointment that group suffers from that arm. We define the optimal arm to be the one that minimizes the maximum disappointment over all groups. This optimal arm addresses one problem with maximin fairness, where the group used to choose the maximin best arm suffers little disappointment regardless of what arm is picked, but another group suffers significantly more disappointment by picking that arm as the best one. The challenge of this problem is that the highest mean reward for a group and the arm that gives that reward are unknown. This means we need to pull arms for multiple goals: to find the optimal arm, and to estimate the highest mean reward of certain groups. This leads to the new adaptive sampling algorithm for best arm identification in the fixed confidence setting called MD-LUCB, or Minimax Disappointment LUCB. We prove bounds on MD-LUCB’s sample complexity and then study its performance with empirical simulations.



Multi-armed bandits, Machine learning theory, Algorithmic Fairness