Bandit algorithms with graphical feedback models and privacy awareness




Hu, Bingshan

Journal Title

Journal ISSN

Volume Title



This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model, the learning algorithm may be able to have more than one observation every time it interacts with the environment. Meanwhile, the learning algorithm only needs to suffer a regret resulting from the pulled arm if it is not the optimal one, which is the same as the basic MAB setting. The first theme of this thesis is to derive instance-dependent regret bounds for stochastic bandits under graphical feedback models.In a basic MAB problem, the learning algorithm can always use the learnt in-formation to make future decisions. If each reward vector encodes information of an individual, this kind of non-private learning algorithm may “leak” sensitive information associated with individuals. In an MAB problem with privacy awareness, the learning algorithm cannot rely on the true information learnt to make future decisions in order to comply with privacy. What a private learning algorithm promises is even if an adversary sees the output of the learning algorithm, this adversary almost cannot infer any information associated with a single individual. The second theme of this thesis covers three variants of private online learning: the private bandit setting, the private full information setting, and the private graphical bandit setting.