Abstract: Multi-task bandit and reinforcement learning through heterogeneous feedback aggregation

Abstract: Multi-task bandit and reinforcement learning through heterogeneous feedback aggregation

In many real-world applications, multiple learning agents seek to learn how to perform highly related yet slightly different tasks, in an online learning protocol. For example, in healthcare robotics, robots are paired with people with dementia to perform personalized cognitive training activities by learning their preferences (e.g. Kubota et al, 2020). We formulate this problem as the \epsilon-multi-task multi-armed bandits / reinforcement learning problem, in which a set of players interact with a set of environments, and for each state and action, the reward distributions and transition probabilities for all tasks have dissimilarity measure at most \epsilon. We design both Upper Confidence Bound (UCB)-based and Thompson Sampling (TS)-based algorithms in this setting that provide robust knowledge transfer across tasks, and prove that they have near-optimal regret guarantees. Empirically, we evaluate our algorithms in the multi-task bandit learning setting, showing the benefit of our algorithms that utilize cross-task knowledge transfer.

The work is based on the following papers: https://arxiv.org/abs/2010.15390, https://arxiv.org/abs/2107.08622, https://arxiv.org/abs/2206.08556

Brief bio: Chicheng Zhang is an assistant professor in the Department of Computer Science at the University of Arizona since 2019. He obtained his PhD in 2017 from University of California San Diego, and was a postdoctoral researcher at Microsoft Research from 2017 to 2019. His research interests lie in the theory and applications of interactive machine learning, including active learning, contextual bandits, reinforcement learning, imitation learning and beyond. His works have appeared in many top venues in machine learning, such as ICML, COLT, NeurIPS. His paper has won an outstanding paper runner-up award at ICML 2022.