Langevin Monte Carlo

Langevin Thompson sampling with logarithmic communication: bandits and reinforcement learning

(ICML 2023) We study approximate Thompson Sampling with Markov Chain Monte Carlo in bandit and reinforcement learning frameworks, providing algorithms that achieve optimal performance with low computation and communication cost.

Nikki Lijing Kuang*, Siddharth Mitra*, Amin Karbasi, Yi-An Ma