NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1927
Title:Thompson Sampling with Information Relaxation Penalties

The paper makes a conceptual algorithmic contribution to sequential decision making under uncertainty. It proposes what appears to be a new way of looking at sampling based multi armed bandit algorithms like Thompson sampling, and unifies it with a host of other methods going all the way to the (intractable) Bayes-optimal sequential bandit algorithm. All reviewers seemed to agree on the conceptual value that the new viewpoint brings to algorithm design and analysis, although a crisp consensus could not be reached amongst them. Nevertheless, the discussion post author feedback saw a majority of the reviewers championing the paper's contribution, which justifies accepting the paper for presentation.