Reviews: Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

In this paper, the authors provide a method for incorporating observational data (possibly subject to unobserved confounding) to improve the performance of policy learning in online settings (crucial theorems are 5,7 and 8). After a period of discussion, the reviewers came to a consensus that this paper merits publication in NeurIPS, and will contribute to the RL literature by giving a principled method of incorporating observational data, even if confounded.

Paper ID:	7406
Title:	Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes