With reviewer scores of (9, 7, 6) it seems extremely likely that this paper will be accepted. I generally agree with the reviewers that the approach is novel and clever, has a nice property of behavior agnostic (in terms of off-policy), and uses a duality approach to confidence bounds.