Title:Surrogate Objectives for Batch Policy Optimization in One-step Decision Making

This paper was on the borderline, and generated significant discussion. The meta-reviewer ended up reading the paper in detail, and decided to recommend accept. Please carefully read the reviewers' comments in revising your paper. P.S. The Meta-Reviewer consulted with the authors of POEM regarding previously observed discrepancies in empirical performance (e.g., from Ma et al.), and confirmed that POEM can suffer from instability when learning on datasets with very wide propensity scores.