Efficient Contextual Bandits with Continuous Actions

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback Bibtex MetaReview Paper Review Supplemental

Authors

Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins

Abstract

We create a computationally tractable learning algorithm for contextual bandits with continuous actions having unknown structure. The new reduction-style algorithm composes with most supervised learning representations. We prove that this algorithm works in a general sense and verify the new functionality with large-scale experiments.