Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

*Omid Sadeghi, Prasanna Raut, Maryam Fazel*

In this paper, we consider an online optimization problem in which the reward functions are DR-submodular, and in addition to maximizing the total reward, the sequence of decisions must satisfy some convex constraints on average. Specifically, at each round $t\in\{1,\dots,T\}$, upon committing to an action $x_t$, a DR-submodular utility function $f_t(\cdot)$ and a convex constraint function $g_t(\cdot)$ are revealed, and the goal is to maximize the overall utility while ensuring the average of the constraint functions $\frac{1}{T}\sum_{t=1}^T g_t(x_t)$ is non-positive. Such cumulative constraints arise naturally in applications where the average resource consumption is required to remain below a prespecified threshold. We study this problem under an adversarial model and a stochastic model for the convex constraints, where the functions $g_t$ can vary arbitrarily or according to an i.i.d. process over time slots $t\in\{1,\dots,T\}$, respectively. We propose a single algorithm which achieves sub-linear (with respect to $T$) regret as well as sub-linear constraint violation bounds in both settings, without prior knowledge of the regime. Prior works have studied this problem in the special case of linear constraint functions. Our results not only improve upon the existing bounds under linear cumulative constraints, but also give the first sub-linear bounds for general convex long-term constraints.

Do not remove: This comment is monitored to verify that the site is working properly