Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

*Corinna Cortes, Mehryar Mohri, Javier Gonzalvo, Dmitry Storcheus*

Most machine learning tasks are inherently multi-objective. This means that the learner has to come up with a model that performs well across a number of base objectives $\cL_{1}, \ldots, \cL_{p}$, as opposed to a single one. Since optimizing with respect to multiple objectives at the same time is often computationally expensive, the base objectives are often combined in an ensemble $\sum_{k=1}^{p}\lambda_{k}\cL_{k}$, thereby reducing the problem to scalar optimization. The mixture weights $\lambda_{k}$ are set to uniform or some other fixed distribution, based on the learner's preferences. We argue that learning with a fixed distribution on the mixture weights runs the risk of overfitting to some individual objectives and significantly harming others, despite performing well on an entire ensemble. Moreover, in reality, the true preferences of a learner across multiple objectives are often unknown or hard to express as a specific distribution. Instead, we propose a new framework of \emph{Agnostic Learning with Multiple Objectives} ($\almo$), where a model is optimized for \emph{any} weights in the mixture of base objectives. We present data-dependent Rademacher complexity guarantees for learning in the $\almo$ framework, which are used to guide a scalable optimization algorithm and the corresponding regularization. We present convergence guarantees for this algorithm, assuming convexity of the loss functions and the underlying hypothesis space. We further implement the algorithm in a popular symbolic gradient computation framework and empirically demonstrate on a number of datasets the benefits of $\almo$ framework versus learning with a fixed mixture weights distribution.

Do not remove: This comment is monitored to verify that the site is working properly