Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Enrique Nueve, Bo Waggoner, Dhamma Kimpara, Jessie Finocchiaro
In multiclass classification over n outcomes, we typically optimize some surrogate loss L:Rd×Y→R assigning real-valued error to predictions in Rd. In this paradigm, outcomes must be embedded into the reals with dimension d≈n in order to design a consistent surrogate loss. Consistent losses are well-motivated theoretically, yet for large n, such as in information retrieval and structured prediction tasks, their optimization may be computationally infeasible. In practice, outcomes are typically embedded into some Rd for d≪n, with little known about their suitability for multiclass classification. We investigate two approaches for trading off consistency and dimensionality in multiclass classification while using a convex surrogate loss. We first formalize partial consistency when the optimized surrogate has dimension d≪n. We then check if partial consistency holds under a given embedding and low-noise assumption, providing insight into when to use a particular embedding into Rd. Finally, we present a new method to construct (fully) consistent losses with d≪n out of multiple problem instances. Our practical approach leverages parallelism to sidestep lower bounds on d.