Learning Task-Agnostic Representations through Multi-Teacher Distillation

Philippe Formont, Maxime Darrin, Banafsheh Karimian, Eric Granger, Jackie CK Cheung, Ismail Ayed, Mohammadhadi Shateri, Pablo Piantanida

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. We introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between the student and the teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Comprehensive evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.