Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation

Jiang, Chenyang; Zhao, Hang; Zhang, Xinyu; Li, Zhengcen; Shan, Qiben; Wu, Shaocong; Su, Jingyong

Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation

Chenyang Jiang, Hang Zhao, Xinyu Zhang, Zhengcen Li, Qiben Shan, Shaocong Wu, Jingyong Su

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Abstract

Dataset distillation compresses large-scale datasets into compact, highly informative synthetic data, significantly reducing storage and training costs. However, existing research primarily focuses on balanced datasets and struggles to perform under real-world long-tailed distributions. In this work, we emphasize the critical role of soft labels in long-tailed dataset distillation and uncover the underlying mechanisms contributing to performance degradation. Specifically, we derive an imbalance-aware generalization bound for model trained on distilled dataset. We then identify two primary sources of soft-label bias, which originate from the distillation model and the distilled images, through systematic perturbation of the data imbalance levels. To address this, we propose ADSA, an Adaptive Soft-label Alignment module that calibrates the entangled biases. This lightweight module integrates seamlessly into existing distillation pipelines and consistently improves performance. On ImageNet-1k-LT with EDC and IPC=50, ADSA improves tail-class accuracy by up to 11.8\% and raises overall accuracy to 41.4\%. Extensive experiments demonstrate that ADSA provides a robust and generalizable solution under limited label budgets and across a range of distillation techniques.

Abstract

Name Change Policy