Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Kaichao You, Zhi Kou, Mingsheng Long, Jianmin Wang
Fine-tuning pre-trained deep neural networks (DNNs) to a target dataset, also known as transfer learning, is widely used in computer vision and NLP. Because task-specific layers mainly contain categorical information and categories vary with datasets, practitioners only \textit{partially} transfer pre-trained models by discarding task-specific layers and fine-tuning bottom layers. However, it is a reckless loss to simply discard task-specific parameters who take up as many as $20\%$ of the total parameters in pre-trained models. To \textit{fully} transfer pre-trained models, we propose a two-step framework named \textbf{Co-Tuning}: (i) learn the relationship between source categories and target categories from the pre-trained model and calibrated predictions; (ii) target labels (one-hot labels), as well as source labels (probabilistic labels) translated by the category relationship, collaboratively supervise the fine-tuning process. A simple instantiation of the framework shows strong empirical results in four visual classification tasks and one NLP classification task, bringing up to $20\%$ relative improvement. While state-of-the-art fine-tuning techniques mainly focus on how to impose regularization when data are not abundant, Co-Tuning works not only in medium-scale datasets (100 samples per class) but also in large-scale datasets (1000 samples per class) where regularization-based methods bring no gains over the vanilla fine-tuning. Co-Tuning relies on a typically valid assumption that the pre-trained dataset is diverse enough, implying its broad application area.