Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)
Yanbo Fan, Siwei Lyu, Yiming Ying, Baogang Hu
In this work, we introduce the average top-$k$ (\atk) loss as a new ensemble loss for supervised learning. The \atk loss provides a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss. Furthermore, the \atk loss combines the advantages of them and can alleviate their corresponding drawbacks to better adapt to different data distributions. We show that the \atk loss affords an intuitive interpretation that reduces the penalty of continuous and convex individual losses on correctly classified data. The \atk loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method. We further study the Statistical Learning Theory of \matk by establishing its classification calibration and statistical consistency of \matk which provide useful insights on the practical choice of the parameter $k$. We demonstrate the applicability of \matk learning combined with different individual loss functions for binary and multi-class classification and regression using synthetic and real datasets.