John Wright, Yangyu Tao, Zhouchen Lin, Yi Ma, Heung-yeung Shum
We present a simple new criterion for classiﬁcation, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the min- imum number of additional bits to code the test sample, subject to an allowable distortion. We prove asymptotic optimality of this criterion for Gaussian data and analyze its relationships to classical classiﬁers. Theoretical results provide new insights into relationships among popular classiﬁers such as MAP and RDA, as well as unsupervised clustering methods based on lossy compression . Mini- mizing the lossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small-sample setting. Compression also provides a uniform means of handling classes of varying dimension. This simple classi- ﬁcation criterion and its kernel and local versions perform competitively against existing classiﬁers on both synthetic examples and real imagery data such as hand- written digits and human faces, without requiring domain-speciﬁc information.