Classification via Minimum Incremental Coding Length (MICL)

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper


John Wright, Yangyu Tao, Zhouchen Lin, Yi Ma, Heung-yeung Shum


We present a simple new criterion for classification, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the min- imum number of additional bits to code the test sample, subject to an allowable distortion. We prove asymptotic optimality of this criterion for Gaussian data and analyze its relationships to classical classifiers. Theoretical results provide new insights into relationships among popular classifiers such as MAP and RDA, as well as unsupervised clustering methods based on lossy compression [13]. Mini- mizing the lossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small-sample setting. Compression also provides a uniform means of handling classes of varying dimension. This simple classi- fication criterion and its kernel and local versions perform competitively against existing classifiers on both synthetic examples and real imagery data such as hand- written digits and human faces, without requiring domain-specific information.