Optimizing Classifers for Imbalanced Training Sets

Part of Advances in Neural Information Processing Systems 11 (NIPS 1998)

Bibtex Metadata Paper


Grigoris Karakoulas, John Shawe-Taylor


Following recent results [9, 8] showing the importance of the fat(cid:173) shattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investi(cid:173) gates the implications of these results for the case of imbalanced datasets and develops two approaches to setting the threshold. The approaches are incorporated into ThetaBoost, a boosting al(cid:173) gorithm for dealing with unequal loss functions. The performance of ThetaBoost and the two approaches are tested experimentally.

Keywords: Computational Learning Theory, Generalization, fat-shattering, large margin, pac estimates, unequal loss, imbalanced datasets