Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Gunnar Rätsch, Bernhard Schölkopf, Alex Smola, Klaus-Robert Müller, Takashi Onoda, Sebastian Mika
AdaBoost and other ensemble methods have successfully been ap(cid:173) plied to a number of classification tasks, seemingly defying prob(cid:173) lems of overfitting. AdaBoost performs gradient descent in an error function with respect to the margin, asymptotically concentrating on the patterns which are hardest to learn. For very noisy prob(cid:173) lems, however, this can be disadvantageous. Indeed, theoretical analysis has shown that the margin distribution, as opposed to just the minimal margin, plays a crucial role in understanding this phe(cid:173) nomenon. Loosely speaking, some outliers should be tolerated if this has the benefit of substantially increasing the margin on the remaining points. We propose a new boosting algorithm which al(cid:173) lows for the possibility of a pre-specified fraction of points to lie in the margin area Or even on the wrong side of the decision boundary.