Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)

*Tong Zhang*

The purpose of this paper is to investigate inﬁnity-sample properties of risk minimization based multi-category classiﬁcation methods. These methods can be considered as natural extensions to binary large margin classiﬁcation. We establish conditions that guarantee the inﬁnity-sample consistency of classiﬁers obtained in the risk minimization framework. Examples are provided for two speciﬁc forms of the general formulation, which extend a number of known methods. Using these examples, we show that some risk minimization formulations can also be used to ob- tain conditional probability estimates for the underlying problem. Such conditional probability information will be useful for statistical inferenc- ing tasks beyond classiﬁcation.

1 Motivation Consider a binary classiﬁcation problem where we want to predict label y ∈ {±1} based on observation x. One of the most signiﬁcant achievements for binary classiﬁcation in machine learning is the invention of large margin methods, which include support vector machines and boosting algorithms. Based on a set of observations (X1, Y1), . . . , (Xn, Yn), a large margin classiﬁcation algorithm produces a decision function ˆfn by empirically min- imizing a loss function that is often a convex upper bound of the binary classiﬁcation error function. Given ˆfn, the binary decision rule is to predict y = 1 if ˆfn(x) ≥ 0, and to predict y = −1 otherwise (the decision rule at ˆfn(x) = 0 is not important). In the literature, the following form of large margin binary classiﬁcation is often encountered: we minimize the empirical risk associated with a convex function φ in a pre-chosen function class Cn:

ˆfn = arg min f∈Cn

Do not remove: This comment is monitored to verify that the site is working properly