Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Tong Zhang
The purpose of this paper is to investigate infinity-sample properties of risk minimization based multi-category classification methods. These methods can be considered as natural extensions to binary large margin classification. We establish conditions that guarantee the infinity-sample consistency of classifiers obtained in the risk minimization framework. Examples are provided for two specific forms of the general formulation, which extend a number of known methods. Using these examples, we show that some risk minimization formulations can also be used to ob- tain conditional probability estimates for the underlying problem. Such conditional probability information will be useful for statistical inferenc- ing tasks beyond classification.
1 Motivation Consider a binary classification problem where we want to predict label y ∈ {±1} based on observation x. One of the most significant achievements for binary classification in machine learning is the invention of large margin methods, which include support vector machines and boosting algorithms. Based on a set of observations (X1, Y1), . . . , (Xn, Yn), a large margin classification algorithm produces a decision function ˆfn by empirically min- imizing a loss function that is often a convex upper bound of the binary classification error function. Given ˆfn, the binary decision rule is to predict y = 1 if ˆfn(x) ≥ 0, and to predict y = −1 otherwise (the decision rule at ˆfn(x) = 0 is not important). In the literature, the following form of large margin binary classification is often encountered: we minimize the empirical risk associated with a convex function φ in a pre-chosen function class Cn:
ˆfn = arg min f∈Cn