John Platt, Nello Cristianini, John Shawe-Taylor
We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an N -class problem, the DDAG con(cid:173) tains N(N - 1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the re(cid:173) sulting bound on the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG. The DAGSVM is substantially faster to train and evalu(cid:173) ate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms.