Shifting, One-Inclusion Mistake Bounds and Tight Multiclass Expected Risk Bounds

Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)

Bibtex Metadata Paper


Benjamin Rubinstein, Peter Bartlett, J. Rubinstein


Under the prediction model of learning, a prediction strategy is presented with an i.i.d. sample of n - 1 points in X and corresponding labels from a concept f F , and aims to minimize the worst-case probability of erring on an nth point. By exploiting the structure of F , Haussler et al. achieved a VC(F )/n bound for the natural one-inclusion prediction strategy, improving on bounds implied by PAC-type results by a O(log n) factor. The key data structure in their result is the natural subgraph of the hypercube--the one-inclusion graph; the key step is a d = VC(F ) bound on one-inclunion graph density. The first main result of this s /n -1 paper is a density bound of n d-1 ( d ) < d, which positively resolves a conjecture of Kuzmin & Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved mistake bound for the randomized (deterministic) one-inclusion strategy for all d (for d (n)). The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is a k -class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This bound on expected risk improves on known PAC-based results by a factor of O(log n) and is shown to be optimal up to a O(log k ) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.