PAC Generalization Bounds for Co-training

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper

Authors

Sanjoy Dasgupta, Michael Littman, David McAllester

Abstract

The rule-based bootstrapping introduced by Yarowsky, and its co- training variant by Blum and Mitchell, have met with considerable em- pirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as sug- gested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of