Omid Madani, David Pennock, Gary Flake
In the context of binary classification, we define disagreement as a mea- sure of how often two independently-trained models differ in their clas- sification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plen- tiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (gen- eralization) error, and a tight upper bound on the "variance of prediction error", or the variance of the average error across instances, where vari- ance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in active learning set- tings, where training sets are not drawn at random and cross validation overestimates error.