Rajat Raina, Yirong Shen, Andrew McCallum, Andrew Ng
Although discriminatively trained classiﬁers are usually more accurate when labeled training data is abundant, previous work has shown that when training data is limited, generative classiﬁers can out-perform them. This paper describes a hybrid model in which a high-dimensional subset of the parameters are trained to maximize generative likelihood, and another, small, subset of parameters are discriminatively trained to maximize conditional likelihood. We give a sample complexity bound showing that in order to ﬁt the discriminative parameters well, the num- ber of training examples required depends only on the logarithm of the number of feature occurrences and feature set size. Experimental results show that hybrid models can provide lower test error and can produce better accuracy/coverage curves than either their purely generative or purely discriminative counterparts. We also discuss several advantages of hybrid models, and advocate further work in this area.