Posterior vs Parameter Sparsity in Latent Variable Models[PDF] [BibTeX] [Supplemental]
In this paper we explore the problem of biasing unsupervised models to favor sparsity. We extend the posterior regularization framework  to encourage the model to achieve posterior sparsity on the unlabeled training data. We apply this new method to learn ﬁrst-order HMMs for unsupervised part-of-speech (POS) tagging, and show that HMMs learned this way consistently and signiﬁcantly out-performs both EM-trained HMMs, and HMMs with a sparsity-inducing Dirichlet prior trained by variational EM. We evaluate these HMMs on three languages — English, Bulgarian and Portuguese — under four conditions. We ﬁnd that our method always improves performance with respect to both baselines, while variational Bayes actually degrades performance in most cases. We increase accuracy with respect to EM by 2.5%-8.7% absolute and we see improvements even in a semisupervised condition where a limited dictionary is provided.