Slav Petrov, Dan Klein
We demonstrate that log-linear grammars with latent variables can be practically trained using discriminative methods. Central to efﬁcient discriminative training is a hierarchical pruning procedure which allows feature expectations to be efﬁ- ciently approximated in a gradient-based procedure. We compare L1 and L2 reg- ularization and show that L1 regularization is superior, requiring fewer iterations to converge, and yielding sparser solutions. On full-scale treebank parsing exper- iments, the discriminative latent models outperform both the comparable genera- tive latent models as well as the discriminative non-latent baselines.