Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Mcdonald, Ryan; Mohri, Mehryar; Silberman, Nathan; Walker, Dan; Mann, Gideon

Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Ryan Mcdonald, Mehryar Mohri, Nathan Silberman, Dan Walker, Gideon S. Mann

Advances in Neural Information Processing Systems 22 (NIPS 2009)

Abstract

Training conditional maximum entropy models on massive data requires significant time and computational resources. In this paper, we investigate three common distributed training strategies: distributed gradient, majority voting ensembles, and parameter mixtures. We analyze the worst-case runtime and resource costs of each and present a theoretical foundation for the convergence of parameters under parameter mixtures, the most efficient strategy. We present large-scale experiments comparing the different strategies and demonstrate that parameter mixtures over independent models use fewer resources and achieve comparable loss as compared to standard approaches.

Abstract

Name Change Policy