Reviews: Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

The paper proposes and theoretically analyzes a distributed SGD algorithm where the workers are pulled towards the best performing worker rather than the average worker. All three reviewers consider the theoretical contribution (analysis of convergence and cost of communication) to be interesting and rigorous. At the same time, one reviewer feels the theoretical analysis applies to a simplified case and may not shed light on the experiments that are done in more complex settings. The reviewer's were not satisfied by the rebuttal, but maintained that the paper is publishable. Overall, there is a consensus that is is a fine paper and I recommend acceptance.

Paper ID:	5205
Title:	Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models