Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper Supplemental

Authors

Yutong He, Xinmeng Huang, Kun Yuan

Abstract

Communication compression is a common technique in distributed optimizationthat can alleviate communication overhead by transmitting compressed gradientsand model parameters. However, compression can introduce information distortion,which slows down convergence and incurs more communication rounds to achievedesired solutions. Given the trade-off between lower per-round communicationcosts and additional rounds of communication, it is unclear whether communicationcompression reduces the total communication cost.This paper explores the conditions under which unbiased compression, a widelyused form of compression, can reduce the total communication cost, as well as theextent to which it can do so. To this end, we present the first theoretical formulationfor characterizing the total communication cost in distributed optimization withunbiased compressors. We demonstrate that unbiased compression alone does notnecessarily save the total communication cost, but this outcome can be achievedif the compressors used by all workers are further assumed independent. Weestablish lower bounds on the communication rounds required by algorithms usingindependent unbiased compressors to minimize smooth convex functions andshow that these lower bounds are tight by refining the analysis for ADIANA.Our results reveal that using independent unbiased compression can reduce thetotal communication cost by a factor of up to $\Theta(\sqrt{\min\\{n,\kappa\\}})$ when all localsmoothness constants are constrained by a common upper bound, where $n$ is thenumber of workers and $\kappa$ is the condition number of the functions being minimized.These theoretical findings are supported by experimental results.