Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
Michal Derezinski, Michael W. Mahoney
In distributed optimization and distributed numerical linear algebra,
we often encounter an inversion bias: if we want to compute a
quantity that depends on the inverse of a sum of distributed matrices,
then the sum of the inverses does not equal the inverse of the sum.
An example of this occurs in distributed Newton's method, where we
wish to compute (or implicitly work with) the inverse Hessian
multiplied by the gradient.
In this case, locally computed estimates are biased, and so taking a
uniform average will not recover the correct solution.
To address this, we propose determinantal averaging, a new
approach for correcting the inversion bias.
This approach involves reweighting the local estimates of the Newton's
step proportionally to the determinant of the local Hessian estimate,
and then averaging them together to obtain an improved global
estimate. This method provides the first known distributed Newton step that is
asymptotically consistent, i.e., it recovers the exact step in
the limit as the number of distributed partitions grows to infinity.
To show this, we develop new expectation identities and moment bounds
for the determinant and adjugate of a random matrix.
Determinantal averaging can be applied not only to Newton's method,
but to computing any quantity that is a linear tranformation of a
matrix inverse, e.g., taking a trace of the inverse covariance matrix,
which is used in data uncertainty quantification.