Part of Advances in Neural Information Processing Systems 5 (NIPS 1992)
Todd Leen, John Moody
The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and provide expressions for special cases of the LMS algorithm. The equilibrium densities are not in general thermal (Gibbs) distributions in the objective function be(cid:173) ing minimized, but rather depend upon an effective potential that includes diffusion effects. Finally we present an exact analytical expression for the time evolution of the density for a learning algo(cid:173) rithm with weight updates proportional to the sign of the gradient.
1
Introduction: Theoretical Framework
Stochastic learning algorithms involve weight updates of the form
w(n+1) = w(n) + /-l(n)H[w(n),x(n)]
(1)
where w E 7£m is the vector of m weights, /-l is the learning rate, H[.] E 7£m is the update function, and x(n) is the exemplar (input or input/target pair) presented