Part of Advances in Neural Information Processing Systems 5 (NIPS 1992)

*Genevieve Orr, Todd Leen*

In stochastic learning, weights are random variables whose time evolution is governed by a Markov process. At each time-step, n, the weights can be described by a probability density function pew, n). We summarize the theory of the time evolution of P, and give graphical examples of the time evolution that contrast the behavior of stochastic learning with true gradient descent (batch learning). Finally, we use the formalism to obtain predictions of the time required for noise-induced hopping between basins of different optima. We compare the theoretical predictions with simulations of large ensembles of networks for simple problems in supervised and unsupervised learning.

1 Weight-Space Probability Densities

Despite the recent application of convergence theorems from stochastic approxima(cid:173) tion theory to neural network learning (Oja 1982, White 1989) there remain out(cid:173) standing questions about the search dynamics in stochastic learning. For example, the convergence theorems do not tell us to which of several optima the algorithm

Do not remove: This comment is monitored to verify that the site is working properly