The Efficiency and the Robustness of Natural Gradient Descent Learning Rule

Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)

Bibtex Metadata Paper


Howard Yang, Shun-ichi Amari


The inverse of the Fisher information matrix is used in the natu(cid:173) ral gradient descent algorithm to train single-layer and multi-layer perceptrons. We have discovered a new scheme to represent the Fisher information matrix of a stochastic multi-layer perceptron. Based on this scheme, we have designed an algorithm to compute the natural gradient. When the input dimension n is much larger than the number of hidden neurons, the complexity of this algo(cid:173) rithm is of order O(n). It is confirmed by simulations that the natural gradient descent learning rule is not only efficient but also robust.