Part of Advances in Neural Information Processing Systems 1 (NIPS 1988)
This paper presents a variation of the back-propagation algo(cid:173) rithm that makes optimal use of a network hidden units by de(cid:173) cr~asing an "energy" term written as a function of the squared activations of these hidden units. The algorithm can automati(cid:173) cally find optimal or nearly optimal architectures necessary to solve known Boolean functions, facilitate the interpretation of the activation of the remaining hidden units and automatically estimate the complexity of architectures appropriate for phonetic labeling problems. The general principle of the algorithm can also be adapted to different tasks: for example, it can be used to eliminate the [0, 0] local minimum of the [-1. +1] logistic acti(cid:173) vation function while preserving a much faster convergence and forcing binary activations over the set of hidden units.