Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters

Part of Advances in Neural Information Processing Systems 2 (NIPS 1989)

Bibtex Metadata Paper


John Bridle


One of the attractions of neural network approaches to pattern recognition is the use of a discrimination-based training method. We show that once we have modified the output layer of a multi(cid:173) layer perceptron to provide mathematically correct probability dis(cid:173) tributions, and replaced the usual squared error criterion with a probability-based score, the result is equivalent to Maximum Mu(cid:173) tual Information training, which has been used successfully to im(cid:173) prove the performance of hidden Markov models for speech recog(cid:173) nition. If the network is specially constructed to perform the recog(cid:173) nition computations of a given kind of stochastic model based clas(cid:173) sifier then we obtain a method for discrimination-based training of the parameters of the models. Examples include an HMM-based word discriminator, which we call an 'Alphanet'.