Andrew Brown, Geoffrey E. Hinton
Logistic units in the first hidden layer of a feedforward neural net(cid:173) work compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's.