Hermann Hild, Alex Waibel
The Multi-State Time Delay Neural Network (MS-TDNN) inte(cid:173) grates a nonlinear time alignment procedure (DTW) and the high(cid:173) accuracy phoneme spotting capabilities of a TDNN into a connec(cid:173) tionist speech recognition system with word-level classification and error backpropagation. We present an MS-TDNN for recognizing continuously spelled letters, a task characterized by a small but highly confusable vocabulary. Our MS-TDNN achieves 98.5/92.0% word accuracy on speaker dependent/independent tasks, outper(cid:173) forming previously reported results on the same databases. We pro(cid:173) pose training techniques aimed at improving sentence level perfor(cid:173) mance, including free alignment across word boundaries, word du(cid:173) ration modeling and error backpropagation on the sentence rather than the word level. Architectures integrating submodules special(cid:173) ized on a subset of speakers achieved further improvements.