Cesare Furlanello, Diego Giuliani, Edmondo Trentin
The paper presents a rapid speaker-normalization technique based on neural network spectral mapping. The neural network is used as a front-end of a continuous speech recognition system (speaker(cid:173) dependent, HMM-based) to normalize the input acoustic data from a new speaker. The spectral difference between speakers can be reduced using a limited amount of new acoustic data (40 phonet(cid:173) ically rich sentences). Recognition error of phone units from the acoustic-phonetic continuous speech corpus APASCI is decreased with an adaptability ratio of 25%. We used local basis networks of elliptical Gaussian kernels, with recursive allocation of units and on-line optimization of parameters (GRAN model). For this ap(cid:173) plication, the model included a linear term. The results compare favorably with multivariate linear mapping based on constrained orthonormal transformations.