Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Achan, Kannan; Roweis, Sam; Frey, Brendan J.

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Kannan Achan, Sam T. Roweis, Brendan J. Frey

Advances in Neural Information Processing Systems 16 (NIPS 2003)

Abstract

Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograms), a representation which is often well-suited to these tasks. However, a signiﬁcant problem with algorithms that manipu- late spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efﬁcient opti- mizer can be used to ﬁnd the maximum a posteriori speech signal, given the spectrogram. In contrast to techniques that alternate between esti- mating the phase and a spectrally-consistent signal, our technique di- rectly infers the speech signal, thus jointly optimizing the phase and a spectrally-consistent signal. We compare our technique with a standard method using signal-to-noise ratios, but we also provide audio ﬁles on the web for the purpose of demonstrating the improvement in perceptual quality that our technique offers.

Abstract

Name Change Policy