Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Brendan J. Frey, Nebojsa Jojic
In previous work on transformed mixtures of Gaussians'' and
transformed hidden Markov models'', we showed how the EM al- gorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabili- ties over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we de- scribe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. For NN images, learning C clusters under N rotations, N scales,
N x-translations and N y-translations takes only (C + 2 log N)N
2
scalar operations per iteration. In contrast, the original algorithm takes CN
6
operations to account for these transformations. We give results on learning a 4-component mixture model from a video sequence with frames of size 320240. The model accounts for 360 rotations and 76,800 translations. Each iteration of EM takes only 10 seconds per frame in MATLAB, which is over 5 million times faster than the original algorithm.