Nicolas Loeff, Himanshu Arora, Alexander Sorokin, David Forsyth
We describe a novel method for learning templates for recognition and localization of objects drawn from categories. A generative model repre- sents the conﬁguration of multiple object parts with respect to an object coordinate system; these parts in turn generate image features. The com- plexity of the model in the number of features is low, meaning our model is much more efﬁcient to train than comparative methods. Moreover, a variational approximation is introduced that allows learning to be or- ders of magnitude faster than previous approaches while incorporating many more features. This results in both accuracy and localization im- provements. Our model has been carefully tested on standard datasets; we compare with a number of recent template models. In particular, we demonstrate state-of-the-art results for detection and localization.