Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Victor Lavrenko, R. Manmatha, Jiwoon Jeon
We propose an approach to learning the semantics of images which al- lows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is di- vided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint prob- abilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval.