Euclidean Embedding of Co-Occurrence Data

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Bibtex Metadata Paper


Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby


Embedding algorithms search for low dimensional structure in complex data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for em- bedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to con- vex optimization over positive semidefinite matrices. The local struc- ture of our embedding corresponds to the statistical correlations via ran- dom walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and signifi- cantly outperforms standard methods of statistical correspondence mod- eling, such as multidimensional scaling and correspondence analysis.