Part of Advances in Neural Information Processing Systems 11 (NIPS 1998)
Marcus Held, Jan Puzicha, Joachim Buhmann
Cluster analysis is a fundamental principle in exploratory data analysis, providing the user with a description of the group struc(cid:173) ture of given data. A key problem in this context is the interpreta(cid:173) tion and visualization of clustering solutions in high- dimensional or abstract data spaces. In particular, probabilistic descriptions of the group structure, essential to capture inter-cluster relation(cid:173) ships, are hardly assessable by simple inspection ofthe probabilistic assignment variables. VVe present a novel approach to the visual(cid:173) ization of group structure. It is based on a statistical model of the object assignments which have been observed or estimated by a probabilistic clustering procedure. The objects or data points are embedded in a low dimensional Euclidean space by approximating the observed data statistics with a Gaussian mixture model. The algorithm provides a new approach to the visualization of the inher(cid:173) ent structure for a broad variety of data types, e.g. histogram data, proximity data and co-occurrence data. To demonstrate the power of the approach, histograms of textured images are visualized as an example of a large-scale data mining application.