Proximity Graphs for Clustering and Manifold Learning

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Bibtex Metadata Paper


Richard Zemel, Miguel Carreira-Perpiñán


Many machine learning algorithms for clustering or dimensionality re- duction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then parti- tioned (clustering) or used to redefine metric information (dimensional- ity reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fully- connected graph, a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually pro- duces a better representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimension- ality reduction algorithm based on the graph.