Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Suvrit Sra, Joel Tropp, Inderjit Dhillon
Various problems in machine learning, databases, and statistics involve pairwise distances among a set of objects. It is often desirable for these distances to satisfy the properties of a metric, especially the triangle in- equality. Applications where metric data is useful include clustering, classification, metric-based indexing, and approximation algorithms for various graph problems. This paper presents the Metric Nearness Prob- lem: Given a dissimilarity matrix, find the "nearest" matrix of distances that satisfy the triangle inequalities. For p nearness measures, this pa- per develops efficient triangle fixing algorithms that compute globally optimal solutions by exploiting the inherent structure of the problem. Empirically, the algorithms have time and storage costs that are linear in the number of triangle constraints. The methods can also be easily parallelized for additional speed.
Imagine that a lazy graduate student has been asked to measure the pairwise distances among a group of objects in a metric space. He does not complete the experiment, and he must figure out the remaining numbers before his adviser returns from her conference. Obviously, all the distances need to be consistent, but the student does not know very much about the space in which the objects are embedded. One way to solve his problem is to find the "nearest" complete set of distances that satisfy the triangle inequalities. This procedure respects the measurements that have already been taken while forcing the missing numbers to behave like distances.
More charitably, suppose that the student has finished the experiment, but--measurements being what they are--the numbers do not satisfy the triangle inequality. The student knows that they must represent distances, so he would like to massage the data so that it corre- sponds with his a priori knowledge. Once again, the solution seems to require the "nearest" set of distances that satisfy the triangle inequalities.
Matrix nearness problems  offer a natural framework for developing this idea. If there are n points, we may collect the measurements into an n n symmetric matrix whose (j, k) entry represents the dissimilarity between the j-th and k-th points. Then, we seek to approximate this matrix by another whose entries satisfy the triangle inequalities. That is,
mik mij + mjk for every triple (i, j, k). Any such matrix will represent the distances among n points in some metric space. We calculate approximation error with a distortion measure that depends on how the corrected matrix should relate to the input matrix. For example, one might prefer to change a few entries significantly or to change all the entries a little.
We call the problem of approximating general dissimilarity data by metric data the Metric Nearness (MN) Problem. This simply stated problem has not previously been studied, al- though the literature does contain some related topics (see Section 1.1). This paper presents a formulation of the Metric Nearness Problem (Section 2), and it shows that every locally optimal solution is globally optimal. To solve the problem we present triangle-fixing al- gorithms that take advantage of its structure to produce globally optimal solutions. It can be computationally prohibitive, both in time and storage, to solve the MN problem without these efficiencies.
1.1 Related Work
The Metric Nearness (MN) problem is novel, but the literature contains some related work.
The most relevant research appears in a recent paper of Roth et al. . They observe that machine learning applications often require metric data, and they propose a technique for metrizing dissimilarity data. Their method, constant-shift embedding, increases all the dissimilarities by an equal amount to produce a set of Euclidean distances (i.e., a set of numbers that can be realized as the pairwise distances among an ensemble of points in a Euclidean space). The size of the translation depends on the data, so the relative and ab- solute changes to the dissimilarity values can be large. Our approach to metrizing data is completely different. We seek a consistent set of distances that deviates as little as pos- sible from the original measurements. In our approach, the resulting set of distances can arise from an arbitrary metric space; we do not restrict our attention to obtaining Euclidean distances. In consequence, we expect metric nearness to provide superior denoising. More- over, our techniques can also learn distances that are missing entirely.
There is at least one other method for inferring a metric. An article of Xing et al.  proposes a technique for learning a Mahalanobis distance for data in Rs. That is, a metric dist(x, y) = (x - y)T G(x - y), where G is an s s positive semi-definite matrix. The user specifies that various pairs of points are similar or dissimilar. Then the matrix G is computed by minimizing the total squared distances between similar points while forcing the total distances between dissimilar points to exceed one. The article provides explicit algorithms for the cases where G is diagonal and where G is an arbitrary positive semi-definite matrix. In comparison, the metric nearness problem is not restricted to Ma- halanobis distances; it can learn a general discrete metric. It also allows us to use specific distance measurements and to indicate our confidence in those measurements (by means of a weight matrix), rather than forcing a binary choice of "similar" or "dissimilar."
The Metric Nearness Problem may appear similar to metric Multi-Dimensional Scaling (MDS) , but we emphasize that the two problems are distinct. The MDS problem en- deavors to find an ensemble of points in a prescribed metric space (usually a Euclidean space) such that the distances between these points are close to the set of input distances. In contrast, the MN problem does not seek to find an embedding. In fact MN does not impose any hypotheses on the underlying space other than requiring it to be a metric space.
The outline of rest of the paper is as follows. Section 2 formally describes the MN problem. In Section 3, we present algorithms that allow us to solve MN problems with p nearness measures. Some applications and experimental results follow in Section 4. Section 5 dis- cusses our results, some interesting connections, and possibilities for future research.
2 The Metric Nearness Problem
We begin with some basic definitions. We define a dissimilarity matrix to be a nonnegative, symmetric matrix with zero diagonal. Meanwhile, a distance matrix is defined to be a dissimilarity matrix whose entries satisfy the triangle inequalities. That is, M is a distance matrix if and only if it is a dissimilarity matrix and mik mij + mjk for every triple of distinct indices (i, j, k). Distance matrices arise from measuring the distances among n points in a pseudo-metric space (i.e., two distinct points can lie at zero distance from each other). A distance matrix contains N = n (n - 1)/2 free parameters, so we denote the collection of all distance matrices by MN . The set MN is a closed, convex cone.
The metric nearness problem requests a distance matrix M that is closest to a given dis- similarity matrix D with respect to some measure of "closeness." In this work, we restrict our attention to closeness measures that arise from norms. Specifically, we seek a distance matrix M so that,
M argmin W X - D , (2.1) XMN where is a norm, W is a symmetric non-negative weight matrix, and ` ' denotes the elementwise (Hadamard) product of two matrices. The weight matrix reflects our confi- dence in the entries of D. When each dij represents a measurement with variance 2ij, we might set wij = 1/2ij. If an entry of D is missing, one can set the corresponding weight to zero.
Theorem 2.1. The function X W X - D always attains its minimum on MN . Moreover, every local minimum is a global minimum. If, in addition, the norm is strictly convex and the weight matrix has no zeros or infinities off its diagonal, then there is a unique global minimum.
Proof. The main task is to show that the objective function has no directions of recession, so it must attain a finite minimum on MN . Details appear in .
It is possible to use any norm in the metric nearness problem. We further restrict our attention to the p norms. The associated Metric Nearness Problems are
1/p p min wjk (xjk - djk) for 1 p < , and (2.2) XMN j=k min max wjk (xjk - djk) for p = . (2.3) XMN j=k
Note that the p norms are strictly convex for 1 < p < , and therefore the solution to (2.2) is unique. There is a basic intuition for choosing p. The 1 norm gives the absolute sum of the (weighted) changes to the input matrix, while the only reflects the maximum absolute change. The other p norms interpolate between these extremes. Therefore, a small value of p typically results in a solution that makes a few large changes to the original data, while a large value of p typically yields a solution with many small changes.