{"title": "Triangle Fixing Algorithms for the Metric Nearness Problem", "book": "Advances in Neural Information Processing Systems", "page_first": 361, "page_last": 368, "abstract": null, "full_text": " Triangle Fixing Algorithms for the Metric\n Nearness Problem\n\n\n\n Inderjit S. Dhillon Suvrit Sra Joel A. Tropp\n Dept. of Computer Sciences Dept. of Mathematics\n The Univ. of Texas at Austin The Univ. of Michigan at Ann Arbor\n Austin, TX 78712. Ann Arbor, MI, 48109.\n {inderjit,suvrit}@cs.utexas.edu jtropp@umich.edu\n\n\n\n\n Abstract\n\n Various problems in machine learning, databases, and statistics involve\n pairwise distances among a set of objects. It is often desirable for these\n distances to satisfy the properties of a metric, especially the triangle in-\n equality. Applications where metric data is useful include clustering,\n classification, metric-based indexing, and approximation algorithms for\n various graph problems. This paper presents the Metric Nearness Prob-\n lem: Given a dissimilarity matrix, find the \"nearest\" matrix of distances\n that satisfy the triangle inequalities. For p nearness measures, this pa-\n per develops efficient triangle fixing algorithms that compute globally\n optimal solutions by exploiting the inherent structure of the problem.\n Empirically, the algorithms have time and storage costs that are linear\n in the number of triangle constraints. The methods can also be easily\n parallelized for additional speed.\n\n\n1 Introduction\n\nImagine that a lazy graduate student has been asked to measure the pairwise distances\namong a group of objects in a metric space. He does not complete the experiment, and\nhe must figure out the remaining numbers before his adviser returns from her conference.\nObviously, all the distances need to be consistent, but the student does not know very much\nabout the space in which the objects are embedded. One way to solve his problem is to find\nthe \"nearest\" complete set of distances that satisfy the triangle inequalities. This procedure\nrespects the measurements that have already been taken while forcing the missing numbers\nto behave like distances.\n\nMore charitably, suppose that the student has finished the experiment, but--measurements\nbeing what they are--the numbers do not satisfy the triangle inequality. The student knows\nthat they must represent distances, so he would like to massage the data so that it corre-\nsponds with his a priori knowledge. Once again, the solution seems to require the \"nearest\"\nset of distances that satisfy the triangle inequalities.\n\nMatrix nearness problems [6] offer a natural framework for developing this idea. If there\nare n points, we may collect the measurements into an n n symmetric matrix whose\n(j, k) entry represents the dissimilarity between the j-th and k-th points. Then, we seek to\napproximate this matrix by another whose entries satisfy the triangle inequalities. That is,\n\n\f\nmik mij + mjk for every triple (i, j, k). Any such matrix will represent the distances\namong n points in some metric space. We calculate approximation error with a distortion\nmeasure that depends on how the corrected matrix should relate to the input matrix. For\nexample, one might prefer to change a few entries significantly or to change all the entries\na little.\n\nWe call the problem of approximating general dissimilarity data by metric data the Metric\nNearness (MN) Problem. This simply stated problem has not previously been studied, al-\nthough the literature does contain some related topics (see Section 1.1). This paper presents\na formulation of the Metric Nearness Problem (Section 2), and it shows that every locally\noptimal solution is globally optimal. To solve the problem we present triangle-fixing al-\ngorithms that take advantage of its structure to produce globally optimal solutions. It can\nbe computationally prohibitive, both in time and storage, to solve the MN problem without\nthese efficiencies.\n\n\n\n1.1 Related Work\n\n\nThe Metric Nearness (MN) problem is novel, but the literature contains some related work.\n\nThe most relevant research appears in a recent paper of Roth et al. [11]. They observe\nthat machine learning applications often require metric data, and they propose a technique\nfor metrizing dissimilarity data. Their method, constant-shift embedding, increases all the\ndissimilarities by an equal amount to produce a set of Euclidean distances (i.e., a set of\nnumbers that can be realized as the pairwise distances among an ensemble of points in a\nEuclidean space). The size of the translation depends on the data, so the relative and ab-\nsolute changes to the dissimilarity values can be large. Our approach to metrizing data is\ncompletely different. We seek a consistent set of distances that deviates as little as pos-\nsible from the original measurements. In our approach, the resulting set of distances can\narise from an arbitrary metric space; we do not restrict our attention to obtaining Euclidean\ndistances. In consequence, we expect metric nearness to provide superior denoising. More-\nover, our techniques can also learn distances that are missing entirely.\n\nThere is at least one other method for inferring a metric. An article of Xing et al. [12]\nproposes a technique for learning a Mahalanobis distance for data in Rs. That is, a metric\ndist(x, y) = (x - y)T G(x - y), where G is an s s positive semi-definite matrix.\nThe user specifies that various pairs of points are similar or dissimilar. Then the matrix\nG is computed by minimizing the total squared distances between similar points while\nforcing the total distances between dissimilar points to exceed one. The article provides\nexplicit algorithms for the cases where G is diagonal and where G is an arbitrary positive\nsemi-definite matrix. In comparison, the metric nearness problem is not restricted to Ma-\nhalanobis distances; it can learn a general discrete metric. It also allows us to use specific\ndistance measurements and to indicate our confidence in those measurements (by means of\na weight matrix), rather than forcing a binary choice of \"similar\" or \"dissimilar.\"\n\nThe Metric Nearness Problem may appear similar to metric Multi-Dimensional Scaling\n(MDS) [8], but we emphasize that the two problems are distinct. The MDS problem en-\ndeavors to find an ensemble of points in a prescribed metric space (usually a Euclidean\nspace) such that the distances between these points are close to the set of input distances.\nIn contrast, the MN problem does not seek to find an embedding. In fact MN does not\nimpose any hypotheses on the underlying space other than requiring it to be a metric space.\n\nThe outline of rest of the paper is as follows. Section 2 formally describes the MN problem.\nIn Section 3, we present algorithms that allow us to solve MN problems with p nearness\nmeasures. Some applications and experimental results follow in Section 4. Section 5 dis-\ncusses our results, some interesting connections, and possibilities for future research.\n\n\f\n2 The Metric Nearness Problem\n\nWe begin with some basic definitions. We define a dissimilarity matrix to be a nonnegative,\nsymmetric matrix with zero diagonal. Meanwhile, a distance matrix is defined to be a\ndissimilarity matrix whose entries satisfy the triangle inequalities. That is, M is a distance\nmatrix if and only if it is a dissimilarity matrix and mik mij + mjk for every triple of\ndistinct indices (i, j, k). Distance matrices arise from measuring the distances among n\npoints in a pseudo-metric space (i.e., two distinct points can lie at zero distance from each\nother). A distance matrix contains N = n (n - 1)/2 free parameters, so we denote the\ncollection of all distance matrices by MN . The set MN is a closed, convex cone.\n\nThe metric nearness problem requests a distance matrix M that is closest to a given dis-\nsimilarity matrix D with respect to some measure of \"closeness.\" In this work, we restrict\nour attention to closeness measures that arise from norms. Specifically, we seek a distance\nmatrix M so that,\n\n M argmin W X - D , (2.1)\n XMN\nwhere is a norm, W is a symmetric non-negative weight matrix, and ` ' denotes the\nelementwise (Hadamard) product of two matrices. The weight matrix reflects our confi-\ndence in the entries of D. When each dij represents a measurement with variance 2ij, we\nmight set wij = 1/2ij. If an entry of D is missing, one can set the corresponding weight\nto zero.\n\nTheorem 2.1. The function X W X - D always attains its minimum on MN .\nMoreover, every local minimum is a global minimum. If, in addition, the norm is strictly\nconvex and the weight matrix has no zeros or infinities off its diagonal, then there is a\nunique global minimum.\n\nProof. The main task is to show that the objective function has no directions of recession,\nso it must attain a finite minimum on MN . Details appear in [4].\n\nIt is possible to use any norm in the metric nearness problem. We further restrict our\nattention to the p norms. The associated Metric Nearness Problems are\n\n 1/p\n p\n min wjk (xjk - djk) for 1 p < , and (2.2)\n XMN j=k\n\n min max wjk (xjk - djk) for p = . (2.3)\n XMN j=k\n\nNote that the p norms are strictly convex for 1 < p < , and therefore the solution to (2.2)\nis unique. There is a basic intuition for choosing p. The 1 norm gives the absolute sum\nof the (weighted) changes to the input matrix, while the only reflects the maximum\nabsolute change. The other p norms interpolate between these extremes. Therefore, a\nsmall value of p typically results in a solution that makes a few large changes to the original\ndata, while a large value of p typically yields a solution with many small changes.\n\n\n3 Algorithms\n\nThis section describes efficient algorithms for solving the Metric Nearness Problems (2.2)\nand (2.3). For ease of exposition, we assume all weights to equal one. At first, it may\nappear that one should use quadratic programming (QP) software when p = 2, linear pro-\ngramming (LP) software when p = 1 or p = , and convex programming software for\nthe remaining p. It turns out that the time and storage requirements of this approach can\nbe prohibitive. An efficient algorithm must exploit the structure of the triangle inequalities.\nIn this paper, we develop one such approach, which may be viewed as a triangle-fixing\n\n\f\nalgorithm. This method examines each triple of points in turn and optimally enforces any\ntriangle inequality that fails. (The definition of \"optimal\" depends on the p nearness mea-\nsure.) By introducing appropriate corrections, we can ensure that this iterative algorithm\nconverges to a globally optimal solution of MN.\n\n\nNotation. We must introduce some additional notation before proceeding. To each matrix\nX of dissimilarities or distances, we associate the vector x formed by stacking the columns\nof the lower triangle, left to right. We use xij to refer to the (i, j) entry of the matrix as\nwell as the corresponding component of the vector. Define a constraint matrix A so that\nM is a distance matrix if and only if Am 0. Note that each row of A contains three\nnonzero entries, +1, -1, and -1.\n\n\n3.1 MN for the 2 norm\n\nWe first develop a triangle-fixing algorithm for solving (2.2) with respect to the 2 norm.\nThis case turns out to be the simplest and most illuminating case. It also plays a pivotal\nrole in the algorithms for the 1 and MN problems.\n\nGiven a dissimilarity vector d, we wish to find its orthogonal projection m onto the cone\nMN . Let us introduce an auxiliary variable e = m - d that represents the changes to the\noriginal distances. We also define b = -Ad. The negative entries of b indicate how much\neach triangle inequality is violated. The problem becomes\n\n mine e 2, (3.1)\n subject to Ae b.\n\nAfter finding the minimizer e , we can use the relation m = d+e to recover the optimal\ndistance vector.\n\nHere is our approach. We initialize the vector of changes to zero (e = 0), and then we\nbegin to cycle through the triangles. Suppose that the (i, j, k) triangle inequality is violated,\ni.e., eij - ejk - eki > bijk. We wish to remedy this violation by making an 2-minimal\nadjustment of eij, ejk, and eki. In other words, the vector e is projected orthogonally onto\nthe constraint set {e : eij - e - e b\n jk ki ijk}. This is tantamount to solving\n\n min 1\n e (e\n 2 ij - eij )2 + (ejk - ejk)2 + (eki - eki)2) , (3.2)\n subject to eij - ejk - eki = bijk.\n\nIt is easy to check that the solution is given by\n\n eij eij - ijk, ejk ejk + ijk, and eki eki + ijk, (3.3)\n\nwhere ijk = 1 (e\n 3 ij - ejk - eki - bijk) > 0. Only three components of the vector e\nneed to be updated. The updates in (3.3) show that the largest edge weight in the triangle\nis decreased, while the other two edge weights are increased.\n\nIn turn, we fix each violated triangle inequality using (3.3). We must also introduce a\ncorrection term to guide the algorithm to the global minimum. The corrections have a\nsimple interpretation in terms of the dual of the minimization problem (3.1). Each dual\nvariable corresponds to the violation in a single triangle inequality, and each individual\ncorrection results in a decrease in the violation. We continue until no triangle receives a\nsignificant update.\n\nAlgorithm 3.1 displays the complete iterative scheme that performs triangle fixing along\nwith appropriate corrections.\n\n\f\n Algorithm 3.1: Triangle Fixing For 2 norm.\n\n TRIANGLE FIXING(D, )\n Input: Input dissimilarity matrix D, tolerance\n Output: M = argmin X - D\n XM 2 .\n N\n for 1 i < j < k n\n (zijk, zjki, zkij) 0 {Initialize correction terms}\n for 1 i < j n\n eij 0 {Initial error values for each dissimilarity dij}\n 1 + {Parameter for testing convergence}\n while ( > ) {convergence test}\n foreach triangle (i, j, k)\n b dki + djk - dij\n 1 (e\n 3 ij - ejk - eki - b) ( )\n min{-, zijk} {Stay within half-space of constraint}\n eij eij - , ejk ejk + , eki eki + ( )\n zijk zijk - {Update correction term}\n end foreach\n sum of changes in the e values\n end while\n return M = D + E\n\n\n\nRemark: Algorithm 3.1 is an efficient adaptation of Bregman's method [1]. By itself,\nBregman's method would suffer the same storage and computation costs as a general con-\nvex optimization algorithm. Our triangle fixing operations allow us to compactly represent\nand compute the intermediate variables required to solve the problem. The correctness and\nconvergence properties of Algorithm 3.1 follow from those of Bregman's method. Further-\nmore, our algorithms are very easy to implement.\n\n\n3.2 MN for the 1 and norms\n\nThe basic triangle fixing algorithm succeeds only when the norm used in (2.2) is strictly\nconvex. Hence, it cannot be applied directly to the 1 and cases. These require a more\nsophisticated approach.\n\nFirst, observe that the problem of minimizing the 1 norm of the changes can be written as\nan LP:\n\n min 0T e + 1T f\n e,f (3.4)\n subject to Ae b, -e - f 0, e - f 0.\n\nThe auxiliary variable f can be interpreted as the absolute value of e. Similarly, minimizing\nthe norm of the changes can be accomplished with the LP\n\n min 0T e + \n e, (3.5)\n subject to Ae b, -e - 1 0, e - 1 0.\n\nWe interpret = e .\n\nSolving these linear programs using standard software can be prohibitively expensive be-\ncause of the large number of constraints. Moreover, the solutions are not unique because\nthe 1 and norms are not strictly convex. Instead, we replace the LP by a quadratic\nprogram (QP) that is strictly convex and returns the solution of the LP that has minimum\n\n 2 -norm. For the 1 case, we have the following result.\n\n\f\nTheorem 3.1 ( 1 Metric Nearness). Let z = [e; f ] and c = [0; 1] be partitioned confor-\nmally. If (3.4) has a solution, then there exists a 0 > 0, such that for all 0,\n\n argmin z + -1c 2 = argmin z 2, (3.6)\n zZ zZ\n\nwhere Z is the feasible set for (3.4) and Z is the set of optimal solutions to (3.4). The\nminimizer of (3.6) is unique.\n\n\nTheorem 3.1 follows from a result of Mangasarian [9, Theorem 2.1-a-i]. A similar theorem\nmay be stated for the case.\n\nThe QP (3.6) can be solved using an augmented triangle-fixing algorithm since the ma-\njority of the constraints in (3.6) are triangle inequalities. As in the 2 case, the triangle\nconstraints are enforced using (3.3). Each remaining constraint is enforced by computing\nan orthogonal projection onto the corresponding halfspace. We refer the reader to [5] for\nthe details.\n\n\n3.3 MN for p norms (1 < p < )\n\nNext, we explain how to use triangle fixing to solve the MN problem for the remaining p\nnorms, 1 < p < . The computational costs are somewhat higher because the algorithm\nrequires solving a nonlinear equation. The problem may be phrased as\n\n 1\n mine e p\n p p subject to Ae b. (3.7)\n\nTo enforce a triangle constraint optimally in the p norm, we need to compute a projection\nof the vector e onto the constraint set. Define (x) = 1 x p\n p p, and note that ( (x))i =\nsgn(xi) |xi|p-1. The projection of e onto the (i, j, k) violating constraint is the solution of\n\n mine (e ) - (e) - (e), e - e subject to aT\n ijke = bijk,\n\nwhere aijk is the row of the constraint matrix corresponding to the triangle inequality\n(i, j, k). The projection may be determined by solving\n\n (e ) = (e) + ijk aijk so that aT\n ijke = bijk. (3.8)\n\nSince aijk has only three nonzero entries, we see that e only needs to be updated in three\ncomponents. Therefore, in Algorithm 3.1 we may replace ( ) by an appropriate numerical\ncomputation of the parameter ijk and replace ( ) by the computation of the new value\nof e. Further details are available in [5].\n\n4 Applications and Experiments\n\nReplacing a general graph (dissimilarity matrix) by a metric graph (distance matrix) can\nenable us to use efficient approximation algorithms for NP-Hard graph problems (MAX-\nCUT clustering) that have guaranteed error for metric data, for example, see [7]. The error\nfrom MN will carry over to the graph problem, while retaining the bounds on total error\nincurred. As an example, constant factor approximation algorithms for MAX-CUT exist\nfor metric graphs [3], and can be used for clustering applications. See [4] for more details.\n\nApplications that use dissimilarity values, such as clustering, classification, searching, and\nindexing, could potentially be sped up if the data is metric. MN is a natural candidate for\nenforcing metric properties on the data to permit these speedups.\n\nWe were originally motivated to formulate and solve MN by a problem that arose in connec-\ntion with biological databases [13]. This problem involves approximating mPAM matrices,\n\n\f\nwhich are a derivative of mutation probability matrices [2] that arise in protein sequencing.\nThey represent a certain measure of dissimilarity for an application in protein sequencing.\nOwing to the manner in which these matrices are formed, they tend not to be distance ma-\ntrices. Query operations in biological databases have the potential to be dramatically sped\nup if the data were metric (using a metric based indexing scheme). Thus, one approach is\nto find the nearest distance matrix to each mPAM matrix and use that approximation in the\nmetric based indexing scheme.\n\nWe approximated various mPAM matrices by their nearest distance matrices. The relative\nerrors of the approximations D - M / D are reported in Table 1.\n\n Table 1: Relative errors for mPAM dataset ( 1, 2, nearness, respectively)\n\n Dataset D-M 1 D-M 2 D-M \n D 1 D 2 D \n mPAM50 0.339 0.402 0.278\n mPAM100 0.142 0.231 0.206\n mPAM150 0.054 0.121 0.151\n mPAM250 0.004 0.025 0.042\n mPAM300 0.002 0.017 0.056\n\n\n4.1 Experiments\nThe MN problem has an input of size N = n(n - 1)/2, and the number of constraints is\nroughly N 3/2. We ran experiments to ascertain the empirical behavior of the algorithm.\nFigure 1 shows loglog plots of the running time of our algorithms for solving the 1\n\n\n Log-Log plot showing runtime behavior of l MN Log-Log plot of running time for l MN\n 1 2\n 8 6.2\n\n\n 6\n 6\n\n 5.8\n\n 4\n 5.6\n\n\n 2 5.4\n\n\n 0 5.2\n\n\n 5\n Log(Running time in seconds) -2 Log(Running time in seconds)\n\n 4.8\n\n -4\n 4.6\n y=1.6x-6.3 y=1.5x - 6.1\n Running Time Running time\n\n -6 4.4\n 1 2 3 4 5 6 7 8 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8\n Log(N) -- N is the input size Log(N) -- where N is the input size\n\n\n\n\n\n Figure 1: Running time for 1 and 2 norm solutions (plots have different scales).\n\n\nand 2 Metric Nearness Problems. Note that the time cost appears to be O(N 3/2), which\nis linear in the number of constraints. The results plotted in the figure were obtained\nby executing the algorithms on random dissimilarity matrices. The procedure was halted\nwhen the distance values changed less than 10-3 from one iteration to the next. For both\nproblems, the results were obtained with a simple MATLAB implementation. Nevertheless,\nthis basic version outperforms MATLAB's optimization package by one or two orders of\nmagnitude (depending on the problem), while numerically achieving similar results. A\nmore sophisticated (C or parallel) implementation could improve the running time even\nmore, which would allow us to study larger problems.\n\n5 Discussion\n\nIn this paper, we have introduced the Metric Nearness problem, and we have developed al-\ngorithms for solving it for p nearness measures. The algorithms proceed by fixing violated\n\n\f\ntriangles in turn, while introducing correction terms to guide the algorithm to the global op-\ntimum. Our experiments suggest that the algorithms require O(N 3/2) time, where N is the\ntotal number of distances, so it is linear in the number of constraints. An open problem is\nto obtain an algorithm with better computational complexity.\n\nMetric Nearness is a rich problem. It can be shown that a special case (allowing only\ndecreases in the dissimilarities) is identical with the All Pairs Shortest Path problem [10].\nThus one may check whether the N distances satisfy metric properties in O(APSP) time.\nHowever, we are not aware if this is a lower bound.\n\nIt is also possible to incorporate other types of linear and convex constraints into the Metric\nNearness Problem. Some other possibilities include putting box constraints on the distances\n(l m u), allowing triangle inequalities (mij 1mik +2mkj), or enforcing order\nconstraints (dij < dkl implies mij < mkl).\n\nWe plan to further investigate the application of MN to other problems in data mining,\nmachine learning, and database query retrieval.\n\nAcknowledgments\nThis research was supported by NSF grant CCF-0431257, NSF Career Award ACI-\n0093404, and NSF-ITR award IIS-0325116.\n\nReferences\n\n [1] Y. Censor and S. A. Zenios. Parallel Optimization: Theory, Algorithms, and Appli-\n cations. Numerical Mathematics and Scientific Computation. OUP, 1997.\n\n [2] M. O. Dayhoff, R. M. Schwarz, and B. C. Orcutt. A model of evolutionary change in\n proteins. Atlas of Protein Sequence and Structure, 5(Suppl. 3), 1978.\n\n [3] W. F. de la Vega and C. Kenyon. A randomized approximation scheme for Metric\n MAX-CUT. J. Comput. Sys. and Sci., 63:531541, 2001.\n\n [4] I. S. Dhillon, S. Sra, and J. A. Tropp. The Metric Nearness Problems with Applica-\n tions. Tech. Rep. TR-03-23, Comp. Sci. Univ. of Texas at Austin, 2003.\n\n [5] I. S. Dhillon, S. Sra, and J. A. Tropp. Triangle Fixing Algorithms for the Metric\n Nearness Problem. Tech. Rep. TR-04-22, Comp. Sci., Univ. of Texas at Austin, 2004.\n\n [6] N. J. Higham. Matrix nearness problems and applications. In M. J. C. Gower and\n S. Barnett, editors, Applications of Matrix Theory, pages 127. Oxford University\n Press, 1989.\n\n [7] P. Indyk. Sublinear time algorithms for metric space problems. In 31st Symposium\n on Theory of Computing, pages 428434, 1999.\n\n [8] J. B. Kruskal and M. Wish. Multidimensional Scaling. Number 07-011. Sage Publi-\n cations, 1978. Series: Quantitative Applications in the Social Sciences.\n\n [9] O. L. Mangasarian. Normal solutions of linear programs. Mathematical Programming\n Study, 22:206216, 1984.\n\n[10] C. G. Plaxton. Personal Communication, 20032004.\n\n[11] V. Roth, J. Laub, J. M. Buhmann, and K.-R. M uller. Going metric: Denoising pari-\n wise data. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural\n Information Processing Systems (NIPS) 15, 2003.\n\n[12] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with\n application to clustering with side constraints. In S. Becker, S. Thrun, and K. Ober-\n mayer, editors, Advances in Neural Information Processing Systems (NIPS) 15, 2003.\n\n[13] W. Xu and D. P. Miranker. A metric model of amino acid substitution. Bioinformatics,\n 20(0):18, 2004.\n\n\f\n", "award": [], "sourceid": 2598, "authors": [{"given_name": "Suvrit", "family_name": "Sra", "institution": null}, {"given_name": "Joel", "family_name": "Tropp", "institution": null}, {"given_name": "Inderjit", "family_name": "Dhillon", "institution": null}]}