Amir Globerson, Sam Roweis
We present an algorithm for learning a quadratic Gaussian metric (Maha- lanobis distance) for use in classiﬁcation tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes inﬁnitely far away. We show that when the metric we learn is used in simple clas- siﬁers, it yields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efﬁcient classiﬁcation with very little reduction in performance.
1 Supervised Learning of Metrics
The problem of learning a distance measure (metric) over an input space is of fundamental importance in machine learning [10, 9], both supervised and unsupervised. When such measures are learned directly from the available data, they can be used to improve learn- ing algorithms which rely on distance computations such as nearest neighbour classiﬁ- cation , supervised kernel machines (such as GPs or SVMs) and even unsupervised clustering algorithms . Good similarity measures may also provide insight into the inter-protein distances), and may aid in building bet- underlying structure of data (e.g. ter data visualizations via embedding. In fact, there is a close link between distance
learning and feature extraction since whenever we construct a featurefx for an input spaceX, we can measure distances betweenx1;x22X using a simple distance func- tion (e.g. Euclidean)d[fx1;fx2℄ in feature space. Thus by ﬁxingd, any feature illustration of this approach is when thefx is a linear projection ofx2< fx=Wx. The Euclidean distance betweenfx1 andfx2 is then the Mahalanobis distancekfx1fx2k2=x1x2TAx1x2, whereA=WTW is a positive learning the matrixA. This is also the goal of the
semideﬁnite matrix. Much of the recent work on metric learning has indeed focused on learning Mahalanobis distances, i.e. current work.
extraction algorithm may be considered a metric learning method. Perhaps the simplest so that
A common approach to learning metrics is to assume some knowledge in the form of equiv-
alence relations, i.e. which points should be close and which should be far (without speci- fying their exact distances). In the classiﬁcation setting there is a natural equivalence rela- tion, namely whether two points are in the same class or not. One of the classical statistical methods which uses this idea for the Mahalanobis distance is Fisher’s Linear Discriminant Analysis (see e.g. ). Other more recent methods are [10, 9, 5] which seek to minimize various separation criteria between the classes under the new metric.
2 The Approach of Collapsing Classes