{"title": "Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces", "book": "Advances in Neural Information Processing Systems", "page_first": 388, "page_last": 396, "abstract": "This paper introduces a novel mathematical and computational framework, namely {\\it Log-Hilbert-Schmidt metric} between positive definite operators on a Hilbert space. This is a generalization of the Log-Euclidean metric on the Riemannian manifold of positive definite matrices to the infinite-dimensional setting. The general framework is applied in particular to compute distances between covariance operators on a Reproducing Kernel Hilbert Space (RKHS), for which we obtain explicit formulas via the corresponding Gram matrices. Empirically, we apply our formulation to the task of multi-category image classification, where each image is represented by an infinite-dimensional RKHS covariance operator. On several challenging datasets, our method significantly outperforms approaches based on covariance matrices computed directly on the original input features, including those using the Log-Euclidean metric, Stein and Jeffreys divergences, achieving new state of the art results.", "full_text": "Log-Hilbert-Schmidt metric between positive de\ufb01nite\n\noperators on Hilbert spaces\n\nH`a Quang Minh\n\nMarco San Biagio\n\nVittorio Murino\n\n{minh.haquang,marco.sanbiagio,vittorio.murino}@iit.it\n\nVia Morego 30, Genova 16163, ITALY\n\nIstituto Italiano di Tecnologia\n\nAbstract\n\nThis paper introduces a novel mathematical and computational framework,\nnamely Log-Hilbert-Schmidt metric between positive de\ufb01nite operators on a\nHilbert space. This is a generalization of the Log-Euclidean metric on the Rie-\nmannian manifold of positive de\ufb01nite matrices to the in\ufb01nite-dimensional setting.\nThe general framework is applied in particular to compute distances between co-\nvariance operators on a Reproducing Kernel Hilbert Space (RKHS), for which we\nobtain explicit formulas via the corresponding Gram matrices. Empirically, we\napply our formulation to the task of multi-category image classi\ufb01cation, where\neach image is represented by an in\ufb01nite-dimensional RKHS covariance operator.\nOn several challenging datasets, our method signi\ufb01cantly outperforms approaches\nbased on covariance matrices computed directly on the original input features,\nincluding those using the Log-Euclidean metric, Stein and Jeffreys divergences,\nachieving new state of the art results.\n\n1\n\nIntroduction and motivation\n\nSymmetric Positive De\ufb01nite (SPD) matrices, in particular covariance matrices, have been playing\nan increasingly important role in many areas of machine learning, statistics, and computer vision,\nwith applications ranging from kernel learning [12], brain imaging [9], to object detection [24, 23].\nOne key property of SPD matrices is the following. For a \ufb01xed n \u2208 N, the set of all SPD matrices\nof size n \u00d7 n is not a subspace in Euclidean space, but is a Riemannian manifold with nonpositive\ncurvature, denoted by Sym++(n). As a consequence of this manifold structure, computational\nmethods for Sym++(n) that simply rely on Euclidean metrics are generally suboptimal.\nIn the current literature, many methods have been proposed to exploit the non-Euclidean structure\nof Sym++(n). For the purposes of the present work, we brie\ufb02y describe three common approaches\nhere, see e.g. [9] for other methods. The \ufb01rst approach exploits the af\ufb01ne-invariant metric, which\nis the classical Riemannian metric on Sym++(n) [18, 16, 3, 19, 4, 24]. The main drawback of this\nframework is that it tends to be computationally intensive, especially for large scale applications.\nOvercoming this computational complexity is one of the main motivations for the recent develop-\nment of the Log-Euclidean metric framework of [2], which has been exploited in many computer\nvision applications, see e.g. [25, 11, 17]. The third approach de\ufb01nes and exploits Bregman diver-\ngences on Sym++(n), such as Stein and Jeffreys divergences, see e.g. [12, 22, 8], which are not\nRiemannian metrics but are fast to compute and have been shown to work well on nearest-neighbor\nretrieval tasks.\nWhile each approach has its advantages and disadvantages, the Log-Euclidean metric possesses\nseveral properties which are lacking in the other two approaches. First, it is faster to compute than\nthe af\ufb01ne-invariant metric. Second, unlike the Bregman divergences, it is a Riemannian metric\non Sym++(n) and thus can better capture its manifold structure. Third, in the context of kernel\n\n1\n\n\flearning, it is straightforward to construct positive de\ufb01nite kernels, such as the Gaussian kernel,\nusing this metric. This is not always the case with the other two approaches: the Gaussian kernel\nconstructed with the Stein divergence, for instance, is only positive de\ufb01nite for certain choices of\nparameters [22], and the same is true with the af\ufb01ne-invariant metric, as can be numerically veri\ufb01ed.\nOur contributions:\nIn this work, we generalize the Log-Euclidean metric to the in\ufb01nite-\ndimensional setting, both mathematically, computationally, and empirically. Our novel metric,\ntermed Log-Hilbert-Schmidt metric (or Log-HS for short), measures the distances between positive\nde\ufb01nite unitized Hilbert-Schmidt operators, which are scalar perturbations of Hilbert-Schmidt oper-\nators on a Hilbert space and which are in\ufb01nite-dimensional generalizations of positive de\ufb01nite ma-\ntrices. These operators have recently been shown to form an in\ufb01nite-dimensional Riemann-Hilbert\nmanifold by [14, 1, 15], who formulated the in\ufb01nite-dimensional version of the af\ufb01ne-invariant\nmetric from a purely mathematical viewpoint. While our Log-Hilbert-Schmidt metric framework\nincludes the Log-Euclidean metric as a special case, the in\ufb01nite-dimensional formulation is signi\ufb01-\ncantly different from its corresponding \ufb01nite-dimensional version, as we demonstrate throughout the\npaper. In particular, one cannot obtain the in\ufb01nite-dimensional formulas from the \ufb01nite-dimensional\nones by letting the dimension approach in\ufb01nity.\nComputationally, we apply our abstract mathematical framework to compute distances between co-\nvariance operators on an RKHS induced by a positive de\ufb01nite kernel. From a kernel learning per-\nspective, this is motivated by the fact that covariance operators de\ufb01ned on nonlinear features, which\nare obtained by mapping the original data into a high-dimensional feature space, can better cap-\nture input correlations than covariance matrices de\ufb01ned on the original data. This is a viewpoint\nthat goes back to KernelPCA [21]. In our setting, we obtain closed form expressions for the Log-\nHilbert-Schmidt metric between covariance operators via the Gram matrices.\nEmpirically, we apply our framework to the task of multi-class image classi\ufb01cation. In our approach,\nthe original features extracted from each input image are implicitly mapped into the RKHS induced\nby a positive de\ufb01nite kernel. The covariance operator de\ufb01ned on the RKHS is then used as the rep-\nresentation for the image and the distance between two images is the Log-Hilbert-Schmidt distance\nbetween their corresponding covariance operators. On several challenging datasets, our method sig-\nni\ufb01cantly outperforms approaches based on covariance matrices computed directly on the original\ninput features, including those using the Log-Euclidean metric, Stein and Jeffreys divergences.\nRelated work: The approach most closely related to our current work is [26], which computed\nprobabilistic distances in RKHS. This approach has recently been employed by [10] to compute\nBregman divergences between RKHS covariance operators. There are two main theoretical issues\nwith the approach in [26, 10]. The \ufb01rst issue is that it is assumed implicitly that the concepts of\ntrace and determinant can be extended to any bounded linear operator on an in\ufb01nite-dimensional\nHilbert space H. This is not true in general, as the concepts of trace and determinant are only well-\nde\ufb01ned for certain classes of operators. Many quantities involved in the computation of the Bregman\ndivergences in [10] are in fact in\ufb01nite when dim(H) = \u221e, which is the case if H is the Gaussian\nRKHS, and only cancel each other out in special cases 1. The second issue concerns the use of\nthe Stein divergence by [10] to de\ufb01ne the Gaussian kernel, which is not always positive de\ufb01nite, as\ndiscussed above. In contrast, the Log-HS metric formulation proposed in this paper is theoretically\nrigorous and it is straightforward to de\ufb01ne many positive de\ufb01nite kernels, including the Gaussian\nkernel, with this metric. Furthermore, our empirical results consistently outperform those of [10].\nOrganization: After some background material in Section 2, we describe the manifold of positive\nde\ufb01nite operators in Section 3. Sections 4 and 5 form the core of the paper, where we develop the\ngeneral framework for the Log-Hilbert-Schmidt metric together with the explicit formulas for the\ncase of covariance operators on an RKHS. Empirical results for image classi\ufb01cation are given in\nSection 6. The proofs for all mathematical results are given in the Supplementary Material.\n\n2 Background\n\nThe Riemannian manifold of positive de\ufb01nite matrices: The manifold structure of Sym++(n)\nhas been studied extensively, both mathematically and computationally. This study goes as far\n\n1We will provide a theoretically rigorous formulation for the Bregman divergences between positive de\ufb01nite\n\noperators in a longer version of the present work.\n\n2\n\n\fback as [18], for more recent treatments see e.g. [16, 3, 19, 4]. The most commonly encountered\nRiemannian metric on Sym++(n) is the af\ufb01ne-invariant metric, in which the geodesic distance\nbetween two positive de\ufb01nite matrices A and B is given by\n\nd(A, B) = || log(A\u22121/2BA\u22121/2)||F ,\n\n(1)\nwhere log denotes the matrix logarithm operation and F is an Euclidean norm on the space of\nsymmetric matrices Sym(n). Following the classical literature, in this work we take F to be the\nFrobenious norm, which is induced by the standard inner product on Sym(n). From a practical\nviewpoint, the metric (1) tends to be computationally intensive, which is one of the main motivations\nfor the Log-Euclidean metric of [2], in which the geodesic distance between A and B is given by\n(2)\nThe main goal of this paper is to generalize the Log-Euclidean metric to what we term the Log-\nHilbert-Schmidt metric between positive de\ufb01nite operators on an in\ufb01nite-dimensional Hilbert space\nand apply this metric in particular to compute distances between covariance operators on an RKHS.\nCovariance operators: Let the input space X be an arbitrary non-empty set. Let x = [x1, . . . , xm]\nbe a data matrix sampled from X , where m \u2208 N is the number of observations. Let K be a\npositive de\ufb01nite kernel on X \u00d7 X and HK its induced reproducing kernel Hilbert space (RKHS).\nLet \u03a6 : X \u2192 HK be the corresponding feature map, which gives the (potentially in\ufb01nite) mapped\ndata matrix \u03a6(x) = [\u03a6(x1), . . . , \u03a6(xm)] of size dim(HK) \u00d7 m in the feature space HK. The\ncorresponding covariance operator for \u03a6(x) is de\ufb01ned to be\n\ndlogE(A, B) = || log(A) \u2212 log(B)||F .\n\n\u03a6(x)Jm\u03a6(x)T : HK \u2192 HK,\n\n1\nm\n\nC\u03a6(x) =\n\nm 1m1T\n\n(3)\nm with 1m = (1, . . . , 1)T \u2208 Rm.\nwhere Jm is the centering matrix, de\ufb01ned by Jm = Im \u2212 1\nThe matrix Jm is symmetric, with rank(Jm) = m \u2212 1, and satis\ufb01es J 2\nm = Jm. The covariance\noperator C\u03a6(x) can be viewed as a (potentially in\ufb01nite) covariance matrix in the feature space HK,\nwith rank at most m\u2212 1. If X = Rn and K(x, y) = (cid:104)x, y(cid:105)Rn, then C\u03a6(x) = Cx, the standard n\u00d7 n\ncovariance matrix encountered in statistics. 2\nRegularization: Generally, covariance matrices may not be full-rank and thus may only be positive\nsemi-de\ufb01nite. In order to apply the theory of Sym++(n), one needs to consider the regularized\nversion (Cx + \u03b3IRn) for some \u03b3 > 0. In the in\ufb01nite-dimensional setting, with dim(HK) = \u221e,\nC\u03a6(x) is always rank-de\ufb01cient and regularization is always necessary. With \u03b3 > 0, (C\u03a6(x) + \u03b3IHK )\nis strictly positive and invertible, both of which are needed to de\ufb01ne the Log-Hilbert-Schmidt metric.\n\n3 Positive de\ufb01nite unitized Hilbert-Schmidt operators\nThroughout the paper, let H be a separable Hilbert space of arbitrary dimension. Let L(H) be\nthe Banach space of bounded linear operators on H and Sym(H) be the subspace of self-adjoint\noperators in L(H). We \ufb01rst describe in this section the manifold of positive de\ufb01nite unitized Hilbert-\nSchmidt operators on which the Log-Hilbert-Schmidt metric is de\ufb01ned. This manifold setting is\nmotivated by the following two crucial differences between the \ufb01nite and in\ufb01nite-dimensional cases.\n(A) Positive de\ufb01nite: If A \u2208 Sym(H) and dim(H) = \u221e, in order for log(A) to be well-de\ufb01ned\nand bounded, it is not suf\ufb01cient to require that all eigenvalues of A be strictly positive. Instead, it is\nnecessary to require that all eigenvalues of A be bounded below by a positive constant (Section 3.1).\n(B) Unitized Hilbert-Schmidt: The in\ufb01nite-dimensional generalization of the Frobenious norm is the\nHilbert-Schmidt norm. However, if dim(H) = \u221e, the identity operator I is not Hilbert-Schmidt and\nwould have in\ufb01nite distance from any Hilbert-Schmidt operator. To have a satisfactory framework,\nit is necessary to enlarge the algebra of Hilbert-Schmidt operators to include I (Section 3.2).\nThese differences between the cases dim(H) = \u221e and dim(H) < \u221e are sharp and manifest\nthemselves in the concrete formulas for the Log-Hilbert-Schmidt metric which we obtain in Sections\nIn particular, the formulas for the case dim(H) = \u221e are not obtainable from their\n4.2 and 5.\ncorresponding \ufb01nite-dimensional versions when dim(H) \u2192 \u221e.\n\n2 One can also de\ufb01ne C\u03a6(x) = 1\n\nm\u22121 \u03a6(x)Jm\u03a6(x)T . This should not make much practical difference if m\n\nis large.\n\n3\n\n\f3.1 Positive de\ufb01nite operators\n\ndim(H)(cid:88)\ndim(H)(cid:88)\n\nk=1\n\nPositive and strictly positive operators: Let us discuss the \ufb01rst crucial difference between the\n\ufb01nite and in\ufb01nite-dimensional settings. Recall that an operator A \u2208 Sym(H) is said to be positive\nif (cid:104)Ax, x(cid:105) \u2265 0 \u2200x \u2208 H. The eigenvalues of A, if they exist, are all nonnegative. If A is positive and\n(cid:104)Ax, x(cid:105) = 0 \u21d0\u21d2 x = 0, then A is said to be strictly positive, and all its eigenvalues are positive. We\ndenote the sets of all positive and strictly positive operators on H, respectively, by Sym+(H) and\nSym++(H). Let A \u2208 Sym++(H). Assume that A is compact, then A has a countable spectrum of\npositive eigenvalues {\u03bbk(A)}dim(H)\n, counting multiplicities, with limk\u2192\u221e \u03bbk(A) = 0 if dim(H) =\n\u221e. Let {\u03c6k(A)}dim(H)\n\ndenote the corresponding normalized eigenvectors, then\n\nk=1\n\nk=1\n\n(4)\nwhere \u03c6k(A) \u2297 \u03c6k(A) : H \u2192 H is de\ufb01ned by (\u03c6k(A) \u2297 \u03c6k(A))w = (cid:104)w, \u03c6k(A)(cid:105)\u03c6k(A), w \u2208 H.\nThe logarithm of A is de\ufb01ned by\n\nA =\n\n\u03bbk(A)\u03c6k(A) \u2297 \u03c6k(A),\n\nlog(\u03bbk(A))\u03c6k(A) \u2297 \u03c6k(A).\n\nk=1\n\nlog(A) =\n\n(5)\nClearly, log(A) is bounded if and only if dim(H) < \u221e, since for dim(H) = \u221e, we have\nlimk\u2192\u221e log(\u03bbk(A)) = \u2212\u221e. Thus, when dim(H) = \u221e, the condition that A be strictly positive is\nnot suf\ufb01cient for log(A) to be bounded. Instead, the following stronger condition is necessary.\nPositive de\ufb01nite operators: A self-adjoint operator A \u2208 L(H) is said to be positive de\ufb01nite (see\ne.g. [20]) if there exists a constant MA > 0 such that\n(cid:104)Ax, x(cid:105) \u2265 MA||x||2\n\n(6)\nThe eigenvalues of A, if they exist, are bounded below by MA. This condition is equivalent to\nrequiring that A be strictly positive and invertible, with A\u22121 \u2208 L(H). Clearly, if dim(H) < \u221e,\nthen strict positivity is equivalent to positive de\ufb01niteness. Let P(H) denote the open cone of self-\nadjoint, positive de\ufb01nite, bounded operators on H, that is\nThroughout the remainder of the paper, we use the following notation: A > 0 \u21d0\u21d2 A \u2208 P(H).\n\nP(H) = {A \u2208 L(H), A\u2217 = A,\u2203MA > 0 s.t. (cid:104)Ax, x(cid:105) \u2265 MA||x||2 \u2200x \u2208 H}.\n\nfor all x \u2208 H.\n\n(7)\n\n3.2 The Riemann-Hilbert manifold of positive de\ufb01nite unitized Hilbert-Schmidt operators\nLet HS(H) denote the two-sided ideal of Hilbert-Schmidt operators on H in L(H), which is a\nBanach algebra with the Hilbert-Schmidt norm, de\ufb01ned by\n\n||A||2\n\nHS = tr(A\u2217A) =\n\n\u03bbk(A\u2217A).\n\n(8)\n\ndim(H)(cid:88)\n\nk=1\n\nWe now discuss the second crucial difference between the \ufb01nite and in\ufb01nite-dimensional settings. If\ndim(H) = \u221e, then the identity operator I is not Hilbert-Schmidt, since ||I||HS = \u221e. Thus, given\n\u03b3 (cid:54)= \u00b5 > 0, we have || log(\u03b3I) \u2212 log(\u00b5I)||HS = | log(\u03b3) \u2212 log(\u00b5)| ||I||HS = \u221e, that is even the\ndistance between two different multiples of the identity operator is in\ufb01nite. This problem is resolved\nby considering the following extended (or unitized) Hilbert-Schmidt algebra [14, 1, 15]:\n\nHR = {A + \u03b3I : A\u2217 = A, A \u2208 HS(H), \u03b3 \u2208 R}.\n\n(9)\n\nThis can be endowed with the extended Hilbert-Schmidt inner product\n\n(cid:104)A + \u03b3I, B + \u00b5I(cid:105)eHS = tr(A\u2217B) + \u03b3\u00b5 = (cid:104)A, B(cid:105)HS + \u03b3\u00b5,\n\n(10)\nunder which the scalar operators are orthogonal to the Hilbert-Schmidt operators. The corresponding\nextended Hilbert-Schmidt norm is given by\neHS = ||A||2\n\nwhere A \u2208 HS(H).\n\n||(A + \u03b3I)||2\n\nHS + \u03b32,\n\nIf dim(H) < \u221e, then we set || ||eHS = || ||HS, with ||(A + \u03b3I)||eHS = ||A + \u03b3I||HS.\nManifold of positive de\ufb01nite unitized Hilbert-Schmidt operators: De\ufb01ne\n\n\u03a3(H) = P(H) \u2229 HR = {A + \u03b3I > 0 : A\u2217 = A, A \u2208 HS(H), \u03b3 \u2208 R}.\n\n(12)\nIf (A + \u03b3I) \u2208 \u03a3(H), then it has a countable spectrum {\u03bbk(A) + \u03b3}dim(H)\nsatisfying \u03bbk + \u03b3 \u2265 MA\nfor some constant MA > 0. Thus (A + \u03b3I)\u22121 exists and is bounded, and log(A + \u03b3I) as de\ufb01ned\nby (5) is well-de\ufb01ned and bounded, with log(A + \u03b3I) \u2208 HR.\n\n(11)\n\nk=1\n\n4\n\n\fThe main results of [15] state that when dim(H) = \u221e, \u03a3(H) is an in\ufb01nite-dimensional Riemann-\nHilbert manifold and the map log : \u03a3(H) \u2192 HR and its inverse exp : HR \u2192 \u03a3(H) are diffeomor-\nphisms. The Riemannian distance between two operators (A + \u03b3I), (B + \u00b5I) \u2208 \u03a3(H) is given by\n(13)\n\nd[(A + \u03b3I), (B + \u00b5I)] = || log[(A + \u03b3I)\u22121/2(B + \u00b5I)(A + \u03b3I)\u22121/2]||eHS.\n\nThis is the in\ufb01nite-dimensional version of the af\ufb01ne-invariant metric (1) 3.\n\n4 Log-Hilbert-Schmidt metric\n\nThis section de\ufb01nes and develops the Log-Hilbert-Schmidt metric, which is the in\ufb01nite-dimensional\ngeneralization of the Log-Euclidean metric (2). The general formulation presented in this section is\nthen applied to RKHS covariance operators in Section 5.\n\n4.1 The general setting\nConsider the following operations on \u03a3(H):\n\n(A + \u03b3I) (cid:12) (B + \u00b5I) = exp(log(A + \u03b3I) + log(B + \u00b5I)),\n\n\u03bb(cid:16) (A + \u03b3I) = exp(\u03bb log(A + \u03b3I)) = (A + \u03b3I)\u03bb, \u03bb \u2208 R.\n\n(14)\n(15)\nVector space structure on \u03a3(H): The key property of the operation (cid:12) is that, unlike the usual\nspace, which is isomorphic to the vector space (HR, +,\u00b7), as shown by the following.\n\noperator product, it is commutative, making (\u03a3(H),(cid:12)) an abelian group and (\u03a3(H),(cid:12),(cid:16)) a vector\nTheorem 1. Under the two operations (cid:12) and(cid:16), (\u03a3(H),(cid:12),(cid:16)) becomes a vector space, with (cid:12)\nacting as vector addition and(cid:16) acting as scalar multiplication. The zero element in (\u03a3(H),(cid:12),(cid:16))\n\nis the identity operator I and the inverse of (A + \u03b3I) is (A + \u03b3I)\u22121. Furthermore, the map\n\n\u03c8 : (\u03a3(H),(cid:12),(cid:16)) \u2192 (HR, +,\u00b7) de\ufb01ned by \u03c8(A + \u03b3I) = log(A + \u03b3I),\n\nis a vector space isomorphism, so that for all (A + \u03b3I), (B + \u00b5I) \u2208 \u03a3(H) and \u03bb \u2208 R,\n\n(16)\n\n(17)\n\n\u03c8((A + \u03b3I) (cid:12) (B + \u00b5I)) = log(A + \u03b3I) + log(B + \u00b5I),\n\n\u03c8(\u03bb(cid:16) (A + \u03b3I)) = \u03bb log(A + \u03b3I),\n\nwhere + and \u00b7 denote the usual operator addition and multiplication operations, respectively.\nMetric space structure on \u03a3(H): Motivated by the vector space isomorphism between\n\n(\u03a3(H),(cid:12),(cid:16)) and (HR, +,\u00b7) via the mapping \u03c8, the following is our generalization of the Log-\n\nEuclidean metric to the in\ufb01nite-dimensional setting.\nDe\ufb01nition 1. The Log-Hilbert-Schmidt distance between two operators (A + \u03b3I) \u2208 \u03a3(H), (B +\n\u00b5I) \u2208 \u03a3(H) is de\ufb01ned to be\n\ndlogHS[(A + \u03b3I), (B + \u00b5I)] =(cid:13)(cid:13)log[(A + \u03b3I) (cid:12) (B + \u00b5I)\u22121](cid:13)(cid:13)eHS .\n\nbased on the one-to-one correspondence between the algebraic structures of (\u03a3(H),(cid:12),(cid:16)) and\n\n(18)\nRemark 1. For our purposes in the current work, we focus on the Log-HS metric as de\ufb01ned above\n(HR, +,\u00b7). An in-depth treatment of the Log-HS metric in connection with the manifold structure of\n\u03a3(H) will be provided in a longer version of the paper.\nThe following theorem shows that the Log-Hilbert-Schmidt distance satis\ufb01es all the axioms of a met-\nric, making (\u03a3(H), dlogHS) a metric space. Furthermore, the square Log-Hilbert-Schmidt distance\ndecomposes uniquely into a sum of a square Hilbert-Schmidt norm plus a scalar term.\nTheorem 2. The Log-Hilbert-Schmidt distance as de\ufb01ned in (18)\nis a metric, making\n(\u03a3(H), dlogHS) a metric space. Let (A + \u03b3I) \u2208 \u03a3(H), (B + \u00b5I) \u2208 \u03a3(H). If dim(H) = \u221e,\nthen there exist unique operators A1, B1 \u2208 HS(H) \u2229 Sym(H) and scalars \u03b31, \u00b51 \u2208 R such that\n\nA + \u03b3I = exp(A1 + \u03b31I), B + \u00b5I = exp(B1 + \u00b51I),\nHS + (\u03b31 \u2212 \u00b51)2.\nlogHS[(A + \u03b3I), (B + \u00b5I)] = (cid:107)A1 \u2212 B1(cid:107)2\nd2\n\nand\n(20)\nIf dim(H) < \u221e, then (19) and (20) hold with A1 = log(A + \u03b3I), B1 = log(B + \u00b5I), \u03b31 = \u00b51 = 0.\n\n(19)\n\n3We give a more detailed discussion of Eqs. (12) and (13) in the Supplementary Material.\n\n5\n\n\fLog-Euclidean metric: Theorem 2 states that when dim(H) < \u221e, we have dlogHS[(A + \u03b3I), (B +\n\u00b5I)] = dlogE[(A + \u03b3I), (B + \u00b5I)]. We have thus recovered the Log-Euclidean metric as a special\ncase of our framework.\n\nHilbert space structure on (\u03a3(H),(cid:12),(cid:16)): Motivated by formula (20), whose right hand side is a\nsquare extended Hilbert-Schmidt distance, we now show that (\u03a3(H),(cid:12),(cid:16)) can be endowed with\n\nan inner product, under which it becomes a Hilbert space.\nDe\ufb01nition 2. Let (A + \u03b3I), (B + \u00b5I) \u2208 \u03a3(H). Let A1, B1 \u2208 HS(H)\u2229 Sym(H) and \u03b31, \u00b51 \u2208 R be\nthe unique operators and scalars, respectively, such that A + \u03b3I = exp(A1 + \u03b31I) and B + \u00b5I =\nexp(B1 + \u00b51I), as in Theorem 2. The Log-Hilbert-Schmidt inner product between (A + \u03b3I) and\n(B + \u00b5I) is de\ufb01ned by\n\n(cid:104)A + \u03b3I, B + \u00b5I(cid:105)logHS = (cid:104)log(A + \u03b3I), log(B + \u00b5I)(cid:105)eHS = (cid:104)A1, B1(cid:105)HS + \u03b31\u00b51.\n\nTheorem 3. The inner product (cid:104) , (cid:105)logHS as given in (21) is well-de\ufb01ned on (\u03a3(H),(cid:12),(cid:16)). En-\ndowed with this inner product, (\u03a3(H),(cid:12),(cid:16),(cid:104) , (cid:105)logHS) becomes a Hilbert space. The correspond-\n\ning Log-Hilbert-Schmidt norm is given by\n\n(21)\n\nIn terms of this norm, the Log-Hilbert-Schmidt distance is given by\n\n||A + \u03b3I||2\n\nlogHS = || log(A + \u03b3I)||2\n\ndlogHS[(A + \u03b3I), (B + \u00b5I)] =(cid:13)(cid:13)(A + \u03b3I) (cid:12) (B + \u00b5I)\u22121(cid:13)(cid:13)logHS .\n\neHS = ||A1||2\n\nHS + \u03b32\n1 .\n\n(22)\n\n(23)\n\n(24)\n(25)\n\nquence of the Hilbert space structure of (\u03a3(H),(cid:12),(cid:16),(cid:104) , (cid:105)logHS) is that it is straightforward to\n\nPositive de\ufb01nite kernels de\ufb01ned with the Log-Hilbert-Schmidt metric: An important conse-\ngeneralize many positive de\ufb01nite kernels on Euclidean space to \u03a3(H) \u00d7 \u03a3(H).\nCorollary 1. The following kernels de\ufb01ned on \u03a3(H) \u00d7 \u03a3(H) are positive de\ufb01nite:\n\nK[(A + \u03b3I), (B + \u00b5I)] = (c + (cid:104)A + \u03b3I, B + \u00b5I(cid:105)logHS)d,\n\nc > 0, d \u2208 N,\nlogHS[(A + \u03b3I), (B + \u00b5I)]/\u03c32), 0 < p \u2264 2.\n\nK[(A + \u03b3I), (B + \u00b5I)] = exp(\u2212dp\n\n4.2 Log-Hilbert-Schmidt metric between regularized positive operators\nFor our purposes in the present work, we focus on the following subset of \u03a3(H):\n\n\u03a3+(H) = {A + \u03b3I : A \u2208 HS(H) \u2229 Sym+(H) , \u03b3 > 0} \u2282 \u03a3(H).\n\n(26)\nExamples of operators in \u03a3+(H) are the regularized covariance operators (C\u03a6(x) + \u03b3I) with \u03b3 > 0.\nIn this case the formulas in Theorems 2 and 3 have the following concrete forms.\nTheorem 4. Assume that dim(H) = \u221e. Let A, B \u2208 HS(H) \u2229 Sym+(H). Let \u03b3, \u00b5 > 0. Then\n\nlogHS[(A + \u03b3I), (B + \u00b5I)] = || log(\nd2\n\n1\n\u03b3\n\nA + I) \u2212 log(\n\n1\n\u00b5\n\nB + I)||2\n\nHS + (log \u03b3 \u2212 log \u00b5)2.\n\n(27)\n\nTheir Log-Hilbert-Schmidt inner product is given by\n\n1\n\u03b3\n\n(cid:104)(A + \u03b3I), (B + \u00b5I)(cid:105)logHS = (cid:104)log(\n\nA + I), log(\n\n(28)\nFinite dimensional case: As a consequence of the differences between the cases dim(H) < \u221e and\ndim(H) = \u221e, we have different formulas for the case dim(H) < \u221e, which depend on dim(H)\nand which are surprisingly more complicated than in the case dim(H) = \u221e.\nTheorem 5. Assume that dim(H) < \u221e. Let A, B \u2208 Sym+(H). Let \u03b3, \u00b5 > 0. Then\n+ I)||2\n\nlogHS[(A + \u03b3I), (B + \u00b5I)] = || log(\nd2\n\n+ I) \u2212 log(\n\nB + I)(cid:105)HS + (log \u03b3)(log \u00b5).\n\n1\n\u00b5\n\nHS\n\nA\n\u03b3\n\nB\n\u00b5\n\n+2(log \u03b3 \u2212 log \u00b5)tr[log(\n\nA\n\u03b3\n\n+ I) \u2212 log(\n\nB\n\u00b5\n\n+ I)] + (log \u03b3 \u2212 log \u00b5)2 dim(H).\n\n(29)\n\nThe Log-Hilbert-Schmidt inner product between (A + \u03b3I) and (B + \u00b5I) is given by\n+ I)(cid:105)HS\n+ I)] + (log \u03b3 log \u00b5) dim(H).\n\n(cid:104)(A + \u03b3I), (B + \u00b5I)(cid:105)logHS = (cid:104)log(\nB\n\u00b5\n\n+ I)] + (log \u00b5)tr[log(\n\n+(log \u03b3)tr[log(\n\n+ I), log(\n\nB\n\u00b5\n\nA\n\u03b3\n\nA\n\u03b3\n\n(30)\n\n6\n\n\f5 Log-Hilbert-Schmidt metric between regularized covariance operators\nLet X be an arbitrary non-empty set. In this section, we apply the general results of Section 4 to\ncompute the Log-Hilbert-Schmidt distance between covariance operators on an RKHS induced by a\npositive de\ufb01nite kernel K on X \u00d7X . In this case, we have explicit formulas for dlogHS and the inner\nproduct (cid:104) , (cid:105)logHS via the corresponding Gram matrices. Let x = [xi]m\ni=1, m \u2208 N,\nbe two data matrices sampled from X and C\u03a6(x), C\u03a6(y) be the corresponding covariance operators\ninduced by the kernel K, as de\ufb01ned in Section 2. Let K[x], K[y], and K[x, y] be the m \u00d7 m\nGram matrices de\ufb01ned by (K[x])ij = K(xi, xj), (K[y])ij = K(yi, yj), (K[x, y])ij = K(xi, yj),\n1 \u2264 i, j \u2264 m. Let A = 1\u221a\n\n\u03b3m \u03a6(x)Jm : Rm \u2192 HK, B = 1\u221a\n\ni=1, y = [yi]m\n\n\u00b5m \u03a6(y)Jm : Rm \u2192 HK, so that\nJmK[x, y]Jm.\n\n1\u221a\n\u03b3\u00b5m\n\n(31)\n\nAT A =\n\n1\n\u03b3m\n\nJmK[x]Jm, BT B =\n\n1\n\u00b5m\n\nJmK[y]Jm, AT B =\n\nLet NA and NB be the numbers of nonzero eigenvalues of AT A and BT B, respectively. Let \u03a3A\nand \u03a3B be the diagonal matrices of size NA \u00d7 NA and NB \u00d7 NB, and UA and UB be the matrices\nof size m \u00d7 NA and m \u00d7 NB, respectively, which are obtained from the spectral decompositions\n(32)\n\nJmK[y]Jm = UB\u03a3BU T\nB .\nIn the following, let \u25e6 denote the Hadamard (element-wise) matrix product. De\ufb01ne\n\nJmK[x]Jm = UA\u03a3AU T\nA ,\n\n1\n\u00b5m\n\n1\n\u03b3m\n\nCAB = 1T\n\nNA\n\nlog(INA + \u03a3A)\u03a3\u22121\n\nA (U T\n\nA AT BUB \u25e6 U T\n\nA AT BUB)\u03a3\u22121\n\nTheorem 6. Assume that dim(HK) = \u221e. Let \u03b3 > 0, \u00b5 > 0. Then\n\nB log(INB + \u03a3B)1NB .\n\nd2\nlogHS[(C\u03a6(x) + \u03b3I), (C\u03a6(y) + \u00b5I)] = tr[log(INA + \u03a3A)]2 + tr[log(INB + \u03a3B)]2\n\u22122CAB + (log \u03b3 \u2212 log \u00b5)2.\n\nThe Log-Hilbert-Schmidt inner product between (C\u03a6(x) + \u03b3I) and (C\u03a6(y) + \u00b5I) is\n\n(cid:104)(C\u03a6(x) + \u03b3I), (C\u03a6(y) + \u00b5I)(cid:105)logHS = CAB + (log \u03b3)(log \u00b5).\n\nTheorem 7. Assume that dim(HK) < \u221e. Let \u03b3 > 0, \u00b5 > 0. Then\nlogHS[(C\u03a6(x) + \u03b3I), (C\u03a6(y) + \u00b5I)] = tr[log(INA + \u03a3A)]2 + tr[log(INB + \u03a3B)]2 \u2212 2CAB\nd2\n\n(33)\n\n(34)\n\n(35)\n\n+2(log\n\n\u03b3\n\u00b5\n\n)(tr[log(INA + \u03a3A)] \u2212 tr[log(INB + \u03a3B)]) + (log\n\n)2 dim(HK). (36)\n\n\u03b3\n\u00b5\n\nThe Log-Hilbert-Schmidt inner product between (C\u03a6(x) + \u03b3I) and (C\u03a6(y) + \u00b5I) is\n(cid:104)(C\u03a6(x) + \u03b3I), (C\u03a6(y) + \u00b5I)(cid:105)logHS = CAB + (log \u00b5)tr[log(INA + \u03a3A)]\n+(log \u03b3)tr[log(INB + \u03a3B)] + (log \u03b3 log \u00b5) dim(HK).\n\n(37)\n\n6 Experimental results\n\nThis section demonstrates the empirical performance of the Log-HS metric on the task of multi-\ncategory image classi\ufb01cation. For each input image, the original features extracted from the image\nare implicitly mapped into the in\ufb01nite-dimensional RKHS induced by the Gaussian kernel. The co-\nvariance operator de\ufb01ned on the RKHS is called the GaussianCOV and is used as the representation\nfor the image. In a classi\ufb01cation algorithm, the distance between two images is the Log-HS distance\nbetween their corresponding GaussianCOVs. This is compared with the directCOV representation,\nthat is covariance matrices de\ufb01ned using the original input features. In all of the experiments, we\nemployed LIBSVM [7] as the classi\ufb01cation method. The following algorithms were evaluated in\nour experiments: Log-E (directCOV and Gaussian SVM using the Log-Euclidean metric), Log-HS\n(GaussianCOV and Gaussian SVM using the Log-HS metric), Log-HS\u2206 (GaussianCOV and SVM\nwith the Laplacian kernel K(x, y) = exp(\u2212||x\u2212y||\n)). For all experiments, the kernel parameters\nwere chosen by cross validation, while the regularization parameters were \ufb01xed to be \u03b3 = \u00b5 = 10\u22128.\nWe also compare with empirical results by the different algorithms in [10], namely J-SVM and S-\nSVM (SVM with the Jeffreys and Stein divergences between directCOVs, respectively), JH-SVM\nand SH-SVM (SVM with the Jeffreys and Stein divergences between GaussianCOVs, respectively),\nand results of the Covariance Discriminant Learning (CDL) technique of [25], which can be consid-\nered as the state-of-the-art for COV-based classi\ufb01cation. All results are reported in Table1.\n\n\u03c3\n\n7\n\n\fTable 1: Results over all the datasets\n\nKTH-TIPS2b (RGB)\n\nFish\n\nMethods\n\nLog-HS\nLog-HS\u2206\n\nSH-SVM[10]\nJH-SVM[10]\n\nLog-E\n\nS-SVM[10]\nJ-SVM[10]\nCDL [25]\n\nKylberg texture\n92.58%(\u00b11.23)\n92.56%(\u00b11.26)\n91.36%(\u00b11.27)\n91.25%(\u00b11.33)\n87.49%(\u00b11.54)\n81.27%(\u00b11.07)\n82.19%(\u00b11.30)\n79.87%(\u00b11.06)\n\nKTH-TIPS2b\n81.91%(\u00b13.3)\n81.50%(\u00b13.90)\n80.10%(\u00b14.60)\n79.90%(\u00b13.80)\n74.11%(\u00b17.41)\n78.30%(\u00b14.84)\n74.70%(\u00b12.81)\n76.30%(\u00b15.10)\n\nV\nO\nC\nn\na\ni\ns\ns\nu\na\nG\n\nV\nO\nC\nt\nc\ne\nr\ni\nd\n\n-\n-\n\n-\n-\n-\n\n79.94%(\u00b14.6)\n77.53%(\u00b15.2)\n\n56.74%(\u00b12.87)\n56.43%(\u00b13.02)\n\n74.13%(\u00b16.1)\n\n42.70%(\u00b13.45)\n\n-\n-\n\n-\n-\n-\n\n(cid:12)(cid:12)(cid:3), where\n(cid:12)(cid:12) are the 20 Gabor \ufb01lters at 4 orientations and 5\n\n(cid:12)(cid:12) , . . .(cid:12)(cid:12)G4,5\n\nx,y\n\nx,y\n\nx,y\n\nTexture classi\ufb01cation: For this task, we used the Kylberg texture dataset [13], which contains\n28 texture classes of different natural and man-made surfaces, with each class consisting of 160\nimages. For this dataset, we followed the validation protocol of [10], where each image is resized\nto a dimension of 128 \u00d7 128, with m = 1024 observations computed on a coarse grid (i.e., every\n4 pixels in the horizontal and vertical direction). At each point, we extracted a set of n = 5 low-\nlevel features F(x, y) = [Ix,y,|Ix| ,|Iy| ,|Ixx| ,|Iyy|] , where I, Ix, Iy, Ixx and Iyy, are the intensity,\n\ufb01rst- and second-order derivatives of the texture image. We randomly selected 5 images in each class\nfor training and used the remaining ones as test data, repeating the entire procedure 10 times. We\nreport the mean and the standard deviation values for the classi\ufb01cation accuracies for the different\nexperiments over all 10 random training/testing splits.\nMaterial classi\ufb01cation: For this task, we used the KTH-TIPS2b dataset [6], which contains images\nof 11 materials captured under 4 different illuminations, in 3 poses, and at 9 scales. The total number\nof images per class is 108. We applied the same protocol as used for the previous dataset [10],\n\nextracting 23 low-level dense features: F(x, y) = (cid:2)Rx,y, Gx,y, Bx,y,(cid:12)(cid:12)G0,0\nRx,y, Gx,y, Bx,y are the color intensities and(cid:12)(cid:12)Go,s\n\nscales. We report the mean and the standard deviation values for all the 4 splits of the dataset.\nFish recognition: The third dataset used is the Fish Recognition dataset [5]. The \ufb01sh data are\nacquired from a live video dataset resulting in 27370 veri\ufb01ed \ufb01sh images. The whole dataset is\ndivided into 23 classes. The number of images per class ranges from 21 to 12112, with a medium\nresolution of roughly 150 \u00d7 120 pixels. The signi\ufb01cant variations in color, pose and illumination\ninside each class make this dataset very challenging. We apply the same protocol as used for the\nprevious datasets, extracting the 3 color intensities from each image to show the effectiveness of our\nmethod: F(x, y) = [Rx,y, Gx,y, Bx,y]. We randomly selected 5 images from each class for training\nand 15 for testing, repeating the entire procedure 10 times.\nDiscussion of results: As one can observe in Table1, in all of the datasets, the Log-HS framework,\noperating on GaussianCOVs, signi\ufb01cantly outperforms approaches based on directCOVs computed\nusing the original input features, including those using Log-Euclidean, Stein and Jeffreys diver-\ngences. Across all datasets, our improvement over the Log-Euclidean metric is up to 14% in ac-\ncuracy. This is consistent with kernel-based learning theory, because GaussianCOVs, de\ufb01ned on\nthe in\ufb01nite-dimensional RKHS, can better capture nonlinear input correlations than directCOVs, as\nwe expected. To the best of our knowledge, our results in the Texture and Material classi\ufb01cation\nexperiments are the new state of the art results for these datasets. Furthermore, our results, which\nare obtained using a theoretically rigorous framework, also consistently outperform those of [10].\nThe computational complexity of our framework, its two-layer kernel machine interpretation, and\nother discussions are given in the Supplementary Material.\nConclusion and future work\nWe have presented a novel mathematical and computational framework, namely Log-Hilbert-\nSchmidt metric, that generalizes the Log-Euclidean metric between SPD matrices to the in\ufb01nite-\ndimensional setting. Empirically, on the task of image classi\ufb01cation, where each image is repre-\nsented by an in\ufb01nite-dimensional RKHS covariance operator, the Log-HS framework substantially\noutperforms other approaches based on covariance matrices computed directly on the original input\nfeatures. Given the widespread use of covariance matrices, we believe that the Log-HS framework\ncan be potentially useful for many problems in machine learning, computer vision, and other ap-\nplications. Many more properties of the Log-HS metric, along with further applications, will be\nreported in a longer version of the current paper and in future work.\n\n8\n\n\fReferences\n[1] E. Andruchow and A. Varela. Non positively curved metric in the space of positive de\ufb01nite in\ufb01nite\n\nmatrices. Revista de la Union Matematica Argentina, 48(1):7\u201315, 2007.\n\n[2] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric means in a novel vector space structure on\n\nsymmetric positive-de\ufb01nite matrices. SIAM J. on Matrix An. and App., 29(1):328\u2013347, 2007.\n\n[3] R. Bhatia. Positive De\ufb01nite Matrices. Princeton University Press, 2007.\n[4] D. A. Bini and B. Iannazzo. Computing the Karcher mean of symmetric positive de\ufb01nite matrices. Linear\n\nAlgebra and its Applications, 438(4):1700\u20131710, 2013.\n\n[5] B. J. Boom, J. He, S. Palazzo, P. X. Huang, C. Beyan, H.-M. Chou, F.-P. Lin, C. Spampinato, and R. B.\nFisher. A research tool for long-term and continuous analysis of \ufb01sh assemblage in coral-reefs using\nunderwater camera footage. Ecological Informatics, in press, 2013.\n\n[6] B. Caputo, E. Hayman, and P. Mallikarjuna. Class-speci\ufb01c material categorisation.\n\n1597\u20131604, 2005.\n\nIn ICCV, pages\n\n[7] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst.\n\nTechnol., 2(3):27:1\u201327:27, May 2011.\n\n[8] A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos. Jensen-Bregman LogDet divergence with\n\napplication to ef\ufb01cient similarity search for covariance matrices. TPAMI, 35(9):2161\u20132174, 2013.\n\n[9] I.L. Dryden, A. Koloydenko, and D. Zhou. Non-Euclidean statistics for covariance matrices, with appli-\n\ncations to diffusion tensor imaging. Annals of Applied Statistics, 3:1102\u20131123, 2009.\n\n[10] M. Harandi, M. Salzmann, and F. Porikli. Bregman divergences for in\ufb01nite dimensional covariance\n\nmatrices. In CVPR, 2014.\n\n[11] S. Jayasumana, R. Hartley, M. Salzmann, Hongdong Li, and M. Harandi. Kernel methods on the Rieman-\n\nnian manifold of symmetric positive de\ufb01nite matrices. In CVPR, 2013.\n\n[12] B. Kulis, M. A. Sustik, and I. S. Dhillon. Low-rank kernel learning with Bregman matrix divergences.\n\nThe Journal of Machine Learning Research, 10:341\u2013376, 2009.\n\n[13] G. Kylberg. The Kylberg texture dataset v. 1.0. External report (Blue series) 35, Centre for Image\n\nAnalysis, Swedish University of Agricultural Sciences and Uppsala University, 2011.\n\n[14] G. Larotonda. Geodesic Convexity, Symmetric Spaces and Hilbert-Schmidt Operators. PhD thesis, Uni-\n\nversidad Nacional de General Sarmiento, Buenos Aires, Argentina, 2005.\n\n[15] G. Larotonda. Nonpositive curvature: A geometrical approach to Hilbert\u2013Schmidt operators. Differential\n\nGeometry and its Applications, 25:679\u2013700, 2007.\n\n[16] J. D. Lawson and Y. Lim. The geometric mean, matrices, metrics, and more. The American Mathematical\n\nMonthly, 108(9):797\u2013812, 2001.\n\n[17] P. Li, Q. Wang, W. Zuo, and L. Zhang. Log-Euclidean kernels for sparse representation and dictionary\n\nlearning. In ICCV, 2013.\n\n[18] G.D. Mostow. Some new decomposition theorems for semi-simple groups. Memoirs of the American\n\nMathematical Society, 14:31\u201354, 1955.\n\n[19] X. Pennec, P. Fillard, and N. Ayache. A Riemannian framework for tensor computing.\n\nJournal of Computer Vision, 66(1):41\u201366, 2006.\n\nInternational\n\n[20] W.V. Petryshyn. Direct and iterative methods for the solution of linear operator equations in Hilbert\n\nspaces. Transactions of the American Mathematical Society, 105:136\u2013175, 1962.\n\n[21] B. Sch\u00a8olkopf, A. Smola, and K.-R. M\u00a8uller. Nonlinear component analysis as a kernel eigenvalue problem.\n\nNeural Comput., 10(5), July 1998.\n\n[22] S. Sra. A new metric on the manifold of kernel matrices with application to matrix geometric means. In\n\nNIPS, 2012.\n\n[23] D. Tosato, M. Spera, M. Cristani, and V. Murino. Characterizing humans on Riemannian manifolds.\n\nTPAMI, 35(8):1972\u20131984, Aug 2013.\n\n[24] O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection via classi\ufb01cation on Riemannian manifolds. TPAMI,\n\n30(10):1713\u20131727, 2008.\n\n[25] R. Wang, H. Guo, L. S. Davis, and Q. Dai. Covariance discriminative learning: A natural and ef\ufb01cient\n\napproach to image set classi\ufb01cation. In CVPR, pages 2496\u20132503, 2012.\n\n[26] S. K. Zhou and R. Chellappa. From sample similarity to ensemble similarity: Probabilistic distance\n\nmeasures in reproducing kernel Hilbert space. TPAMI, 28(6):917\u2013929, 2006.\n\n9\n\n\f", "award": [], "sourceid": 266, "authors": [{"given_name": "Minh", "family_name": "Ha Quang", "institution": "Istituto Italiano di Tecnologia"}, {"given_name": "Marco", "family_name": "San Biagio", "institution": "IIT - ISTITUTO ITALIANO DI TECNOLOGIA"}, {"given_name": "Vittorio", "family_name": "Murino", "institution": "Istituto Italiano di Tecnologia"}]}