#### Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric

Part of Advances in Neural Information Processing Systems 28 (NIPS 2015)

#### Authors

*Vivien Seguy, Marco Cuturi*

#### Abstract

We consider in this work the space of probability measures $P(X)$ on a Hilbert space $X$ endowed with the 2-Wasserstein metric. Given a finite family of probability measures in $P(X)$, we propose an iterative approach to compute geodesic principal components that summarize efficiently that dataset. The 2-Wasserstein metric provides $P(X)$ with a Riemannian structure and associated concepts (Fr\'echet mean, geodesics, tangent vectors) which prove crucial to follow the intuitive approach laid out by standard principal component analysis. To make our approach feasible, we propose to use an alternative parameterization of geodesics proposed by \citet[\S 9.2]{ambrosio2006gradient}. These \textit{generalized} geodesics are parameterized with two velocity fields defined on the support of the Wasserstein mean of the data, each pointing towards an ending point of the generalized geodesic. The resulting optimization problem of finding principal components is solved by adapting a projected gradient descend method. Experiment results show the ability of the computed principal components to capture axes of variability on histograms and probability measures data.