{"title": "Riemannian batch normalization for SPD neural networks", "book": "Advances in Neural Information Processing Systems", "page_first": 15489, "page_last": 15500, "abstract": "Covariance matrices have attracted attention for machine learning applications due\nto their capacity to capture interesting structure in the data. The main challenge\nis that one needs to take into account the particular geometry of the Riemannian\nmanifold of symmetric positive definite (SPD) matrices they belong to. In the con-\ntext of deep networks, several architectures for these matrices have recently been\nproposed. In our article, we introduce a Riemannian batch normalization (batch-\nnorm) algorithm, which generalizes the one used in Euclidean nets. This novel\nlayer makes use of geometric operations on the manifold, notably the Riemannian\nbarycenter, parallel transport and non-linear structured matrix transformations. We\nderive a new manifold-constrained gradient descent algorithm working in the space\nof SPD matrices, allowing to learn the batchnorm layer. We validate our proposed\napproach with experiments in three different contexts on diverse data types: a\ndrone recognition dataset from radar observations, and on emotion and action\nrecognition datasets from video and motion capture data. Experiments show that\nthe Riemannian batchnorm systematically gives better classification performance\ncompared with leading methods and a remarkable robustness to lack of data.", "full_text": "Riemannian batch normalization for SPD neural\n\nnetworks\n\nDaniel Brooks\n\nThales Land and Air Systems, BU ARC\n\nLimours, FRANCE\n\nSorbonne Universit\u00e9, CNRS, LIP6\n\nParis, FRANCE\n\nSorbonne Universit\u00e9, CNRS, LIP6\n\nThales Land and Air Systems, BU ARC\n\nThales Land and Air Systems, BU ARC\n\nSorbonne Universit\u00e9, CNRS, LIP6\n\nOlivier Schwander\n\nParis, FRANCE\n\nJean-Yves Schneider\n\nLimours, FRANCE\n\nFr\u00e9d\u00e9ric Barbaresco\n\nLimours, FRANCE\n\nMatthieu Cord\n\nParis, FRANCE\n\nAbstract\n\nCovariance matrices have attracted attention for machine learning applications due\nto their capacity to capture interesting structure in the data. The main challenge\nis that one needs to take into account the particular geometry of the Riemannian\nmanifold of symmetric positive de\ufb01nite (SPD) matrices they belong to. In the con-\ntext of deep networks, several architectures for these matrices have recently been\nproposed. In our article, we introduce a Riemannian batch normalization (batch-\nnorm) algorithm, which generalizes the one used in Euclidean nets. This novel\nlayer makes use of geometric operations on the manifold, notably the Riemannian\nbarycenter, parallel transport and non-linear structured matrix transformations. We\nderive a new manifold-constrained gradient descent algorithm working in the space\nof SPD matrices, allowing to learn the batchnorm layer. We validate our proposed\napproach with experiments in three different contexts on diverse data types: a\ndrone recognition dataset from radar observations, and on emotion and action\nrecognition datasets from video and motion capture data. Experiments show that\nthe Riemannian batchnorm systematically gives better classi\ufb01cation performance\ncompared with leading methods and a remarkable robustness to lack of data.\n\n1\n\nIntroduction and related works\n\nCovariance matrices are ubiquitous in any statistical related \ufb01eld but their direct usage as a representa-\ntion of the data for machine learning is less common. However, it has proved its usefulness in a variety\nof applications: object detection in images [46], analysis of Magnetic Resonance Imaging (MRI)\ndata [41], classi\ufb01cation of time-series for Brain-Computer Interfaces [8] (BCI). It is particularly\ninteresting in the case of temporal data since a global covariance matrix is a straightforward way\nto capture and represent the temporal \ufb02uctuations of data points of different lengths. The main\ndif\ufb01culty is that these matrices, which are symmetric positive de\ufb01nite (SPD), cannot be seen as\npoints in a Euclidean space: the set of SPD matrices is a curved Riemannian manifold, thus tools\nfrom non-Euclidean geometry must be used; see [10] for a plethora of theoretical justi\ufb01cations and\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fproperties on the matter. For this reason most of classi\ufb01cation methods (which implicitly make the\nhypothesis of a Euclidean input space) cannot be used successfully.\nInterestingly, relatively simple machine learning techniques can produce state-of-art results as soon as\nthe particular Riemannian geometry is taken into account. This is the case for BCI: [8, 7] use nearest\nbarycenter (but with Riemannian barycenter) and SVM (but on the tangent space of the barycenter of\nthe data points) to successfully classify covariances matrices computed on electroencephalography\nmultivariate signals (EEG); in the same \ufb01eld,[51] propose kernel methods for metric learning on the\nSPD manifold . Another example is in MRI, where [41, 4] develop a k-nearest neighbors algorithm\nusing a Riemannian distance. Motion recognition from motion skeletal data also bene\ufb01ts from\nRiemannian geometry, as exposed in [16], [30] and [29]. In the context of neural networks, an\narchitecture (SPDNet) speci\ufb01cally adapted for these matrices has been proposed [28]. The overall\naspect is similar to a classical (Euclidean) network (transformations, activations and a \ufb01nal stage of\nclassi\ufb01cation) but each layer processes a point on the SPD manifold; the \ufb01nal layer transforms the\nfeature manifold to a Euclidean space for further classi\ufb01cation. More architectures have followed,\nproposing alternatives to the basic building blocks: in [23] and [27], a more lightweight transformation\nlayer is proposed; in [52] and [18], the authors propose alternate convolutional layers, respectively\nbased on multi-channel SPD representation and Riemannian means; a recurrent model is further\nproposed in [19]; in [37] and [36], an approximate matrix square-root layer replaces the \ufb01nal\nEuclidean projection to lighten computational complexity. In [15], a SPD neural network is appended\nto a fully-convolutional net to improve on performance and robustness to data scarcity. All in all,\nmost of the developments focus on improving or modifying existing blocks in an effort to converge to\ntheir most relevant form, both theoretically and practically; in this work, we propose a new building\nblock for SPD neural networks, inspired by the well-known and well-used batch normalization\nlayer [31]. This layer makes use of batch centering and biasing, operations which need to be\nde\ufb01ned on the SPD manifold. As an additional, independent SPD building block, this novel layer\nis agnostic to the particular way the other layers are computed, and as such can \ufb01t into any of the\nabove architectures. Throughout the paper we choose to focus on the original architecture proposed\nin [28]. Although the overall structure of the original batchnorm is preserved, its generalization to\nSPD matrices requires geometric tools on the manifold, both for the forward and backward pass. In\nthis study, we further assess the particular interest of batch-normalized SPD nets in the context of\nlearning on scarce data with lightweight models: indeed, many \ufb01elds are faced with costly, private\nor evasive data, which strongly motivates the exploration of architectures naturally resilient to such\nchallenging situations. Medical imagery data is well-known to face these issues [41], as is the \ufb01eld\nof drone radar classi\ufb01cation [14], which we study in this work: indeed, radar signal acquisition is\nprohibitively expensive, the acquired data is usually of con\ufb01dential nature, and drone classi\ufb01cation\nin particular is plagued with an ever-changing pool of targets, which we can never reasonably hope\nto encapsulate in comprehensive datasets. Furthermore, hardware integration limitations further\nmotivate the development of lightweight models based on a powerful representation of the data. As\nsuch, our contributions are the following:\n\n\u2022 a Riemannian batch normalization layer for SPD neural networks, respecting the manifold\u2019s\n\ngeometry;\n\n\u2022 a generalized gradient descent allowing to learn the batchnorm layer;\n\u2022 extensive experimentations on three datasets from three different \ufb01elds, (experiments are\nmade reproducible with our open-source PyTorch library, released along with the article).\n\nOur article is organized as follows: we \ufb01rst recall the essential required tools of manifold geometry;\nwe then proceed to describe our proposed Riemannian batchnorm algorithm; next, we devise the\nprojected gradient descent algorithm for learning the batchnorm; \ufb01nally, we validate experimentally\nour proposed architecture.\n\n2 Geometry on the manifold of SPD matrices\n\nWe start by recalling some useful geometric notions on the SPD manifold, noted S +\u2217 in the following.\n\n2\n\n\f2.1 Riemannian metrics on SPD matrices\n\nIn a general setting, a Riemannian distance \u03b4R(P1, P2) between two points P1 and P2 on a manifold\nis de\ufb01ned as the length of the geodesic \u03b3P1\u2192P2, i.e. the shortest parameterized curve \u03be(t), linking\nthem:\n\n(cid:90) 1\n\n\u03b4R(P1, P2) =\n\n\u03be | (\u03be(0)=P1,\u03be(1)=P2)\n\ninf\n\n0\n\nds(t)dt\n\n(1)\n\nds(t)2 = \u02d9\u03be(t)T F\u03be(t) \u02d9\u03be(t)\n\nIn the equation above, ds is the in\ufb01nitesimal distance between two close points and F is the metric\n\u02d9\u03be is the velocity of the curve,\ntensor, which de\ufb01nes a local metric at each point on the manifold.\nsometimes noted d\u03be. For manifolds of exponential family distributions, F is none other than the\nFisher information matrix (FIM) (the inverse of which de\ufb01nes well-known Cramer-Rao bound),\nwhich is the Hessian matrix of the entropy. This connection between entropy and differential metrics\nwas \ufb01rst made in 1945 by C.R. Rao [42] and in 1943 by M. Fr\u00e9chet [26], and further axiomatized\nin 1965 by N.N. Chentsov [17]. Then, in a 1976 con\ufb01dential report cited in [5], S.T. Jensen derived\n2 tr(\u03be\u22121 \u02d9\u03be\u03be\u22121 \u02d9\u03be).\nthe in\ufb01nitesimal distance between two centered multivariate distributions ds(\u03be)2 = 1\nSuch distributions being de\ufb01ned entirely by the covariance matrix, they are isomorphic to the SPD\nmanifold, so the integration of ds along the geodesic leads to the globally-de\ufb01ned natural distance on\nS +\u2217 [38], also called af\ufb01ne-invariant Riemannian metric (AIRM) [41], which can be expressed using\nthe standard Frobenius norm || \u00b7 ||F :\n\n\u03b4R(P1, P2) =\n\n||log(P\n\n\u2212 1\n1 P2P\n\n2\n\n\u2212 1\n1\n\n2\n\n)||F\n\n1\n2\n\n(2)\n\nThe interested reader may note that while the above metric is the correct one from the information\ngeometric viewpoint, it is notoriously computation-heavy. Other metrics or divergences, either closely\napproximate it or provide an alternate theoretical apporach, while contributing the highly desirable\nproperty of lightweight computational complexity, especially in the modern context of machine\nlearning. Notable examples may include the usage of the Fisher-Bures metric [45], the Bregman\ndivergence [11, 44, 6], and optimal transport [3].\nAnother matter of importance is the de\ufb01nition of the natural mappings to and from the manifold and\nits tangent bundle, which groups the tangent Euclidean spaces at each point in the manifold. At any\ngiven reference point P0 \u2208 S +\u2217 , we call logarithmic mapping LogP0 of another point P \u2208 S +\u2217 at P0\nthe corresponding vector S in the tangent space TP0 at P0. The inverse operation is the exponential\nmapping ExpP0. In S +\u2217 , both mappings (not to be confused with the matrix log and exp functions)\nare known in closed form [2]:\n\n\u2200S \u2208 TP0 , ExpP0(S) = P\n\u2200P \u2208 S +\u2217 , LogP0 (P ) = P\n\n1\n2\n\n0 exp(P\n\n1\n2\n\n0 log(P\n\n2\n\n2\n\n\u2212 1\n\u2212 1\n0 SP\n0\n\u2212 1\n\u2212 1\n0 P P\n0\n\n2\n\n2\n\n1\n2\n\n0 \u2208 S +\u2217\n)P\n0 \u2208 TP0\n\n)P\n\n1\n2\n\n(3a)\n\n(3b)\n\n2.2 Riemannian barycenter\n\n(cid:80)\n\nThe \ufb01rst step of the batchnorm algorithm is the computation of batch means; it may be possible to\ni\u2264N Pi of a batch B of N SPD matrices {Pi}i\u2264N , we will rather use\nuse the arithmetic mean 1\nN\nthe more geometrically appropriate Riemannian barycenter G , also known as the Fr\u00e9chet mean [48] ,\nwhich we note Bar({Pi}i\u2264N ) or Bar(B). The Riemannian barycenter has shown strong theoretical\nand practical interest in Riemannian data analysis [41], which justi\ufb01es its usage in this context. By\nde\ufb01nition, G is the point on the manifold that minimizes inertia in terms of the Riemannian metric\nde\ufb01ned in equation 2. The de\ufb01nition is trivially extensible to a weighted Riemannian barycenter, noted\nBarw({Pi}i\u2264N ) or Barw(B), where the weights w := {wi}i\u2264N respect the convexity constraint:\n\nG = Barw({Pi}i\u2264N ) := arg min\nG\u2208S+\u2217\n\nwi \u03b42\n\nR(G, Pi) , with\n\ni\u2264N wi = 1\n\n(4)\n\n(cid:26)wi \u2265 0(cid:80)\n\nN(cid:88)\n\ni=1\n\n3\n\n\fFigure 1: Illustration of one iteration of the Karcher \ufb02ow [34].\n\nWhen N = 2, i.e. when w = {w, 1 \u2212 w}, a closed-form solution exists, which exactly corresponds\nto the geodesic between two points P1 and P2, parameterized by w \u2208 [0, 1] [12]:\n\nBar(w,1\u2212w)(P1, P2) = P\n\n1\n2\n2\n\n\u2212 1\n2 P1P\n\n2\n\n\u2212 1\n2\n\n2\n\n1\n2\n\n2 , with w \u2265 0\n\nP\n\n(5)\n\n(cid:0)P\n\n(cid:1)w\n\nUnfortunately, when N > 2, the solution to the minimization problem is not known in closed-form:\nthus G is usually computed using the so-called Karcher \ufb02ow algorithm [34, 49] , which we illustrate\nin Figure 1 . In short, the Karcher \ufb02ow is an iterative process in which data points projected using the\nlogarithmic mapping (equation 3b) are averaged in tangent space and mapped back to the manifold\nusing the exponential mappings (equation 3a) , with a guaranteed convergence on a manifold with\nconstant negative curvature, which is the case for S +\u2217 [34]. The initialization of G is arbitrary, but\na reasonable choice is the arithmetic mean. A key point is that convergence is guaranteed on a\nmanifold with constant negative curvature, which is the case for the SPD manifold S +\u2217 [34]. Another\npoint of interest is that selecting K = 1 (that is, only one iteration of the \ufb02ow) and \u03b1 = 1 (unit\nstep size) in the Karcher algorithm , corresponds exactly to the barycenter from the Log-Euclidean\nmetric viewpoint [41]. We actually use this setting in the layer: as the batch barycenter is but a noisy\nestimation of the true barycenter, a lax approximation is suf\ufb01cient, and also allows for much faster\ninference.\n\n2.3 Centering SPD matrices using parallel transport\nThe Euclidean batchnorm involves centering and biasing the batch B, which is done via subtraction\nand addition. However on a curved manifold, there is no such group structure in general, so these\nseemingly basic operations are ill-de\ufb01ned. To shift SPD matrices around their mean G or towards\na bias parameter G, we propose to rather use parallel transport on the manifold [2]. In short, the\nparallel transport (PT) operator \u0393P1\u2192P2(S) of a vector S \u2208 TP1 in the tangent plane at P1, between\nP1, P2 \u2208 S +\u2217 de\ufb01nes the path from P1 to P2 such that S remains parallel to itself in the tangent\nplanes along the path. The geodesic \u03b3P1\u2192P2 is itself a special case of the PT, when S is chosen to be\nthe direction vector \u03b3(cid:48)\n\nP1\u2192P2 (0) from P1 to P2. The expression for PT is known on S +\u2217 :\n\n\u2200S \u2208 TP1, \u0393P1\u2192P2(S) = (P2P \u22121\n\n1\n\n)\n\n1\n\n2 S (P2P \u22121\n\n1\n\n1\n\n2 \u2208 TP2\n\n)\n\n(6)\n\nThe equation above de\ufb01nes PT for tangent vectors, while we wish to transport points on the manifold.\nTo do so, we simply project the data points to the tangent space using the logarithmic mapping ,\nparallel transport the resulting vector from Eq. 6 which we then map back to the manifold using\nexponential mapping . It can be shown (see [47], appendix C for a full proof) that the resulting\noperation, which we call SPD transport, turns out to be exactly the same as the formula above,\nwhich is not an obvious result in itself. By abuse of notation, we also use \u0393P1\u2192P2 to denote the\nSPD transport. Therefore, we can now de\ufb01ne the centering of a batch of matrices {Pi}i\u2264N with\nRiemannian barycenter G as the PT from G to the identity Id, and the biasing of the batch towards a\nparametric SPD matrix G as the PT from Id to G.\n\n4\n\n\fBatch centering and biasing We now have the tools to de\ufb01ne the batch centering and biasing:\n\nCentering from G := Bar(B): \u2200i \u2264 N, \u00afPi = \u0393G\u2192Id (Pi) = G\u2212 1\nBiasing towards parameter G: \u2200i \u2264 N, \u02dcPi = \u0393Id\u2192G( \u00afPi) = G\n\n2 Pi G\u2212 1\n2 \u00afPi G\n\n2\n\n1\n\n1\n2\n\n(7a)\n(7b)\n\n3 Batchnorm for SPD data\n\nIn this section we introduce the Riemannian batch normalization (Riemannian BN, or RBN) algorithm\nfor SPD matrices. We \ufb01rst brie\ufb02y recall the basic architecture of an SPD neural network.\n\n3.1 Basic layers for SPD neural network\n\nThe SPDNet architecture mimics that of classical neural networks with a \ufb01rst stage devoted to\ncompute a pertinent representation of the input data points and a second stage which allows to\nperform the \ufb01nal classi\ufb01cation. The particular structure of S +\u2217 , the manifold of SPD matrices, is taken\ninto account by layers crafted to respect and exploit this geometry. The layers introduced in [28] are\nthreefold:\nThe BiMap (bilinear transformation) layer, analogous to the usual dense layer; the induced dimension\nreduction eases the computational burden often found in learning algorithms on SPD data:\n\nX (l) = W (l)T\n\nP (l\u22121)W (l) with W (l) semi-orthogonal\n\n(8)\n\nThe ReEig (recti\ufb01ed eigenvalues activation) layer, analogous to the ReLU activation; it can also be\nseen as a eigen-regularization, protecting the matrices from degeneracy:\n\nX (l) = U (l) max(\u03a3(l), \u0001In)U (l)T , with P (l) = U (l)\u03a3(l)U (l)T\n\n(9)\n\nThe LogEig (log eigenvalues Euclidean projection) layer:\n\nX (l) = vec( U (l) log(\u03a3(l))U (l)T\n\n) , with again U (l) the eigenspace of P (l)\n\n(10)\n\nThis \ufb01nal layer has no Eucidean counterpart: its purpose is the projection and vectorization of the\noutput feature manifold to a Euclidean space, which allows for further classi\ufb01cation with a traditional\ndense layer. As stated previously, it is possible to envision different formulations for each of the layers\nde\ufb01ned above (see [23, 52, 37] for varied examples). Our following de\ufb01nition of the batchnorm can\n\ufb01t any formulation as it remains an independent layer.\n\n3.2 Statistical distribution on SPD matrices\n\nIn traditional neural nets, batch normalization is de\ufb01ned as the centering and standardization of the\ndata within one batch, followed by the multiplication and addition by parameterized variance and bias,\nto emulate the data sampling from a learnt Gaussian distribution. In order to generalize to batches of\nSPD matrices, we must \ufb01rst de\ufb01ne the notion of Gausian density on S +\u2217 . Although this de\ufb01nition has\nnot yet been settled for good, several approaches have been proposed. In [33], the authors proceed\nby introducing mean and variance as second- and fourth-order tensors. On the other hand, [43]\nderive a scalar variance. In another line of work synthesized in [9], which we adopt in this work,\nthe Gaussian density is derived from the de\ufb01nition of maximum entropy on exponential families\nusing information geometry on the cone of SPD matrices. In this setting, the natural parameter of the\nresulting exponential family is simply the Riemannian mean; in other words, this means the notion of\nvariance, which appears in the Eucidean setting, takes no part in this de\ufb01nition of a Gaussian density\non S +\u2217 . Speci\ufb01cally, such a density p on SPD matrices P of dimension n writes:\n\np(P ) \u221d det(\u03b1 G\u22121)e\u2212tr(\u03b1 G\u22121P ) , with \u03b1 =\n\nn + 1\n\n2\n\n(11)\n\nIn the equation above, G is the Riemannian mean of the distribution. Again, there is no notion of\nvariance: the main consequence is that a Riemannian BN on SPD matrices will only involve centering\nand biasing of the batch.\n\n5\n\n\f3.3 Final batchnorm algorithm\n\nWhile the normalization is done on the current batch during training time, the statistics used in\ninference are computed as running estimations. For instance, the running mean over the training set,\nnoted GS, is iteratively updated at each batch. In a Euclidean setting, this would amount to a weighted\naverage between the batch mean and the current running mean, the weight being a momentum typically\nset to 0.9. The same concept holds for SPD matrices, but the running mean should be a Riemannian\nmean weighted by \u03b7, i.e. Bar(\u03b7,1\u2212\u03b7)(GS , GB), which amounts to transporting the running mean\ntowards the current batch mean by an amount (1 \u2212 \u03b7) along the geodesic. We can now write the full\nRBN algorithm 1. In practice, Riemannian BN is appended after each BiMap layer in the network.\nAlgorithm 1 Riemannian batch normalization on S +\u2217 , training and testing phase\n\nTRAINING PHASE\nRequire: batch of N SPD matrices {Pi}i\u2264N, running mean GS, bias G, momentum \u03b7\n1: GB \u2190 Bar({Pi}i\u2264N )\n2: GS \u2190 Bar\u03b7(GS , GB)\n3: for i \u2264 N do\n\u00afPi \u2190 \u0393GB\u2192Id (Pi)\n4:\n\u02dcPi \u2190 \u0393Id\u2192G( \u00afPi)\n5:\n6: end for\n\n// compute batch mean\n// update running mean\n\n// center batch\n// bias batch\n\nreturn normalized batch { \u02dcPi}i\u2264N\n\nTESTING PHASE\n\nRequire: batch of N SPD matrices {Pi}i\u2264N, \ufb01nal running mean GS, learnt bias G\n1: for i \u2264 N do\n2:\n3:\n4: end for\n\n// center batch using set statistics\n// bias batch using learnt parameter\n\n\u00afPi \u2190 \u0393GS\u2192Id (Pi)\n\u02dcPi \u2190 \u0393Id\u2192G( \u00afPi)\n\nreturn normalized batch { \u02dcPi}i\u2264N\n\n4 Learning the batchnorm\n\nThe speci\ufb01cities of a the proposed batchnorm algorithm are the non-linear manipulation of manifold\nvalues in both inputs and parameters and the use of a Riemannian barycenter. Here we present the two\nresults necessary to correctly \ufb01t the learning of the RBN in a standard back-propagation framework.\n\n4.1 Learning with SPD constraint\n\nThe bias parameter matrix G of the RBN is by construction constrained to the SPD manifold.\nHowever, noting L the network\u2019s loss function, the usual Euclidean gradient \u2202L\n\u2202G, which we note\n\u2202Geucl, has no particular reason to respect this constraint. To enforce it, \u2202Geucl is projected to the\ntangent space of the manifold at G using the manifold\u2019s tangential projection operator \u03a0TG, resulting\nin the tangential gradient \u2202Griem. The update is then obtained by computing the geodesic on the\nSPD manifold emanating from G in the direction \u2202Griem, using the exponential mapping de\ufb01ned in\nequation 3a. Both operators are known in S +\u2217 [50]:\n\n\u2200P, \u03a0TG(P ) = G\n\nP + P T\n\n2\n\nG \u2208 TG \u2282 S +\n\n(12)\n\nWe illustrate this two-step process in Figure 2, explained in detail in [24], which allows to learn the\nparameter in a manifold-constrained fashion. However, this is still not enough for the optimization\nof the layer, as the BN involves not simply G and G, but G 1\n2 , which are structured matrix\nfunctions of G, i.e. which act non-linearly on the matrices\u2019 eigenvalues without affecting its associated\neigenspace. The next subsection deals with the backpropagation through such functions.\n\n2 and G\u2212 1\n\n6\n\n\fFigure 2: Illustration of manifold-constrained gradient update. The Euclidean gradient is projected to\nthe tangent space, then mapped to the manifold.\n\n4.2 Structured matrix backpropagation\nClassically, the functions involved in the chain rule are vector functions in Rn [35], whereas we deal\nhere with structured (symmetric) matrix functions in the S +\u2217 , speci\ufb01cally the square root (\u00b7)\n2 for the\nbias and the inverse square root (\u00b7)\u2212 1\n2 for the barycenter (in equations 7b 7a). A generalization of the\nchain rule to S\u2217\n+ is thus required for the backpropagation through the RBN layer to be correct. Note\nthat a similar requirement applies to the ReEig and LogEig layers, respectively with a threshold and\nlog function. We generically note f a monotonous non-linear function; both (\u00b7)\n2 check\nout this hypothesis. A general formula for the gradient of f, applied on a SPD matrix\u2019 eigenvalues\n(\u03c3i)i\u2264n grouped in \u03a3\u2019s diagonal, was independently developed by [32] and [13]. In short: given the\nfunction P (cid:55)\u2212\u2192 X := f (P ) and the succeeding gradient \u2202L(l+1)\n\n\u2202X , the output gradient \u2202L(l)\n\n2 and (\u00b7)\u2212 1\n\n\u2202P is:\n\n1\n\n1\n\n(cid:18)\n\n(cid:19)\n\n\u2202L(l)\n\u2202P\n\n= U\n\nL (cid:12) (U T (\n\n\u2202L(l+1)\n\n\u2202X\n\n)U )\n\nU T\n\n(13)\n\nThe equation above, also decribed in [39], is called the Daleck\u02d8ii-Kre\u02d8in formula and dates back to\n1956, (but was translated from Russian 9 years later), predating the other formulation by 60 years.\nIt involves the eigenspace U of the input matrix P , and the Loewner matrix L, or \ufb01nite difference\nmatrix de\ufb01ned by:\n\nIn the case at hand,\n2 . We credit [25] for \ufb01rst\nshowing the equivalence between the two cited formulations, of which we expose the most concise.\n\n2 and\n\n= 1\n\n(\u00b7)\u2212 1\n\n2\n\n= \u2212 1\n\n2 (\u00b7)\u2212 3\n\n2 (\u00b7)\u2212 1\n\nIn summary, the Riemannian barycenter (approximation via the Karcher \ufb02ow for a batch of matrices,\nor exact formulation for two matrices), the parallel transport and its extension on the SPD manifold, the\nSPD-constrained gradient descent and the derivation of a non-linear SPD-valued structured function\u2019s\ngradient allow for training and inference of the proposed Riemannian batchnorm algorithm.\n\n7\n\n(cid:40) f (\u03c3i)\u2212f (\u03c3j )\n(cid:18)\n\n\u03c3i\u2212\u03c3j\nf(cid:48)(\u03c3i)\n\nif \u03c3i (cid:54)= \u03c3j\notherwise\n\n(cid:19)(cid:48)\n\n(\u00b7)\n\n1\n2\n\nLij =\n\n(cid:18)\n\n(cid:19)(cid:48)\n\n(14)\n\n\fTable 1: Accuracy comparison of SPDNet, SPDNetBN and FCNs on NATO radar data, in function of\namount of training data.\n\nModel\n# Parameters\nAcc. (all data)\nAcc. (10% data)\n\nSPDNet\n\u223c 500\n\n72.6% \u00b1 0.61\n69.1% \u00b1 0.97\n\n\u223c 500\n\nSPDNetBN\n82.3% \u00b1 0.80\n77.777.777.7% \u00b1 0.95\n\n5 Experiments\n\nFCN\n\n\u223c 10000\n\n88.788.788.7% \u00b1 0.83\n65.6% \u00b1 2.74\n\n\u223c 500\n\n73.4% \u00b1 3.66\n61.1% \u00b1 3.50\n\n-\n\nMRDRM\n69.7% \u00b1 1.12\n67.1% \u00b1 2.17\n\nHere we evaluate the gain in performance of the RBN against the baseline SPDNet on different tasks:\nradar data classi\ufb01cation, emotion recognition from video, and action recognition from motion capture\ndata. We call the depth L of an SPDNet the number of BiMap layers in the network, and denote the\ndimensions as {n0,\u00b7\u00b7\u00b7 , nL}. The vectorized input to the \ufb01nal classi\ufb01cation layer is thus of length\nL. All networks are trained for 200 epochs using SGD with momentum set to 0.9 with a batch size\nn2\nof 30 and learning rate 5e\u22123, 1e\u22122 or 5e\u22122. We provide the data in a pre-processed form alongside\nthe PyTorch [40] code for reproducibility purposes. We call SPDNetBN an SPDNet using RBN after\neach BiMap layer. Finally, we also report performances of shallow learning method on SPD data,\nnamely a minimum Riemannian distance to Riemannian mean scheme (MRDRM), described in [7],\nin order to bring elements of comparison between shallow and deep learning on SPD data.\n\n5.1 Drones recognition\n\nOur \ufb01rst experimental target focuses on drone micro-Doppler [21] radar classi\ufb01cation. First we\nvalidate the usage of our proposed method over a baseline SPDNet, and also compare to state-of-\nthe-art deep learning methods. Then, we study the models\u2019 robustness to lack of data, a challenge\nwhich, as stated previously, plagues the task of radar classi\ufb01cation and also a lot of different tasks.\nExperiments are conducted on a con\ufb01dential dataset of real recordings issued from the NATO\norganization 1 . To spur reproducibility, we also experiment on synthetic, publicly available data.\n\nRadar data description A radar signal is the result of an emitted wave re\ufb02ected on a target; as\nsuch, one data point is a time-series of N values, which can be considered as multiple realizations\nof a locally stationary centered Gaussian process, as done in [20]. The signal is split in windows\nof length n = 20, the series of which a single covariance matrix of size 20 \u2217 20 is sampled from,\nwhich represents one radar data point. The NATO data features 10 classes of drones, whereas the\nsynthetic data is generated by a realistic simulator of 3 different classes of drones following the\nprotocol described in [14]. We chose here to mimick the real dataset\u2019s con\ufb01guration, i.e. we consider\na couple of minutes of continuous recordings per class, which correspond to 500 data points per class.\n\nComparison of SPDNetBN against SPDNet and radar state-of-the-art We test the two\nSPD-based models in a {20, 16, 8}, 2-layer con\ufb01guration for the synthetic data, and in a\n{20, 16, 14, 12, 10, 8}, 5-layer con\ufb01guration for the NATO data, over a 5-fold cross-validation, split\nin a train-test of 75% \u2212 25%. We also wish to compare the Riemannian models to the common\nEuclidean ones, which currently consitute the state-of-the-art in micro-Doppler classi\ufb01cation. We\ncompare two fully convolutional networks (FCN): the \ufb01rst one is used as given in [14]; for the\nsecond one, the number of parameters is set to approximately the same number as for the SPD neural\nnetworks, which amounts to an unusually small deep net. All in all, the SPDNet, SPDNetBN and\nsmall FCN on the one hand, and the full-size FCN on the other hand respectively have approximately\n500 and 10000 parameters. Table 1 reports the average accuracies and variances on the NATO data.\nWe observe a strong gain in performance on the SPDNetBN over the SPDNet and over the small FCN,\nwhich validates the usage of the batchnorm along with the exploitation of the geometric structure\nunderlying the data. All in all, we reach better performance with much fewer parameters.\nFinally, in the interest of convergence analysis, we also report learning curves for the model\u2019s accuracy\nwith and without Riemannian batchnorm in \ufb01gure 3.\n\n1We would like to thank the NATO working group SET245 for providing the drone micro-Doppler database\n\nand allowing for publication of classi\ufb01cation results.\n\n8\n\n\fFigure 3: Test accuracy on the NATO dataset of the SPDNet with and without RBN, measured in\nhours. The RBN exhibits a steeper learning curve. For the same number of epochs, it does take more\ntime overall, but reaches better accuracy much faster, allowing to reduce the number of epochs.\n\nFigure 4: Performance of all models in function of the amount of synthetic radar data. The SPDNetBN\nmodel outperforms the other ones and continues to work even with a little fraction of the train data.\n\nRobustness to lack of data As stated previously, it is of great interest to consider the robustness of\nlearning algorithms when faced with a critically low amount of data. The last line in table 1 shows\nthat when given only 10% of available training data, the SPD-based models remain highly robust to\nthe lack of data while the FCNs plummet. Further, we study robustness on synthetic data, arti\ufb01cially\nvarying the amount of training data while comparing performance over the same test set. As the\nsimulator is unbounded on potential training data, we also increase the initial training set up to double\nits original size. Results are reported in Figure 4. We can conclude from these that the SPDNetBN\nboth exhibits higher robustness to lack of data and performs much better than the state-of-the-art\ndeep method with much fewer parameters. When the available training data allowed skyrockets,\nwe do observe that the FCN comes back to par with the SPDNetBN to the point of outperforming\nit by a small margin in the extremal scenario; in the meantime, the SPDNet lags behind by a large\nmargin to the SPDNetBN, which thus seems to bene\ufb01t strongly from the normalization. In any case,\nthe manifold framework seems well suited in a scarce data learning context, especially considering\nthe introduced normalization layers, which again pinpoints the interest of taking into account the\ngeometric structure of the data, all the while without introducing prior knowledge during training.\n\n5.2 Other experiments\n\nHere we validate the use of the RBN on a broader set of tasks. We \ufb01rst clarify we do not necessarily\nseek state-of-the-art in the general sense for the following tasks, but rather in the speci\ufb01c case of\nthe family of SPD-based methods. Our own implementation (as an open PyTorch library) of the\nSPDNet\u2019s performances match that in [28], ensuring a fair comparison.\n\n9\n\n0.60.650.70.750.80.85-0.100.10.20.30.40.50.60.70.SDPNet with RBNSPDNet without RBNValidationaccuracyTime (hours)100020003000Numberoftrainingexamples708090Accuracy(%)Referenceamountofdata(1500)SPDNetSPDNetBNFCN\fTable 2: Accuracy comparison of SPDNet with and without Riemannian BN on the AFEW dataset.\n{400, 300, 200, 100, 50}\nModel architecture\nSPDNet\nSPDNetBN (ours)\n\n{400, 50}\n29.9%\n34.934.934.9%\n\n{400, 200, 100, 50}\n\n{400, 100, 50}\n\n33.7%\n37.137.137.1%\n\n34.5%\n36.236.236.2%\n\n31.2%\n35.235.235.2%\n\nTable 3: Accuracy comparison of SPDNet with and without Riemannian BN on the HDM05 dataset.\n\nModel architecture\n{93, 30}\n\nSPDNet\n61.6%\u00b11.35\n\nSPDNetBN (ours)\n65.265.265.2% \u00b1 1.15\n\nEmotion recognition In this section we experiment on the AFEW dataset [22], which consists\nof videos depicting 7 classes of emotions; we follow the setup and protocol in [28]. Results for\n4 architectures are summarized in table 2. In comparison, the MRDRM yields a 20.5% accuracy.\nWe observe a consistent improvement using our normalization scheme. This dataset being our\nlargest-scale experiment, we also report the increase in computation time using the RBN, speci\ufb01cally\nfor the deepest net: one training lasted on average 81s for SPDNet, and 88s (+8.6%) for SPDNetBN.\n\nAction recognition In this section we experiment on the HDM05 motion capture dataset. We use\nthe same experimental setup as in [28] results are shown in table 3. Note that all tested models\nexhibit noticeable variance depending on the weights initialization and the initial random split of\nthe dataset; the results displayed were obtained by setting a \ufb01xed seed of 0 for both. In comparison,\nthe MRDRM yields a 27.3% \u00b1 1.06 accuracy. Again, we validate a better performance using the\nbatchnorm.\n\nConclusion\n\nWe proposed a batch normalization algorithm for SPD neural networks, mimicking the orginal\nbatchnorm in Euclidean neural networks. The algorithm makes use of the SPD Riemannian manifold\u2019s\ngeometric structure, namely the Riemannian barycenter, parallel transport, and manifold-constrained\nbackpropagation through non-linear structured functions on SPD matrices. We demonstrate a\nsystematic, and in some cases considerable, performance increase across a diverse range of data\ntypes. An additional observation is the better robustness to lack of data compared to the baseline\nSPD neural network and to a state-of-the-art convolutional network, as well as better performance\nthan a well-used, more traditional Riemannian learning method (the closest-barycenter scheme). The\noverall performances of our proposed SPDNetBN makes it a suitable candidate in learning scenarios\nwhere data is structured, scarce, and where model size is a relevant issue.\n\nReferences\n[1] Motion Database HDM05.\n[2] S.-i. Amari. Information Geometry and Its Applications. Applied Mathematical Sciences.\n\nSpringer Japan, 2016.\n\n[3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein Generative Adversarial Networks. In\n\nInternational Conference on Machine Learning, pages 214\u2013223, July 2017.\n\n[4] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Log-Euclidean metrics for fast and simple\n\ncalculus on diffusion tensors. Magnetic Resonance in Medicine, 56(2):411\u2013421, Aug. 2006.\n\n[5] C. Atkinson and A. F. S. Mitchell. Rao\u2019s Distance Measure. Sankhy\u00afa: The Indian Journal of\n\nStatistics, Series A (1961-2002), 43(3):345\u2013365, 1981.\n\n[6] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman Divergences.\n\nJournal of Machine Learning Research, 6(Oct):1705\u20131749, 2005.\n\n[7] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass Brain\u2013Computer Interface\nIEEE Transactions on Biomedical Engineering,\n\nClassi\ufb01cation by Riemannian Geometry.\n59(4):920\u2013928, Apr. 2012.\n\n[8] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Classi\ufb01cation of covariance matrices\nusing a Riemannian-based kernel for BCI applications. Neurocomputing, 112:172\u2013178, July\n2013.\n\n10\n\n\f[9] F. Barbaresco. Jean-Louis Koszul and the Elementary Structures of Information Geometry. In\nF. Nielsen, editor, Geometric Structures of Information, Signals and Communication Technology,\npages 333\u2013392. Springer International Publishing, Cham, 2019.\n\n[10] R. Bhatia. Positive De\ufb01nite Matrices. Princeton University Press, Princeton, NJ, USA, 2015.\n[11] J.-D. Boissonnat, F. Nielsen, and R. Nock. Bregman Voronoi diagrams. Discrete and Computa-\n\ntional Geometry, page 200, 2010.\n\n[12] S. Bonnabel and R. Sepulchre. Riemannian Metric and Geometric Mean for Positive Semide\ufb01-\nnite Matrices of Fixed Rank. SIAM Journal on Matrix Analysis and Applications, 31(3):1055\u2013\n1070, Jan. 2010.\n\n[13] M. Brodski\u02d8\u0131, J. Dalecki\u02d8\u0131, O. \u00c8\u02d8\u0131dus, I. Iohvidov, M. Kre\u02d8\u0131n, O. Lady\u017eenskaja, V. Lidski\u02d8\u0131, J. Ljubi\u02c7c,\nV. Macaev, A. Povzner, L. Sahnovi\u02c7c, J. \u0160muljan, I. Suharevski\u02d8\u0131, and N. Uralceva. Thirteen\nPapers on Functional Analysis and Partial Differential Equations, volume 47 of American\nMathematical Society Translations: Series 2. American Mathematical Society, Dec. 1965.\n\n[14] D. A. Brooks, O. Schwander, F. Barbaresco, J. Schneider, and M. Cord. Temporal Deep\nLearning for Drone Micro-Doppler Classi\ufb01cation. In 2018 19th International Radar Symposium\n(IRS), pages 1\u201310, June 2018.\n\n[15] D. A. Brooks, O. Schwander, F. Barbaresco, J. Schneider, and M. Cord. Exploring Complex\nTime-series Representations for Riemannian Machine Learning of Radar Data. In ICASSP 2019\n- 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),\npages 3672\u20133676, May 2019.\n\n[16] J. Cavazza, P. Morerio, and V. Murino. When Kernel Methods Meet Feature Learning: Log-\nCovariance Network for Action Recognition From Skeletal Data. In 2017 IEEE Conference on\nComputer Vision and Pattern Recognition Workshops (CVPRW), pages 1251\u20131258, July 2017.\n[17] N. N. Cencov. Statistical Decision Rules and Optimal Inference. American Mathematical Soc.,\n\nApr. 2000. Google-Books-ID: 63CPCwAAQBAJ.\n\n[18] R. Chakraborty, J. Bouza, J. Manton, and B. C. Vemuri. ManifoldNet: A Deep Network\nFramework for Manifold-valued Data. arXiv:1809.06211 [cs], Sept. 2018. arXiv: 1809.06211.\n[19] R. Chakraborty, C.-H. Yang, X. Zhen, M. Banerjee, D. Archer, D. Vaillancourt, V. Singh, and\nB. Vemuri. A Statistical Recurrent Model on the Manifold of Symmetric Positive De\ufb01nite Ma-\ntrices. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett,\neditors, Advances in Neural Information Processing Systems 31, pages 8883\u20138894. Curran\nAssociates, Inc., 2018.\n\n[20] N. Charon and F. Barbaresco. A new approach for target detection in radar images based on\n\ngeometric properties of covariance matrices\u2019 spaces, 2009.\n\n[21] V. C. Chen, F. Li, S.-S. Ho, and H. Wechsler. Micro-Doppler effect in radar: phenomenon, model,\nand simulation study. IEEE Transactions on Aerospace and electronic systems, 42(1):2\u201321,\n2006.\n\n[22] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static facial expression analysis in tough\nconditions: Data, evaluation protocol and benchmark. In 2011 IEEE International Conference\non Computer Vision Workshops (ICCV Workshops), pages 2106\u20132112, Nov. 2011.\n\n[23] Z. Dong, S. Jia, V. C. Zhang, M. Pei, and Y. Wu. Deep Manifold Learning of Symmetric\n\nPositive De\ufb01nite Matrices with Application to Face Recognition. In AAAI, 2017.\n\n[24] A. Edelman, T. Arias, and S. Smith. The Geometry of Algorithms with Orthogonality Con-\n\nstraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303\u2013353, Jan. 1998.\n\n[25] M. Engin, L. Wang, L. Zhou, and X. Liu. DeepKSPD: Learning Kernel-matrix-based SPD\nRepresentation for Fine-grained Image Recognition. arXiv:1711.04047 [cs], Nov. 2017. arXiv:\n1711.04047.\n\n[26] M. Fr\u00e9chet. Sur l\u2019extension de certaines evaluations statistiques au cas de petits echantillons.\nRevue de l\u2019Institut International de Statistique / Review of the International Statistical Institute,\n11(3/4):182\u2013205, 1943.\n\n[27] Z. Gao, Y. Wu, X. Bu, and Y. Jia. Learning a Robust Representation via a Deep Network on\nSymmetric Positive De\ufb01nite Manifolds. arXiv:1711.06540 [cs], Nov. 2017. arXiv: 1711.06540.\n[28] Z. Huang and L. Van Gool. A Riemannian Network for SPD Matrix Learning. arXiv:1608.04233\n\n[cs], Aug. 2016. arXiv: 1608.04233.\n\n[29] Z. Huang, C. Wan, T. Probst, and L. Van Gool. Deep Learning on Lie Groups for Skeleton-based\n\nAction Recognition. arXiv:1612.05877 [cs], Dec. 2016. arXiv: 1612.05877.\n\n[30] Z. Huang, J. Wu, and L. Van Gool. Building Deep Networks on Grassmann Manifolds.\n\narXiv:1611.05742 [cs], Nov. 2016. arXiv: 1611.05742.\n\n11\n\n\f[31] S. Ioffe and C. Szegedy. Batch Normalization: Accelerating Deep Network Training by\n\nReducing Internal Covariate Shift. arXiv:1502.03167 [cs], Feb. 2015. arXiv: 1502.03167.\n\n[32] C. Ionescu, O. Vantzos, and C. Sminchisescu. Matrix Backpropagation for Deep Networks with\nStructured Layers. In 2015 IEEE International Conference on Computer Vision (ICCV), pages\n2965\u20132973, Santiago, Chile, Dec. 2015. IEEE.\n\n[33] N. Jaquier and S. Calinon. Gaussian mixture regression on symmetric positive de\ufb01nite matrices\nmanifolds: Application to wrist motion estimation with sEMG. In 2017 IEEE/RSJ International\nConference on Intelligent Robots and Systems (IROS), pages 59\u201364, Vancouver, BC, Sept. 2017.\nIEEE.\n\n[34] H. Karcher. Riemannian center of mass and molli\ufb01er smoothing. Communications on Pure and\n\nApplied Mathematics, 30(5):509\u2013541, Sept. 1977.\n\n[35] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document\n\nrecognition. Proceedings of the IEEE, 86(11):2278\u20132324, Nov. 1998.\n\n[36] P. Li, J. Xie, Q. Wang, and Z. Gao. Towards Faster Training of Global Covariance Pooling\nNetworks by Iterative Matrix Square Root Normalization. In 2018 IEEE/CVF Conference on\nComputer Vision and Pattern Recognition, pages 947\u2013955, Salt Lake City, UT, June 2018. IEEE.\n[37] Y. Mao, R. Wang, S. Shan, and X. Chen. COSONet: Compact Second-Order Network for Video\n\nFace Recognition. page 16.\n\n[38] G. Marceau-Caron and Y. Ollivier. Natural Langevin Dynamics for Neural Networks. In\nF. Nielsen and F. Barbaresco, editors, Geometric Science of Information, volume 10589, pages\n451\u2013459. Springer International Publishing, Cham, 2017.\n\n[39] F. Nielsen and R. Bhatia, editors. Matrix Information Geometry. Springer-Verlag, Berlin\n\nHeidelberg, 2013.\n\n[40] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,\n\nL. Antiga, and A. Lerer. Automatic differentiation in PyTorch. Oct. 2017.\n\n[41] X. Pennec, P. Fillard, and N. Ayache. A Riemannian Framework for Tensor Computing.\n\nInternational Journal of Computer Vision, 66(1):41\u201366, Jan. 2006.\n\n[42] C. R. Rao. Information and the Accuracy Attainable in the Estimation of Statistical Parameters.\nIn Breakthroughs in Statistics, Springer Series in Statistics, pages 235\u2013247. Springer, New York,\nNY, 1992.\n\n[43] S. Said, L. Bombrun, Y. Berthoumieu, and J. Manton. Riemannian Gaussian Distributions on\nthe Space of Symmetric Positive De\ufb01nite Matrices. arXiv:1507.01760 [math, stat], July 2015.\narXiv: 1507.01760.\n\n[44] A. Siahkamari, V. Saligrama, D. Castanon, and B. Kulis. Learning Bregman Divergences.\n\narXiv:1905.11545 [cs, stat], May 2019. arXiv: 1905.11545.\n\n[45] K. Sun, P. Koniusz, and Z. Wang. Fisher-Bures Adversary Graph Convolutional Networks.\n\narXiv:1903.04154 [cs, stat], Mar. 2019. arXiv: 1903.04154.\n\n[46] O. Tuzel, F. Porikli, and P. Meer. Region Covariance: A Fast Descriptor for Detection and\nClassi\ufb01cation. In Computer Vision \u2013 ECCV 2006, Lecture Notes in Computer Science, pages\n589\u2013600. Springer, Berlin, Heidelberg, May 2006.\n\n[47] O. Yair, M. Ben-Chen, and R. Talmon. Parallel Transport on the Cone Manifold of SPD Matrices\n\nfor Domain Adaptation. July 2018.\n\n[48] L. Yang, M. Arnaudon, and F. Barbaresco. Riemannian median, geometry of covariance\nmatrices and radar target detection. In The 7th European Radar Conference, pages 415\u2013418,\nSept. 2010.\n\n[49] L. Yang, M. Arnaudon, and F. Barbaresco. Riemannian median, geometry of covariance\n\nmatrices and radar target detection. pages 415\u2013418, Nov. 2010.\n\n[50] F. Yger. A review of kernels on covariance matrices for BCI applications. In 2013 IEEE\nInternational Workshop on Machine Learning for Signal Processing (MLSP), pages 1\u20136, Sept.\n2013.\n\n[51] F. Yger and M. Sugiyama. Supervised LogEuclidean Metric Learning for Symmetric Positive\n\nDe\ufb01nite Matrices. arXiv:1502.03505 [cs], Feb. 2015. arXiv: 1502.03505.\n\n[52] T. Zhang, W. Zheng, Z. Cui, and C. Li. Deep Manifold-to-Manifold Transforming Network. In\n2018 25th IEEE International Conference on Image Processing (ICIP), pages 4098\u20134102, Oct.\n2018.\n\n12\n\n\f", "award": [], "sourceid": 8975, "authors": [{"given_name": "Daniel", "family_name": "Brooks", "institution": "Thales - LIP6"}, {"given_name": "Olivier", "family_name": "Schwander", "institution": "Sorbonne Universit\u00e9"}, {"given_name": "Frederic", "family_name": "Barbaresco", "institution": "THALES LAND & AIR SYSTEMS"}, {"given_name": "Jean-Yves", "family_name": "Schneider", "institution": "THALES LAND & AIR SYSTEMS"}, {"given_name": "Matthieu", "family_name": "Cord", "institution": "Sorbonne University"}]}