{"title": "$L_1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry", "book": "Advances in Neural Information Processing Systems", "page_first": 441, "page_last": 449, "abstract": "We propose a new approach to the problem of robust estimation in multiview geometry. Inspired by recent advances in the sparse recovery problem of statistics, our estimator is defined as a Bayesian maximum a posteriori with multivariate Laplace prior on the vector describing the outliers. This leads to an estimator in which the fidelity to the data is measured by the $L_\\infty$-norm while the regularization is done by the $L_1$-norm. The proposed procedure is fairly fast since the outlier removal is done by solving one linear program (LP). An important difference compared to existing algorithms is that for our estimator it is not necessary to specify neither the number nor the proportion of the outliers. The theoretical results, as well as the numerical example reported in this work, confirm the efficiency of the proposed approach.", "full_text": "L1-Penalized Robust Estimation for a Class of Inverse\n\nProblems Arising in Multiview Geometry\n\nArnak S. Dalalyan and Renaud Keriven\n\nIMAGINE/LabIGM,\n\ndalalyan,keriven@imagine.enpc.fr\n\nUniversit\u00b4e Paris Est - Ecole des Ponts ParisTech,\n\nMarne-la-Vall\u00b4ee, France\n\nAbstract\n\nWe propose a new approach to the problem of robust estimation in multiview ge-\nometry. Inspired by recent advances in the sparse recovery problem of statistics,\nwe de\ufb01ne our estimator as a Bayesian maximum a posteriori with multivariate\nLaplace prior on the vector describing the outliers. This leads to an estimator\nin which the \ufb01delity to the data is measured by the L\u221e-norm while the regular-\nization is done by the L1-norm. The proposed procedure is fairly fast since the\noutlier removal is done by solving one linear program (LP). An important differ-\nence compared to existing algorithms is that for our estimator it is not necessary\nto specify neither the number nor the proportion of the outliers. We present strong\ntheoretical results assessing the accuracy of our procedure, as well as a numerical\nexample illustrating its ef\ufb01ciency on real data.\n\n1 Introduction\n\nIn the present paper, we are concerned with a class of non-linear inverse problems appearing in the\nstructure and motion problem of multiview geometry. This problem, that have received a great deal\nof attention by the computer vision community in last decade, consists in recovering a set of 3D\npoints (structure) and a set of camera matrices (motion), when only 2D images of the aforemen-\ntioned 3D points by some cameras are available. Throughout this work we assume that the internal\nparameters of cameras as well as their orientations are known. Thus, only the locations of camera\ncenters and 3D points are to be estimated. In solving the structure and motion problem by state-of-\nthe-art methods, it is customary to start by establishing correspondences between pairs of 2D data\npoints. We will assume in the present study that these point correspondences have been already\nestablished.\nOne can think of the structure and motion problem as the inverse problem of inverting the operator O\nthat takes as input the set of 3D points and the set of cameras, and produces as output the 2D images\nof the 3D points by the cameras. This approach will be further formalized in the next section.\nGenerally, the operator O is not injective, but in many situations (for example, when for each pair\nof cameras there are at least \ufb01ve 3D points in general position that are seen by these cameras [23]),\nthere is only a small number of inputs, up to an overall similarity transform, having the same image\nby O. In such cases, the solutions to the structure and motion problem can be found using algebraic\narguments.\nThe main \ufb02aw of algebraic solutions is their sensitivity to the noise in the data: very often, thanks\nto the noise in the measurements, there is no input that could have generated the observed output.\nA natural approach to cope with such situations consists in searching for the input providing the\nclosest possible output to the observed data. Then, a major issue is how to choose the metric in the\noutput space. A standard approach [16] consists in measuring the distance between two elements\n\n1\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nFigure 1: (a) One image from the dinosaur sequence. Camera locations and scene points estimated\nby the blind L\u221e-cost minimization (b,c) and by the proposed \u201coutlier aware\u201d procedure (d,e).\nof the output space in the Euclidean L2-norm. In the structure and motion problem with more than\ntwo cameras, this leads to a hard non-convex optimization problem. A particularly elegant way of\ncircumventing the non-convexity issues inherent to the use of L2-norm consists in replacing it by the\nL\u221e-norm [15, 18, 24, 25, 27, 13, 26]. It has been shown that, for a number of problems, L\u221e-norm\nbased estimators can be computed very ef\ufb01ciently using, for example, the iterative bisection method\n[18, Algorithm 1, p. 1608] that solves a convex program at each iteration. There is however an\nissue with the L\u221e-techniques that dampens the enthusiasm of practitioners: it is highly sensitive to\noutliers (c.f . Fig. 1). In fact, among all Lq-metrics with q \u2265 1, the L\u221e-metric is the most seriously\naffected by the outliers in the data. Two procedures have been introduced [27, 19] that make the\nL\u221e-estimator less sensitive to outliers. Although these procedures demonstrate satisfactory empir-\nical performance, they suffer from a lack of suf\ufb01cient theoretical support assessing the accuracy of\nproduced estimates.\nThe purpose of the present work is to introduce and to theoretically investigate a new procedure\nof estimation in presence of noise and outliers. Our procedure combines L\u221e-norm for measuring\nthe \ufb01delity to the data and L1-norm for regularization. It can be seen as a maximum a posteriori\n(MAP) estimator under uniformly distributed random noise and a sparsity favoring prior on the\nvector of outliers. Interestingly, this study bridges the work on the robust estimation in multiview\ngeometry [12, 27, 19, 21] and the theory of sparse recovery in statistics and signal processing [10,\n2, 5, 6].\nThe rest of the paper is organized as follows. The next section gives the precise formulation of the\ntranslation estimation and triangulation problem to which the presented methodology can be applied.\nA brief review of the L\u221e-norm minimization algorithm is presented in Section 3. In Section 4, we\nintroduce the statistical framework and derive a new procedure as a MAP estimator. The main result\non the accuracy of this procedure is stated and proved in Section 5, while Section 6 contains some\nnumerical experiments. The methodology of our study is summarized in Section 7.\n\n2 Translation estimation and triangulation\n\nLet us start by presenting a problem of multiview geometry to which our approach can be success-\nfully applied, namely the problem of translation estimation and triangulation in the case of known\nrotations. For rotation estimation algorithms, we refer the interested reader to [22, 14] and the\nreferences therein.\nLet P\u2217\ni , i = 1, . . . , m, be a sequence of m cameras that are known up to a translation. Recall that a\ncamera is characterized by a 3\u00d7 4 matrix P with real entries that can be written as P = K[R|t], where\nK is an invertible 3 \u00d7 3 matrix called the camera calibration matrix, R is a 3 \u00d7 3 rotation matrix and\nt \u2208 R3. We will refer to t as the translation of the camera P. We can thus write P\u2217\ni = Ki[Ri|t\u2217\ni ],\ni = 1, . . . , m. For a set of unknown scene points U\u2217\nj ,, j = 1, . . . , n, expressed in homogeneous\ncoordinates (i.e., U\u2217\nj is an element of the projective space P3), we assume that noisy images of each\nU\u2217\nj by some cameras P\u2217\n\ni are observed. Thus, we have at our disposal the measurements\n\n(cid:20)eT\n\n1 P\u2217\n2 P\u2217\neT\n\ni U\u2217\ni U\u2217\n\nxij =\n\n1\n3 P\u2217\ni U\u2217\neT\n\nj\n\n(cid:21)\n\nj\n\nj\n\n+ \u03beij,\n\nj = 1, . . . , n,\n\ni \u2208 Ij,\n\n(1)\n\nwhere e(cid:96), (cid:96) = 1, 2, 3, stands for the unit vector of R3 having one as the (cid:96)th coordinate and Ij is the\nj} does not\nset of indices of cameras for which the point U\u2217\ncontain points at in\ufb01nity: U\u2217\n\nj is visible. We assume that the set {U\u2217\nj \u2208 R3 and for every j = 1, . . . , n.\n\nj |1]T for some X\u2217\n\nj = [X\u2217T\n\n2\n\n\f\u2217.\n\n3 P\u2217\n\ni U\u2217\n\nm , X\u2217T\n\n\u2217 = (t\u2217T\n\n1 , . . . , t\u2217T\n\n1 , . . . , X\u2217T\n\nj is in front of the camera\nj \u2265 0. This is termed cheirality condition. Furthermore, we will assume that none\ni . This assumption implies that\n\nWe are now in a position to state the problem of translation estimation and triangulation in the\ni } (translation estimation)\ncontext of multiview geometry. It consists in recovering the 3-vectors {t\u2217\nand the 3D points {X\u2217\nj} (triangulation) from the noisy measurements {xij; j = 1, . . . , n; i \u2208 Ij} \u2282\nn )T \u2208 R3(m+n). Thus,\nR2. In what follows, we use the notation \u03b8\nwe are interested in estimating \u03b8\nRemark 1 (Cheirality). It should be noted right away that if the point U\u2217\nP\u2217\ni , then eT\nof the true 3D points U\u2217\n3 P\u2217\neT\nRemark 2 (Identi\ufb01ability). The parameter \u03b8 we have just de\ufb01ned is, in general, not identi\ufb01able from\nthe measurements {xij}. In fact, one easily checks that, for every \u03b1 (cid:54)= 0 and for every t \u2208 R3, the\nparameters {t\u2217\nj + t)} generate the same measurements. To cope with\nthis issue, we assume that t\u2217\nj = 1. Thus, in what follows we assume\n\u2217 \u2208 R3(m+n\u22121). Further assumptions ensuring the identi\ufb01ability\nthat t\u2217\nof \u03b8\n\nj} and {\u03b1(t\u2217\n1 = 03 and that mini,j eT\n\u2217 and \u03b8\n\nj lies on the principal plane of a camera P\u2217\n\n1 is removed from \u03b8\n\u2217 are given below.\n\nj > 0 so that the quotients eT\n\nj , (cid:96) = 1, 2, are well de\ufb01ned.\n\ni \u2212 Rit), \u03b1(X\u2217\n\n3 P\u2217\n\ni U\u2217\n\n3 P\u2217\n\ni U\u2217\n\n(cid:96) P\u2217\n\ni U\u2217\n\ni , X\u2217\n\ni U\u2217\n\nj /eT\n\n3 Estimation by Sequential Convex Programming\n\ns =(cid:80)\nsquared REs. This de\ufb01nes the estimator(cid:98)\u03b8 as a minimizer of the cost function C2,2(\u03b8) =(cid:80)\n\nThis section presents results on the estimation of \u03b8 based on the reprojection error (RE) minimiza-\ntion. This material is essential for understanding the results that are at the core of the present work.\nIn what follows, for every s \u2265 1, we denote by (cid:107)x(cid:107)s the Ls-norm of a vector x, i.e.(cid:107)x(cid:107)s\nj |xj|s\nif x = (x1, . . . , xd)T. As usual, we extend this to s = +\u221e by setting (cid:107)x(cid:107)\u221e = maxj |xj|.\nA classical method [16] for estimating the parameter \u03b8 is based on minimizing the sum of the\ni,j (cid:107)xij\u2212\n3 PiUj is the 2-vector that we would obtain if \u03b8\n\n2, where xij(\u03b8) :=(cid:2)eT\n\nxij(\u03b8)(cid:107)2\nwere the true parameter. It can also be written as\n\n(cid:3)T/eT\n\n1 PiUj; eT\n\n2 PiUj\n\nxij(\u03b8) =\n\n(2)\nThe minimization of C2,2 is a hard nonconvex problem. In general, it does not admit closed-form\nsolution and the existing iterative algorithms may often get stuck in local minima. An ingenious\nidea to overcome this dif\ufb01culty [15, 17] is based on the minimization of the L\u221e cost function\n\n;\n\n.\n\n1 Ki(RiXj + ti)\n3 Ki(RiXj + ti)\neT\n\n2 Ki(RiXj + ti)\neT\n3 Ki(RiXj + ti)\neT\n\n(cid:20) eT\n\nC\u221e,s(\u03b8) = max\n\nj=1,...,n\n\nmax\ni\u2208Ij\n\n(cid:107)xij \u2212 xij(\u03b8)(cid:107)s,\n\ns \u2208 [1, +\u221e].\n\n(3)\n\nNote that the substitution of the L2-cost function by the L\u221e-cost function has been proved to lead\nto improved algorithms in other estimation problems as well, cf., e.g., [8]. This cost function has\na clear practical advantage in that all its sublevel sets are convex. This property ensures that all\nminima of C\u221e,s form a convex set and that an element of this set can be computed by solving\na sequence of convex programs [18], e.g., by the bisection algorithm. Note that for s = 1 and\ns = +\u221e, the minimization of C\u221e,s can be recast in a sequence of LPs. The main idea behind the\nbisection algorithm can be summarized as follows. We aim to designate an algorithm computing\ncheirality condition. Let us introduce the residuals rij(\u03b8) = xij \u2212 xij(\u03b8) that can be represented as\n\n(cid:98)\u03b8s \u2208 arg min\u03b8 C\u221e,s(\u03b8), for any prespeci\ufb01ed s \u2265 1, over the set of all vectors \u03b8 satisfying the\n\n(cid:21)T\n\n(4)\nfor some vectors aij(cid:96), cij \u2208 R2. Furthermore, as presented in Remark 2, the cheirality conditions\nimply the set of linear constraints cT\n\nij\u03b8 \u2265 1. Thus, the problem of computing(cid:98)\u03b8s can be rewritten as\n\nrij(\u03b8) =\n\n;\n\n,\n\n(cid:26)(cid:107)rij(\u03b8)(cid:107)s \u2264 \u03b3,\n\nminimize\n\nij\u03b8 \u2265 1.\ncT\nNote that the inequality (cid:107)rij(\u03b8)(cid:107)s \u2264 \u03b3 can be replaced by (cid:107)AT\nij\u03b8(cid:107)s \u2264 \u03b3cT\nij\u03b8 with Aij = [aij1; aij2].\nAlthough (5) is not a convex problem, its solution can be well approximated by solving a sequence\nof convex feasibility problems.\n\n\u03b3\n\nsubject to\n\n(5)\n\n(cid:20) aT\n\nij1\u03b8\ncT\nij\u03b8\n\naT\nij2\u03b8\ncT\nij\u03b8\n\n(cid:21)T\n\n3\n\n\f4 Robust estimation by linear programming\n\nThis and the next sections contain the main theoretical contribution of the present work. We start\nwith the precise formulation of the statistical model. We then exhibit a prior distribution on the\nunknown parameters of the model that leads to a MAP estimator.\n\n(cid:20) aT\n\n\u2217\n\u2217 ;\n\nij1\u03b8\ncT\nij\u03b8\n\n\u2217\naT\nij2\u03b8\n\u2217\ncT\nij\u03b8\n\n(cid:21)T\n\n4.1 The statistical model\nLet us \ufb01rst observe that, in view of (1) and (4), the model we are considering can be rewritten as\n\n= \u03beij,\n\nj = 1, . . . , n; i \u2208 Ij.\n\nLet N = 2(cid:80)n\n\n(6)\nj=1 Ij be the total number of measurements and let M = 3(n + m \u2212 1) be the size of\n\u2217. Let us denote by A (resp. C) the M \u00d7 N matrix formed by the concatenation of the\nthe vector \u03b8\n1). Similarly, let us denote by \u03be the N-vector formed by concatenating\ncolumn-vectors aij(cid:96) (resp. cij\n\u2217)\u03bep, p = 1, . . . , N. This\nthe vectors \u03beij. In these notation, Eq. (6) is equivalent to aT\np \u03b8\nequation de\ufb01nes the statistical model in the case where there is no outlier. To extend this model to\ncover the situation where some outliers are present in the measurements, we introduce the vector\n\u03c9\u2217 \u2208 RN de\ufb01ned by \u03c9\u2217\np = 0 if the pth measurement is an inlier and\np| > 0 otherwise. This leads us to the model:\n|\u03c9\u2217\n\n\u2217 = (cT\np \u03b8\n\n\u2217 \u2212 (cT\np \u03b8\n\n\u2217)\u03bep so that \u03c9\u2217\n\np = aT\np \u03b8\n\nwhere diag(v) stands for the diagonal matrix having the components of v as diagonal entries.\n\nAT\u03b8\n\n\u2217 = \u03c9\u2217 + diag(CT\u03b8\n\n\u2217)\u03be,\n\n(7)\n\n\u2217T; \u03c9\u2217T]T based on the following prior information:\n\nStatement of the problem: Given the matrices A and C, estimate the parameter-vector\n\u2217 = [\u03b8\n\u03b2\nC1 : Eq. (7) holds with some small noise vector \u03be,\nC2 : minp cT\np \u03b8\nC3 : \u03c9\u2217 is sparse, i.e., only a small number of coordinates of \u03c9\u2217 are different from zero.\n\n\u2217 = 1,\n\n4.2 Sparsity prior and MAP estimator\n\u2217, we place ourselves in the Bayesian framework. To this\nTo derive an estimator of the parameter \u03b2\nend, we impose a probabilistic structure on the noise vector \u03be and introduce a prior distribution on\nthe unknown vector \u03b2.\nSince the noise \u03be represents the difference (in pixels) between the measurements and the true image\npoints, it is naturally bounded and, generally, does not exceeds the level of a few pixels. Therefore,\nit is reasonable to assume that the components of \u03be are uniformly distributed in some compact set\nof R2, centered at the origin. We assume in what follows that the subvectors \u03beij of \u03be are uniformly\ndistributed in the square [\u2212\u03c3, \u03c3]2 and are mutually independent. Note that this implies that all the\ncoordinates of \u03be are independent. In practice, this assumption can be enforced by decorrelating the\nmeasurements using the empirical covariance matrix [20]. We de\ufb01ne the prior on \u03b8 as the uniform\ndistribution on the polytope P = {\u03b8 \u2208 RM : CT\u03b8 \u2265 1}, where the inequality is understood compo-\nnentwise. The density of this distribution is p1(\u03b8) \u221d 1P(\u03b8), where \u221d stands for the proportionality\nrelation and 1P(\u03b8) = 1 if \u03b8 \u2208 P and 0 otherwise. When P is unbounded, this results in an improper\nprior, which is however not a problem for de\ufb01ning the Bayes estimator.\nThe task of choosing a prior on \u03c9 is more delicate in that it should re\ufb02ect the information that \u03c9\nis sparse. The most natural prior would be the one having a density which is a decreasing function\nof the L0-norm of \u03c9, i.e., of the number of its nonzero coef\ufb01cients. However, the computation of\nestimators based on this type of priors is NP-hard. An approach for overcoming this dif\ufb01culty relies\non using the L1-norm instead of the L0-norm. Following this idea, we de\ufb01ne the prior distribution\non \u03c9 by the probability density p2(\u03c9) \u221d f((cid:107)\u03c9(cid:107)1), where f is some decreasing function2 de\ufb01ned\non [0,\u221e). Assuming in addition that \u03b8 and \u03c9 are independent, we get the following prior on \u03b2:\n\n\u03c0(\u03b2) = \u03c0(\u03b8; \u03c9) \u221d 1P(\u03b8) \u00b7 f((cid:107)\u03c9(cid:107)1).\n\n(8)\n\n1To get a matrix of the same size as A, in the matrix C each column is duplicated two times.\n2The most common choice is f (x) = e\u2212x corresponding to the multivariate Laplace density.\n\n4\n\n\fTheorem 1. Assume that the noise \u03be has independent entries which are uniformly distributed in\n\n[\u2212\u03c3, \u03c3] for some \u03c3 > 0, then the MAP estimator(cid:98)\u03b2 = [(cid:98)\u03b8T;(cid:98)\u03c9T]T based on the prior \u03c0 de\ufb01ned by Eq.\n\n(8) is the solution of the optimization problem:\n\n(cid:26)|aT\n\np \u03b8 \u2212 \u03c9p| \u2264 \u03c3cT\np \u03b8 \u2265 1, \u2200p.\ncT\n\np \u03b8, \u2200p\n\nminimize\n\n(cid:107)\u03c9(cid:107)1\n\nsubject to\n\n(9)\n\nThe proof of this theorem is a simple exercise and is left to the reader.\nRemark 3 (Condition C2). One easily checks that any solution of (9) satis\ufb01es condition C2. Indeed,\n\nif for some solution(cid:98)\u03b2 it were not the case, then \u02dc\u03b2 = (cid:98)\u03b2/ minp cT\np(cid:98)\u03b8 would satisfy the constraints of\n(9) and \u02dc\u03c9 would have a smaller L1-norm than that of(cid:98)\u03c9, which is in contradiction with the fact that\n(cid:98)\u03b2 solves (9).\nRemark 4 (The role of \u03c3). In the de\ufb01nition of(cid:98)\u03b2, \u03c3 is a free parameter that can be interpreted as\nthe level of separation of inliers from outliers. The proposed algorithm implicitly assumes that all\nthe measurements xij for which (cid:107)\u03beij(cid:107)\u221e > \u03c3 are outliers, while all the others are treated as inliers.\n\u03c3 and to de\ufb01ne the estimator(cid:98)\u03b2 as a MAP estimator based on the prior incorporating the uncertainty\nIf \u03c3 is unknown, a reasonable way of acting is to impose a prior distribution on the possible values of\n\non \u03c3. When there are no outliers and the prior on \u03c3 is decreasing, this approach leads to the estimator\nminimizing the L\u221e cost function. In the presence of outliers, the shape of the prior on \u03c3 becomes\nmore important for the de\ufb01nition of the estimator. This is an interesting point for future investigation.\n\n4.3 Two-step procedure\nBuilding on the previous arguments, we introduce the following two-step algorithm.\n\nStep 1: Compute [(cid:98)\u03b8T;(cid:98)\u03c9T]T as a solution to (9) and set J = {p :(cid:98)\u03c9p = 0} .\n\nInput: {ap, cp; p = 1, . . . , N} and \u03c3.\nStep 2: Apply the bisection algorithm to the reduced data set {xp; p \u2208 J}.\n\nTwo observations are in order. First, when applying the bisection algorithm at Step 2, we can use\n\nC\u221e,s((cid:98)\u03b8) as the initial value of \u03b3u. The second observation is that a better way of acting would be to\np(cid:98)\u03b8)\u22121; p = 1, . . . , N}.\n\nminimize the weighted L1-norm of \u03c9, where the weight assigned to \u03c9p is inversely proportional to\n\u2217 is unknown, a reasonable strategy consists in adding a step in between Step\nthe depth cT\n1 and Step 2, which performs the weighted minimization with weights {(cT\n\n\u2217. Since \u03b8\n\np \u03b8\n\n5 Accuracy of estimation\nLet us introduce some additional notation. Recall the de\ufb01nition of P and set \u2202P = {\u03b8 : minp cT\np \u03b8 =\n1} and \u2206P\u2217 = {\u03b8 \u2212 \u03b8\n(cid:48) \u2208 \u2202P, \u03b8 (cid:54)= \u03b8}. For every subset of indices J \u2282 {1, . . . , N}, we\ndenote by AJ the M \u00d7N matrix obtained from A by replacing the columns that have an index outside\nJ by zero. Furthermore, let us de\ufb01ne\n\n(cid:48) : \u03b8, \u03b8\n\n\u03b4J(\u03b8) =\n\nsup\n\n\u03b8(cid:48)\u2208\u2202P,AT\u03b8(cid:48)(cid:54)=AT\u03b8\n\n(cid:107)AT\nJ(\u03b8\n(cid:107)AT(\u03b8\n\n(cid:48) \u2212 \u03b8)(cid:107)2\n(cid:48) \u2212 \u03b8)(cid:107)2\n\n,\n\n\u2200J \u2282 {1, . . . , N},\n\n\u2200\u03b8 \u2208 \u2202P.\n\n(10)\n\nOne easily checks that \u03b4J \u2208 [0, 1] and \u03b4J \u2264 \u03b4J(cid:48) if J \u2282 J(cid:48).\nAssumption A: The real number \u03bb de\ufb01ned by \u03bb = ming\u2208\u2206P\u2217 (cid:107)ATg(cid:107)2/(cid:107)g(cid:107)2 is strictly positive.\n\u2217 even in the case without outliers.\nAssumption A is necessary for identifying the parameter vector \u03b8\nIn fact, if \u03c9\u2217 = 0, and if Assumption A is not ful\ufb01lled, then3 \u2203 g \u2208 \u2206P\u2217 such that ATg = 0. That\nis, given the matrices A and C, there are two distinct vectors \u03b81 and \u03b82 in \u2202P such that AT\u03b81 = AT\u03b82.\nTherefore, if eventually \u03b81 is the true parameter vector satisfying C1 and C3, then \u03b82 satis\ufb01es these\nconditions as well. As a consequence, the true vector cannot be accurately estimated.\n\n3We assume for simplicity that \u2202P is compact.\n\n5\n\n\f5.1 The noise free case\nTo evaluate the quality of estimation, we \ufb01rst place ourselves in the case where \u03c3 = 0. The estimator\n\n\u2217 is then de\ufb01ned as a solution to the optimization problem\n\n(cid:98)\u03b2 of \u03b2\n\n(11)\n\nmin(cid:107)\u03c9(cid:107)1\n\nover \u03b2 =\n\ns.t.\n\n(cid:20)\u03b8\n\n(cid:21)\n\n\u03c9\n\n(cid:26)AT\u03b8 = \u03c9\n\nCT\u03b8 \u2265 1 .\n\nwhere \u03c9\u2217\n\n0 ). If \u03b4T0(\u03b8\n\nFrom now on, for every index set T and for every vector h, hT stands for the vector equal to h on\nan index set T and zero elsewhere. The complementary set of T will be denoted by T c.\nTheorem 2. Let Assumption A be ful\ufb01lled and let T0 (resp. T1) denote the index set corresponding\n\u2217) + \u03b4T0\u222aT1(\u03b8\n\u2217) < 1 then,\n\nfor some constant C0, it holds:\n\nto the locations of S largest entries4 of \u03c9\u2217 (resp. (\u03c9\u2217 \u2212(cid:98)\u03c9)T c\nS(cid:107)1,\n\n(cid:107)(cid:98)\u03b2 \u2212 \u03b2\nhas no more than S nonzero entries, then the estimation is exact: (cid:98)\u03b2 = \u03b2\nProof. We set h = \u03c9\u2217 \u2212(cid:98)\u03c9 and g = \u03b8\n\n\u2217(cid:107)2 \u2264 C0(cid:107)\u03c9\u2217 \u2212 \u03c9\u2217\n\n(12)\nS stands for the vector \u03c9\u2217 with all but the S-largest entries set to zero. In particular, if \u03c9\u2217\n\n\u2217 \u2212(cid:98)\u03b8. It follows from Remark 3 that g \u2208 \u2206P. To proceed\n\nwith the proof, we need the following auxiliary result, the proof of which can be easily deduced\nfrom [4].\nLemma 1. Let v \u2208 Rd be some vector and let S \u2264 d be a positive integer. If we denote by T the\nindices of S largest entries of the vector |v|, then (cid:107)vT c(cid:107)2 \u2264 S\u22121/2(cid:107)v(cid:107)1.\nApplying Lemma 1 to the vector v = hT c\n\n0 and to the index set T = T1, we get\n\n\u2217.\n\n(cid:107)h(T0\u222aT1)c(cid:107)2 \u2264 S\u22121/2(cid:107)hT c\n\n0 (cid:107)1.\n\nT c\n0\n\n(cid:107)hT c\n\n(cid:107)1 and (cid:107)\u03c9\u2217\n\n0 (cid:107)1 + (cid:107)\u03c9\u2217\n\n0 (cid:107)1 \u2264 (cid:107)(\u03c9\u2217\u2212h)T c\n\nT0(cid:107)1 \u2264 (cid:107)(cid:98)\u03c9(cid:107)1 + (cid:107)\u03c9\u2217\n\n(13)\nT0(cid:107)1 \u2264\n0 (cid:107)1 = (cid:107)\u03c9\u2217 \u2212 h(cid:107)1 =\n\u2217 satis\ufb01es the constraints of the optimization problem (11) a solution of which is(cid:98)\u03b2, we have\n(cid:107)h(T0\u222aT1)c(cid:107)2 \u2264 S\u22121/2(cid:107)hT0(cid:107)1 + 2S\u22121/2(cid:107)\u03c9\u2217\n\nOn the other hand, summing up the inequalities (cid:107)hT c\n0 (cid:107)1 +(cid:107)\u03c9\u2217\n(cid:107)(cid:98)\u03c9(cid:107)1, we get\n(cid:107)(\u03c9\u2217 \u2212 h)T0(cid:107)1 +(cid:107)hT0(cid:107)1, and using the relation (cid:107)(\u03c9\u2217 \u2212 h)T0(cid:107)1 +(cid:107)(\u03c9\u2217 \u2212 h)T c\n(cid:107)1 + (cid:107)hT0(cid:107)1.\n(cid:107)(cid:98)\u03c9(cid:107)1 \u2264 (cid:107)\u03c9\u2217(cid:107)1. This inequality, in conjunction with (13) and (14), implies\nboth(cid:98)\u03b2 and \u03b2\n(cid:107)h(cid:107)2 \u2264 (cid:107)hT0\u222aT1(cid:107)2 + (cid:107)h(T0\u222aT1)c(cid:107)2 \u2264 (cid:107)hT0\u222aT1(cid:107)2 + (cid:107)hT0(cid:107)2 + 2S\u22121/2(cid:107)\u03c9\u2217\n\nwhere the last step follows from the Cauchy-Schwartz inequality. Using once again the fact that\n\n\u2217 satisfy the constraints of (11), we get h = ATg. Therefore,\n\n(cid:107)1 \u2264 (cid:107)hT0(cid:107)2 + 2S\u22121/2(cid:107)\u03c9\u2217\n\nSince \u03b2\n\n(cid:107)1,\n\n(14)\n\n(15)\n\n(cid:107)1\n\nT c\n0\n\nT c\n0\n\nT c\n0\n\n(cid:107)1 \u2264 (\u03b42S + \u03b4S)(cid:107)ATg(cid:107)2 + 2S\u22121/2(cid:107)\u03c9\u2217\n\nT0\u222aT1g(cid:107)2 + (cid:107)AT\n\n= (cid:107)AT\n= (\u03b42S + \u03b4S)(cid:107)h(cid:107)2 + 2S\u22121/2(cid:107)\u03c9\u2217\n\nT0g(cid:107)2 + 2S\u22121/2(cid:107)\u03c9\u2217\n\n(cid:107)1.\n\nSince \u03c9\u2217\nTo complete the proof, it suf\ufb01ces to observe that\n\n= \u03c9\u2217 \u2212 \u03c9S, the last inequality yields (cid:107)h(cid:107)2 \u2264(cid:0)2S\u22121/2/(1 \u2212 \u03b4S \u2212 \u03b42S)(cid:1)(cid:107)\u03c9\u2217 \u2212 \u03c9\u2217\n\u2217(cid:107)2 \u2264 (cid:107)g(cid:107)2 + (cid:107)h(cid:107)2 \u2264 \u03bb\u22121(cid:107)Ag(cid:107)2 + (cid:107)h(cid:107)2 =(cid:0)\u03bb\u22121 + 1(cid:1)(cid:107)h(cid:107)2 \u2264 C0(cid:107)\u03c9\u2217 \u2212 \u03c9\u2217\n\n(cid:107)(cid:98)\u03b2 \u2212 \u03b2\n\nS(cid:107)1.\n\n(cid:107)1\n(16)\nS(cid:107)1.\n\nT c\n0\n\nT c\n0\n\nT c\n0\n\nT c\n0\n\nT c\n0\n\n\u2217) < 1 is close in spirit to the restricted isometry\nRemark 5. The assumption \u03b4T0(\u03b8\nassumption (cf., e.g., [10, 6, 3] and the references therein). It is very likely that results similar to\nthat of Theorem 2 hold under other kind of assumptions recently introduced in the theory of L1-\nminimization [11, 29, 2]. This investigation is left for future research.\n\n\u2217) + \u03b4T0\u222aT1(\u03b8\n\nthen max((cid:107)(cid:98)\u03c9 \u2212 \u03c9\u2217(cid:107)2,(cid:107)AT((cid:98)\u03b8 \u2212 \u03b8\n\nWe emphasize that the constant C0 is rather small. For example, if \u03b4T0(\u03b8\n\n\u221a\n\u2217)(cid:107)2) \u2264 (4/\nS)(cid:107)\u03c9\u2217 \u2212 \u03c9\u2217\n\nS(cid:107)1.\n\n\u2217) + \u03b4T0\u222aT1(\u03b8\n\n\u2217) = 0.5,\n\n4in absolute value\n\n6\n\n\f5.2 The noisy case\nThe assumption \u03c3 = 0 is an idealization of the reality that has the advantage of simplifying the\nmathematical derivations. While such a simpli\ufb01ed setting is useful for conveying the main ideas\nbehind the proposed methodology, it is of major practical importance to discuss the extensions to the\n\nmore realistic noisy model. To this end, we introduce the vector(cid:98)\u03be of estimated residuals satisfying\nAT(cid:98)\u03b8 = (cid:98)\u03c9 + diag(CT(cid:98)\u03b8)(cid:98)\u03be and (cid:107)(cid:98)\u03be(cid:107)\u221e \u2264 \u03c3.\nmax((cid:107) diag(CT(cid:98)\u03b8)(cid:98)\u03be(cid:107)2;(cid:107) diag(CT\u03b8\n(cid:107)(cid:98)\u03b2 \u2212 \u03b2\n\nTheorem 3. Let the assumptions of Theorem 2 be ful\ufb01lled.\n\nIf for some \u0001 > 0 we have\n\n\u2217(cid:107)2 \u2264 C0(cid:107)\u03c9\u2217 \u2212 \u03c9\u2217\n\n\u2217)\u03be(cid:107)2) \u2264 \u0001, then\n\nS(cid:107)1 + C1\u0001\n\n(17)\n\nwhere C0 and C1 are some constants.\n\n\u2217)\u03be and(cid:98)\u03b7 = diag(CT(cid:98)\u03b8)(cid:98)\u03be. On the one hand, in view of (15),\n(cid:107)1 with h = \u03c9\u2217 \u2212(cid:98)\u03c9. On the other hand, since\n\nProof. Let us de\ufb01ne \u03b7 = diag(CT\u03b8\nh = ATg +(cid:98)\u03b7 \u2212 \u03b7, we have\nwe have (cid:107)h(T0\u222aT1)c(cid:107)2 \u2264 (cid:107)hT0(cid:107)2 + 2S\u22121/2(cid:107)\u03c9\u2217\n(cid:107)h(T0\u222aT1)c(cid:107)2 \u2265 (cid:107)AT\nT0g(cid:107)2 + (cid:107)(cid:98)\u03b7T0(cid:107)2 + (cid:107)\u03b7T0(cid:107)2 \u2264 (cid:107)AT\nand (cid:107)hT0(cid:107)2 \u2264 (cid:107)AT\nT0\u222aT1g(cid:107)2 + (cid:107)AT\n\n(T0\u222aT1)cg(cid:107)2 \u2212 (cid:107)(cid:98)\u03b7(T0\u222aT1)c(cid:107)2 \u2212 (cid:107)\u03b7(T0\u222aT1)c(cid:107)2 \u2265 (cid:107)AT\n(T0\u222aT1)cg(cid:107)2 \u2212 2\u0001\nT0g(cid:107)2 + 2\u0001. These inequalities imply that\nT0g(cid:107)2 + 4\u0001 + 2S\u22121/2(cid:107)\u03c9\u2217\n\u2264 (\u03b4T0\u222aT1 + \u03b4T0)(cid:107)ATg(cid:107)2 + 4\u0001 + 2S\u22121/2(cid:107)\u03c9\u2217\n\n(cid:107)ATg(cid:107)2 \u2264 (cid:107)AT\n\n(cid:107)1\n(cid:107)1.\n\nT c\n0\n\nT c\n0\n\nT c\n0\n\nTo complete the proof, it suf\ufb01ces to remark that\n\n(cid:107)(cid:98)\u03b2 \u2212 \u03b2\n\n\u2217(cid:107)2 \u2264 (cid:107)h(cid:107)2 + (cid:107)g(cid:107)2 \u2264 (cid:107)AT g(cid:107)2 + (cid:107)g(cid:107)2 + 2\u0001 \u2264 (1 + \u03bb\u22121)(cid:107)ATg(cid:107)2 + 2\u0001\n\n\u2264\n\n1+\u03bb\u22121\n\n1\u2212\u03b4T0\u222aT1\u2212\u03b4T0\n\n(4\u0001 + 2S\u22121/2(cid:107)\u03c9\u2217\n\n(cid:107)1).\n\nT c\n0\n\n\u03b8\n\n\u03b8\n\nT c\n0\n\nT c\n0\n\n\u2217 = AT\n\n\u2217)+ \u03b4T0\u222aT1(\u03b8\n\n(cid:48) \u2208 \u2202P such that AT\n\n\u2217) < 1 is necessary for \u03b8\n\u2217 and \u03b8\n\n\u2217 to be consistently estimated. Indeed, if \u03b4T0(\u03b8\n(cid:48) satisfy (7) with the same number of outliers.\n\n5.3 Discussion\n\u2217) < 1. While this assumption\nThe main assumption in Theorems 2 and 3 is that \u03b4T0(\u03b8\nis by no means necessary, it should be recognized that it cannot be signi\ufb01cantly relaxed. In fact, the\n\u2217) = 1,\ncondition \u03b4T0(\u03b8\n(cid:48), which makes the problem of robust\nthen it is possible to \ufb01nd \u03b8\nestimation ill-posed, since both \u03b8\nNote also that the mapping J (cid:55)\u2192 \u03b4J(\u03b8) is subadditive, that is \u03b4J \u222a J(cid:48)(\u03b8) \u2264 \u03b4J(\u03b8) + \u03b4J(cid:48)(\u03b8).\n\u2217) < 1/3 for every index set J of\nTherefore, the condition of Thm. 2 is ful\ufb01lled as soon as \u03b4J(\u03b8\ncardinality \u2264 S. Thus, the condition maxJ:|J|\u2264S \u03b4S(\u03b8\n\u2217 in\npresence of S outliers, while maxJ:|J|\u2264S \u03b4S(\u03b8\nA simple upper bound on \u03b4J, obtained by replacing the sup over \u2202P by the sup over RM , is \u03b4J(\u03b8) \u2264\nJ(cid:107), \u2200\u03b8 \u2208 \u2202P, where O = O(A) stands for the Rank(A)\u00d7N matrix with orthonormal rows spanning\n(cid:107)OT\nthe image of AT. The matrix norm is understood as the largest singular value. Note that for a given\nJ, the computation of (cid:107)OT\nWe emphasize that the model we have investigated comprises the robust linear model as a particular\ncase. Indeed, if the last row of the matrix A is equal to zero as well as all the rows of C except the\nlast row which that has all the entries equal to one, then the model described by (7) is nothing else\nbut a linear model with unknown noise variance.\nTo close this section, let us stress that other approaches (cf., for instance, [9, 7, 1]) recently intro-\nduced in sparse learning and estimation may potentially be useful for the problem of robust estima-\ntion.\n\n\u2217) < 1/3 is suf\ufb01cient for identifying \u03b8\n\nJ(cid:107) is far easier than that of \u03b4J(\u03b8).\n\n\u2217) < 1 is necessary.\n\n6 Numerical illustration\n\nWe implemented the algorithm in MatLab, using the SeDuMi package for solving LPs [28]. We\napplied our algorithm of robust estimation to the well-known dinosaur sequence 5. which consists\n\n5http://www.robots.ox.ac.uk/\u02dcvgg/data1.html\n\n7\n\n\fFigure 2: (a)-(c) Overhead view of the scene points estimated by the KK-procedure (a), by the SH-\nprocedure (b) and by our procedure. (d) Boxplots of the errors when estimating the camera centers\nby our procedure (left) and by the KK-procedure. (e) Boxplots of the errors when estimating the\ncamera centers by our procedure (left) and by the SH-procedure.\nof 36 images of a dinosaur on a turntable, see Fig. 1 (a) for one example. The 2D image points\nwhich are tracked across the image sequence and the projection matrices of 36 cameras are provided\nas well. There are 16,432 image points corresponding to 4,983 scene points. This data is severely\naffected by outliers which results in a very poor accuracy of the \u201cblind\u201d L\u221e-cost minimization\nprocedure. Its maximal RE equals 63 pixel and, as shown in Fig. 1, the estimated camera centers are\nnot on the same plane and the scatter plot of scene points is inaccurate.\np \u03b8| was larger than \u03c3/4,\nWe ran our procedure with \u03c3 = 0.5 pixel. If for pth measurement |\u03c9p/cT\nthen the it has been considered is an outlier and removed from the dataset. The corresponding 3D\nscene point was also removed if, after the step of outlier removal, it was seen by only one camera.\nThis resulted in removing 1, 306 image points and 297 scene points. The plots (d) and (e) of Fig. 1\nshow the estimated camera centers and estimated scene points. We see, in particular, that the camera\ncenters are almost coplanar. Note that in this example, the second step of the procedure described in\nSection 4.3 does not improve on the estimator computed at the \ufb01rst step. Thus, an accurate estimate\nis obtained by solving only one linear program.\nWe compared our procedure with the procedures proposed by Sim and Hartley [27], hereafter\nreferred to as SH-procedure, and by Kanade and Ke [19], hereafter KK-procedure. For the SH-\nprocedure, we iteratively computed the L\u221e-cost minimizer by removing, at each step j, the mea-\nsurements that had a RE larger than Emax,j \u2212 0.5\u0001, where Emax,j was the largest RE. We have\nstopped the SH-procedure when the number of removed measurements exceeded 1,500. This num-\nber has been attained after 53 cycles. Therefore, the execution time was approximately 50 times\nlarger than for our procedure. The estimator obtained by SH-procedure has a maximal RE equal\nto 1.33 pixel, whereas the maximal RE for our estimator is of 0.62 pixel. Concerning the KK-\nprocedure, we run it with the parameter value m = N \u2212 NO = 15, 000, which is approximately\nthe number of inliers detected by our method. Recall that the KK-procedure aims at minimizing the\nmth largest RE. As shown in Fig. 2, our procedure performs better than that of [19].\n\n7 Conclusion\n\nIn this paper, we presented a rigorous Bayesian framework for the problem of translation estima-\ntion and triangulation that have leaded to a new robust estimation procedure. We have formulated\nthe problem under consideration as a nonlinear inverse problem with a high-dimensional unknown\nparameter-vector. This parameter-vector encapsulates the information on the scene points and the\ncamera locations, as well as the information on the location of outliers in the data. The proposed\nestimator exploits the sparse nature of the vector of outliers through L1-norm minimization. We\nhave given the mathematical proof of the result demonstrating the ef\ufb01ciency of the proposed esti-\nmator under mild assumptions. Real data analysis conducted on the dinosaur sequence supports our\ntheoretical results.\n\nAcknowledgments\n\nThe work of the \ufb01rst author was partially supported by ANR under grants Callisto and Parcimonie.\n\n8\n\n\fReferences\n[1] F. Bach. Bolasso: model consistent Lasso estimation through the bootstrap. In Twenty-\ufb01fth International\n\nConference on Machine Learning (ICML), 2008. 7\n\n[2] P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and Dantzig selector. Ann.\n\nStatist., 37(4):1705\u20131732, 2009. 2, 6\n\n[3] E. Cand`es and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n. Ann.\n\nStatist., 35(6):2313\u20132351, 2007. 6\n\n[4] E. J. Cand`es. The restricted isometry property and its implications for compressed sensing. C. R. Math.\n\nAcad. Sci. Paris, 346(9-10):589\u2013592, 2008. 6\n\n[5] E. J. Cand`es and P. A. Randall. Highly robust error correction by convex programming. IEEE Trans.\n\nInform. Theory, 54(7):2829\u20132840, 2008. 2\n\n[6] E. J. Cand`es, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measure-\n\nments. Comm. Pure Appl. Math., 59(8):1207\u20131223, 2006. 2, 6\n\n[7] C. Chesneau and M. Hebiri. Some theoretical results on the grouped variables Lasso. Math. Methods\n\nStatist., 17(4):317\u2013326, 2008. 7\n\n[8] A. S. Dalalyan, A. Juditsky, and V. Spokoiny. A new algorithm for estimating the effective dimension-\n\nreduction subspace. Journal of Machine Learning Research, 9:1647\u20131678, Aug. 2008. 3\n\n[9] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting, sharp PAC-bayesian bounds\n\nand sparsity. Machine Learning, 72(1-2):39\u201361, 2008. 7\n\n[10] D. Donoho, M. Elad, and V. Temlyakov. Stable recovery of sparse overcomplete representations in the\n\npresence of noise. IEEE Trans. Inform. Theory, 52(1):6\u201318, 2006. 2, 6\n\n[11] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform.\n\nTheory, 47(7):2845\u20132862, 2001. 6\n\n[12] O. Enqvist and F. Kahl. Robust optimal pose estimation. In ECCV, pages I: 141\u2013153, 2008. 2\n[13] R. Hartley and F. Kahl. Optimal algorithms in multiview geometry. In ACCV, volume 1, pages 13 \u2013 34,\n\nNov. 2007. 2\n\n[14] R. Hartley and F. Kahl. Global optimization through rotation space search. IJCV, 2009. 2\n[15] R. I. Hartley and F. Schaffalitzky. L\u221e minimization in geometric reconstruction problems. In CVPR (1),\n\npages 504\u2013509, 2004. 2, 3\n\n[16] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University\n\nPress, June 2004. 1, 3\n\n[17] F. Kahl. Multiple view geometry and the L\u221e-norm. In ICCV, pages 1002\u20131009. IEEE Computer Society,\n\n2005. 3\n\n[18] F. Kahl and R. I. Hartley. Multiple-view geometry under the L\u221e norm. IEEE Trans. Pattern Analysis and\n\nMachine Intelligence, 30(9):1603\u20131617, sep 2008. 2, 3\n\n[19] T. Kanade and Q. Ke. Quasiconvex optimization for robust geometric reconstruction. In ICCV, pages II:\n\n986\u2013993, 2005. 2, 8\n\n[20] Q. Ke and T. Kanade. Uncertainty models in quasiconvex optimization for geometric reconstruction. In\n\nCVPR, pages I: 1199\u20131205, 2006. 4\n\n[21] H. D. Li. A practical algorithm for L\u221e triangulation with outliers. In CVPR, pages 1\u20138, 2007. 2\n[22] D. Martinec and T. Pajdla. Robust rotation and translation estimation in multiview reconstruction. In\n\nCVPR, pages 1\u20138, 2007. 2\n\n[23] D. Nist\u00b4er. An ef\ufb01cient solution to the \ufb01ve-point relative pose problem. IEEE Trans. Pattern Anal. Mach.\n\nIntell, 26(6):756\u2013777, 2004. 1\n\n[24] C. Olsson, A. P. Eriksson, and F. Kahl. Ef\ufb01cient optimization for L\u221e problems using pseudoconvexity.\n\nIn ICCV, pages 1\u20138, 2007. 2\n\n[25] Y. D. Seo and R. I. Hartley. A fast method to minimize L\u221e error norm for geometric vision problems. In\n\nICCV, pages 1\u20138, 2007. 2\n\n[26] Y. D. Seo, H. J. Lee, and S. W. Lee. Sparse structures in L-in\ufb01nity norm minimization for structure and\n\nmotion reconstruction. In ECCV, pages I: 780\u2013793, 2008. 2\n\n[27] K. Sim and R. Hartley. Removing outliers using the L\u221e norm. In CVPR, pages I: 485\u2013494, 2006. 2, 8\n[28] J. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim.\n\nMethods Softw., 11/12(1-4):625\u2013653, 1999. 7\n\n[29] P. Zhao and B. Yu. On model selection consistency of Lasso. J. Mach. Learn. Res., 7:2541\u20132563, 2006.\n\n6\n\n9\n\n\f", "award": [], "sourceid": 132, "authors": [{"given_name": "Arnak", "family_name": "Dalalyan", "institution": null}, {"given_name": "Renaud", "family_name": "Keriven", "institution": null}]}