{"title": "Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing", "book": "Advances in Neural Information Processing Systems", "page_first": 1545, "page_last": 1553, "abstract": "The replica method is a non-rigorous but widely-used technique from statistical physics used in the asymptotic analysis of many large random nonlinear problems. This paper applies the replica method to non-Gaussian MAP estimation. It is shown that with large random linear measurements and Gaussian noise, the asymptotic behavior of the MAP estimate of an n-dimensional vector ``decouples as n scalar MAP estimators. The result is a counterpart to Guo and Verdus replica analysis on MMSE estimation. The replica MAP analysis can be readily applied to many estimators used in compressed sensing, including basis pursuit, lasso, linear estimation with thresholding and zero-norm estimation. In the case of lasso estimation, the scalar estimator reduces to a soft-thresholding operator and for zero-norm estimation it reduces to a hard-threshold. Among other benefits, the replica method provides a computationally tractable method for exactly computing various performance metrics including MSE and sparsity recovery.", "full_text": "Asymptotic Analysis of MAP Estimation via the\n\nReplica Method and Compressed Sensing\u2217\n\nSundeep Rangan\n\nQualcomm Technologies\n\nBedminster, NJ\n\nAlyson K. Fletcher\n\nUniversity of California, Berkeley\n\nBerkeley, CA\n\nVivek K Goyal\n\nMass. Inst. of Tech.\n\nCambridge, MA\n\nsrangan@qualcomm.com\n\nalyson@eecs.berkeley.edu\n\nvgoyal@mit.edu\n\nAbstract\n\nThe replica method is a non-rigorous but widely-accepted technique from statis-\ntical physics used in the asymptotic analysis of large, random, nonlinear prob-\nlems. This paper applies the replica method to non-Gaussian maximum a pos-\nteriori (MAP) estimation. It is shown that with random linear measurements and\nGaussian noise, the asymptotic behavior of the MAP estimate of an n-dimensional\nvector \u201cdecouples\u201d as n scalar MAP estimators. The result is a counterpart to Guo\nand Verd\u00b4u\u2019s replica analysis of minimum mean-squared error estimation.\nThe replica MAP analysis can be readily applied to many estimators used in\ncompressed sensing, including basis pursuit, lasso, linear estimation with thresh-\nolding, and zero norm-regularized estimation.\nIn the case of lasso estimation\nthe scalar estimator reduces to a soft-thresholding operator, and for zero norm-\nregularized estimation it reduces to a hard-threshold. Among other bene\ufb01ts, the\nreplica method provides a computationally-tractable method for exactly comput-\ning various performance metrics including mean-squared error and sparsity pat-\ntern recovery probability.\n\n1 Introduction\n\nEstimating a vector x \u2208 Rn from measurements of the form\n(1)\nwhere \u03a6 \u2208 Rm\u00d7n represents a known measurement matrix and w \u2208 Rm represents measurement\nerrors or noise, is a generic problem that arises in a range of circumstances. One of the most basic\nestimators for x is the maximum a posteriori (MAP) estimate\n\ny = \u03a6x + w,\n\n\u02c6xmap(y) = arg max\n\nx\u2208Rn\n\npx|y(x|y),\n\n(2)\n\nwhich is de\ufb01ned assuming some prior on x. For most priors, the MAP estimate is nonlinear and its\nbehavior is not easily characterizable. Even if the priors for x and w are separable, the analysis of\nthe MAP estimate may be dif\ufb01cult since the matrix \u03a6 couples the n unknown components of x with\nthe m measurements in the vector y.\nThe primary contribution of this paper\u2014an abridged version of [1]\u2014is to show that with certain\nlarge random \u03a6 and Gaussian w, there is an asymptotic decoupling of (1) into n scalar MAP estima-\ntion problems. Each equivalent scalar problem has an appropriate scalar prior and Gaussian noise\nwith an effective noise level. The analysis yields the asymptotic joint distribution of each component\nxj of x and its corresponding estimate \u02c6xj in the MAP estimate vector \u02c6xmap(y). From the joint\ndistribution, various further computations can be made, such as the mean-squared error (MSE) of\nthe MAP estimate or the error probability of a hypothesis test computed from the MAP estimate.\n\n\u2217This work was supported in part by a University of California President\u2019s Postdoctoral Fellowship and\n\nNational Science Foundation CAREER Award 0643836.\n\n1\n\n\fReplica Method. Our analysis is based on a powerful but non-rigorous technique from statistical\nphysics known as the replica method. The replica method was originally developed by Edwards and\nAnderson [2] to study the statistical mechanics of spin glasses. Although not fully rigorous from the\nperspective of probability theory, the technique was able to provide explicit solutions for a range of\ncomplex problems where many other methods had previously failed [3].\n\nThe replica method was \ufb01rst applied to the study of nonlinear MAP estimation problems by\nTanaka [4] and M\u00a8uller [5]. These papers studied the behavior of the MAP estimator of a vector\nx with i.i.d. binary components observed through linear measurements of the form (1) with a large\nrandom \u03a6 and Gaussian w. The results were then generalized in a remarkable paper by Guo and\nVerd\u00b4u [6] to vectors x with arbitrary distributions. Guo and Verd\u00b4u\u2019s result was also able to incor-\nporate a large class of minimum postulated MSE estimators, where the estimator may assume a\nprior that is different from the actual prior. The main result in this paper is the corresponding MAP\nstatement to Guo and Verd\u00b4u\u2019s result. In fact, our result is derived from Guo and Verd\u00b4u\u2019s by taking\nappropriate limits with large deviations arguments.\n\nThe non-rigorous aspect of the replica method involves a set of assumptions that include a self-\naveraging property, the validity of a \u201creplica trick,\u201d and the ability to exchange certain limits. Some\nprogress has been made in formally proving these assumptions; a survey of this work can be found\nin [7]. Also, some of the predictions of the replica method have been validated rigorously by other\nmeans [8]. To emphasize our dependence on these unproven assumptions, we will refer to Guo and\nVerd\u00b4u\u2019s result as the Replica MMSE Claim. Our main result, which depends on Guo and Verd\u00b4u\u2019s\nanalysis, will be called the Replica MAP Claim.\n\nApplications to Compressed Sensing. As an application of our main result, we will develop a few\nanalyses of estimation problems that arise in compressed sensing [9\u201311]. In compressed sensing,\none estimates a sparse vector x from random linear measurements. Generically, optimal estimation\nof x with a sparse prior is NP-hard [12]. Thus, most attention has focused on greedy heuristics such\nas matching pursuit and convex relaxations such as basis pursuit [13] or lasso [14]. While successful\nin practice, these algorithms are dif\ufb01cult to analyze precisely.\n\nRecent compressed sensing research has provided scaling laws on numbers of measurements that\nguarantee good performance of these methods [15\u201317]. However, these scaling laws are in general\nconservative. There are, of course, notable exceptions including [18] and [19] which provide match-\ning necessary and suf\ufb01cient conditions for recovery of strictly sparse vectors with basis pursuit and\nlasso. However, even these results only consider exact recovery and are limited to measurements\nthat are noise-free or measurements with a signal-to-noise ratio (SNR) that scales to in\ufb01nity.\n\nMany common sparse estimators can be seen as MAP estimators with certain postulated priors.\nMost importantly, lasso and basis pursuit are MAP estimators assuming a Laplacian prior. Other\ncommonly-used sparse estimation algorithms, including linear estimation with and without thresh-\nolding and zero norm-regularized estimators, can also be seen as MAP-based estimators. For these\nalgorithms, the replica method provides\u2014under the assumption of the replica hypotheses\u2014not just\nbounds, but the exact asymptotic behavior. This in turns permits exact expressions for various per-\nformance metrics such as MSE or fraction of support recovery. The expressions apply for arbitrary\nratios k/n, n/m and SNR.\n\n2 Estimation Problem and Assumptions\n\nConsider the estimation of a random vector x \u2208 Rn from linear measurements of the form\n\n(3)\nwhere y \u2208 Rm is a vector of observations, \u03a6 = AS1/2, A \u2208 Rm\u00d7n is a measurement matrix, S is\na diagonal matrix of positive scale factors,\n\ny = \u03a6x + w = AS1/2x + w,\n\nS = diag (s1, . . . , sn) , sj > 0,\n\n(4)\nand w \u2208 Rm is zero-mean, white Gaussian noise. We consider a sequence of such problems indexed\ny knowing the measurement matrix A and scale factor matrix S.\n\nby n, with n \u2192 \u221e. For each n, the problem is to determine an estimatebx of x from the observations\n\n2\n\n\fThe components xj of x are modeled as zero mean and i.i.d. with some prior probability distribution\n0. We use the subscript\np0(xj ). The per-component variance of the Gaussian noise is E|wj|2 = \u03c32\n\u201c0\u201d on the prior and noise level to differentiate these quantities from certain \u201cpostulated\u201d values to\nbe de\ufb01ned later.\nIn (3), we have factored \u03a6 = AS1/2 so that even with the i.i.d. assumption on xjs above and an\ni.i.d. assumption on entries of A, the model can capture variations in powers of the components of\nx that are known a priori at the estimator. Variations in the power of x that are not known to the\nestimator should be captured in the distribution of x.\n\nWe summarize the situation and make additional assumptions to specify the problem precisely as\nfollows:\n\n(a) The number of measurements m = m(n) is a deterministic quantity that varies with n and\n\nsatis\ufb01es\n\nlim\nn\u2192\u221e\n\nn/m(n) = \u03b2\n\nfor some \u03b2 \u2265 0. (The dependence of m on n is usually omitted for brevity.)\n\n(b) The components xj of x are i.i.d. with probability distribution p0(xj ).\n(c) The noise w is Gaussian with w \u223c N (0, \u03c32\n(d) The components of the matrix A are i.i.d. zero mean with variance 1/m.\n(e) The scale factors sj are i.i.d. and satisfy sj > 0 almost surely.\n(f) The scale factor matrix S, measurement matrix A, vector x and noise w are independent.\n\n0Im).\n\n3 Review of the Replica MMSE Claim\n\nWe begin by reviewing the Replica MMSE Claim of Guo and Verd\u00b4u [6]. Suppose one is given a\npost that may be different from\n\u201cpostulated\u201d prior distribution ppost and a postulated noise level \u03c32\nthe true values p0 and \u03c32\n\n0. We de\ufb01ne the minimum postulated MSE (MPMSE) estimate of x as\n\nwhere C is a normalization constant.\nThe Replica MMSE Claim describes the asymptotic behavior of the postulated MMSE estimator via\nan equivalent scalar estimator. Let q(x) be a probability distribution de\ufb01ned on some set X \u2286 R.\nGiven \u00b5 > 0, let px|z(x | z ; q, \u00b5) be the conditional distribution\n\u03c6(z \u2212 x ; \u00b5)q(x) dx(cid:21)\u22121\n\npx|z(x | z ; q, \u00b5) =(cid:20)Zx\u2208X\n\n\u03c6(z \u2212 x ; \u00b5)q(x)\n\n(6)\n\nwhere \u03c6(\u00b7) is the Gaussian distribution\n\n\u03c6(v ; \u00b5) =\n\n1\n\n\u221a2\u03c0\u00b5\n\ne\u2212|v|2/(2\u00b5).\n\n(7)\n\nThe distribution px|z(x|z ; q, \u00b5) is the conditional distribution of the scalar random variable x \u223c\nq(x) from an observation of the form\n(8)\nwhere v \u223c N (0, 1). Using this distribution, we can de\ufb01ne the scalar conditional MMSE estimate,\n(9)\n\nz = x + \u221a\u00b5v,\n\n\u02c6xmmse\n\nscalar(z ; q, \u00b5) =Zx\u2208X\n\nxpx|z(x|z ; \u00b5) dx.\n\n3\n\nwhere px|y(x | y ; q, \u03c32) is the conditional distribution of x given y under the x distribution and\nnoise variance speci\ufb01ed as parameters after the semicolon:\n\n\u02c6xmpmse(y) = E(cid:0)x | y ; ppost, \u03c32\n\npx|y(x | y ; q, \u03c32) = C\u22121 exp(cid:18)\u2212\n\npost) dx,\n\npost(cid:1) =Z xpx|y(x | y ; ppost, \u03c32\n2\u03c32ky \u2212 AS1/2xk2(cid:19) q(x),\n\n1\n\nq(x) =\n\nq(xj ),\n\n(5)\n\nnYj=1\n\n\fAlso, given two distributions, p0(x) and p1(x), and two noise levels, \u00b50 > 0 and \u00b51 > 0, de\ufb01ne\n\nmse(p1, p0, \u00b51, \u00b50, z) =Zx\u2208X |x \u2212 \u02c6xmmse\n\nscalar(z ; p1, \u00b51)|2px|z(x | z ; p0, \u00b50) dx,\n\n(10)\n\nwhich is the mean-squared error in estimating the scalar x from the variable z in (8) when x has a\ntrue distribution x \u223c p0(x) and the noise level is \u00b5 = \u00b50, but the estimator assumes a distribution\nx \u223c p1(x) and noise level \u00b5 = \u00b51.\nReplica MMSE Claim [6]. Consider the estimation problem in Section 2. Let \u02c6xmpmse(y) be the\nMPMSE estimator based on a postulated prior ppost and postulated noise level \u03c32\npost. For each\nn, let j = j(n) be some deterministic component index with j(n) \u2208 {1, . . . , n}. Then there exist\neffective noise levels \u03c32\n\ne\ufb00 and \u03c32\n\np\u2212e\ufb00 such that:\n\n(a) As n \u2192 \u221e, the random vectors (xj , sj, \u02c6xmpmse\n\n) converge in distribution to the random\nvector (x, s, \u02c6x) where x, s, and v are independent with x \u223c p0(x), s \u223c pS(s), v \u223c N (0, 1),\nand\n(11)\n\n\u02c6x = \u02c6xmmse\n\nj\n\nscalar(z ; ppost, \u00b5p), z = x + \u221a\u00b5v.\np\u2212e\ufb00 /s.\n\nwhere \u00b5 = \u03c32\n\ne\ufb00 /s and \u00b5p = \u03c32\n\n(b) The effective noise levels satisfy the equations\n\n\u03c32\ne\ufb00 = \u03c32\n\u03c32\np\u2212e\ufb00 = \u03c32\n\n0 + \u03b2E [s mse(ppost, p0, \u00b5p, \u00b5, z)]\npost + \u03b2E [s mse(ppost, ppost, \u00b5p, \u00b5p, z)] ,\nwhere the expectations are taken over s \u223c pS(s) and z generated by (11).\n\n(12a)\n(12b)\n\nThe Replica MMSE Claim asserts that the asymptotic behavior of the joint estimation of the n-\ndimensional vector x can be described by n equivalent scalar estimators. In the scalar estimation\nproblem, a component x \u223c p0(x) is corrupted by additive Gaussian noise yielding a noisy mea-\ne\ufb00 /s, which is the effective noise divided by the\nsurement z. The additive noise variance is \u00b5 = \u03c32\nscale factor s. The estimate of that component is then described by the (generally nonlinear) scalar\nestimator \u02c6x(z ; ppost, \u00b5p).\np\u2212e\ufb00 are described by the solutions to \ufb01xed-point equations\nThe effective noise levels \u03c32\n(12). Note that \u03c32\np\u2212e\ufb00 appear implicitly on the left- and right-hand sides of these equations\nvia the terms \u00b5 and \u00b5p. When there are multiple solutions to these equations, the true solution is the\nminimizer of a certain Gibbs\u2019 function [6].\n\ne\ufb00 and \u03c32\n\ne\ufb00 and \u03c32\n\n4 Replica MAP Claim\n\nWe now turn to MAP estimation. Let X \u2286 R be some (measurable) set and consider an estimator of\nthe form\n\n\u02c6xmap(y) = arg min\nx\u2208X n\n\n1\n2\u03b3ky \u2212 AS1/2xk2\n\n2 +\n\nf (xj ),\n\n(13)\n\nnXj=1\n\nwhere \u03b3 > 0 is an algorithm parameter and f : X \u2192 R is some scalar-valued, non-negative cost\nfunction. We will assume that the objective function in (13) has a unique essential minimizer for\nalmost all y.\n\nThe estimator (13) can be interpreted as a MAP estimator. Speci\ufb01cally, for any u > 0, it can be\nveri\ufb01ed that \u02c6xmap(y) is the MAP estimate\n\n\u02c6xmap(y) = arg max\n\nx\u2208X n\n\npx|y(x | y ; pu, \u03c32\nu),\n\nwhere pu(x) and \u03c32\n\nu are the prior and noise level\n\npu(x) =(cid:20)Zx\u2208X n\n\nexp(\u2212uf (x))dx(cid:21)\u22121\n\n4\n\nexp(\u2212uf (x)), \u03c32\n\nu = \u03b3/u,\n\n(14)\n\n\festimators\n\nwhere f (x) = Pj f (xj). To analyze this MAP estimator, we consider a sequence of MMSE\n\n(15)\nThe proof of the Replica MAP Claim below (see [1]) uses a standard large deviations argument to\nshow that\n\nu(cid:1) .\nbxu(y) = E(cid:0)x | y ; pu, \u03c32\nu\u2192\u221ebxu(y) = \u02c6xmap(y)\n\nlim\n\nfor all y. Under the assumption that the behaviors of the MMSE estimators are described by the\nReplica MMSE Claim, we can then extrapolate the behavior of the MAP estimator.\n\nTo state the claim, de\ufb01ne the scalar MAP estimator\n\n\u02c6xmap\nscalar(z ; \u03bb) = arg min\n\nx\u2208X\n\nF (x, z, \u03bb), F (x, z, \u03bb) =\n\n1\n2\u03bb|z \u2212 x|2 + f (x).\n\n(16)\n\nwhere, again, we assume that (16) has a unique essential minimizer for almost all \u03bb and almost all\nz. We also assume that the limit\n\n\u03c32(z, \u03bb) = lim\nx\u2192\u02c6x\n\n|x \u2212 \u02c6x|2\n\n2(F (x, z, \u03bb) \u2212 F (\u02c6x, z, \u03bb))\n\n,\n\n(17)\n\nexists where \u02c6x = \u02c6xmap\n\nscalar(z; \u03bb). We make the following additional assumptions:\n\nAssumption 1 Consider the MAP estimator (13) applied to the estimation problem in Section 2.\nAssume:\n\n(a) For all u > 0 suf\ufb01ciently large, assume the postulated prior pu and noise level \u03c32\n\nu satisfy\nthe Replica MMSE Claim. Also, assume that for the corresponding effective noise levels,\n\u03c32\ne\ufb00 (u) and \u03c32\n\np\u2212e\ufb00 (u), the following limits exists:\n\n\u03c32\ne\ufb00,map = lim\nu\u2192\u221e\n\n\u03c32\ne\ufb00 (u), \u03b3p = lim\nu\u2192\u221e\n\nu\u03c32\n\np\u2212e\ufb00(u).\n\n(b) Suppose for each n, \u02c6xu\n\nj \u2208 {1, . . . , n} based on the postulated prior pu and noise level \u03c32\nfollowing limits can be interchanged:\n\nj (n) is the MMSE estimate of the component xj for some index\nu. Then, assume that the\n\nlim\nn\u2192\u221e\nwhere the limits are in distribution.\n\nlim\nu\u2192\u221e\n\n\u02c6xu\nj (n) = lim\nn\u2192\u221e\n\nlim\nu\u2192\u221e\n\n\u02c6xu\nj (n),\n\n(c) Assume that f (x) is non-negative and satis\ufb01es f (x)/ log |x| \u2192 \u221e as |x| \u2192 \u221e.\n\nItem (a) is stated to reiterate that we are assuming the Replica MMSE Claim is valid. See [1, Sect.\nIV] for additional discussion of technical assumptions.\nReplica MAP Claim [1]. Consider the estimation problem in Section 2. Let \u02c6xmap(y) be the MAP\nestimator (13) de\ufb01ned for some f (x) and \u03b3 > 0 satisfying Assumption 1. For each n, let j = j(n)\nbe some deterministic component index with j(n) \u2208 {1, . . . , n}. Then:\n\n(a) As n \u2192 \u221e, the random vectors (xj, sj, \u02c6xmap\n\n) converge in distribution to the random\nvector (x, s, \u02c6x) where x, s, and v are independent with x \u223c p0(x), s \u223c pS(s), v \u223c N (0, 1),\nand\n(18)\n\nscalar(z, \u03bbp), z = x + \u221a\u00b5v,\n\nj\n\nwhere \u00b5 = \u03c32\n\n\u02c6x = \u02c6xmap\ne\ufb00,map/s and \u03bbp = \u03b3p/s.\n\n(b) The limiting effective noise levels \u03c32\n\ne\ufb00,map and \u03b3p satisfy the equations\n\n\u03c32\ne\ufb00,map = \u03c32\n\n(19a)\n(19b)\nwhere the expectations are taken over x \u223c p0(x), s \u223c pS(s), and v \u223c N (0, 1), with \u02c6x and\nz de\ufb01ned in (18).\n\n0 + \u03b2E(cid:2)s|x \u2212 \u02c6x|2(cid:3)\n\u03b3p = \u03b3 + \u03b2E(cid:2)s\u03c32(z, \u03bbp)(cid:3) ,\n\n5\n\n\fAnalogously to the Replica MMSE Claim, the Replica MAP Claim asserts that asymptotic behavior\nof the MAP estimate of any single component of x is described by a simple equivalent scalar esti-\nmator. In the equivalent scalar model, the component of the true vector x is corrupted by Gaussian\nnoise and the estimate of that component is given by a scalar MAP estimate of the component from\nthe noise-corrupted version.\n\n5 Analysis of Compressed Sensing\n\nOur results thus far hold for any separable distribution for x and under mild conditions on the cost\nfunction f . The role of f is to determine the estimator. In this section, we \ufb01rst consider choices of\nf that yield MAP estimators relevant to compressed sensing. We then additionally impose a sparse\nprior for x for numerical evaluations of asymptotic performance.\n\nLasso Estimation. We \ufb01rst consider the lasso or basis pursuit estimate [13, 14] given by\n\n\u02c6xlasso(y) = arg min\n\nx\u2208Rn\n\n1\n2\u03b3ky \u2212 AS1/2xk2\n\n2 + kxk1,\n\n(20)\n\nwhere \u03b3 > 0 is an algorithm parameter. This estimator is identical to the MAP estimator (13) with\nthe cost function\n\nWith this cost function, the scalar MAP estimator in (16) is given by\n\nf (x) = |x|.\n\nwhere T soft\n\n\u03bb\n\n(z) is the soft thresholding operator\n\n\u02c6xmap\nscalar(z ; \u03bb) = T soft\n\n\u03bb\n\n(z),\n\nT soft\n\u03bb\n\n(z) =( z \u2212 \u03bb,\n\n0,\nz + \u03bb,\n\nif z > \u03bb;\nif |z| \u2264 \u03bb;\nif z < \u2212\u03bb.\n\n(21)\n\n(22)\n\ne\ufb00,map and \u03b3p such that\nThe Replica MAP Claim now states that there exists effective noise levels \u03c32\nfor any component index j, the random vector (xj, sj, \u02c6xj ) converges in distribution to the vector\n(x, s, \u02c6x) where x \u223c p0(x), s \u223c pS(s), and \u02c6x is given by\n\n\u02c6x = T soft\n\u03bbp\n\n(z),\n\nz = x + \u221a\u00b5v,\n\n(23)\n\ne\ufb00,map/s. Hence, the asymptotic behavior of lasso\nwhere v \u223c N (0, 1), \u03bbp = \u03b3p/s, and \u00b5 = \u03c32\nhas a remarkably simple description: the asymptotic distribution of the lasso estimate \u02c6xj of the\ncomponent xj is identical to xj being corrupted by Gaussian noise and then soft-thresholded to\nyield the estimate \u02c6xj.\nTo calculate the effective noise levels, one can perform a simple calculation to show that \u03c32(z, \u03bb) in\n(17) is given by\n\n\u03c32(z, \u03bb) =(cid:26) \u03bb,\n\n0,\n\nif |z| > \u03bb;\nif |z| \u2264 \u03bb.\n\n(24)\n\nHence,\n\n(25)\nwhere we have use the fact that \u03bbp = \u03b3p/s. Substituting (21) and (25) into (19), we obtain the\n\ufb01xed-point equations\n\nE(cid:2)s\u03c32(z, \u03bbp)(cid:3) = E [s\u03bbp Pr(|z| > \u03bbp)] = \u03b3p Pr(|z| > \u03b3p/s)\n\n\u03c32\ne\ufb00,map = \u03c32\n\n0 + \u03b2Ehs|x \u2212 T soft\n\n\u03bbp\n\n(z)|2i\n\n(26a)\n\n(26b)\nwhere the expectations are taken with respect to x \u223c p0(x), s \u223c pS(s), and z in (23). Again, while\nthese \ufb01xed-point equations do not have a closed-form solution, they can be relatively easily solved\nnumerically given distributions of x and s.\n\n\u03b3p = \u03b3 + \u03b2\u03b3p Pr(|z| > \u03b3p/s),\n\n6\n\n\fZero Norm-Regularized Estimation. Lasso can be regarded as a convex relaxation of zero norm-\nregularized estimation\n\n\u02c6xzero(y) = arg min\n\nx\u2208Rn\n\n1\n2\u03b3ky \u2212 AS1/2xk2\n\n2 + kxk0,\n\n(27)\n\nwhere kxk0 is the number of nonzero components of x. For certain strictly sparse priors, zero\nnorm-regularized estimation may provide better performance than lasso. While computing the zero\nnorm-regularized estimate is generally very dif\ufb01cult, we can use the replica analysis to provide a\nsimple characterization of its performance. This analysis can provide a bound on the performance\nachievable by practical algorithms.\n\nThe zero norm-regularized estimator is identical to the MAP estimator (13) with the cost function\n\nf (x) =(cid:26) 0,\n\n1,\n\nif x = 0;\nif x 6= 0.\n\n(28)\n\nTechnically, this cost function does not satisfy the conditions of the Replica MAP Claim. To avoid\nthis problem, we can consider an approximation of (28),\n\nf\u03b4,M (x) =(cid:26) 0,\n\n1,\n\nif |x| < \u03b4;\nif |x| \u2208 [\u03b4, M ],\n\nwhich is de\ufb01ned on the set X = {x : |x| \u2264 M}. We can then take the limits \u03b4 \u2192 0 and M \u2192 \u221e.\nTo simplify the presentation, we will just apply the Replica MAP Claim with f (x) in (28) and omit\nthe details in taking the appropriate limits.\n\nWith f (x) given by (28), the scalar MAP estimator in (16) is given by\nt = \u221a2\u03bb,\n\n\u02c6xmap\nscalar(z ; \u03bb) = T hard\n\n(z),\n\nt\n\nwhere T hard\n\nt\n\nis the hard thresholding operator,\n\nT hard\nt\n\n(z) =(cid:26) z,\n\n0,\n\nif |z| > t;\nif |z| \u2264 t.\n\n(29)\n\n(30)\n\nNow, similar to the case of lasso estimation, the Replica MAP Claim states there exists effective\ne\ufb00,map and \u03b3p such that for any component index j, the random vector (xj , sj, \u02c6xj )\nnoise levels \u03c32\nconverges in distribution to the vector (x, s, \u02c6x) where x \u223c p0(x), s \u223c pS(s), and \u02c6x is given by\n\n(31)\n\n(32)\n\nwhere v \u223c N (0, 1), \u03bbp = \u03b3p/s, \u00b5 = \u03c32\n\ne\ufb00,map/s, and\n\n\u02c6x = T hard\n\nt\n\n(z),\n\nz = x + \u221a\u00b5v,\n\nt =p2\u03bbp =q2\u03b3p/s.\n\nThus, the zero norm-regularized estimation of a vector x is equivalent to n scalar components cor-\nrupted by some effective noise level \u03c32\ne\ufb00,map and hard-thresholded based on a effective noise level\n\u03b3p.\ne\ufb00,map and \u03b3p can be computed similarly to\nThe \ufb01xed-point equations for the effective noise levels \u03c32\nthe case of lasso. Speci\ufb01cally, one can verify that (24) and (25) are both satis\ufb01ed for the hard thresh-\nolding operator as well. Substituting (25) and (29) into (19), we obtain the \ufb01xed-point equations\n\n\u03c32\ne\ufb00,map = \u03c32\n\n(33a)\n(33b)\nwhere the expectations are taken with respect to x \u223c p0(x), s \u223c pS(s), z in (31), and t given by\n(32). These \ufb01xed-point equations can be solved numerically.\n\n\u03b3p = \u03b3 + \u03b2\u03b3p Pr(|z| > t),\n\n0 + \u03b2E(cid:2)s|x \u2212 T hard\n\n(z)|2(cid:3) ,\n\nt\n\n7\n\n\f0\n\n\u22122\n\n\u22124\n\n\u22126\n\n\u22128\n\n\u221210\n\n\u221212\n\n\u221214\n\n\u221216\n\n)\n\nB\nd\n(\n \nr\no\nr\nr\ne\nd\ne\nr\na\nu\nq\ns\n \n\n \n\ni\n\nn\na\nd\ne\nM\n\n\u221218\n\n \n\n0.5\n\nLinear \n(replica)\nLinear\n(sim.)\nLasso \n(replica)\nLasso \n(sim.)\nZero \nnorm\u2212reg\nOptimal\nMMSE \n\n \n\n3\n\n1\n2.5\nMeasurement ratio \u03b2 = n/m\n\n1.5\n\n2\n\nFigure 1: MSE performance prediction with the Replica MAP Claim. Plotted is the median nor-\nmalized SE for various sparse recovery algorithms: linear MMSE estimation, lasso, zero norm-\nregularized estimation, and optimal MMSE estimation. Solid lines show the asymptotic predicted\nMSE from the Replica MAP Claim. For the linear and lasso estimators, the circles and triangles\nshow the actual median SE over 1000 Monte Carlo simulations.\n\nNumerical Simulation. To validate the predictive power of the Replica MAP Claim for \ufb01nite\ndimensions, we performed numerical simulations where the components of x are a zero-mean\nBernoulli\u2013Gaussian process. Speci\ufb01cally,\n\nxj \u223c(cid:26) N (0, 1), with prob. 0.1;\n\n0, with prob. 0.9.\n\nWe took the vector x to have n = 100 i.i.d. components, and we used ten values of m to vary \u03b2 =\nn/m from 0.5 to 3. For each problem size, we simulated the lasso and linear MMSE estimators over\n1000 independent instances with noise levels chosen such that the SNR with perfect side information\nis 10 dB. Each set of trials is represented by its median squared error in Fig. 1.\n\nThe simulated performance is matched very closely by the asymptotic values predicted by the replica\nanalysis. (Analysis of the linear MMSE estimator using the Replica MAP Claim is detailed in [1];\nthe Replica MMSE Claim is also applicable to this estimator.) In addition, the replica analysis can be\napplied to zero norm-regularized and optimal MMSE estimators that are computationally infeasible\nfor large problems. These results are also shown in Fig. 1, illustrating the potential of the replica\nmethod to quantify the precise performance losses of practical algorithms.\n\nAdditional numerical simulations in [1] illustrate convergence to the replica MAP limit, applicability\nto discrete distributions for x, effects of power variations in the components, and accurate prediction\nof the probability of sparsity pattern recovery.\n\n6 Conclusions\n\nWe have shown that the behavior of vector MAP estimators with large random measurement matri-\nces and Gaussian noise asymptotically matches that of a set of decoupled scalar estimation problems.\nWe believe that this equivalence to a simple scalar model will open up numerous doors for analysis,\nparticularly in problems of interest in compressed sensing. One can use the model to dramatically\nimprove upon existing performance analyses for sparsity pattern recovery and MSE. Also, the tech-\nnique is suf\ufb01ciently general to study effects of dynamic range.\n\n8\n\n\fReferences\n\n[1] S. Rangan, A. K. Fletcher, and V. K. Goyal. Asymptotic analysis of MAP estimation via\nthe replica method and applications to compressed sensing. arXiv:0906.3234v1 [cs.IT]., June\n2009.\n\n[2] S. F. Edwards and P. W. Anderson. Theory of spin glasses. J. Phys. F: Metal Physics, 5:965\u2013\n\n974, 1975.\n\n[3] H. Nishimori. Statistical physics of spin glasses and information processing: An introduction.\n\nInternational Series of Monographs on Physics. Oxford Univ. Press, Oxford, UK, 2001.\n\n[4] T. Tanaka. A statistical-mechanics approach to large-system analysis of CDMA multiuser\n\ndetectors. IEEE Trans. Inform. Theory, 48(11):2888\u20132910, November 2002.\n\n[5] R. R. M\u00a8uller. Channel capacity and minimum probability of error in large dual antenna array\nsystems with binary modulation. IEEE Trans. Signal Process., 51(11):2821\u20132828, November\n2003.\n\n[6] D. Guo and S. Verd\u00b4u. Randomly spread CDMA: Asymptotics via statistical physics. IEEE\n\nTrans. Inform. Theory, 51(6):1983\u20132010, June 2005.\n\n[7] M. Talagrand. Spin Glasses: A Challenge for Mathematicians. Springer, New York, 2003.\n[8] A. Montanari and D. Tse. Analysis of belief propagation for non-linear problems: The example\nof CDMA (or: How to prove Tanaka\u2019s formula). arXiv:cs/0602028v1 [cs.IT]., February 2006.\n[9] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc-\ntion from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489\u2013\n509, February 2006.\n\n[10] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289\u20131306, April\n\n2006.\n\n[11] E. J. Cand`es and T. Tao. Near-optimal signal recovery from random projections: Universal\n\nencoding strategies? IEEE Trans. Inform. Theory, 52(12):5406\u20135425, December 2006.\n\n[12] B. K. Natarajan. Sparse approximate solutions to linear systems.\n\n24(2):227\u2013234, April 1995.\n\nSIAM J. Computing,\n\n[13] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM\n\nJ. Sci. Comp., 20(1):33\u201361, 1999.\n\n[14] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B,\n\n58(1):267\u2013288, 1996.\n\n[15] D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete repre-\n\nsentations in the presence of noise. IEEE Trans. Inform. Theory, 52(1):6\u201318, January 2006.\n\n[16] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform.\n\nTheory, 50(10):2231\u20132242, October 2004.\n\n[17] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise.\n\nIEEE Trans. Inform. Theory, 52(3):1030\u20131051, March 2006.\n\n[18] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using\n\u21131-constrained quadratic programming (lasso). IEEE Trans. Inform. Theory, 55(5):2183\u20132202,\nMay 2009.\n\n[19] D. L. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when the pro-\n\njection radically lowers dimension. J. Amer. Math. Soc., 22(1):1\u201353, January 2009.\n\n9\n\n\f", "award": [], "sourceid": 1034, "authors": [{"given_name": "Sundeep", "family_name": "Rangan", "institution": null}, {"given_name": "Vivek", "family_name": "Goyal", "institution": null}, {"given_name": "Alyson", "family_name": "Fletcher", "institution": null}]}