{"title": "b-bit Marginal Regression", "book": "Advances in Neural Information Processing Systems", "page_first": 2062, "page_last": 2070, "abstract": "We consider the problem of sparse signal recovery from $m$ linear measurements quantized to $b$ bits. $b$-bit Marginal Regression is proposed as recovery algorithm. We study the question of choosing $b$ in the setting of a given budget of bits $B = m \\cdot b$ and derive a single easy-to-compute expression characterizing the trade-off between $m$ and $b$. The choice $b = 1$ turns out to be optimal for estimating the unit vector corresponding to the signal for any level of additive Gaussian noise before quantization as well as for adversarial noise. For $b \\geq 2$, we show that Lloyd-Max quantization constitutes an optimal quantization scheme and that the norm of the signal canbe estimated consistently by maximum likelihood.", "full_text": "b-bit Marginal Regression\n\nMartin Slawski\n\nPing Li\n\nDepartment of Statistics and Biostatistics\n\nDepartment of Statistics and Biostatistics\n\nDepartment of Computer Science\n\nDepartment of Computer Science\n\nRutgers University\n\nRutgers University\n\nmartin.slawski@rutgers.edu\n\npingli@stat.rutgers.edu\n\nAbstract\n\nWe consider the problem of sparse signal recovery from m linear measurements\nquantized to b bits. b-bit Marginal Regression is proposed as recovery algorithm.\nWe study the question of choosing b in the setting of a given budget of bits B =\nm \u00b7 b and derive a single easy-to-compute expression characterizing the trade-off\nbetween m and b. The choice b = 1 turns out to be optimal for estimating the unit\nvector corresponding to the signal for any level of additive Gaussian noise before\nquantization as well as for adversarial noise. For b \u2265 2, we show that Lloyd-Max\nquantization constitutes an optimal quantization scheme and that the norm of the\nsignal can be estimated consistently by maximum likelihood by extending [15].\n\n1 Introduction\nConsider the common compressed sensing (CS) model\n\nyi = hai, x\u2217i + \u03c3\u03b5i,\ny = Ax\u2217 + \u03c3\u03b5, y = (yi)m\n\ni = 1, . . . , m, or equivalently\n\ni=1, A = (Aij )m,n\n\ni,j=1, {ai = (Aij )n\n\nj=1}m\n\ni=1, \u03b5 = (\u03b5i)m\n\ni=1,\n\n(1)\n\nwhere the {Aij} and the {\u03b5i} are i.i.d. N (0, 1) (i.e. standard Gaussian) random variables, the latter\nof which will be referred to by the term \u201cadditive noise\u201d and accordingly \u03c3 > 0 as \u201cnoise level\u201d, and\nx\u2217 \u2208 Rn is the signal of interest to be recovered given (A, y). Let s = kx\u2217k0 := |S(x\u2217)|, where\nS(x\u2217) = {j : |x\u2217j| > 0}, be the \u21130-norm of x\u2217 (i.e. the cardinality of its support S(x\u2217)). One of the\ncelebrated results in CS is that accurate recovery of x\u2217 is possible as long as m & s log n, and can\nbe carried out by several computationally tractable algorithms e.g. [3, 5, 21, 26, 29].\n\nSubsequently, the concept of signal recovery from an incomplete set (m < n) of linear measure-\nments was developed further to settings in which only coarsely quantized versions of such linear\nmeasurements are available, with the extreme case of single-bit measurements [2, 8, 11, 22, 23, 28,\n16]. More generally, one can think of b-bit measurements, b \u2208 {1, 2, . . .}. Assuming that one is free\nto choose b given a \ufb01xed budget of bits B = m \u00b7 b gives rise to a trade-off between m and b. An\noptimal balance of these two quantities minimizes the error in recovering the signal. Such optimal\ntrade-off depends on the quantization scheme, the noise level, and the recovery algorithm. This\ntrade-off has been considered in previous CS literature [13]. However, the analysis therein concerns\nan oracle-assisted recovery algorithm equipped with knowledge of S(x\u2217) which is not fully realistic.\nIn [9] a speci\ufb01c variant of Iterative Hard Thresholding [1] for b-bit measurements is considered. It is\nshown via numerical experiments that choosing b \u2265 2 can in fact achieve improvements over b = 1\nat the level of the total number of bits required for approximate signal recovery. On the other hand,\nthere is no analysis supporting this observation. Moreover, the experiments in [9] only concern a\nnoiseless setting. Another approach is to treat quantization as additive error and to perform signal\nrecovery by means of variations of recovery algorithms for in\ufb01nite-precision CS [10, 14, 18]. In this\nline of research, b is assumed to be \ufb01xed and a discussion of the aforementioned trade-off is missing.\nIn the present paper we provide an analysis of compressed sensing from b-bit measurements using a\nspeci\ufb01c approach to signal recovery which we term b-bit Marginal Regression. This approach builds\non a method for one-bit compressed sensing proposed in an in\ufb02uential paper by Plan and Vershynin\n[23] which has subsequently been re\ufb01ned in several recent works [4, 24, 28]. As indicated by the\nname, b-bit Marginal Regression can be seen as a quantized version of Marginal Regression, a simple\n\n1\n\n\fyet surprisingly effective approach to support recovery that stands out due to its low computational\ncost, requiring only a single matrix-vector multiplication and a sorting operation [7]. Our analysis\nyields a precise characterization of the above trade-off involving m and b in various settings. It\nturns out that the choice b = 1 is optimal for recovering the normalized signal x\u2217u = x\u2217/kx\u2217k2,\nunder additive Gaussian noise as well as under adversarial noise. It is shown that the choice b =\n2 additionally enables one to estimate kx\u2217k2, while being optimal for recovering x\u2217u for b \u2265 2.\nHence for the speci\ufb01c recovery algorithm under consideration, it does not pay off to take b > 2.\nFurthermore, once the noise level is signi\ufb01cantly high, b-bit Marginal Regression is empirically\nshown to perform roughly as good as several alternative recovery algorithms, a \ufb01nding suggesting\nthat in high-noise settings taking b > 2 does not pay off in general. As an intermediate step in our\nanalysis, we prove that Lloyd-Max quantization [19, 20] constitutes an optimal b-bit quantization\nscheme in the sense that it leads to a minimization of an upper bound on the reconstruction error.\nNotation: We use [d] = {1, . . . , d} and S(x) for the support of x \u2208 Rn. x \u2299 x\u2032 = (xj \u00b7 x\u2032j )n\nj=1.\nI(P ) is the indicator function of expression P . The symbol \u221d means \u201cup to a positive universal\nconstant\u201d. Supplement: Proofs and additional experiments can be found in the supplement.\n\n2 From Marginal Regression to b-bit Marginal Regression\nSome background on Marginal Regression. It is common to perform sparse signal recovery by\nsolving an optimization problem of the form\n\n1\n2mky \u2212 Axk2\n\nx\n\n\u03b3\n2\n\nmin\n\n2 +\n\n(2)\nwhere P is a penalty term encouraging sparse solutions. Standard choices for P are P (x) = kxk0,\nwhich is computationally not feasible in general, its convex relaxation P (x) = kxk1 or non-convex\npenalty terms like SCAD or MCP that are more amenable to optimization than the \u21130-norm [27].\nAlternatively P can as well be used to enforce a constraint by setting P (x) = \u03b9C(x), where \u03b9C(x) =\n0 if x \u2208 C and +\u221e otherwise, with C = {x \u2208 Rn : kxk0 \u2264 s} or C = {x \u2208 Rn : kxk1 \u2264 r} being\nstandard choices. Note that (2) is equivalent to the optimization problem\n\nP (x), \u03b3 \u2265 0,\n\nmin\nx \u2212 h\u03b7, xi +\n\n1\n2\n\nx\u22a4 A\u22a4A\nm\n\nx +\n\n\u03b3\n2\n\nP (x), where \u03b7 =\n\nA\u22a4y\nm\n\n.\n\nReplacing A\u22a4A/m by E[A\u22a4A/m] = I (recall that the entries of A are i.i.d. N (0, 1)), we obtain\n\nmin\nx \u2212 h\u03b7, xi +\n\n1\n2kxk2\n\n2 +\n\n\u03b3\n2\n\nP (x), \u03b7 =\n\nA\u22a4y\nm\n\n,\n\n(3)\n\nwhich tends to be much simpler to solve than (2) as the \ufb01rst two terms are separable in the compo-\nnents of x. For the choices of P mentioned above, we obtain closed form solutions:\n\nP (x) = kxk1 : bxj = (|\u03b7j| \u2212 \u03b3)+ sign(\u03b7j),\n\nP (x) = kxk0 : bxj = \u03b7jI(|\u03b7j| \u2265 \u03b31/2)\nP (x) = \u03b9x:kxk0\u2264s : bxj = \u03b7j I(|\u03b7j| \u2265 |\u03b7(s)|) P (x) = \u03b9x:kxk1\u2264r : bxj = (|\u03b7j| \u2212 \u03b3\u2217)+ sign(\u03b7j) (4)\nfor j \u2208 [n], where + denotes the positive part and |\u03b7(s)| is the sth largest entry in \u03b7 in absolute\nmagnitude and \u03b3\u2217 = min{\u03b3 \u2265 0 :Pn\nj=1(|\u03b7j| \u2212 \u03b3)+ \u2264 r}. In other words, the estimators are hard-\nrespectively soft-thresholded versions of \u03b7j = A\u22a4j y/m which are essentially equal to the univariate\n(or marginal) regression coef\ufb01cients \u03b8j = A\u22a4j y/kAjk2\n2 in the sense that \u03b7j = \u03b8j(1 + OP(m\u22121)),\nj \u2208 [n], hence the term \u201cmarginal regression\u201d. In the literature, it is the estimator in the left half of\n(4) that is popular [7], albeit as a means to infer the support of x\u2217 rather than x\u2217 itself. Under (2) the\nperformance with respect to signal recovery can still be reasonable in view of the statement below.\n\nProposition 1. Consider model (1) with x\u2217 6= 0 and the Marginal Regression estimator bx de\ufb01ned\ncomponent-wise bybxj = \u03b7jI(|\u03b7j| \u2265 |\u03b7(s)|), j \u2208 [n], where \u03b7 = A\u22a4y/m. Then there exists positive\nconstants c, C > 0 such that with probability at least 1 \u2212 cn\u22121\n\nkx\u2217k2 \u2264 C kx\u2217k2 + \u03c3\nkbx \u2212 x\u2217k2\n\nkx\u2217k2 r s log n\n\nm\n\n.\n\n(5)\n\nIn comparison,\n\nthe relative \u21132-error of more sophisticated methods like the lasso scales as\n\nO({\u03c3/kx\u2217k2}ps log(n)/m) which is comparable to (5) once \u03c3 is of the same order of magni-\ntude as kx\u2217k2. Marginal Regression can also be interpreted as a single projected gradient iteration\n\n2\n\n\ffrom 0 for problem (2) with P = \u03b9x:kxk0\u2264s. Taking more than one projected gradient iteration gives\nrise to a popular recovery algorithm known as Iterative Hard Thresholding (IHT, [1]).\nCompressed sensing with non-linear observations and the method of Plan & Vershynin. As a\ngeneralization of (1) one can consider measurements of the form\ni \u2208 [m]\n\n(6)\nfor some map Q. Without loss generality, one may assume that kx\u2217k2 = 1 as long as x\u2217 6= 0 (which\nis assumed in the sequel) by de\ufb01ning Q accordingly. Plan and Vershynin [23] consider the following\noptimization problem for recovering x\u2217, and develop a framework for analysis that covers even more\ngeneral measurement models than (6). The proposed estimator minimizes\n\nyi = Q(hai, x\u2217i + \u03c3\u03b5i),\n\nmin\n\nx:kxk2\u22641,kxk1\u2264\u221as \u2212 h\u03b7, xi ,\n\n\u03b7 = A\u22a4y/m.\n\n(7)\n\nNote that the constraint set {x : kxk2 \u2264 1, kxk1 \u2264 \u221as} contains {x : kxk2 \u2264 1, kxk0 \u2264 s}. The\n\nauthors prefer the former because it is suited for approximately sparse signals as well and second\nbecause it is convex. However, the optimization problem with sparsity constraint is easy to solve:\n\nmin\n\nx:kxk2\u22641,kxk0\u2264s\u2212 h\u03b7, xi ,\n\n\u03b7 = A\u22a4y/m.\n\n(8)\n\nLemma 1. The solution of problem (8) is given bybx = ex/kexk2, exj = \u03b7j I(|\u03b7j| \u2265 |\u03b7(s)|), j \u2208 [n].\n\nWhile this is elementary we state it as a separate lemma as there has been some confusion in the ex-\nisting literature. In [4] the same solution is obtained after (unnecessarily) convexifying the constraint\nset, which yields the unit ball of the so-called s-support norm. In [24] a family of concave penalty\nterms including the SCAD and MCP is proposed in place of the cardinality constraint. However, in\nlight of Lemma 1, the use of such penalty terms lacks motivation.\nThe minimization problem (8) is essentially that of Marginal Regression (3) with P = \u03b9x:kxk0\u2264s, the\nonly difference being that the norm of the solution is \ufb01xed to one. Note that the Marginal Regression\nestimator is equi-variant w.r.t. re-scaling of y, i.e. for a \u00b7 y with a > 0,bx changes to abx. In addition,\nlet \u03b1, \u03b2 > 0 and de\ufb01nebx(\u03b1) andbx[\u03b2] as the minimizers of the optimization problems\nIt is not hard to verify thatbx(\u03b1)/kbx(\u03b1)k2 = bx[\u03b2]/kbx[\u03b2]k2 = bx[1]. In summary, for estimating the\ndirection x\u2217u = x\u2217/kx\u2217k2 it does not matter if a quadratic term in the objective or an \u21132-norm con-\nstraint is used. Moreover, estimation of the \u2019scale\u2019 \u03c8\u2217 = kx\u2217k2 and the direction can be separated.\nAdopting the framework in [23], we provide a straightforward bound on the \u21132-error ofbx minimizing\n(8). To this end we de\ufb01ne two quantities which will be of central interest in subsequent analysis.\n\nx:kxk2\u2264\u03b2,kxk0\u2264s \u2212 h\u03b7, xi .\n\nx:kxk0\u2264s\u2212 h\u03b7, xi +\n\n\u03b1\n2 kxk2\n2,\n\nmin\n\nmin\n\n(9)\n\n\u03bb = E[g \u03b8(g)], g \u223c N (0, 1), where \u03b8 is de\ufb01ned by E[y1|a1] = \u03b8(ha1, x\u2217i)\n\u03a8 = inf{C > 0 : P{max1\u2264j\u2264n |\u03b7j \u2212 E[\u03b7j]| \u2264 Cplog(n)/m} \u2265 1 \u2212 1/n.}.\n\nThe quantity \u03bb concerns the deterministic part of the analysis as it quanti\ufb01es the distortion of the\nlinear measurements under the map Q, while \u03a8 is used to deal with the stochastic part. The de\ufb01nition\nof \u03a8 is based on the usual tail bound for the maximum of centered sub-Gaussian random variables.\nIn fact, as long as Q has bounded range, Gaussianity of the {Aij} implies that the {\u03b7j \u2212 E[\u03b7j]}n\nj=1\nare zero-mean sub-Gaussian. Accordingly, the constant \u03a8 is proportional to the sub-Gaussian norm\nof the {\u03b7j \u2212 E[\u03b7j]}n\nProposition 2. Consider model (6) s.t. kx\u2217k2 = 1 and (10). Suppose that \u03bb > 0 and denote by bx\nthe minimizer of (8). Then with probability at least 1 \u2212 1/n, it holds that\n\nj=1, cf. [25].\n\n(10)\n\nkx\u2217 \u2212bxk2 \u2264 2\u221a2\n\n\u03a8\n\n\u03bbr s log n\n\nm\n\n.\n\n(11)\n\nSo far s has been assumed to be known. If that is not the case, s can be estimated as follows.\n\nProposition 3. In the setting of Proposition 2, considerbs = |{j : |\u03b7j| > \u03a8plog(n)/m}| andbx as\nthe minimizer of (8) with s replaced bybs. Then with probability at least 1 \u2212 1/n, S(bx) \u2286 S(x\u2217)\n\n(i.e. no false positive selection). Moreover, if\n\nmin\n\nj\u2208S(x\u2217)|x\u2217j| > (2\u03a8/\u03bb)plog(n)/m, one has S(bx) = S(x\u2217).\n\n(12)\n\n3\n\n\fb-bit Marginal Regression. b-bit quantized measurements directly \ufb01t into the non-linear obser-\nvation model (6). Here the map Q represents a quantizer that partitions R+ into K = 2b\u22121 bins\nk=1 given by distinct thresholds t = (t1, . . . , tK\u22121)\u22a4 (in increasing order) and t0 = 0,\n{Rk}K\ntK = +\u221e such that R1 = [t0, t1), . . . ,RK = [tK\u22121, tK). Each bin is assigned a distinct rep-\nresentative from M = {\u00b51, . . . , \u00b5K} (in increasing order) so that Q : R \u2192 \u2212M\u222aM is de\ufb01ned by\nz 7\u2192 Q(z) = sign(z)PK\n\nk=1 \u00b5kI(|z| \u2208 Rk). Expanding model (6) accordingly, we obtain\n\nyi = sign(hai, x\u2217i + \u03c3\u03b5i)PK\n= sign(hai, x\u2217ui + \u03c4 \u03b5i)PK\n\nk=1 \u00b5kI(|(hai, x\u2217i + \u03c3\u03b5i)| \u2208 Rk)\nk=1 \u00b5kI(|(hai, x\u2217ui + \u03c4 \u03b5i)| \u2208 Rk/\u03c8\u2217), i \u2208 [m],\n\nwhere \u03c8\u2217 = kx\u2217k2, x\u2217u = x\u2217/\u03c8\u2217 and \u03c4 = \u03c3/\u03c8\u2217. Thus the scale \u03c8\u2217 of the signal can be absorbed\ninto the de\ufb01nition of the bins respectively thresholds which should be proportional to \u03c8\u2217. We may\nthus again \ufb01x \u03c8\u2217 = 1 and in turn x\u2217 = x\u2217u, \u03c3 = \u03c4 w.l.o.g. for the analysis below. Estimation of \u03c8\u2217\nseparately from x\u2217u will be discussed in an extra section.\n3 Analysis\n\nIn this section we study in detail the central question of the introduction. Suppose we have a \ufb01xed\nbudget B of bits available and are free to choose the number of measurements m and the number\n\nof bits per measurement b subject to B = m \u00b7 b such that the \u21132-error kbx \u2212 x\u2217k2 of b-bit Marginal\n\nRegression is as small as possible. What is the optimal choice of (m, b)? In order to answer this\nquestion, let us go back to the error bound (11). That bound applies to b-bit Marginal Regression for\nany choice of b and varies with \u03bb = \u03bbb and \u03a8 = \u03a8b, both of which additionally depend on \u03c3, the\nchoice of the thresholds t and the representatives \u00b5. It can be shown that the dependence of (11) on\nthe ratio \u03a8/\u03bb is tight asymptotically as m \u2192 \u221e. Hence it makes sense to compare two different\nchoices b and b\u2032 in terms of the ratio of \u2126b = \u03a8b/\u03bbb and \u2126b\u2032 = \u03a8b\u2032/\u03bbb\u2032. Since the bound (11)\ndecays with \u221am, for b\u2032-bit measurements, b\u2032 > b, to improve over b-bit measurements with respect\nto the total #bits used, it is then required that \u2126b/\u2126b\u2032 > pb\u2032/b. The route to be taken is thus as\nfollows: we \ufb01rst derive expressions for \u03bbb and \u03a8b and then minimize the resulting expression for \u2126b\nw.r.t. the free parameters t and \u00b5. We are then in position to compare \u2126b/\u2126b\u2032 for b 6= b\u2032.\nEvaluating \u03bbb = \u03bbb(t, \u00b5). Below, \u2299 denotes the entry-wise multiplication between vectors.\nLemma 2. We have \u03bbb(t, \u00b5) = h\u03b1(t), E(t) \u2299 \u00b5i /(1 + \u03c32), where\n\u03b1(t) = (\u03b11(t), . . . , \u03b1K(t))\u22a4 , \u03b1k(t) = P{|eg| \u2208 Rk(t)} , eg \u223c N (0, 1 + \u03c32), k \u2208 [K],\nE(t) = (E1(t), . . . , EK(t))\u22a4 , Ek(t) = E[eg|eg \u2208 Rk(t)], eg \u223c N (0, 1 + \u03c32), k \u2208 [K].\nLemma 3. As |x\u2217j| \u2192 0, j = 1, . . . , n, and as m \u2192 \u221e, we have \u03a8b(t, \u00b5) \u221dph\u03b1(t), \u00b5 \u2299 \u00b5i.\nNote that the proportionality constant (not depending on b) in front of the given expression does not\nneed to be known as it cancels out when computing ratios \u2126b/\u2126b\u2032. The asymptotics |x\u2217j| \u2192 0, j \u2208\n[n], is limiting but still makes sense for s growing with n (recall that we \ufb01x kx\u2217k2 = 1 w.l.o.g.).\nOptimal choice of t and \u00b5. It turns that the optimal choice of (t, \u00b5) minimizing \u03a8b/\u03bbb coincides\nwith the solution of an instance of the classical Lloyd-Max quantization problem [19, 20] stated\nbelow. Let h be a random variable with \ufb01nite variance and Q the quantization map from above.\n\nEvaluating \u03a8b = \u03a8b(t, \u00b5). Exact evaluation proves to be dif\ufb01cult. We hence resort to an analyti-\ncally more tractable approximation which is still suf\ufb01ciently accurate as con\ufb01rmed by experiments.\n\nmin\nt,\u00b5\n\nE[{h \u2212 Q(h; t, \u00b5)}2] = min\n\nt,\u00b5\n\nk=1 \u00b5kI(|h| \u2208 Rk(t) )}2].\n\n(13)\n\nE[{h \u2212 sign(h)PK\n\nProblem (13) can be seen as a one-dimensional k-means problem at the population level, and it is\nsolved in practice by an alternating scheme similar to that used for k-means. For h from a log-\nconcave distribution (e.g. Gaussian) that scheme can be shown to deliver the global optimum [12].\nTheorem 1. Consider the minimization problem mint,\u00b5 \u03a8b(t, \u00b5)/\u03bbb(t, \u00b5). Its minimizer (t\u2217, \u00b5\u2217)\nequals that of the Lloyd-Max problem (13) for h \u223c N (0, 1 + \u03c32). Moreover,\n\n\u2126b(t\u2217, \u00b5\u2217) = \u03a8b(t\u2217, \u00b5\u2217)/\u03bbb(t\u2217, \u00b5\u2217) \u221dp(\u03c32 + 1)/\u03bbb,0(t\u22170, \u00b5\u22170),\n\nwhere \u03bbb,0(t\u22170, \u00b5\u22170) denotes the value of \u03bbb for \u03c3 = 0 evaluated at (t\u22170, \u00b5\u22170), the choice of (t, \u00b5)\nminimizing \u2126b for \u03c3 = 0.\n\n4\n\n\fRegarding the choice of (t, \u00b5) the result of Theorem 1 may not come as a suprise as the entries of y\nare i.i.d. N (0, 1 + \u03c32). It is less immediate though that this speci\ufb01c choice can also be motivated\nas the one leading to the minimization of the error bound (11). Furthermore, Theorem 1 implies\nthat the relative performance of b- and b\u2032-bit measurements does not depend on \u03c3 as long as the\nrespective optimal choice of (t, \u00b5) is used, which requires \u03c3 to be known. Theorem 1 provides\nan explicit expression for \u2126b that is straightforward to compute. The following table lists ratios\n\u2126b/\u2126b\u2032 for selected values of b and b\u2032.\n\nb = 1, b\u2032 = 2\n1.178\n\nb = 2, b\u2032 = 3\n1.046\n\nb = 3, b\u2032 = 4\n1.013\n\n\u2126b/\u2126b\u2032:\n\nrequired for b\u2032 \u226b b: \u221a2 \u2248 1.414 p3/2 \u2248 1.225 p4/3 \u2248 1.155\n\nThese \ufb01gures suggests that the smaller b, the better the performance for a given budget of bits B.\nBeyond additive noise. Additive Gaussian noise is perhaps the most studied form of perturbation,\nbut one can of course think of numerous other mechanisms whose effect can be analyzed on the\nbasis of the same scheme used for additive noise as long as it is feasible to obtain the corresponding\nexpressions for \u03bb and \u03a8. We here do so for the following mechanisms acting after quantization.\n(I) Random bin \ufb02ip. For i \u2208 [m]: with probability 1 \u2212 p, yi remains unchanged. With probability p,\nyi is changed to an element from (\u2212M \u222a M) \\ {yi} uniformly at random.\n(II) Adversarial bin \ufb02ip. For i \u2208 [m]: Write yi = q\u00b5k for q \u2208 {\u22121, 1} and \u00b5k \u2208 M. With\nprobability 1 \u2212 p, yi remains unchanged. With probability p, yi is changed to \u2212q\u00b5K.\nNote that for b = 1, (I) and (II) coincide (sign \ufb02ip with probability p). Depending on the magnitude\nof p, the corresponding value \u03bb = \u03bbb,p may even be negative, which is unlike the case of additive\nnoise. Recall that the error bound (11) requires \u03bb > 0. Borrowing terminology from robust statistics,\nwe consider \u00afpb = min{p : \u03bbb,p \u2264 0} as the breakdown point, i.e. the (expected) proportion of\ncontaminated observations that can still be tolerated so that (11) continues to hold. Mechanism (II)\nproduces a natural counterpart of gross corruptions in the standard setting (1).\nIt can be shown\nthat among all maps \u2212M \u222a M \u2192 \u2212M \u222a M applied randomly to the observations with a \ufb01xed\nprobability, (II) maximizes the ratio \u03a8/\u03bb, hence the attribute \u201cadversarial\u201d. In Figure 1 we display\n\u03a8b,p/\u03bbb,p for b \u2208 {1, 2, 3, 4} for both (I) and (II). The table below lists the corresponding breakdown\npoints. For simplicity, (t, \u00b5) are not optimized but set to the optimal (in the sense of Lloyd-Max)\nchoice (t\u22170, \u00b5\u22170) in the noiseless case. The underlying derivations can be found in the supplement.\n\n(I)\n\u00afpb\n\nb = 1\n1/2\n\nb = 2\n3/4\n\nb = 3\n7/8\n\nb = 4\n15/16\n\n(II)\n\u00afpb\n\nb = 1\n1/2\n\nb = 2\n0.42\n\nb = 3\n0.36\n\nb = 4\n0.31\n\nFigure 1 and the table provide one more argument in favour of one-bit measurements as they offer\nbetter robustness vis-`a-vis adversarial corruptions. In fact, once the fraction of such corruptions\nreaches 0.2, b = 1 performs best \u2212 on the measurement scale. For the milder corruption scheme (I),\nb = 2 turns out to the best choice for signi\ufb01cant but moderate p.\n\n1.8\n\n1.6\n\n1.4\n\n1.2\n\n1\n\n)\n\u03bb\n/\n\u03a8\n\n(\n0\n1\ng\no\nl\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n0\n\n0.1\n\nb = 1\n\nb = 2\n\nb = 3 / 4 (~overlap)\n\n0.2\n\n0.3\n\nfraction of bin flips\n\n0.4\n\n0.5\n\n2.5\n\n2\n\n1.5\n\n)\n\u03bb\n/\n\u03a8\n\n(\n0\n1\ng\no\nl\n\n1\n\n0.5\n\n0\n0\n\nb = 4\n\nb = 3\n\nb = 2\n\nb = 1\n\n0.1\n\n0.2\n\n0.3\n\n0.4\n\n0.5\n\nfraction of gross corruptions\n\nFigure 1: \u03a8b,p/\u03bbb,p (log10-scale), b \u2208 {1, 2, 3, 4}, p \u2208 [0, 0.5] for mechanisms (I, L) and (II, R).\n4 Scale estimation\n\nIn Section 2, we have decomposed x\u2217 = x\u2217u\u03c8\u2217 into a product of a unit vector x\u2217u and a scale\nparameter \u03c8\u2217 > 0. We have pointed out that x\u2217u can be estimated by b-bit Marginal Regression\n\n5\n\n\fseparately from \u03c8\u2217 since the latter can be absorbed into the de\ufb01nition of the bins {Rk}. Accordingly,\nwe may estimate x\u2217 asbx = bxub\u03c8 withbxu and b\u03c8 estimating x\u2217u and \u03c8\u2217, respectively. We here consider\nthe maximum likelihood estimator (MLE) for \u03c8\u2217, by following [15] which studied the estimation of\nthe scale parameter for the entire \u03b1-stable family of distributions. The work of [15] was motivated\nby a different line of one scan 1-bit CS algorithm [16] based on \u03b1-stable designs [17].\nFirst, we consider the case \u03c3 = 0, so that the {yi} are i.i.d. N (0, (\u03c8\u2217)2). The likelihood function is\n\nL(\u03c8) =\n\nmYi=1\n\nKXk=1\n\nI(yi \u2208 Rk) P(|yi| \u2208 Rk) =\n\nKYk=1\n\n{2(\u03a6(tk/\u03c8) \u2212 \u03a6(tk\u22121/\u03c8))}mk ,\n\n(14)\n\nwhere mk = |{i : |yi| \u2208 Rk}|, k \u2208 [K], and \u03a6 denotes the standard Gaussian cdf. Note that for\nK = 1, L(\u03c8) is constant (i.e. does not depend on \u03c8) which con\ufb01rms that for b = 1, it is impossible\nto recover \u03c8\u2217. For K = 2 (i.e. b = 2), the MLE has a simple a closed form expression given by\nb\u03c8 = t1/\u03a6\u22121(0.5(1 + m1/m)). The following tail bound establishes fast convergence of b\u03c8 to \u03c8\u2217.\nProposition 4. Let \u03b5 \u2208 (0, 1) and c = 2{\u03c6\u2032(t1/\u03c8\u2217)}2, where \u03c6\u2032 denotes the derivative of the\nstandard Gaussian pdf. With probability at least 1 \u2212 2 exp(\u2212cm\u03b52), we have |b\u03c8/\u03c8\u2217 \u2212 1| \u2264 \u03b5.\nThe exponent c is maximized for t1 = \u03c8\u2217 and becomes smaller as t1/\u03c8\u2217 moves away from 1.\nWhile scale estimation from 2-bit measurements is possible, convergence can be slow if t1 is not\nwell chosen. For b \u2265 3, convergence can be faster but the MLE is not available in closed form [15].\nWe now turn to the case \u03c3 > 0. The MLE based on (14) is no longer consistent. If x\u2217u is known then\nthe joint likelihood of for (\u03c8\u2217, \u03c3) is given by\n\n(cid:26)\u03a6(cid:18) ui \u2212 \u03c8 hai, x\u2217ui\n\nmYi=1\n\n(cid:19) \u2212 \u03a6(cid:18) li \u2212 \u03c8 hai, x\u2217ui\n\n(cid:19)(cid:27) ,\n\n(15)\n\nL(\u03c8,e\u03c3) =\n\ne\u03c3\n\ne\u03c3\n\nwhere [li, ui] denotes the interval the i-th observation is contained in before quantization, i \u2208 [m]. It\nis not clear to us whether the likelihood is log-concave, which would ensure that the global optimum\ncan be obtained by convex programming. Empirically, we have not encountered any issue with\nspurious local minima when using \u03c8 = 0 and e\u03c3 as the MLE from the noiseless case as starting\npoint. The only issue with (15) we are aware of concerns the case in which there exists \u03c8 so that\n\u03c8 hai, x\u2217ui \u2208 [li, ui], i \u2208 [m]. In this situation, the MLE for \u03c3 equals zero and the MLE for \u03c8 may\nnot be unique. However, this is a rather unlikely scenario as long as there is a noticeable noise level.\nAs x\u2217u is typically unknown, we may follow the plug-in principle, replacing x\u2217u by an estimatorbxu.\n\n5 Experiments\nWe here provide numerical results supporting/illustrating some of the key points made in the previ-\nous sections. We also compare b-bit Marginal Regression to alternative recovery algorithms.\nSetup. Our simulations follow model (1) with n = 500, s \u2208 {10, 20, . . . , 50}, \u03c3 \u2208 {0, 1, 2}\nand b \u2208 {1, 2}. Regarding x\u2217, the support and its signs are selected uniformly at random, while\nthe absolute magnitude of the entries corresponding to the support are drawn from the uniform\ndistribution on [\u03b2, 2\u03b2], where \u03b2 = f \u00b7 (1/\u03bb1,\u03c3)plog(n)/m and m = f 2(1/\u03bb1,\u03c3)2s log n with\nf \u2208 {1.5, 3, 4.5, . . . , 12} controlling the signal strength. The resulting signal is then normalized\nto unit 2-norm. Before normalization, the norm of the signal lies in [1,\u221a2] by construction which\nensures that as f increases the signal strength condition (12) is satis\ufb01ed with increasing probabil-\nity. For b = 2, we use Lloyd-Max quantization for a N (0, 1)-random variable which is optimal for\n\u03c3 = 0, but not for \u03c3 > 0. Each possible con\ufb01guration for s, f and \u03c3 is replicated 20 times. Due to\nspace limits, a representative subset of the results is shown; the rest can be found in the supplement.\nEmpirical veri\ufb01cation of the analysis in Section 3. The experiments reveal that what is predicted\nby the analysis of the comparison of the relative performance of 1-bit and 2-bit measurements for\nestimating x\u2217 closely agrees with what is observed empirically, as can be seen in Figure 2.\nEstimation of the scale and the noise level. Figure 3 suggests that the plug-in MLE for (\u03c8\u2217 =\nkx\u2217k2, \u03c3) is a suitable approach, at least as long as \u03c8\u2217/\u03c3 is not too small. For \u03c3 = 2, the plug-in\nMLE for \u03c8\u2217 appears to have a noticeable bias as it tends to 0.92 instead of 1 for increasing f (and\nthus increasing m). Observe that for \u03c3 = 0, convergence to the true value 1 is smaller as for \u03c3 = 1,\n\n6\n\n\f \n\nb = 1\nb = 2\nrequired improvement\npredicted improvement\n\n\u03c3 =0, s = 10\n\n\u22121\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n\u22123\n\n\u22123.5\n\n\u22124\n\n\u22124.5\n\n\u22125\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n \n\nb = 1\nb = 2\nrequired improvement\npredicted improvement\n\n\u03c3 =0, s = 50\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n\u22123\n\n\u22123.5\n\n\u22124\n\n\u22124.5\n\n\u22125\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\n\u03c3 =1, s = 50\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n\u22123\n\n\u22123.5\n\n\u22124\n\n\u22124.5\n\n\u22125\n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\n4\n\n \n\nb = 1\nb = 2\nrequired improvement\npredicted improvement\n\n2.5\n\n3\n\n3.5\n\n4\n\nf\n\nf\n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n\u22123\n\n\u22123.5\n\n\u22124\n\n\u22124.5\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n\u22125\n\n \n\n0.5\n\n\u03c3 =2, s = 50\n\n1\n\n1.5\n\n2\n\nf\n\nf\n\n2.5\n\n3\n\n3.5\n\n4\n\n \n\nb = 1\nb = 2\nrequired improvement\npredicted improvement\n\n2.5\n\n3\n\n3.5\n\n4\n\nFigure 2: Average \u21132-estimation errors kx\u2217 \u2212bxk2 for b = 1 and b = 2 on the log2-scale in depen-\ndence of the signal strength f . The curve \u2019predicted improvement\u2019 (of b = 2 vs. b = 1) is obtained\nby scaling the \u21132-estimation error by the factor predicted by the theory of Section 3. Likewise the\ncurve \u2019required improvement\u2019 results by scaling the error of b = 1 by 1/\u221a2 and indicates what\nwould be required by b = 2 to improve over b = 1 at the level of total #bits.\n\n1.02\n\n1\n\n\u03c3 = 1\n\nl\n\ne\nv\ne\n\nl\n \n\ni\n\ne\ns\no\nn\n\n\u03c3 = 2\n\n\u03c3 = 0\n\ns = 50\n\n \n\nd\ne\n\nt\n\na\nm\n\ni\nt\ns\ne\n\n*\nx\n \nf\no\n \nm\nr\no\nn\n \nd\ne\nt\na\nm\n\ni\nt\ns\ne\n\n0.98\n\n0.96\n\n0.94\n\n0.92\n\n0.9\n\n0.88\n\n0.86\n\n1.8\n\n1.6\n\n1.4\n\n1.2\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n\u03c3 = 2\n\n\u03c3 = 1\n\n\u03c3 = 0\n\ns = 50\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\n4\n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\n4\n\nf\n\nf\n\nFigure 3: Estimation of \u03c8 = kx\u2217k2 (here 1) and \u03c3. The curves depict the average of the plug-in\nMLE discussed in Section 4 while the bars indicate \u00b11 standard deviation.\nwhile \u03c3 is over-estimated (about 0.2) for small f . The above two issues are presumably a plug-in\n\nb-bit Marginal Regression and alternative recovery algorithms. We compare the \u21132-estimation\nerror of b-bit Marginal Regression to several common recovery algorithms. Compared to apparently\n\neffect, i.e. a consequence of usingbxu in place of x\u2217u.\nmore principled methods which try to enforce agreement of Q(y) and Q(Abx) w.r.t. the Hamming\n\ndistance (or a surrogate thereof), b-bit Marginal Regression can be seen as a crude approach as it is\nbased on maximizing the inner product between y and Ax. One may thus expect that its performance\nis inferior. In summary, our experiments con\ufb01rm that this is true in low-noise settings, but not so if\nthe noise level is substantial. Below we brie\ufb02y present the alternatives that we consider.\nPlan-Vershynin: The approach in [23] based on (7) which only differs in that the constraint set\nresults from a relaxation. As shown in Figure 4 the performance is similar though slightly inferior.\nIHT-quadratic: Standard Iterative Hard Thresholding based on quadratic loss [1]. As pointed out\nabove, b-bit Marginal Regression can be seen as one-step version of Iterative Hard Thresholding.\n\n7\n\n\fIHT-hinge (b = 1): The variant of Iterative Hard Threshold for binary observations using a hinge\nloss-type loss function as proposed in [11].\nSVM (b = 1): Linear SVM with squared hinge loss and an \u21131-penalty, implemented in LIBLINEAR\n[6]. The cost parameter is chosen from 1/\u221am log m.{2\u22123, 2\u22122, . . . , 23} by 5-fold cross-validation.\n\nIHT-Jacques (b = 2): A variant of Iterative Hard Threshold for quantized observations based on a\nspeci\ufb01c piecewiese linear loss function [9].\nSVM-type (b = 2): This approach is based on solving the following convex optimization problem:\nminx,{\u03bei} \u03b3kxk1 +Pm\ni=1 \u03bei subject to li \u2212 \u03bei \u2264 hai, xi \u2264 ui + \u03bei, \u03bei \u2265 0, i \u2208 [m], where [li, ui]\nis the bin observation i is assigned to. The essential idea is to enforce consistency of the observed\nand predicted bin assignments up to slacks {\u03bei} while promoting sparsity of the solution via an \u21131-\npenalty. The parameter \u03b3 is chosen from \u221am log m\u00b7{2\u221210, 2\u22129, . . . , 23} by 5-fold cross-validation.\n\nTurning to the results as depicted by Figure 4, the difference between a noiseless (\u03c3 = 0) and\nheavily noisy setting (\u03c3 = 2) is perhaps most striking.\n\u03c3 = 0: both IHT variants signi\ufb01cantly outperform b-bit Marginal Regression. By comparing errors\nfor IHT, b = 2 can be seen to improve over b = 1 at the level of the total # bits.\n\u03c3 = 2: b-bit Marginal Regression is on par with the best performing methods. IHT-quadratic for\nb = 2 only achieves a moderate reduction in error over b = 1, while IHT-hinge is supposedly\naffected by convergence issues. Overall, the results suggest that a setting with substantial noise\nfavours a crude approach (low-bit measurements and conceptually simple recovery algorithms).\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n\u22122\n\n\u22123\n\n\u22124\n\n\u22125\n\n\u22126\n\n\u22127\n\n\u22128\n\n\u22129\n\n\u03c3 =0, s = 50\n\n \n\nMarginal\nPlan\u2212Vershynin\nIHT\u2212quadratic\nIHT\u2212hinge\nSVM\n\n0\n\n\u22120.5\n\n\u22121\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n\u22123\n\n\u22123.5\n\n\u22124\n\n\u22124.5\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n\u03c3 =2, s = 50\n\n \n\nMarginal\nPlan\u2212Vershynin\nIHT\u2212quadratic\nIHT\u2212hinge\nSVM\n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\nf\n\n4\n\n \n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\n2.5\n\n3\n\n3.5\n\nf\n\n4\n\n \n\n\u22122\n\n\u22123\n\n\u22124\n\n\u22125\n\n\u22126\n\n\u22127\n\n\u22128\n\n\u22129\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n\u221210\n\n\u03c3 =0, s = 50\n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\nMarginal\nPlan\u2212Vershynin\nIHT\u2212quadratic\nIHT\u2212Jacques\nSVM\u2212type\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n\u22123\n\n\u22123.5\n\n\u22124\n\n\u22124.5\n\n \n)\nr\no\nr\nr\ne\n(\n2\ng\no\n\nl\n\n2.5\n\n3\n\n3.5\n\n4\n\n\u22125\n\n\u03c3 =2, s = 50\n\n \n\n0.5\n\n1\n\n1.5\n\n2\n\nMarginal\nPlan\u2212Vershynin\nIHT\u2212quadratic\nIHT\u2212Jacques\nSVM\u2212type\n\n2.5\n\n3\n\n3.5\n\n4\n\nb = 1\n\nb = 2\n\nFigure 4: Average \u21132-estimation errors for several recovery algorithms on the log2-scale in depen-\ndence of the signal strength f . We contrast \u03c3 = 0 (L) vs. \u03c3 = 2 (R), b = 1 (T) vs. b = 2 (B).\n\nf\n\nf\n\n6 Conclusion\nBridging Marginal Regression and a popular approach to 1-bit CS due to Plan & Vershynin, we\nhave considered signal recovery from b-bit quantized measurements. The main \ufb01nding is that for\nb-bit Marginal Regression it is not bene\ufb01cial to increase b beyond 2. A compelling argument for\nb = 2 is the fact that the norm of the signal can be estimated unlike the case b = 1. Compared to\nhigh-precision measurements, 2-bit measurements also exhibit strong robustness properties. It is of\ninterest if and under what circumstances the conclusion may differ for other recovery algorithms.\nAcknowledgement. This work is partially supported by NSF-Bigdata-1419210, NSF-III-1360971,\nONR-N00014-13-1-0764, and AFOSR-FA9550-13-1-0137.\n\n8\n\n\fReferences\n\n[1] T. Blumensath and M. Davies. Iterative hard thresholding for compressed sensing. Applied and Compu-\n\ntational Harmonic Analysis, 27:265\u2013274, 2009.\n\n[2] P. Boufounos and R. Baraniuk. 1-bit compressive sensing. In Information Science and Systems, 2008.\n[3] E. Candes and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n. The\n\nAnnals of Statistics, 35:2313\u20132351, 2007.\n\n[4] S. Chen and A. Banerjee. One-bit Compressed Sensing with the k-Support Norm. In AISTATS, 2015.\n[5] D. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52:1289\u20131306, 2006.\n[6] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear\n\nclassi\ufb01cation. Journal of Machine Learning Research, 9:1871\u20131874, 2008.\n\n[7] C. Genovese, J. Jin, L. Wasserman, and Z. Yao. A Comparison of the Lasso and Marginal Regression.\n\nJournal of Machine Learning Research, 13:2107\u20132143, 2012.\n\n[8] S. Gopi, P. Netrapalli, P. Jain, and A. Nori. One-bit Compressed Sensing: Provable Support and Vector\n\nRecovery. In ICML, 2013.\n\n[9] L. Jacques, K. Degraux, and C. De Vleeschouwer. Quantized iterative hard thresholding: Bridging 1-bit\n\nand high-resolution quantized compressed sensing. arXiv:1305.1786, 2013.\n\n[10] L. Jacques, D. Hammond, and M. Fadili. Dequantizing compressed sensing: When oversampling and\n\nnon-gaussian constraints combine. IEEE Transactions on Information Theory, 57:559\u2013571, 2011.\n\n[11] L. Jacques, J. Laska, P. Boufounos, and R. Baraniuk. Robust 1-bit Compressive Sensing via Binary Stable\n\nEmbeddings of Sparse Vectors. IEEE Transactions on Information Theory, 59:2082\u20132102, 2013.\n\n[12] J. Kieffer. Uniqueness of locally optimal quantizer for log-concave density and convex error weighting\n\nfunction. IEEE Transactions on Information Theory, 29:42\u201347, 1983.\n\n[13] J. Laska and R. Baraniuk. Regime change: Bit-depth versus measurement-rate in compressive sensing.\n\narXiv:1110.3450, 2011.\n\n[14] J. Laska, P. Boufounos, M. Davenport, and R. Baraniuk. Democracy in action: Quantization, saturation,\n\nand compressive sensing. Applied and Computational Harmonic Analysis, 31:429\u2013443, 2011.\n[15] P. Li. Binary and Multi-Bit Coding for Stable Random Projections. arXiv:1503.06876, 2015.\n[16] P. Li. One scan 1-bit compressed sensing. Technical report, arXiv:1503.02346, 2015.\n[17] P. Li, C.-H. Zhang, and T. Zhang. Compressed counting meets compressed sensing. In COLT, 2014.\n[18] J. Liu and S. Wright. Robust dequantized compressive sensing. Applied and Computational Harmonic\n\nAnalysis, 37:325\u2013346, 2014.\n\n[19] S. Lloyd. Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28:129\u2013137,\n\n1982.\n\n[20] J. Max. Quantizing for Minimum Distortion. IRE Transactions on Information Theory, 6:7\u201312, 1960.\n[21] D. Needell and J. Tropp. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples.\n\nApplied and Computational Harmonic Analysis, 26:301\u2013321, 2008.\n\n[22] Y. Plan and R. Vershynin. One-bit compressed sensing by linear programming. Communications on Pure\n\nand Applied Mathematics, 66:1275\u20131297, 2013.\n\n[23] Y. Plan and R. Vershynin. Robust 1-bit compressed sensing and sparse logistic regression: a convex\n\nprogramming approach. IEEE Transactions on Information Theory, 59:482\u2013494, 2013.\n\n[24] R. Zhu and Q. Gu. Towards a Lower Sample Complexity for Robust One-bit Compressed Sensing. In\n\nICML, 2015.\n[25] R. Vershynin.\n\nIn: Compressed Sensing: Theory and Applications, chapter \u2019Introduction to the non-\n\nasymptotic analysis of random matrices\u2019. Cambridge University Press, 2012.\n\n[26] M. Wainwright.\n\nSharp thresholds for noisy and high-dimensional recovery of sparsity using \u21131-\nconstrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 55:2183\u20132202,\n2009.\n\n[27] C.-H. Zhang and T. Zhang. A general theory of concave regularization for high-dimensional sparse\n\nestimation problems. Statistical Science, 27:576\u2013593, 2013.\n\n[28] L. Zhang, J. Yi, and R. Jin. Ef\ufb01cient algorithms for robust one-bit compressive sensing. In ICML, 2014.\n[29] T. Zhang. Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations. IEEE\n\nTransactions on Information Theory, 57:4689\u20134708, 2011.\n\n9\n\n\f", "award": [], "sourceid": 1241, "authors": [{"given_name": "Martin", "family_name": "Slawski", "institution": "Rutgers University"}, {"given_name": "Ping", "family_name": "Li", "institution": "Rugters University"}]}