{"title": "High Dimensional Linear Regression using Lattice Basis Reduction", "book": "Advances in Neural Information Processing Systems", "page_first": 1842, "page_last": 1852, "abstract": "We consider a high dimensional linear regression problem where the goal is to efficiently recover an unknown vector \\beta^* from n noisy linear observations Y=X \\beta^*+W  in R^n, for known X in R^{n \\times p} and unknown W in R^n. Unlike most of the literature on this model we make no sparsity assumption on \\beta^*. Instead we adopt a regularization based on assuming that the underlying vectors \\beta^* have rational entries with the same denominator Q. We call this Q-rationality assumption.  We propose a new polynomial-time algorithm for this task which is based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm.  We establish that under the Q-rationality assumption, our algorithm recovers exactly the vector \\beta^* for a large class of distributions for the iid entries of X and non-zero noise W. We prove that it is successful under small noise, even when the learner has access to only one observation (n=1). Furthermore, we prove that in the case of the Gaussian white noise for W, n=o(p/\\log p) and Q sufficiently large, our algorithm tolerates a nearly optimal information-theoretic level of the noise.", "full_text": "High Dimensional Linear Regression\n\nusing Lattice Basis Reduction\n\nDavid Gamarnik\n\nSloan School of Management\n\nMassachussetts Institute of Technology\n\nCambridge, MA 02139\ngamarnik@mit.edu\n\nIlias Zadik\n\nOperations Research Center\n\nMassachussetts Institute of Technology\n\nCambridge, MA 02139\n\nizadik@mit.edu\n\nAbstract\n\nWe consider a high dimensional linear regression problem where the goal is to\nef\ufb01ciently recover an unknown vector \u03b2\u2217 from n noisy linear observations Y =\nX\u03b2\u2217 + W \u2208 Rn, for known X \u2208 Rn\u00d7p and unknown W \u2208 Rn. Unlike most\nof the literature on this model we make no sparsity assumption on \u03b2\u2217. Instead\nwe adopt a regularization based on assuming that the underlying vectors \u03b2\u2217 have\nrational entries with the same denominator Q \u2208 Z>0. We call this Q-rationality\nassumption. We propose a new polynomial-time algorithm for this task which\nis based on the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction\nalgorithm. We establish that under the Q-rationality assumption, our algorithm\nrecovers exactly the vector \u03b2\u2217 for a large class of distributions for the iid entries of\nX and non-zero noise W . We prove that it is successful under small noise, even\nwhen the learner has access to only one observation (n = 1). Furthermore, we\nprove that in the case of the Gaussian white noise for W , n = o (p/ log p) and Q\nsuf\ufb01ciently large, our algorithm tolerates a nearly optimal information-theoretic\nlevel of the noise.\n\n1\n\nIntroduction\n\nWe consider the following high-dimensional linear regression model. Consider n samples of a vector\n\u03b2\u2217 \u2208 Rp in a vector form Y = X\u03b2\u2217 + W for some X \u2208 Rn\u00d7p and W \u2208 Rn. Given the knowledge\nof Y and X the goal is to infer \u03b2\u2217 using an ef\ufb01cient algorithm and the minimum number n of samples\npossible. Throughout the paper we call p the number of features, X the measurement matrix and W\nthe noise vector.\nWe focus on the high-dimensional case where n may be much smaller than p and p grows to in\ufb01nity,\na setting that has been very popular in the literature during the last years Chen et al. (2001), Donoho\n(2006), Candes et al. (2006), Foucart and Rauhut (2013), Wainwright (2009). In this case, and under\nno additional structural assumption, the inference task becomes impossible, even in the noiseless case\nW = 0, as the underlying linear system becomes underdetermined. Most papers address this issue by\nimposing a sparsity assumption on \u03b2\u2217, which refers to \u03b2\u2217 having only a limited number of non-zero\nentries compared to its dimension Donoho (2006), Candes et al. (2006), Foucart and Rauhut (2013).\nDuring the past decades, the sparsity assumption led to a fascinating line of research in statistics and\ncompressed sensing, which established, among other results, that several polynomial-time algorithms,\nsuch as Basis Pursuit Denoising Scheme and LASSO, can ef\ufb01ciently recover a sparse \u03b2\u2217 with number\nof samples much smaller than the number of features Candes et al. (2006), Wainwright (2009),\nFoucart and Rauhut (2013). For example, it is established that if \u03b2\u2217 is constrained to have at most\nk \u2264 p non-zero entries, X has iid N (0, 1) entries, W has iid N (0, \u03c32) entries and n is of the order\n\n(cid:1), then both of the mentioned algorithms can recover \u03b2\u2217, up to the level of the noise. Different\n\nk log(cid:0) p\n\nk\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fstructural assumptions than sparsity have also been considered in the literature. For example, a recent\npaper Bora et al. (2017) makes the assumption that \u03b2\u2217 lies near the range of an L-Lipschitz generative\nmodel G : Rk \u2192 Rp and it proposes an algorithm which succeeds with n = O(k log L) samples.\nA downside of all of the above results is that they provide no guarantee in the case n is much smaller\n\n(cid:1). Consider for example the case where the components of a sparse \u03b2\u2217 are binary-valued,\n\nthan k log(cid:0) p\n\nk\n\nk\n\nk\n\nand X, W follow the Gaussian assumptions described above. Supposing that \u03c3 is suf\ufb01ciently small,\nit is a straightforward argument that even when n = 1, \u03b2\u2217 is recoverable from Y = (cid:104)X, \u03b2\u2217(cid:105) + W by\na brute-force method with probability tending to one as p goes to in\ufb01nity (whp). On the other hand,\nfor sparse and binary-valued \u03b2\u2217, the Basis Pursuit method in the noiseless case Donoho and Tanner\n(2006) and the Basis Pursuit Denoising Scheme in the noisy case Gamarnik and Zadik (2017b) have\nbeen proven to fail to recover a vector with the same support of \u03b2\u2217, with n = o(k log p) samples\nWainwright (2009). This failure to capture the complexity of the problem accurately enough for\n\nbeen proven to fail to recover a binary \u03b2\u2217 with n = o(k log(cid:0) p\n(cid:1)) samples. Furthermore, LASSO has\nsmall sample sizes also lead to an algorithmic hardness conjecture for the regime n = o(k log(cid:0) p\n(cid:1))\n\nGamarnik and Zadik (2017a), Gamarnik and Zadik (2017b). While this conjecture still stands in\nthe general case, as we show in this paper, in the special case where \u03b2\u2217 is rational-valued and the\nmagnitude of the noise W is suf\ufb01ciently small, the statistical computational gap can be closed and \u03b2\u2217\ncan be recovered even when n = 1.\nThe structural assumption we impose on \u03b2\u2217 is that its entries are rational numbers with denominator\nequal to some \ufb01xed positive integer value Q \u2208 Z>0, something we refer to as the Q-rationality\nassumption. Note that for any Q, this assumption is trivially satis\ufb01ed by the binary-valued \u03b2\u2217\nwhich was discussed above. The 1-rationality assumption corresponds to \u03b2\u2217 having integer entries,\nwhich is well-motivated in practise. For example, this assumption appears frequently in the study of\nglobal navigation satellite systems (GPS) and communications Hassibi and Boyd (1998), Hassibi\nand Vikalo (2002), Brunel and Boutros (1999), Borno (2011). In the \ufb01rst reference the authors\npropose a mixed linear/integer model of the form Y = Ax + Bz + W where z is an integer valued\nvector corresponding to integer multiples of certain wavelength. Several examples corresponding to\nregression models with integer valued regression coef\ufb01cients and zero noise (though not always in the\nsame model) are also discussed in the book Foucart and Rauhut (2013). In particular one application\nis the so-called Single-Pixel camera. In this model a vector \u03b2 corresponds to color intensities of\nan image for different pixels and thus takes discrete values. The model assumes no noise, which\nis one of the assumptions we adopt in our model, though the corresponding regression matrix has\ni.i.d. +1/ \u2212 1 Bernoulli entries, as opposed to a continuous distribution we assume. Two other\napplications involving noiseless regression models found in the same reference are MRI imaging and\nRadar detection.\nA large body of literature on noiseless regression type models is a series of papers on phase retrieval.\nHere the coef\ufb01cients of the regression vector \u03b2\u2217 and the entries of the regression matrix X are\ncomplex valued, but the observation vector Y = X\u03b2\u2217 is only observed through absolute values. This\nmodel has many applications, including crystallography, see Candes et al. (2015). The aforementioned\npaper provides many references to phase retrieval model including the cases when the entries of \u03b2\u2217\nhave a \ufb01nite support. We believe that our method can also be extended so that to model the case\nwhere the entries of the regression vector have a \ufb01nite support, even if irrationally valued, and the\nentries of Y are only observed through their magnitude. In other words, we expect that the method of\nthe present paper applies to the phase retrieval problem at least in some of the cases and this is one of\nthe current directions we are exploring.\nNoiseless regression model with integer valued regression coef\ufb01cients were also important in the\ntheoretical development of compressive sensing methods. Speci\ufb01cally, Donoho Donoho (2006) and\nDonoho and Tanner Donoho and Tanner (2005),Donoho and Tanner (2006),Donoho and Tanner\n(2009) consider a noiseless regression model of the form AB where A is a random (say Gaussian)\nmatrix and B is the unit cube [0, 1]p. One of the goals of these papers was to count number of extreme\npoints of the projected polytope AB in order to explain the effectiveness of the linear programming\nbased methods. The extreme points of this polytope can only appear as projections of extreme points\nof B which are all length-p binary vector, namely one deals with noiseless regression model with\nbinary coef\ufb01cients \u2013 an important special case of the model we consider in our paper.\nIn the Bayesian setting, where the ground truth \u03b2\u2217 is sampled according to a discrete distribution\nDonoho et al. (2013) proposes a low-complexity algorithm which provably recovers \u03b2\u2217 with n = o(p)\n\n2\n\n\fy, x1, x2, . . . , xp \u2208 Z>0 the goal is to \ufb01nd a \u2205 (cid:54)= S \u2282 [p] with y =(cid:80)\n\nsamples. This algorithm uses the technique of approximate message passing (AMP) and is motivated\nby ideas from statistical physics Krzakala et al. (2012). Even though the result from Donoho et al.\n(2013) applies to the general discrete case for \u03b2\u2217, it requires the matrix X to be spatially coupled, a\nproperty that in particular does not hold for X with iid standard Gaussian entries. Furthermore the\nrequired sample size for the algorithm to work is only guaranteed to be sublinear in p, a sample size\npotentially much bigger than the information-theoretic limit for recovery under suf\ufb01ciently small\nnoise (n = 1). In the present paper, where \u03b2\u2217 satis\ufb01es the Q-rationality assumption, we propose\na polynomial-time algorithm which applies for a large class of continuous distributions for the iid\nentries of X, including the normal distribution, and provably works even when n = 1.\nThe algorithm we propose is inspired by the algorithm introduced in Lagarias and Odlyzko (1985)\nwhich solves, in polynomial time, a certain version of the so-called Subset-Sum problem. To\nbe more speci\ufb01c, consider the following NP-hard algorithmic problem. Given p \u2208 Z>0 and\ni\u2208S xi when at least one such\nset S is assumed to exist. Over 30 years ago, this problem received a lot of attention in the \ufb01eld of\ncryptography, based on the belief that the problem would be hard to solve in many \u201creal\" instances.\nThis would imply that several already built public key cryptosystems, called knapsack public key\ncryptosystems, could be considered safe from attacks Lempel (1979), Merkle and Hellman (1978).\nThis belief though was proven wrong by several papers in the early 80s, see for example Shamir\n(1982). Motivated by this line of research, Lagarias and Odlyzko in Lagarias and Odlyzko (1985), and\na year later Frieze in Frieze (1986), using a cleaner and shorter argument, proved the same surprising\n2 (1+\u0001)p2}\nfact: if x1, x2, . . . , xp follow an iid uniform distribution on [2 1\nfor some \u0001 > 0 then there exists a polynomial-in-p time algorithm which solves the subset-sum\nproblem whp as p \u2192 +\u221e. In other words, even though the problem is NP-hard in the worst-case,\nassuming a quadratic-in-p number of bits for the coordinates of x, the algorithmic complexity of the\ntypical such problem is polynomial in p. The successful ef\ufb01cient algorithm is based on an elegant\napplication of a seminal algorithm in the computational study of lattices called the Lenstra-Lenstra-\nLovasz (LLL) algorithm, introduced in Lenstra et al. (1982). This algorithm receives as an input\na basis {b1, . . . , bm} \u2282 Zm of a full-dimensional lattice L and returns in time polynomial in m\n2 (cid:107)z(cid:107)2, for all\nand maxi=1,2,...,m log (cid:107)bi(cid:107)\u221e a non-zero vector \u02c6z in the lattice, such that (cid:107)\u02c6z(cid:107)2 \u2264 2 m\nz \u2208 L \\ {0}.\nBesides its signi\ufb01cance in cryptography, the result of Lagarias and Odlyzko (1985) and Frieze (1986)\nenjoys an interesting linear regression interpretation as well. One can show that under the iid uniform\nin [2 1\ni\u2208S xi whp\ni = 1(i \u2208 S)\nas p tends to in\ufb01nity. Therefore if \u03b2\u2217 is the indicator vector of this unique set S, that is \u03b2\u2217\ni = (cid:104)x, \u03b2\u2217(cid:105) where x := (x1, x2, . . . , xp). Furthermore\nusing only the knowledge of y, x as input to the Lagarias-Odlyzko algorithm we obtain a polynomial\nin p time algorithm which recovers exactly \u03b2\u2217 whp as p \u2192 +\u221e. Written in this form, and given our\nearlier discussion on high-dimensional linear regression, this statement is equivalent to the statement\nthat the noiseless high-dimensional linear regression problem with binary \u03b2\u2217 and X generated with\niid elements from Unif[2 1\n] is polynomial-time solvable even with one sample (n = 1), whp\nas p grows to in\ufb01nity. The main focus of this paper is to extend this result to \u03b2\u2217 satisfying the\nQ-rationality assumption, continuous distributions on the iid entries of X and non-trivial noise levels.\n\n] assumption for x1, x2, . . . , xp, there exists exactly one set S with y =(cid:80)\n\n] := {1, 2, 3, . . . , 2 1\n\n2 (1+\u0001)p2\n\n2 (1+\u0001)p2\n\nfor i = 1, 2, . . . , p, we have that y =(cid:80)\n\ni xi\u03b2\u2217\n\n2 (1+\u0001)p2\n\nSummary of the Results\n\nWe propose a polynomial time algorithm for high-dimensional linear regression problem and establish\na general result for its performance. We show that if the entries of X \u2208 Rn\u00d7p are iid from an arbitrary\ncontinuous distribution with bounded density and \ufb01nite expected value, \u03b2\u2217 satis\ufb01es the Q-rationality\nassumption, (cid:107)\u03b2\u2217(cid:107)\u221e \u2264 R for some R > 0, and W is either an adversarial vector with in\ufb01nity norm\nat most \u03c3 or has iid mean-zero entries with variance at most \u03c32, then under some explicitly stated\nassumption on the parameters n, p, \u03c3, R, Q our algorithm recovers exactly the vector \u03b2\u2217 in time\nwhich is polynomial in n, p, log( 1\n\u03c3 ), log R, log Q, whp as p tends to in\ufb01nity. As a corollary, we\nshow that for any Q and R our algorithm can infer correctly \u03b2\u2217, when \u03c3 is at most exponential in\n\n\u2212(cid:0)p2/2 + (2 + p) log(QR)(cid:1), even from one observation (n = 1). We show that for general n our\nalgorithm can tolerate noise level \u03c3 which is exponential in \u2212(cid:0)(2n + p)2/2n + (2 + p/n) log(QR)(cid:1).\n\nWe complement our results with the information-theoretic limits of our problem. We show that in\nthe case of Gaussian white noise W , a noise level which is exponential in \u2212 p\nn log(QR), which is\n\n3\n\n\f\u03c3 ).\n\nessentially the second part of our upper bound, cannot be tolerated. This allows us to conclude that\nin the regime n = o (p/ log p) and RQ = 2\u03c9(p) our algorithm tolerates the optimal information\ntheoretic level of noise.\nThe algorithm we propose receives as input real-valued data Y, X but importantly it truncates in\nthe \ufb01rst step the data by keeping the \ufb01rst N bits after zero of every entry. In particular, this allows\nthe algorithm to perform only \ufb01nite-precision artihmetic operations. Here N is a parameter of our\nalgorithm chosen by the algorithm designer. For our recovery results it is chosen to be polynomial in\np and log( 1\nA crucial step towards our main result is the extension of the Lagarias-Odlyzko algorithm Lagarias and\nOdlyzko (1985), Frieze (1986) to not necessarily binary, integer vectors \u03b2\u2217 \u2208 Zp, for measurement\nmatrix X \u2208 Zn\u00d7p with iid entries not necessarily from the uniform distribution, and \ufb01nally, for\nnon-zero noise vector W . As in Lagarias and Odlyzko (1985) and Frieze (1986), the algorithm we\nconstruct depends crucially on building an appropriate lattice and applying the LLL algorithm on it.\nThere is though an important additional step in the algorithm presented in the present paper compared\nwith the algorithm in Lagarias and Odlyzko (1985) and Frieze (1986). The latter algorithm is proven\nto recover a non-zero integer multiple \u03bb\u03b2\u2217 of the underlying binary vector \u03b2\u2217. Then since \u03b2\u2217 is\nknown to be binary, the exact recovery becomes a matter of renormalizing out the factor \u03bb from every\nnon-zero coordinate. On the other hand, even if we establish in our case the corresponding result and\nrecover a non-zero integer multiple of \u03b2\u2217 whp, this last renormalizing step would be impossible as\nthe ground truth vector is not assumed to be binary. We address this issue as follows. First we notice\nthat the renormalization step remains valid if the greatest common divisor of the elements of \u03b2\u2217 is\n1. Under this assumption from any non-zero integer multiple of \u03b2\u2217, \u03bb\u03b2\u2217 we can obtain the vector\nitself by observing that the greatest common divisor of \u03bb\u03b2\u2217 equals to \u03bb, and computing \u03bb by using\nfor instance the Euclid\u2019s algorithm. We then generalize our recovery guarantee to arbitrary \u03b2\u2217. We\ndo this by \ufb01rst translating implicitly the vector \u03b2\u2217 with a random integer vector Z via translating our\nobservations Y = X\u03b2\u2217 + W by XZ to obtain Y + XZ = X(\u03b2\u2217 + Z) + W . We then prove that the\nelements of \u03b2\u2217 + Z have greatest common divisor equal to unity with probability tending to one. This\nlast step is based on an analytic number theory argument which slightly extends a beautiful result from\nprobabilistic number theory (see for example, Theorem 332 in Hardy and Wright (1975)) according\n\u03c02 , where P \u22a5\u22a5 Q refers to P, Q\nto which limm\u2192+\u221e PP,Q\u223cUnif{1,2,...,m},P\u22a5\u22a5Q [gcd (P, Q) = 1] = 6\nbeing independent random variables. This result is not of clear origin in the literature, but possibly\nit is attributed to Chebyshev, as mentioned in Erdos and Lorentz (1985). A key implication of this\nresult for us is the fact that the limit above is strictly positive.\n\nGiven two vectors x, y \u2208 Rd the Euclidean inner product notation is denoted by (cid:104)x, y(cid:105) :=(cid:80)d\nlinearly independent b1, . . . , bk \u2208 Zk is de\ufb01ned as {(cid:80)k\n\nNotation\nLet Z\u2217 denote Z \\ {0}. For k \u2208 Z>0 we set [k] := {1, 2, . . . , k}. For a vector x \u2208 Rd we\nde\ufb01ne Diagd\u00d7d (x) \u2208 Rd\u00d7d to be the diagonal matrix with Diagd\u00d7d (x)ii = xi, for i \u2208 [d]. For\n1 \u2264 p < \u221e by Lp we refer to the standard p-norm notation for \ufb01nite dimensionall real vectors.\ni=1 xiyi.\nBy log : R>0 \u2192 R we refer the logarithm with base 2. The lattice L \u2286 Zk generated by a set of\ni=1 zibi|z1, z2, . . . , zk \u2208 Z}. Throughout\nthe paper we use the standard asymptotic notation, o, O, \u0398, \u2126 for comparing the growth of two\nreal-valued sequences an, bn, n \u2208 Z>0.Finally, we say that a sequence of events {Ap}p\u2208N holds with\nhigh probability (whp) as p \u2192 +\u221e if limp\u2192+\u221e P (Ap) = 1.\n\n2 Main Results\n\n2.1 Extended Lagarias-Odlyzko algorithm\nLet n, p, R \u2208 Z>0. Given X \u2208 Zn\u00d7p, \u03b2\u2217 \u2208 (Z \u2229 [\u2212R, R])p and W \u2208 Zn, set Y = X\u03b2\u2217 + W .\nFrom the knowledge of Y, X the goal is to infer exactly \u03b2\u2217. For this task we propose the following\nalgorithm which is an extension of the algorithm in Lagarias and Odlyzko (1985) and Frieze (1986).\nFor realistic purposes the values of R,(cid:107)W(cid:107)\u221e is not assumed to be known exactly. As a result, the\nfollowing algorithm, besides Y, X, receives as an input a number \u02c6R \u2208 Z>0 which is an estimated\n\n4\n\n\fupper bound in absolute value for the entries of \u03b2\u2217 and a number \u02c6W \u2208 Z>0 which is an estimated\nupper bound in absolute value for the entries of W .\n\nAlgorithm 1 Extended Lagarias-Odlyzko (ELO) Algorithm\nInput: (Y, X, \u02c6R, \u02c6W ), Y \u2208 Zn, X \u2208 Zn\u00d7p, \u02c6R, \u02c6W \u2208 Z>0.\nOutput: \u02c6\u03b2\u2217 an estimate of \u03b2\u2217\n1 Generate a random vector Z \u2208 { \u02c6R + 1, \u02c6R + 2, . . . , 2 \u02c6R + log p}p with iid entries uniform in\n\n(cid:16) \u02c6R(cid:100)\u221a\n\n{ \u02c6R + 1, \u02c6R + 2, . . . , 2 \u02c6R + log p}\n2 Set Y1 = Y + XZ.\n3 For each i = 1, 2, . . . , n, if |(Y1)i| < 3 set (Y2)i = 3 and otherwise set (Y2)i = (Y1)i.\n4 Set m = 2n+(cid:100) p\n5 Output \u02c6z \u2208 R2n+p from running the LLL basis reduction algorithm on the lattice generated by the\ncolumns of the following (2n + p) \u00d7 (2n + p) integer-valued matrix,\n\np(cid:101) + \u02c6W(cid:100)\u221a\n(cid:34) mX \u2212mDiagn\u00d7n (Y2) mIn\u00d7n\n\nn(cid:101)(cid:17)\n\n.\n\n2 (cid:101)+3p\n\n(cid:35)\n\nAm :=\n\nIp\u00d7p\n0n\u00d7p\n\n0p\u00d7n\n0n\u00d7n\n\n0p\u00d7n\nIn\u00d7n\n\n(1)\n\n6 Compute g = gcd (\u02c6zn+1, \u02c6zn+2, . . . , \u02c6zn+p) , using the Euclid\u2019s algorithm.\n7 If g (cid:54)= 0, output \u02c6\u03b2\u2217 = 1\n\ng (\u02c6zn+1, \u02c6zn+2, . . . , \u02c6zn+p)t \u2212 Z. Otherwise, output \u02c6\u03b2\u2217 = 0p\u00d71.\n\nWe explain here informally the steps of the (ELO) algorithm and brie\ufb02y sketch the motivation behind\neach one of them. In the \ufb01rst and second steps the algorithm translates Y by XZ where Z is a\nrandom vector with iid elements chosen uniformly from { \u02c6R + 1, \u02c6R + 2, . . . , 2 \u02c6R + log p}. In that\nway \u03b2\u2217 is translated implicitly to \u03b2 = \u03b2\u2217 + Z because Y1 = Y + XZ = X(\u03b2\u2217 + Z) + W . As we\nwill establish using a number theoretic argument, gcd (\u03b2) = 1 whp as p \u2192 +\u221e with respect to the\nrandomness of Z, even though this is not necessarily the case for the original \u03b2\u2217. This is an essential\nrequirement for our technique to exactly recover \u03b2\u2217 and steps six and seven to be meaningful. In\nthe third step the algorithm gets rid of the signi\ufb01cantly small observations. The minor but necessary\nmodi\ufb01cation of the noise level affects the observations in a negligible way.\nThe fourth and \ufb01fth steps of the algorithm provide a basis for a speci\ufb01c lattice in 2n + p dimensions.\nThe lattice is built with the knowledge of the input and Y2, the modi\ufb01ed Y . The algorithm in step \ufb01ve\ncalls the LLL basis reduction algorithm to run for the columns of Am as initial basis for the lattice.\nThe fact that Y has been modi\ufb01ed to be non-zero on every coordinate is essential here so that Am\nis full-rank and the LLL basis reduction algorithm, de\ufb01ned in Lenstra et al. (1982), can be applied,.\nThis application of the LLL basis reduction algorithm is similar to the one used in Frieze (1986) with\none important modi\ufb01cation. In order to deal here with multiple equations and non-zero noise, we\nuse 2n + p dimensions instead of 1 + p in Frieze (1986). Following though a similar strategy as in\nFrieze (1986), it can be established that the n + 1 to n + p coordinates of the output of the algorithm,\n\u02c6z \u2208 Z2n+p, correspond to a vector which is a non-zero integer multiple of \u03b2, say \u03bb\u03b2 for \u03bb \u2208 Z\u2217,\nw.h.p. as p \u2192 +\u221e.\nThe proof of the above result is an important part in the analysis of the algorithm and it is heavily\nbased on the fact that the matrix Am, which generates the lattice, has its \ufb01rst n rows multiplied by the\n\u201clarge enough\" and appropriately chosen integer m which is de\ufb01ned in step four. It can be shown that\nthis property of Am implies that any vector z in the lattice with \u201csmall enough\" L2 norm necessarily\nsatis\ufb01es (zn+1, zn+2, . . . , zn+p) = \u03bb\u03b2 for some \u03bb \u2208 Z\u2217 whp as p \u2192 +\u221e. In particular, using\nthat \u02c6z is guaranteed to satisfy (cid:107)\u02c6z(cid:107)2 \u2264 2\n2 (cid:107)z(cid:107)2 for all non-zero z in the lattice, it can be derived\nthat \u02c6z has a \u201csmall enough\" L2 norm and therefore indeed satis\ufb01es the desired property whp as\np \u2192 +\u221e. Assuming now the validity of the gcd (\u03b2) = 1 property, step six \ufb01nds in polynomial time\nthis unknown integer \u03bb that corresponds to \u02c6z, because gcd (\u02c6zn+1, \u02c6zn+2, . . . , \u02c6zn+p) = gcd (\u03bb\u03b2) = \u03bb.\nFinally step seven scales out \u03bb from every coordinate and then subtracts the known random vector Z,\nto output exactly \u03b2\u2217.\nOf course the above is based on an informal reasoning. Formally we establish the following result.\nTheorem 2.1. Suppose\n\n2n+p\n\n5\n\n\f(1) X \u2208 Zn\u00d7p is a matrix with iid entries generated according to a distribution D on Z which\n2N probability on each element\n\nfor some N \u2208 Z>0 and constants C, c > 0, assigns at most c\nof Z and satis\ufb01es E[|V |] \u2264 C2N , for V d= D;\n\n(2) \u03b2\u2217 \u2208 (Z \u2229 [\u2212R, R])p, W \u2208 Zn;\n(3) Y = X\u03b2\u2217 + W .\n\nSuppose furthermore that \u02c6R \u2265 R and\n\n(cid:104)\n\nN \u2265 1\n2n\n\n(2n + p)\n\n2n + p + 10 log\n\n(cid:16) \u02c6R\n\n\u221a\n\np + ((cid:107)W(cid:107)\u221e + 1)\n\n(cid:17)(cid:105)\n\n\u221a\n\nn\n\n+ 6 log ((1 + c) np) .\n\n(cid:16) 1\n\n(2)\n\n(cid:17)\n\nFor any \u02c6W \u2265 (cid:107)W(cid:107)\u221e the algorithm ELO with input (Y, X, \u02c6R, \u02c6W ) outputs exactly \u03b2\u2217 w.p. 1\u2212O\n(whp as p \u2192 +\u221e) and terminates in time at most polynomial in n, p, N, log \u02c6R and log \u02c6W .\nRemark 2.2. In the statement of Theorem 2.1 the only parameters that are assumed to grow to\nin\ufb01nity are p and whichever other parameters among n, R,(cid:107)W(cid:107)\u221e, N are implied to grow to in\ufb01nity\nbecause of (2). Note in particular that n can remain bounded, including the case n = 1, if N grows\nfast enough.\nRemark 2.3. It can be easily checked that the assumptions of Theorem 2.1 are satis\ufb01ed for n = 1,\n2 } and W = 0. Under these assumptions,\nN = (1 + \u0001) p2\nthe Theorem\u2019s implication is a generalization of the result from Lagarias and Odlyzko (1985) and\nFrieze (1986) to the case \u03b2\u2217 \u2208 {\u22121, 0, 1}p.\n\n2 , R = 1, D = Unif{1, 2, 3, . . . , 2(1+\u0001) p2\n\nnp\n\n2.2 Applications to High-Dimensional Linear Regression\n\nThe Model\n\nWe \ufb01rst de\ufb01ne the Q-rationality assumption.\nDe\ufb01nition 2.4. Let p, Q \u2208 Z>0. We say that a vector \u03b2 \u2208 Rp satis\ufb01es the Q-rationality assumption\nif for all i \u2208 [p], \u03b2\u2217\nQ , for some Ki \u2208 Z.\n\ni = Ki\n\nThe high-dimensional linear regression model we are considering is as follows.\nAssumptions 1. Let n, p, Q \u2208 Z>0 and R, \u03c3, c > 0. Suppose\n\n(1) measurement matrix X \u2208 Rn\u00d7p with iid entries generated according to a continuous\ndistribution C which has density f with (cid:107)f(cid:107)\u221e \u2264 c and satis\ufb01es E[|V |] < +\u221e, where\nV d= C;\n\n(2) ground truth vector \u03b2\u2217 satis\ufb01es \u03b2\u2217 \u2208 [\u2212R, R]p and the Q-rationality assumption;\n(3) Y = X\u03b2\u2217 + W for some noise vector W \u2208 Rn. It is assumed that either (cid:107)W(cid:107)\u221e \u2264 \u03c3 or\n\nW has iid entries with mean zero and variance at most \u03c32, depending on the context.\n\nObjective: Based on the knowledge of Y and X the goal is to recover \u03b2\u2217 using an ef\ufb01cient algorithm\nand using the smallest number n of samples possible. The recovery should occur with high probability\n(w.h.p), as p diverges to in\ufb01nity.\n\nThe Lattice-Based Regression (LBR) Algorithm\n\nAs mentioned in the Introduction, we propose an algorithm to solve the regression problem, which\nwe call the Lattice-Based Regression (LBR) algorithm. The exact knowledge of Q, R,(cid:107)W(cid:107)\u221e is\nnot assumed. Instead the algorithm receives as an input, additional to Y and X, \u02c6Q \u2208 Z>0 which is\nan estimated multiple of Q, \u02c6R \u2208 Z>0 which is an estimated upper bound in absolute value for the\nentries of \u03b2\u2217 and \u02c6W \u2208 R>0 which is an estimated upper bound in absolute value for the entries of\nthe noise vector W . Furthermore an integer number N \u2208 Z>0 is given to the algorithm as an input,\nwhich, as we will explain, corresponds to a truncation in the data in the \ufb01rst step of the algorithm.\n\n6\n\n\fAlgorithm 2 Lattice Based Regression (LBR) Algorithm\nInput: (Y, X, N, \u02c6Q, \u02c6R, \u02c6W ), Y \u2208 Zn, X \u2208 Zn\u00d7p and N, \u02c6Q, \u02c6R, \u02c6W \u2208 Z>0.\nOutput: \u02c6\u03b2\u2217 an estimate of \u03b2\u2217\n\n8 Set YN = ((Yi)N )i\u2208[n] and XN = ((Xij)N )i\u2208[n],j\u2208[p].\n9 Set ( \u02c6\u03b21)\u2217 to be the output of the ELO algorithm with input:\n\n2N \u02c6QYN , 2N XN , \u02c6Q \u02c6R, 2 \u02c6Q\n\n2N \u02c6W + \u02c6Rp\n\n(cid:16)\n\n(cid:17)(cid:17)\n\n.\n\n(cid:16)\n\n10 Output \u02c6\u03b2\u2217 = 1\n\u02c6Q\n\n( \u02c6\u03b21)\u2217.\n\n(cid:98)2N|x|(cid:99)\n\n2N , which corresponds to the operation of keeping\n\nGiven x \u2208 R and N \u2208 Z>0 let xN = sign(x)\nthe \ufb01rst N bits after zero of a real number x.\nWe now explain informally the steps of the algorithm. In the \ufb01rst step, the algorithm truncates each\nentry of Y and X by keeping only its \ufb01rst N bits after zero, for some N \u2208 Z>0. This in particular\nallows to perform \ufb01nite-precision operations and to call the ELO algorithm in the next step which\nis designed for integer input. In the second step, the algorithm naturally scales up the truncated\ndata to integer values, that is it scales YN by 2N \u02c6Q and XN by 2N . The reason for the additional\nmultiplication of the observation vector Y by \u02c6Q is necessary to make sure the ground truth vector \u03b2\u2217\ncan be treated as integer-valued. To see this notice that Y = X\u03b2\u2217 + W and YN , XN being \u201cclose\" to\nY, X imply\n\n2N \u02c6QYN = 2N XN ( \u02c6Q\u03b2\u2217) + \u201cextra noise terms\" + 2N \u02c6QW.\n\nTherefore, assuming the control of the magnitude of the extra noise terms, by using the Q-rationality\nassumption and that \u02c6Q is estimated to be a multiple of Q, the new ground truth vector becomes \u02c6Q\u03b2\u2217\nwhich is integer-valued. The \ufb01nal step of the algorithm consist of rescaling now the output of Step 2,\nto an output which is estimated to be the original \u03b2\u2217. In the next subsection, we turn this discussion\ninto a provable recovery guarantee.\n\nRecovery Guarantees for the LBR algorithm\n\nWe state now our main result, explicitly stating the assumptions on the parameters, under which the\nLBR algorithm recovers exactly \u03b2\u2217 from bounded but adversarial noise W .\nTheorem 2.5.A. Under Assumption 1 and assuming W \u2208 [\u2212\u03c3, \u03c3]n for some \u03c3 \u2265 0, the following\nholds. Suppose \u02c6Q is a multiple of Q, \u02c6R \u2265 R and\n\nN >\n\n(2n + p)\n\n2n + p + 10 log \u02c6Q + 10 log\n\n(3)\nFor any \u02c6W \u2265 \u03c3, the LBR algorithm with input (Y, X, N, \u02c6Q, \u02c6R, \u02c6W ) terminates with \u02c6\u03b2\u2217 = \u03b2\u2217 w.p.\n1 \u2212 O\n\n(whp as p \u2192 +\u221e) and in time polynomial in n, p, N, log \u02c6R, log \u02c6W and log \u02c6Q.\n\n+ 20 log(3 (1 + c) np)\n\n2N \u03c3 + \u02c6Rp\n\n(cid:16) 1\n\n(cid:17)\n\n.\n\n1\n2\n\nnp\n\n(cid:16)\n\n(cid:16)\n\nApplying Theorem 2.5.A we establish the following result handling random noise W .\nTheorem 2.5.B. Under Assumption 1 and assuming W \u2208 Rn is a vector with iid entries generating\naccording to an, independent from X, distribution W on R with mean zero and variance at most \u03c32\nfor some \u03c3 \u2265 0 the following holds. Suppose that \u02c6Q is a multiple of Q, \u02c6R \u2265 R, and\n\n1\n2\n\n(2n + p)\n\nN >\n\n(cid:17)\nFor any \u02c6W \u2265 \u221a\n1 \u2212 O\n\n(cid:16) 1\n\nnp\n\n2n + p + 10 log \u02c6Q + 10 log\n\n(4)\nnp\u03c3 the LBR algorithm with input (Y, X, N, \u02c6Q, \u02c6R, \u02c6W ) terminates with \u02c6\u03b2\u2217 = \u03b2\u2217 w.p.\n\n+ 20 log(3 (1 + c) np)\n\nnp\u03c3 + \u02c6Rp\n\n.\n\n(whp as p \u2192 +\u221e) and in time polynomial in n, p, N, log \u02c6R, log \u02c6W and log \u02c6Q.\n\n(cid:16)\n\n2N\u221a\n\n(cid:16)\n\n(cid:17)\n\n(cid:17)\n\n(cid:17)\n\n(cid:17)\n\nNoise tolerance of the LBR algorithm\n\nThe assumptions (2) and (4) might make it hard to build an intuition for the truncation level the LBR\nalgorithm provably works. For this reason, in this subsection we simplify it and state a Proposition\n\n7\n\n\fexplicitly mentioning the optimal truncation level and hence characterizing the optimal level of noise\nthat the LBR algorithm can tolerate with n samples.\nFirst note that in the statements of Theorem 2.5.A and Theorem 2.5.B the only parameters that\nare assumed to grow are p and whichever other parameter is implied to grow because of (2) and\n(4). Therefore, importantly, n does not necessarily grow to in\ufb01nity, if for example N, 1\n\u03c3 grow\nappropriately with p. That means that Theorem 2.5.A and Theorem 2.5.B imply non-trivial guarantees\nfor arbitrary sample size n. The proposition below shows that if \u03c3 is at most exponential in \u2212(1 +\nfor some \u0001 > 0, then for appropriately chosen truncation level N\n\u0001)\nthe LBR algorithm recovers exactly the vector \u03b2\u2217 with n samples. In particular, with one sample\n\n(cid:104) (p+2n)2\n(n = 1) LBR algorithm tolerates noise level up to exponential in \u2212(1 + \u0001)(cid:2)p2/2 + (2 + p) log(QR)(cid:3)\n\n2n + (2 + p\n\nn ) log (RQ)\n\n(cid:105)\n\nfor some \u0001 > 0. On the other hand, if n = \u0398(p) and log (RQ) = o(p), the LBR algorithm tolerates\nnoise level up to exponential in \u2212O(p).\nProposition 2.6. Under Assumption 1 and assuming W \u2208 Rn is a vector with iid entries generating\n(cid:21)\naccording to an, independent from X, distribution W on R with mean zero and variance at most \u03c32\n(cid:17)\nfor some \u03c3 \u2265 0, the following holds. Suppose for some \u0001 > 0, \u03c3 \u2264 2\nn ) log(RQ)\nThen the LBR algorithm with input Y, X, \u02c6Q = Q, \u02c6R = R, \u02c6W\u221e = 1, features p \u2265 300\n\n(cid:16) 300\n\n2n +(2+ p\n\n\u2212(1+\u0001)\n\n(p+2n)2\n\nlog\n\n(cid:20)\n\n.\n\nand N satisfying log(cid:0) 1\n\n(cid:1) \u2265 N \u2265 (1 + \u0001)\n\n\u0001\n\n(1+c)\u0001\n\n, terminates with \u02c6\u03b2\u2217 = \u03b2\u2217\n\n(cid:104) (p+2n)2\n\n2n + (2 + p\n\nn ) log (RQ)\n\nw.p. 1 \u2212 O\n\n(whp as p \u2192 +\u221e) and in time polynomial in n, p, N, log \u02c6R, log \u02c6W and log \u02c6Q.\n\n(cid:17)\n\n(cid:16) 1\n\nnp\n\n\u03c3\n\n(cid:105)\n\nInformation Theoretic Bounds\n\nIn this subsection, we discuss the maximum noise that can be tolerated information-theoretically in\nrecovering a \u03b2\u2217 \u2208 [\u2212R, R]p satisfying the Q-rationality assumption. We establish that under Gaussian\nwhite noise, any successful recovery mechanism can tolerate noise level at most exponentially small\nin \u2212 [p log (QR) /n].\nProposition 2.7. Suppose that X \u2208 Rn\u00d7p is a vector with iid entries following a continuous\ndistribution D with E[|V |] < +\u221e, where V d= D, \u03b2\u2217 \u2208 [\u2212R, R]p satis\ufb01es the Q-rationality\nassumption, W \u2208 Rn has iid N (0, \u03c32) entries and Y = X\u03b2\u2217 + W . Suppose furthermore that\n2 . Then there is no mechanism which, whp as p \u2192 +\u221e, recovers\n\n\u03c3 > R(np)3(cid:16)\n\n(cid:17)\u2212 1\n\n\u2212 1\n\n2p log(2QR+1)\n\n2\n\nn\n\nexactly \u03b2\u2217 with knowledge of Y, X, Q, R, \u03c3. That is, for any \u02c6\u03b2\u2217 = \u02c6\u03b2\u2217 (Y, X, Q, R, \u03c3) we have\n\nP(cid:16) \u02c6\u03b2\u2217 = \u03b2\u2217(cid:17)\n\n< 1.\n\nlim sup\np\u2192+\u221e\n\nSharp Optimality of the LBR Algorithm\n\nUsing Propositions 2.6 and 2.7 the following sharp result is established.\nProposition 2.8. Under Assumptions 1 where W \u2208 Rn is a vector with iid N (0, \u03c32) entries the\nfollowing holds. Suppose that n = o\nand\n\u0001 > 0:\n\nand RQ = 2\u03c9(p). Then for \u03c30 := 2\u2212 p log(RQ)\n\nlog p\n\nn\n\n(cid:16) p\n\n(cid:17)\n\n,then the w.h.p. exact recovery of \u03b2\u2217 from the knowledge of Y, X, Q, R, \u03c3 is\n\n, then the w.h.p. exact recovery of \u03b2\u2217 from the knowledge of Y, X, Q, R, \u03c3 is\n\npossible by the LBR algorithm.\n\n\u2022 if \u03c3 > \u03c31\u2212\u0001\n0\nimpossible.\n\u2022 if \u03c3 < \u03c31+\u0001\n\n0\n\n3 Synthetic Experiments\n\nIn this section we present an experimental analysis of the ELO and LBR algorithms.\nELO algorithm: We focus on p = 30 features sample sizes n = 1, n = 10 and n = 30, R = 100 and\nzero-noise W = 0. Each entry of \u03b2\u2217 is iid Unif ({1, 2, . . . , R = 100}). For 10 values of \u03b1 \u2208 (0, 3),\n\n8\n\n\fUnif(cid:0){1, 2, 3, . . . , 2N}(cid:1) for N = p2\n\nspeci\ufb01cally \u03b1 \u2208 {0.25, 0.5, 0.75, 1, 1.3, 1.6, 1.9, 2.25, 2.5, 2.75}, we generate the entries of X iid\n2\u03b1n. For each combination of n, \u03b1 we generate 20 independent\ninstances of inputs. We plot in Figure 1 the fractions of instances where the output of the ELO\nalgorithm outputs exactly \u03b2\u2217 and the average termination time of the algorithm.\n\nFigure 1: Average performance and runtime of ELO over 20 instances with p = 30 features\nand n = 1, 10, 30 samples.\n\nComments: First, we observe that importantly the algorithm recovers the vectors correctly on all\n\u03b1 < 1-instances with p = 30 features, even if our theoretical guarantees are only for large enough\np. Second, Theorem 2.1 implies that if N > (2n + p)2 /2n and large p, ELO recovers \u03b2\u2217, with\nhigh probability. In the experiments we observe that indeed ELO algorithm works in that regime,\nas then \u03b1 = p2\n2nN < 1. Also the experiments show that ELO works for larger values of \u03b1. Finally,\nthe termination time of the algorithm was on average 1 minute and worst case 5 minutes, granting it\nreasonable for many applications.\nLBR algorithm: We focus on p = 30 features, n = 10 samples, Q = 1 and R = 100. We generate\neach entry of \u03b2\u2217 w.p. 0.5 equal to zero and w.p. 0.5, Unif ({1, 2, . . . , R = 100}). We generate\nthe entries of X iid U (0, 1) and of W iid U (\u2212\u03c3, \u03c3) for \u03c3 \u2208 {0, e\u221220, e\u221212, e\u22124}. We generate 20\nindependent instances for any combination of \u03c3 and truncation level N. We plot the fraction of\ninstances where the output of LBR algorithm is exactly \u03b2\u2217.\n\nFigure 2: Average performance of LBR algorithm\nfor various noise and truncation levels.\n\nComments: The experiments show that, \ufb01rst LBR works correctly in many cases for the moderate\nvalue of p = 30 and second that there is indeed an appropriate tuned truncation level (2n + p)2/2n <\nN < log (1/\u03c3) for which LBR succeeds. The latter is in exact agreement with Proposition 2.6.\n\nAcknowledgments\n\nThe authors would like to gratefully aknowledge the work of Patricio Foncea and Andrew Zheng\non performing the synthetic experiments for the ELO and LBR algorithms, as part of a project for a\ngraduate-level class at MIT, during Spring 2018.\n\n9\n\n\fReferences\nBora, A., Jalal, A., Price, E., and Dimakis, A. G. (2017). Compressed sensing using generative\nmodels. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017,\npages 537\u2013546.\n\nBorno, M. A. (2011). Reduction in solving some integer least squares problems. arXiv Preprint.\n\nBrunel, L. and Boutros, J. (1999). Euclidean space lattice decoding for joint detection in cdma\nsystems. In Proceedings of the 1999 IEEE Information Theory and Communications Workshop\n(Cat. No. 99EX253).\n\nCandes, E. J., Eldar, Y. C., Strohmer, T., and Voroninski, V. (2015). Phase retrieval via matrix\n\ncompletion. SIAM review, 57(2):225\u2013251.\n\nCandes, E. J., Romberg, J. K., and Tao, T. (2006). Stable signal recovery from incomplete and\ninaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207\u20131223.\n\nChen, S. S., Donoho, D. L., and Saunders, M. A. (2001). Atomic decomposition by basis pursuit.\n\nSIAM Rev., 43(1):129\u2013159.\n\nCover, T. M. and Thomas, J. A. (2006). Elements of Information Theory (Wiley Series in Telecommu-\n\nnications and Signal Processing). Wiley-Interscience.\n\nDonoho, D. and Tanner, J. (2009). Observed universality of phase transitions in high-dimensional\ngeometry, with implications for modern data analysis and signal processing. Philosophical\nTransactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences,\n367(1906):4273\u20134293.\n\nDonoho, D. L. (2006). Compressed sensing. IEEE Transactions on information theory, 52(4):1289\u2013\n\n1306.\n\nDonoho, D. L., Javanmard, A., and Montanari, A. (2013). Information-theoretically optimal com-\npressed sensing via spatial coupling and approximate message passing. IEEE Transactions on\nInformation Theory, 59(11):7434\u20137464.\n\nDonoho, D. L. and Tanner, J. (2005). Neighborliness of randomly projected simplices in high\ndimensions. Proceedings of the National Academy of Sciences of the United States of America,\n102(27):9452\u20139457.\n\nDonoho, D. L. and Tanner, J. (2006). Counting faces of randomly-projected polytopes when then\n\nprojection radically lowers dimension.\n\nErdos, P. and Lorentz, G. (1985). On the probability that n and g(n) are relatively prime. Acta Arith.,\n\n5:524\u2013531.\n\nFoucart, S. and Rauhut, H. (2013). A mathematical introduction to compressive sensing. Springer.\n\nFrieze, A. M. (1986). On the lagarias-odlyzko algorithm for the subset sum problem. SIAM J.\n\nComput., 15:536\u2013539.\n\nGamarnik, D. and Zadik, I. (2017a). High dimensional linear regression with binary coef\ufb01cients:\n\nMean squared error and a phase transition. Conference on Learning Theory (COLT).\n\nGamarnik, D. and Zadik, I. (2017b). Sparse high dimensional linear regression: Algorithmic barrier\n\nand a local search algorithm.\n\nHardy, G. and Wright, E. (1975). An Introduction to the Theory of Numbers. Oxford Science\n\nPublications, \ufb01fth edition edition.\n\nHassibi, A. and Boyd, S. (1998). Integer parameter estimation in linear models with applications to\n\ngps. IEEE Transactions on Signal Processing.\n\n10\n\n\fHassibi, B. and Vikalo, H. (2002). On the expected complexity of integer least-squares problems. In\n\n2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.\n\nKrzakala, F., M\u00e9zard, M., Sausset, F., Sun, Y. F., and Zdeborov\u00e1, L. (2012). Statistical-physics-based\n\nreconstruction in compressed sensing. Phys. Rev. X, 2:021005.\n\nLagarias, J. C. and Odlyzko, A. M. (1985). Solving low-density subset sum problems. Journal of the\n\nACM (JACM), 32(1):229\u2013246.\n\nLempel, A. (1979). Cryptology in transition. ACM Comput. Surv., 11(4):285\u2013303.\n\nLenstra, A. K., Lenstra, H. W., and Lov\u00e1sz, L. (1982). Factoring polynomials with rational coef\ufb01cients.\n\nMathematische Annalen, 261(4):515\u2013534.\n\nMerkle, R. and Hellman, M. (1978). Hiding information and signatures in trapdoor knapsacks. IEEE\n\nTransactions on Information Theory, 24(5):525\u2013530.\n\nShamir, A. (1982). A polynomial time algorithm for breaking the basic merkle-hellman cryptosystem.\n\nIn 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), pages 145\u2013152.\n\nWainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using\nconstrained quadratic programming (lasso). IEEE transactions on information theory, 55(5):2183\u2013\n2202.\n\n11\n\n\f", "award": [], "sourceid": 922, "authors": [{"given_name": "Ilias", "family_name": "Zadik", "institution": "MIT"}, {"given_name": "David", "family_name": "Gamarnik", "institution": "Massachusetts Institute of Technology"}]}