{"title": "Super-Resolution Off the Grid", "book": "Advances in Neural Information Processing Systems", "page_first": 2665, "page_last": 2673, "abstract": "Super-resolution is the problem of recovering a superposition of point sources using bandlimited measurements, which may be corrupted with noise. This signal processing problem arises in numerous imaging problems, ranging from astronomy to biology to spectroscopy, where it is common to take (coarse) Fourier measurements of an object. Of particular interest is in obtaining estimation procedures which are robust to noise, with the following desirable statistical and computational properties: we seek to use coarse Fourier measurements (bounded by some \\emph{cutoff frequency}); we hope to take a (quantifiably) small number of measurements; we desire our algorithm to run quickly. Suppose we have $k$ point sources in $d$ dimensions, where the points are separated by at least $\\Delta$ from each other (in Euclidean distance). This work provides an algorithm with the following favorable guarantees:1. The algorithm uses Fourier measurements, whose frequencies are bounded by $O(1/\\Delta)$ (up to log factors). Previous algorithms require a \\emph{cutoff frequency} which may be as large as $\\Omega(\\sqrt{d}/\\Delta)$.2. The number of measurements taken by and the computational complexity of our algorithm are bounded by a polynomial in both the number of points $k$ and the dimension $d$, with \\emph{no} dependence on the separation $\\Delta$. In contrast, previous algorithms depended inverse polynomially on the minimal separation and exponentially on the dimension for both of these quantities.Our estimation procedure itself is simple: we take random bandlimited measurements (as opposed to taking an exponential number of measurements on the hyper-grid). Furthermore, our analysis and algorithm are elementary (based on concentration bounds of sampling and singular value decomposition).", "full_text": "Super-Resolution Off the Grid\n\nQingqing Huang\n\nMIT,\nEECS,\nLIDS,\n\nqqh@mit.edu\n\nSham M. Kakade\n\nUniversity of Washington,\nDepartment of Statistics,\n\nComputer Science & Engineering,\nsham@cs.washington.edu\n\nAbstract\n\nSuper-resolution is the problem of recovering a superposition of point sources us-\ning bandlimited measurements, which may be corrupted with noise. This signal\nprocessing problem arises in numerous imaging problems, ranging from astron-\nomy to biology to spectroscopy, where it is common to take (coarse) Fourier mea-\nsurements of an object. Of particular interest is in obtaining estimation procedures\nwhich are robust to noise, with the following desirable statistical and computa-\ntional properties: we seek to use coarse Fourier measurements (bounded by some\ncutoff frequency); we hope to take a (quanti\ufb01ably) small number of measurements;\nwe desire our algorithm to run quickly.\nSuppose we have k point sources in d dimensions, where the points are separated\nby at least from each other (in Euclidean distance). This work provides an\nalgorithm with the following favorable guarantees:\n\n\u2022 The algorithm uses Fourier measurements, whose frequencies are bounded\nby O(1/) (up to log factors). Previous algorithms require a cutoff frequency\nwhich may be as large as \u2326(pd/).\n\n\u2022 The number of measurements taken by and the computational complexity of\nour algorithm are bounded by a polynomial in both the number of points k\nand the dimension d, with no dependence on the separation . In contrast,\nprevious algorithms depended inverse polynomially on the minimal separa-\ntion and exponentially on the dimension for both of these quantities.\n\nOur estimation procedure itself is simple: we take random bandlimited measure-\nments (as opposed to taking an exponential number of measurements on the hyper-\ngrid). Furthermore, our analysis and algorithm are elementary (based on concen-\ntration bounds for sampling and the singular value decomposition).\n\n1\n\nIntroduction\n\nWe follow the standard mathematical abstraction of this problem (Candes & Fernandez-Granda\n[4, 3]): consider a d-dimensional signal x(t) modeled as a weighted sum of k Dirac measures in Rd:\n\nx(t) =\n\nkXj=1\n\nwj\u00b5(j),\n\n(1)\n\nwhere the point sources, the \u00b5(j)\u2019s, are in Rd. Assume that the weights wj are complex valued,\nwhose absolute values are lower and upper bounded by some positive constant. Assume that we are\ngiven k, the number of point sources1.\n\n1An upper bound of the number of point sources suf\ufb01ces.\n\n1\n\n\fDe\ufb01ne the measurement function f (s) : Rd ! C to be the convolution of the point source x(t) with\na low-pass point spread function ei\u21e1~~ as below:\n\nf (s) =Zt2Rd\n\nei\u21e1x(dt) =\n\nwjei\u21e1<\u00b5(j),s>.\n\nkXj=1\n\nIn the noisy setting, the measurements are corrupted by uniformly bounded perturbation z:\n\n(2)\n\n(3)\n\nef (s) = f (s) + z(s),\n\n|z(s)| \uf8ff \u270fz,8s.\n\nSuppose that we are only allowed to measure the signal x(t) by evaluating the measurement function\n\nef (s) at any s 2 Rd, and we want to recover the parameters of the point source signal, i.e., {wj, \u00b5(j) :\nj 2 [k]}. We follow the standard normalization to assume that:\nLet wmin = minj |wj| denote the minimal weight, and let be the minimal separation of the point\nsources de\ufb01ned as follows:\n\n|wj| 2 [0, 1] 8j 2 [k].\n\n\u00b5(j) 2 [1, +1]d,\n\n = min\n\nj6=j0 k\u00b5(j) \u00b5(j0)k2,\n\n(4)\n\nwhere we use the Euclidean distance between the point sources for ease of exposition2. These\nquantities are key parameters in our algorithm and analysis. Intuitively, the recovery problem is\nharder if the minimal separation is small and the minimal weight wmin is small.\nThe \ufb01rst question is that, given exact measurements, namely \u270fz = 0, where and how many measure-\nments should we take so that the original signal x(t) can be exactly recovered.\nDe\ufb01nition 1.1 (Exact recovery). In the exact case, i.e. \u270fz = 0, we say that an algorithm achieves\nexact recovery with m measurements of the signal x(t) if, upon input of these m measurements, the\nalgorithm returns the exact set of parameters {wj, \u00b5(j) : j 2 [k]}.\nMoreover, we want the algorithm to be measurement noise tolerant, in the sense that in the presence\nof measurement noise we can still recover good estimates of the point sources.\nDe\ufb01nition 1.2 (Stable recovery). In the noisy case, i.e., \u270fz 0, we say that an algorithm achieves\nstable recovery with m measurements of the signal x(t) if, upon input of these m measurements, the\nalgorithm returns estimates {bwj,b\u00b5(j) : j 2 [k]} such that\n\nmaxnkb\u00b5(j) \u00b5(\u21e1(j))k2 : j 2 [k]o \uf8ff poly(d, k)\u270fz,\n\nwhere the min is over permutations \u21e1 on [k] and poly(d,k) is a polynomial function in d and k.\n\nmin\n\n\u21e1\n\nBy de\ufb01nition, if an algorithm achieves stable recovery with m measurements, it also achieves exact\nrecovery with these m measurements.\nThe terminology of \u201csuper-resolution\u201d is appropriate due to the following remarkable result (in the\nnoiseless case) of Donoho [9]: suppose we want to accurately recover the point sources to an error\nof , where \u2327 . Naively, we may expect to require measurements whose frequency depends\ninversely on the desired the accuracy . Donoho [9] showed that it suf\ufb01ces to obtain a \ufb01nite number\nof measurements, whose frequencies are bounded by O(1/), in order to achieve exact recovery;\nthus resolving the point sources far more accurately than that which is naively implied by using\nfrequencies of O(1/). Furthermore, the work of Candes & Fernandez-Granda [4, 3] showed that\nstable recovery, in the univariate case (d = 1), is achievable with a cutoff frequency of O(1/)\nusing a convex program and a number of measurements whose size is polynomial in the relevant\nquantities.\n\n2Our claims hold withut using the \u201cwrap around metric\u201d, as in [4, 3], due to our random sampling. Also, it\n\nis possible to extend these results for the `p-norm case.\n\n2\n\n\fd = 1\n\ncutoff freq\n\nmeasurements\n\nruntime\n\nd 1\ncutoff freq measurements\n\n)d\n\n( 1\n1\n-\n\nruntime\n\n)d, k)\n\npoly(( 1\n1\n-\n\nSDP\n\nMP\nOurs\n\n1\n\n\n1\n\n\n1\n\n\nk log(k) log( 1\n )\n\n1\n\n\n(k log(k))2\n\npoly( 1\n\n , k)\n( 1\n )3\n\n(k log(k))2\n\nCd\n1\n-\n\nlog(kd)\n\n\n\n(k log(k) + d)2\n\n(k log(k) + d)2\n\nSee Section 1.2 for description. See Lemma 2.3 for details about the cutoff frequency.\n\nTable 1:\nHere, we are implicitly using O(\u00b7) notation.\n\n1.1 This work\n\nWe are interested in stable recovery procedures with the following desirable statistical and com-\nputational properties: we seek to use coarse (low frequency) measurements; we hope to take a\n(quanti\ufb01ably) small number of measurements; we desire our algorithm run quickly. Informally, our\nmain result is as follows:\nTheorem 1.3 (Informal statement of Theorem 2.2). For a \ufb01xed probability of error, the proposed\nalgorithm achieves stable recovery with a number of measurements and with computational runtime\nthat are both on the order of O((k log(k) + d)2). Furthermore, the algorithm makes measurements\nwhich are bounded in frequency by O(1/) (ignoring log factors).\n\nNotably, our algorithm and analysis directly deal with the multivariate case, with the univariate case\nas a special case. Importantly, the number of measurements and the computational runtime do not\ndepend on the minimal separation of the point sources. This may be important even in certain low\ndimensional imaging applications where taking physical measurements are costly (indeed, super-\nresolution is important in settings where is small). Furthermore, our technical contribution of how\nto decompose a certain tensor constructed with Fourier measurements may be of broader interest to\nrelated questions in statistics, signal processing, and machine learning.\n\n1.2 Comparison to related work\n\nTable 1 summarizes the comparisons between our algorithm and the existing results. The multi-\ndimensional cutoff frequency we refer to in the table is the maximal coordinate-wise entry of any\nmeasurement frequency s (i.e. ksk1). \u201cSDP\u201d refers to the semide\ufb01nite programming (SDP) based\nalgorithms of Candes & Fernandez-Granda [3, 4]; in the univariate case, the number of measure-\nments can be reduced by the method in Tang et. al. [23] (this is re\ufb02ected in the table). \u201cMP\u201d refers\nto the matrix pencil type of methods, studied in [14] and [15] for the univariate case. Here, we are\nde\ufb01ning the in\ufb01nity norm separation as 1 = minj6=j0 k\u00b5(j) \u00b5(j0)k1, which is understood as the\nwrap around distance on the unit circle. Cd 1 is a problem dependent constant (discussed below).\nObserve the following differences between our algorithm and prior work:\n\n1) Our minimal separation is measured under the `2-norm instead of the in\ufb01nity norm, as in the\nSDP based algorithm. Note that 1 depends on the coordinate system; in the worst case, it can\nunderestimate the separation by a 1/pd factor, namely 1 \u21e0 /pd.\n\n2) The computation complexity and number of measurements are polynomial in dimension d and\nthe number of point sources k, and surprisingly do not depend on the minimal separation of the\npoint sources! Intuitively, when the minimal separation between the point sources is small, the\nproblem should be harder, this is only re\ufb02ected in the sampling range and the cutoff frequency\nof the measurements in our algorithm.\n\n3) Furthermore, one could project the multivariate signal to the coordinates and solve multiple uni-\nvariate problems (such as in [19, 17], which provided only exact recovery results). Naive random\nprojections would lead to a cutoff frequency of O(pd/).\n\n3\n\n\fSDP approaches: The work in [3, 4, 10] formulates the recovery problem as a total-variation min-\nimization problem; they then show the dual problem can be formulated as an SDP. They focused\non the analysis of d = 1 and only explicitly extend the proofs for d = 2. For d 1, Ingham-type\ntheorems (see [20, 12]) suggest that Cd = O(pd).\nThe number of measurements can be reduced by the method in [23] for the d = 1 case, which is\nnoted in the table. Their method uses sampling \u201coff the grid\u201d; technically, their sampling scheme is\nactually sampling random points from the grid, though with far fewer measurements.\nMatrix pencil approaches: The matrix pencil method, MUSIC and Prony\u2019s method are essentially\nthe same underlying idea, executed in different ways. The original Prony\u2019s method directly attempts\nto \ufb01nd roots of a high degree polynomial, where the root stability has few guarantees. Other methods\naim to robustify the algorithm.\nRecently, for the univariate matrix pencil method, Liao & Fannjiang [14] and Moitra [15] provide a\nstability analysis of the MUSIC algorithm. Moitra [15] studied the optimal relationship between the\ncutoff frequency and , showing that if the cutoff frequency is less than 1/, then stable recovery\nis not possible with matrix pencil method (with high probability).\n\n1.3 Notation\nLet R, C, and Z to denote real, complex, and natural numbers. For d 2 Z, [d] denotes the set\n[d] = {1, . . . , d}. For a set S, |S| denotes its cardinality. We use to denote the direct sum of sets,\nnamely S1 S2 = {(a + b) : a 2 S1, b 2 S2}.\nLet en to denote the n-th standard basis vector in Rd, for n 2 [d]. Let P d\nR,2 = {x 2 Rd : kxk2 = 1}\nto denote the d-sphere of radius R in the d-dimensional standard Euclidean space.\nDenote the condition number of a matrix X 2 Rm\u21e5n as cond2(X) = max(X)/min(X), where\nmax(X) and min(X) are the maximal and minimal singular values of X.\nWe use \u2326 to denote tensor product. Given matrices A, B, C 2 Cm\u21e5k, the tensor product V =\nA \u2326 B \u2326 C 2 Cm\u21e5m\u21e5m is equivalent to Vi1,i2,i3 = Pk\nn=1 Ai1,nBi2,nCi3,n. Another view of\ntensor is that it de\ufb01nes a multi-linear mapping. For given dimension mA, mB, mC the mapping\nV (\u00b7,\u00b7,\u00b7) : Cm\u21e5mA \u21e5 Cm\u21e5mB \u21e5 Cm\u21e5mC ! CmA\u21e5mB\u21e5mC is de\ufb01ned as:\n\n[V (XA, XB, Xc)]i1,i2,i3 = Xj1,j2,j32[m]\n\nVj1,j2,j3[XA]j1,i1[XB]j2,i2[XC]j3,i3.\n\nIn particular, for a 2 Cm, we use V (I, I, a) to denote the projection of tensor V along the 3rd\ndimension. Note that if the tensor admits a decomposition V = A \u2326 B \u2326 C, it is straightforward to\nverify that\n\nV (I, I, a) = ADiag(C>a)B>.\n\nIt is well-known that if the factors A, B, C have full column rank then the rank k decomposition\nis unique up to re-scaling and common column permutation. Moreover, if the condition number\nof the factors are upper bounded by a positive constant, then one can compute the unique tensor\ndecomposition V with stability guarantees (See [1] for a review. Lemma 2.5 herein provides an\nexplicit statement.).\n\n2 Main Results\n\n2.1 The algorithm\n\nWe brie\ufb02y describe the steps of Algorithm 1 below:\n(Take measurements) Given positive numbers m and R, randomly draw a sampling set S =\ns(1), . . . s(m) of m i.i.d. samples of the Gaussian distribution N (0, R2Id\u21e5d). Form the set\nS0 = S [ {s(m+1) = e1, . . . , s(m+d) = ed, s(m+d+1) = 0} \u21e2 Rd. Denote m0 = m + d + 1.\nTake another independent random sample v from the unit sphere, and de\ufb01ne v(1) = v, v(2) = 2v.\n\n4\n\n\f1. Take measurements:\n\nInput: R, m, noisy measurement function ef (\u00b7).\nOutput: Estimates {bwj,b\u00b5(j) : j 2 [k]}.\nLet S = {s(1), . . . , s(m)} be m i.i.d. samples from the Gaussian distribution N (0, R2Id\u21e5d).\nSet s(m+n) = en for all n 2 [d] and s(m+n+1) = 0. Denote m0 = m + d + 1.\nTake another random samples v from the unit sphere, and set v(1) = v and v(2) = 2v.\n\nConstruct a tensor eF 2 Cm0\u21e5m0\u21e53: eFn1,n2,n3 = ef (s)s=s(n1)+s(n2)+v(n3).\n2. Tensor Decomposition: Set (bVS0,bDw) = TensorDecomp(eF ).\nFor j = 1, . . . , k, set [bVS0]j = [bVS0]j/[bVS0]m0,j\n3. Read of estimates: For j = 1, . . . , k, setb\u00b5(j) = Real(log([bVS][m+1:m+d,j])/(i\u21e1)).\n4. SetcW = arg minW2Ck kbF bVS0 \u2326bVS0 \u2326bVdDwkF .\nConstruct the 3rd order tensor eF 2 Cm0\u21e5m0\u21e53 with noise corrupted measurements ef (s) evaluated\nat the points in S0 S0 {v(1), v(2)}, arranged in the following way:\n\nAlgorithm 1: General algorithm\n\n(Tensor decomposition) De\ufb01ne the characteristic matrix VS to be:\n\neFn1,n2,n3 = ef (s)s=s(n1)+s(n2)+v(n3),8n1, n2 2 [m0], n3 2 [2].\n377775\nVS =266664\n\nei\u21e1<\u00b5(1),s(1)> . . .\nei\u21e1<\u00b5(1),s(2)> . . .\n\n. . .\nei\u21e1<\u00b5(1),s(m)> . . .\n\nei\u21e1<\u00b5(k),s(1)>\nei\u21e1<\u00b5(k),s(2)>\n\nei\u21e1<\u00b5(k),s(m)>\n\n...\n\n...\n\n.\n\n(5)\n\n(6)\n\n(7)\n\nand de\ufb01ne matrix V 0 2 Cm0\u21e5k to be\n\nVS0 =\"\nwhere Vd 2 Cd\u21e5k is de\ufb01ned in (17). De\ufb01ne\n\nVS\nVd\n\n1, . . . , 1 # ,\n\nV2 =24\n\nei\u21e1<\u00b5(1),v(1)> . . .\nei\u21e1<\u00b5(1),v(2)> . . .\n. . .\n\n1\n\nei\u21e1<\u00b5(k),v(1)>\nei\u21e1<\u00b5(k),v(2)>\n\n1\n\n35 .\n\nF = VS0 \u2326 VS0 \u2326 (V2Dw),\n\nNote that in the exact case (\u270fz = 0) the tensor F constructed in (5) admits a rank-k decomposition:\n(8)\nAssume that VS0 has full column rank, then this tensor decomposition is unique up to column\npermutation and rescaling with very high probability over the randomness of the random unit vector\nv. Since each element of VS0 has unit norm, and we know that the last row of VS0 and the last row\nof V2 are all ones, there exists a proper scaling so that we can uniquely recover wj\u2019s and columns\nof VS0 up to common permutation.\nIn this paper, we adopt Jennrich\u2019s algorithm (see Algorithm 2) for tensor decomposition. Other\nalgorithms, for example tensor power method ([1]) and recursive projection ([24]), which are pos-\nsibly more stable than Jennrich\u2019s algorithm, can also be applied here.\n(Read off estimates) Let log(Vd) denote the element-wise logarithm of Vd. The estimates of the\npoint sources are given by:\n\nh\u00b5(1), \u00b5(2), . . . , \u00b5(k)i =\n\nlog(Vd)\n\ni\u21e1\n\n.\n\n5\n\n\fInput: Tensor eF 2 Cm\u21e5m\u21e53, rank k.\noutput: FactorbV 2 Cm\u21e5k.\n1. Compute the truncated SVD of eF (I, I, e1) = bPb\u21e4bP > with the k leading singular values.\n2. Set bE = eF (bP ,bP , I). Set bE1 = bE(I, I, e1) and bE2 = bE(I, I, e2).\n3. Let the columns of bU be the eigenvectors of bE1bE1\n4. SetbV = pmbPbU.\n\ncorresponding to the k eigenvalues\n\nAlgorithm 2: TensorDecomp\n\nwith the largest absolute value.\n\n2\n\nRemark 2.1. In the toy example, the simple algorithm corresponds to using the sampling set S0 =\n{e1, . . . , ed}. The conventional univariate matrix pencil method corresponds to using the sampling\nset S0 = {0, 1, . . . , m} and the set of measurements S0 S0 S0 corresponds to the grid [m]3.\n\n2.2 Guarantees\n\nIn this section, we discuss how to pick the two parameters m and R and prove that the proposed\nalgorithm indeed achieves stable recovery in the presence of measurement noise.\nTheorem 2.2 (Stable recovery). There exists a universal constant C such that the following holds.\nFix \u270fx, s, v 2 (0, 1\n2 );\npick m such that m maxn k\nfor d = 1, pick R \nAssume the bounded measurement noise model as in (3) and that \u270fz \uf8ff vw2\nWith probability at least (1s) over the random sampling of S, and with probability at least (1v)\nover the random projections in Algorithm 2, the proposed Algorithm 1 returns an estimation of the\n\n100pdk5 \u21e3 12\u270fx\n1+2\u270fx\u23182.5\n\n\u270fxq8 log k\n\n; for d 2, pick R \n\np2 log(1+2/\u270fx)\n\n, do;\n\n\u21e1\n\nmin\n\np2 log(k/\u270fx)\n\n\u21e1\n\n.\n\ns\n\n.\n\nwhere the min is over permutations \u21e1 on [k]. Moreover, the proposed algorithm has time complexity\nin the order of O((m0)3).\n\nThe next lemma shows that essentially, with overwhelming probability, all the frequencies taken\nconcentrate within the hyper-cube with cutoff frequency R0 on each coordinate, where R0 is compa-\nrable to R,\nLemma 2.3 (The cutoff frequency). For d > 1, with high probability, all of the 2(m0)2 sampling\nfrequencies in S0S0{v(1), v(2)} satisfy that ks(j1) +s(j2) +v(j3)k1 \uf8ff R0,\n8j1, j2 2 [m], j3 2\n[2], where the per-coordinate cutoff frequency is given by R0 = O(Rplog md).\nFor d = 1 case, the cutoff frequency R0 can be made to be in the order of R0 = O(1/).\nRemark 2.4 (Failure probability). Overall, the failure probability consists of two pieces: v for\nrandom projection of v, and s for random sampling to ensure the bounded condition number of VS.\nThis may be boosed to arbitrarily high probability through repetition.\n\n6\n\npoint source signalbx(t) =Pk\n\nj=1 bwjb\u00b5(j) with accuracy:\nmaxnkb\u00b5(j) \u00b5(\u21e1(j))k2 : j 2 [k]o \uf8ff C\n\nmin\n\n\u21e1\n\npdk5\nv\n\nwmax\nw2\n\nmin \u2713 1 + 2\u270fx\n1 2\u270fx\u25c62.5\n\n\u270fz,\n\n\f2.3 Key Lemmas\n\nStability of tensor decomposition: In this paragraph, we give a brief description and the stability\nguarantee of the well-known Jennrich\u2019s algorithm ([11, 13]) for low rank 3rd order tensor decompo-\nsition. We only state it for the symmetric tensors as appeared in the proposed algorithm.\nConsider a tensor F = V \u2326 V \u2326 (V2Dw) 2 Cm\u21e5m\u21e53 where the factor V has full column rank k.\nThen the decomposition is unique up to column permutation and rescaling, and Algorithm 2 \ufb01nds the\nfactors ef\ufb01ciently. Moreover, the eigen-decomposition is stable if the factor V is well-conditioned\nand the eigenvalues of FaF \u2020b are well separated.\nLemma 2.5 (Stability of Jennrich\u2019s algorithm). Consider the 3rd order tensor F = V \u2326 V \u2326\n(V2Dw) 2 Cm\u21e5m\u21e53 of rank k \uf8ff m, constructed as in Step 1 in Algorithm 1.\n\nGiven a tensor eF that is element-wise close to F , namely for all n1, n2, n3 2 [m],eFn1,n2,n3 \nFn1,n2,n3 \uf8ff \u270fz, and assume that the noise is small \u270fz \uf8ff\n100pdkwmaxcond2(V )5 . Use eF as the input\nto Algorithm 2. With probability at least (1 v) over the random projections v(1) and v(2), we can\nbound the distance between columns of the outputbV and that of V by:\npdk2\nj nkbVj V\u21e1(j)k2 : j 2 [k]o \uf8ff C\nv\n\nwhere C is a universal constant.\n\ncond2(V )5\u270fz,\n\nwmax\nw2\n\nvw2\n\nmax\n\nmin\n\n(9)\n\n\u21e1\n\nmin\n\nmin\n\nCondition number of VS0: The following lemma is helpful:\nLemma 2.6. Let VS0 2 C(m+d+1)\u21e5k be the factor as de\ufb01ned in (7). Recall that VS0 = [VS; Vd; 1],\nwhere Vd is de\ufb01ned in (17), and VS is the characteristic matrix de\ufb01ned in (6).\nWe can bound the condition number of VS0 by\n\ncond2(VS0) \uf8ffq1 + pkcond2(VS).\n\n(10)\n\nCondition number of the characteristic matrix VS: Therefore, the stability analysis of the pro-\nposed algorithm boils down to understanding the relation between the random sampling set S and\nthe condition number of the characteristic matrix VS. This is analyzed in Lemma 2.8 (main technical\nlemma).\nLemma 2.7. For any \ufb01xed number \u270fx 2 (0, 1/2). Consider a Gaussian vector s with distribution\np2 log(k/\u270fx)\nN (0, R2Id\u21e5d), where R \nfor d = 1. De\ufb01ne the\nHermitian random matrix Xs 2 Ck\u21e5k\nei\u21e1<\u00b5(1),s>\nei\u21e1<\u00b5(2),s>\n\nfor d 2, and R \n\np2 log(1+2/\u270fx)\n\n\u21e1\nherm to be\n\n\u21e1\n\nhei\u21e1<\u00b5(1),s>, ei\u21e1<\u00b5(2),s>, . . . ei\u21e1<\u00b5(k),s>i .\n\n(11)\n\nXs =266664\n\n...\n\nei\u21e1<\u00b5(k),s>\n\n377775\n\nWe can bound the spectrum of Es[Xs] by:\n\n(1 \u270fx)Ik\u21e5k Es[Xs] (1 + \u270fx)Ik\u21e5k.\n\n(12)\nLemma 2.8 (Main technical lemma). In the same setting of Lemma 2.7, Let S = {s(1), . . . , s(m)}\nbe m independent samples of the Gaussian vector s. For m k\n, with probability at\nleast 1 s over the random sampling, the condition number of the factor VS is bounded by:\n\n\u270fxq8 log k\n\ns\n\ncond2(VS) \uf8ffr 1 + 2\u270fx\n\n1 2\u270fx\n\n.\n\n(13)\n\n7\n\n\fAcknowledgments\n\nThe authors thank Rong Ge and Ankur Moitra for very helpful discussions. Sham Kakade ac-\nknowledges funding from the Washington Research Foundation for innovation in Data-intensive\nDiscovery.\n\nReferences\n[1] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky. Tensor decompositions for\nlearning latent variable models. The Journal of Machine Learning Research, 15(1):2773\u20132832,\n2014.\n\n[2] A. Anandkumar, D. Hsu, and S. M. Kakade. A method of moments for mixture models and\n\nhidden markov models. arXiv preprint arXiv:1203.0683, 2012.\n\n[3] E. J. Cand`es and C. Fernandez-Granda. Super-resolution from noisy data. Journal of Fourier\n\nAnalysis and Applications, 19(6):1229\u20131254, 2013.\n\n[4] E. J. Cand`es and C. Fernandez-Granda. Towards a mathematical theory of super-resolution.\n\nCommunications on Pure and Applied Mathematics, 67(6):906\u2013956, 2014.\n\n[5] Y. Chen and Y. Chi. Robust spectral compressed sensing via structured matrix completion.\n\nInformation Theory, IEEE Transactions on, 60(10):6576\u20136601, 2014.\n\n[6] S. Dasgupta. Learning mixtures of gaussians. In Foundations of Computer Science, 1999. 40th\n\nAnnual Symposium on, pages 634\u2013644. IEEE, 1999.\n\n[7] S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss.\n\nRandom structures and algorithms, 22(1):60\u201365, 2003.\n\n[8] S. Dasgupta and L. J. Schulman. A two-round variant of em for gaussian mixtures. In Pro-\nceedings of the Sixteenth conference on Uncertainty in arti\ufb01cial intelligence, pages 152\u2013159.\nMorgan Kaufmann Publishers Inc., 2000.\n\n[9] D. L. Donoho. Superresolution via sparsity constraints. SIAM Journal on Mathematical Anal-\n\nysis, 23(5):1309\u20131331, 1992.\n\n[10] C. Fernandez-Granda. A Convex-programming Framework for Super-resolution. PhD thesis,\n\nStanford University, 2014.\n\n[11] R. A. Harshman. Foundations of the parafac procedure: Models and conditions for an \u201dex-\n\nplanatory\u201d multi-modal factor analysis. 1970.\n\n[12] V. Komornik and P. Loreti. Fourier series in control theory. Springer Science & Business\n\nMedia, 2005.\n\n[13] S. Leurgans, R. Ross, and R. Abel. A decomposition for three-way arrays. SIAM Journal on\n\nMatrix Analysis and Applications, 14(4):1064\u20131083, 1993.\n\n[14] W. Liao and A. Fannjiang. Music for single-snapshot spectral estimation: Stability and super-\n\nresolution. Applied and Computational Harmonic Analysis, 2014.\n\n[15] A. Moitra. The threshold for super-resolution via extremal functions.\n\narXiv:1408.1681, 2014.\n\narXiv preprint\n\n[16] E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden markov models.\n\nIn\nProceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 366\u2013\n375. ACM, 2005.\n\n[17] S. Nandi, D. Kundu, and R. K. Srivastava. Noise space decomposition method for two-\ndimensional sinusoidal model. Computational Statistics & Data Analysis, 58:147\u2013161, 2013.\n[18] K. Pearson. Contributions to the mathematical theory of evolution. Philosophical Transactions\n\nof the Royal Society of London. A, pages 71\u2013110, 1894.\n\n[19] D. Potts and M. Tasche. Parameter estimation for nonincreasing exponential sums by prony-\n\nlike methods. Linear Algebra and its Applications, 439(4):1024\u20131039, 2013.\n\n[20] D. L. Russell. Controllability and stabilizability theory for linear partial differential equations:\n\nrecent progress and open questions. Siam Review, 20(4):639\u2013739, 1978.\n\n8\n\n\f[21] A. Sanjeev and R. Kannan. Learning mixtures of arbitrary gaussians. In Proceedings of the\n\nthirty-third annual ACM symposium on Theory of computing, pages 247\u2013257. ACM, 2001.\n\n[22] G. Schiebinger, E. Robeva, and B. Recht. Superresolution without separation. arXiv preprint\n\narXiv:1506.03144, 2015.\n\n[23] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht. Compressed sensing off the grid. Information\n\nTheory, IEEE Transactions on, 59(11):7465\u20137490, 2013.\n\n[24] S. S. Vempala and Y. F. Xiao. Max vs min: Independent component analysis with nearly linear\n\nsample complexity. arXiv preprint arXiv:1412.2954, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1552, "authors": [{"given_name": "Qingqing", "family_name": "Huang", "institution": "MIT"}, {"given_name": "Sham", "family_name": "Kakade", "institution": "University of Washington"}]}~~