{"title": "Structure-Blind Signal Recovery", "book": "Advances in Neural Information Processing Systems", "page_first": 4817, "page_last": 4825, "abstract": "We consider the problem of recovering a signal observed in Gaussian noise. If the set of signals is convex and compact, and can be specified beforehand, one can use classical linear estimators that achieve a risk within a constant factor of the minimax risk. However, when the set is unspecified, designing an estimator that is blind to the hidden structure of the signal remains a challenging problem. We propose a new family of estimators to recover signals observed in Gaussian noise. Instead of specifying the set where the signal lives, we assume the existence of a well-performing linear estimator. Proposed estimators enjoy exact oracle inequalities and can be efficiently computed through convex optimization. We present several numerical illustrations that show the potential of the approach.", "full_text": "Structure-Blind Signal Recovery\n\nDmitry Ostrovsky\u2217 Zaid Harchaoui\u2020 Anatoli Juditsky\u2217 Arkadi Nemirovski\u2021\n\nfirstname.lastname@imag.fr\n\nAbstract\n\nWe consider the problem of recovering a signal observed in Gaussian noise. If\nthe set of signals is convex and compact, and can be speci\ufb01ed beforehand, one\ncan use classical linear estimators that achieve a risk within a constant factor of\nthe minimax risk. However, when the set is unspeci\ufb01ed, designing an estimator\nthat is blind to the hidden structure of the signal remains a challenging problem.\nWe propose a new family of estimators to recover signals observed in Gaussian\nnoise. Instead of specifying the set where the signal lives, we assume the existence\nof a well-performing linear estimator. Proposed estimators enjoy exact oracle\ninequalities and can be ef\ufb01ciently computed through convex optimization. We\npresent several numerical illustrations that show the potential of the approach.\n\n1\n\nIntroduction\n\nWe consider the problem of recovering a complex-valued signal (xt)t\u2208Z from the noisy observations\n\ny\u03c4 = x\u03c4 + \u03c3\u03b6\u03c4 , \u2212n \u2264 \u03c4 \u2264 n.\n\n(1)\nHere n \u2208 Z+, and \u03b6\u03c4 \u223c CN (0, 1) are i.i.d. standard complex-valued Gaussian random variables,\n0 \u223c N (0, 1). Our goal is to recover xt, 0 \u2264 t \u2264 n, given\nmeaning that \u03b60 = \u03be1\nthe sequence of observations yt\u2212n, ..., yt up to instant t, a task usually referred to as (pointwise) \ufb01l-\ntering in machine learning, statistics, and signal processing [5].\nThe traditional approach to this problem considers linear estimators, or linear \ufb01lters, which write as\n\n0 with i.i.d. \u03be1\n\n0 + \u0131\u03be2\n\n0, \u03be2\n\nn(cid:88)\n\n(cid:98)xt =\n\n\u03c6\u03c4 yt\u2212\u03c4 ,\n\n0 \u2264 t \u2264 n.\n\n\u03c4 =0\n\nLinear estimators have been thoroughly studied in various forms, they are both theoretically attrac-\ntive [7, 3, 2, 16, 17, 11, 13] and easy to use in practice. If the set X of signals is well-speci\ufb01ed, one\ncan usually compute a (nearly) minimax on X linear estimator in a closed form. In particular, if X\nis a class of smooth signals, such as a H\u00a8older or a Sobolev ball, then the corresponding estimator is\ngiven by the kernel estimator with the properly set bandwidth parameter [16] and is minimax among\nall possible estimators. Moreover, as shown by [6, 2], if only X is convex, compact, and centrally\nsymmetric, the risk of the best linear estimator of xt is within a small constant factor of the minimax\nrisk over X . Besides, if the set X can be speci\ufb01ed in a computationally tractable way, which clearly\nis still a weaker assumption than classical smoothness assumptions, the best linear estimator can be\nef\ufb01ciently computed by solving a convex optimization problem on X . In other words, given a com-\nputationally tractable set X on the input, one can compute a nearly-minimax linear estimator and\nthe corresponding (nearly-minimax) risk over X . The strength of this approach, however, comes at\n\u2217LJK, University of Grenoble Alpes, 700 Avenue Centrale, 38401 Domaine Universitaire de Saint-Martin-\n\u2020University of Washington, Seattle, WA 98195, USA.\n\u2021Georgia Institute of Technology, Atlanta, GA 30332, USA.\n\nd\u2019H`eres, France.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fa price: the set X still must be speci\ufb01ed beforehand. Therefore, when one faces a recovery problem\nwithout any prior knowledge of X , this approach cannot be implemented.\nWe adopt here a novel approach to \ufb01ltering, which we refer to as structure-blind recovery. While\nwe do not require X to be speci\ufb01ed beforehand, we assume that there exists a linear oracle \u2013 a well-\nperforming linear estimator of xt. Previous works [8, 10, 4], following a similar philosophy, proved\nthat one can ef\ufb01ciently adapt to the linear oracle \ufb01lter of length m = O(n) if the corresponding \ufb01lter\n\u03c6 is time-invariant, i.e. it recovers the target signal uniformly well in the O(n)-sized neighbourhood\nm for a moderate \u03c1 \u2265 1. The adaptive estimator\nof t, and if its (cid:96)2-norm is small \u2013 bounded by \u03c1/\nis computed by minimizing the (cid:96)\u221e-norm of the \ufb01lter discrepancy, in the Fourier domain, under the\n\u221a\nconstraint on the (cid:96)1-norm of the \ufb01lter in the Fourier domain. Put in contrast to the oracle linear \ufb01lter,\nthe price for adaptation is proved to be O(\u03c13\nWe make the following contributions:\n\nln n), with the lower bound of O(\u03c1\n\nln n) [8, 4].\n\n\u221a\n\n\u221a\n\nconstrained or penalized by the (cid:96)1-norm of the \ufb01lter in the Fourier domain;\n\n\u2022 we propose a new family of recovery methods, obtained by solving a least-squares problem\n\u2022 we prove exact oracle inequalities for the (cid:96)2-risk of these methods;\n\u2022 we show that the price for adaptation improves upon previous works [8, 4] to O(\u03c12\n\nln n)\n\n\u221a\n\n\u221a\nfor the point-wise risk and to O(\u03c1\n\n\u2022 we present numerical experiments that show the potential of the approach on synthetic and\n\nln n) for the (cid:96)2-risk.\n\nreal-world images and signals.\n\nBefore presenting the theoretical results, let us introduce the notation we use throughout the paper.\nFilters Let C(Z) be the linear space of all two-sided complex-valued sequences x = {xt \u2208 C}t\u2208Z.\nFor k, k(cid:48) \u2208 Z we consider \ufb01nite-dimensional subspaces\n\nC(Zk(cid:48)\n\nk ) = {x \u2208 C(Z) :\n\nxt = 0,\n\nt /\u2208 [k, k(cid:48)]} .\n\nIt is convenient to identify m-dimensional complex vectors, m = k(cid:48) \u2212 k + 1, with elements of\nC(Zk(cid:48)\n\nk ) by means of the notation:\n\nk := [xk; ...; xk(cid:48)] \u2208 Ck(cid:48)\u2212k+1.\nxk(cid:48)\nk ) \u2192 C(Zj(cid:48)\nWe associate to linear mappings C(Zk(cid:48)\n(cid:88)\nentries. The convolution u \u2217 v of two sequences u, v \u2208 C(Z) is a sequence with elements\n\nj ) (j(cid:48)\u2212j +1)\u00d7(k(cid:48)\u2212k +1) matrices with complex\n\n[u \u2217 v]t =\n\nu\u03c4 vt\u2212\u03c4 ,\n\nt \u2208 Z.\n\nGiven observations (1) and \u03d5 \u2208 C(Zm\n\ufb01lter \u03d5:\n\n((cid:98)xt is merely a kernel estimate of xt by a kernel \u03d5 supported on [0, ..., m]).\n\n\u03c4\u2208Z\n\n(cid:98)xt = [\u03d5 \u2217 y]t\n\n0 ) consider the (left) linear estimation of x associated with\n\nDiscrete Fourier transform We de\ufb01ne the unitary Discrete Fourier transform (DFT) operator\nFn : Cn+1 \u2192 Cn+1 by\n\nz (cid:55)\u2192 Fnz,\n\n[Fnz]k = (n + 1)\u22121/2\n\nzt e\n\n2\u03c0\u0131kt\nn+1 ,\n\n0 \u2264 k \u2264 n.\n\nn(cid:88)\n\nt=0\n\nThe inverse Discrete Fourier transform (iDFT) operator F \u22121\nstands for Hermitian adjoint of A). By the Fourier inversion theorem, F \u22121\n\nWe denote (cid:107) \u00b7 (cid:107)p usual (cid:96)p-norms on C(Z): (cid:107)x(cid:107)p = ((cid:80)\n\nn\n\nn\n\nis given by F \u22121\n\n:= F H\nn (Fn z) = z.\n\nn (here AH\nt\u2208Z |xt|p)1/p, p \u2208 [1,\u221e]. Usually, the\nk ); we reserve the special notation\n\nargument will be \ufb01nite-dimensional \u2013 an element of C(Zk(cid:48)\n0(cid:107)p.\n\n(cid:107)x(cid:107)n,p := (cid:107)xn\n\n2\n\n\fFurthermore, DFT allows to equip C(Zn\n0 ) with the norms associated with (cid:96)p-norms in the spectral\ndomain:\nn,p := (cid:107)xn\n0(cid:107)\u2217\n\np := (cid:107)Fnxn\n\np \u2208 [1,\u221e];\n\n(cid:107)x(cid:107)\u2217\n\n0(cid:107)p,\n\nnote that unitarity of the DFT implies the Parseval identity: (cid:107)x(cid:107)n,2 = (cid:107)x(cid:107)\u2217\nFinally, c, C, and C(cid:48) stand for generic absolute constants.\n\nn,2.\n\n2 Oracle inequality for constrained recovery\n\nGiven observations (1) and \u0001 > 0, we \ufb01rst consider the constrained recovery(cid:98)xcon given by\nwhere (cid:98)\u03d5 is an optimal solution of the constrained optimization problem\nn + 1(cid:9).\n\n[(cid:98)xcon]t = [(cid:98)\u03d5 \u2217 y]t,\n(cid:8)(cid:107)y \u2212 \u03d5 \u2217 y(cid:107)n,2 : (cid:107)\u03d5(cid:107)\u2217\n\nt = 0, ..., n,\n\nn,1 \u2264 \u0001/\n\n\u221a\n\nmin\n\u03d5\u2208C(Zn\n0 )\n\n(2)\n\nn,1 = (cid:107)Fn\u03d5n\n\nThe constrained recovery estimator minimizes a least-squares \ufb01t criterion under a constraint on\n(cid:107)\u03d5(cid:107)\u2217\n0(cid:107)1, that is an (cid:96)1 constraint on the discrete Fourier transform of the \ufb01lter. While the\nleast-squares objective naturally follows from the Gaussian noise assumption, the constraint can be\nmotivated as follows.\n\nSmall-error linear \ufb01lters Linear \ufb01lter \u03d5o with a small (cid:96)1 norm in the spectral domain and small\nrecovery error exists, essentially, whenever there exists a linear \ufb01lter with small recovery error [8, 4].\nIndeed, let us say that x \u2208 C(Zn\n0 ) is simple [4] with parameters m \u2208 Z+ and \u03c1 \u2265 1 if there exists\n\u03c6o \u2208 C(Zm\n\n0 ) such that for all \u2212m \u2264 \u03c4 \u2264 2m,\n\n(cid:2)E(cid:8)|x\u03c4 \u2212 [\u03c6o \u2217 y]\u03c4|2(cid:9)(cid:3)1/2 \u2264 \u03c3\u03c1\u221a\n\n.\n\n(3)\n\nm + 1\n\nIn other words, x is (m, \u03c1)-simple if there exists a hypothetical \ufb01lter \u03c6o of the length at most m + 1\nm+1 in the interval \u2212m \u2264 \u03c4 \u2264 2m.\nwhich recovers x\u03c4 with squared risk uniformly bounded by \u03c32\u03c12\nNote that (3) clearly implies that (cid:107)\u03c6o(cid:107)2 \u2264 \u03c1/\n\u2200\u03c4, \u2212m \u2264 \u03c4 \u2264 2m. Now, let n = 2m, and let\n\n\u221a\nm + 1, and that |[x \u2212 \u03c6o \u2217 x]\u03c4| \u2264 \u03c3\u03c1/\n\nm + 1\n\n\u221a\n\n\u03d5o = \u03c6o \u2217 \u03c6o \u2208 Cn+1.\n\nAs proved in [15, Appendix C], we have\n\nand, for a moderate absolute constant c,\n\n(cid:107)\u03d5o(cid:107)\u2217\n\n\u221a\nn,1 \u2264 2\u03c12/\n\nn + 1,\n\n(cid:107)x \u2212 \u03d5o \u2217 y(cid:107)n,2 \u2264 c\u03c3\u03c12(cid:112)1 + ln[1/\u03b1]\n\n(4)\n\n(5)\nwith probability 1\u2212\u03b1. To summarize, if x is (m, \u03c1)-simple, i.e., when there exists a \ufb01lter \u03c6o of length\n\u2264 m + 1 which recovers x with small risk on the interval [\u2212m, 2m], then the \ufb01lter \u03d5o = \u03c6o \u2217 \u03c6o\nof the length at most n + 1, with n = 2m, has small norm (cid:107)\u03d5o(cid:107)\u2217\nn,1 and recovers the signal x with\n(essentially the same) small risk on the interval [0, n].\n\nHidden structure The constrained recovery estimator is completely blind to a possible hidden\nstructure of the signal, yet can seamlessly adapt to it when such a structure exists, in a way that\nwe can rigorously establish. Using the right-shift operator on C(Z), [\u2206x]t = xt\u22121, we formalize\nthe hidden structure as an unknown shift-invariant linear subspace of C(Z), \u2206S = S, of a small\ndimension s. We do not assume that x belongs to that subspace. Instead, we make a more general\nassumption that x is close to this subspace, that is, it may be decomposed into a sum of a component\nthat lies in the subspace and a component whose norm we can control.\n\n3\n\n\fAssumption A We suppose that x admits the decomposition\nxS \u2208 S,\n\nx = xS + \u03b5,\n\nwhere S is an (unknown) shift-invariant, \u2206S = S, subspace of C(Z) of dimension s, 1 \u2264 s \u2264 n+1,\nand \u03b5 is \u201csmall\u201d, namely,\n\n(cid:107)\u2206\u03c4 \u03b5(cid:107)n,2 \u2264 \u03c3\u03ba,\n\n0 \u2264 \u03c4 \u2264 n.\n\n(cid:34) s(cid:88)\n\n(cid:35)\n\nShift-invariant subspaces of C(Z) are exactly the sets of solutions of homogeneous linear difference\nequations with polynomial operators. This is summarized by the following lemma (we believe it is\na known fact; for completeness we provide a proof in [15, Appendix C]).\nLemma 2.1. Solution set of a homogeneous difference equation with a polynomial operator p(\u2206),\n\nt \u2208 Z,\n\n\u03c4 =0\n\n= 0,\n\np\u03c4 xt\u2212\u03c4\n\n[p(\u2206)x]t =\n\n(6)\nwith deg(p(\u00b7)) = s, p(0) = 1, is a shift-invariant subspace of C(Z) of dimension s. Conversely,\nany shift-invariant subspace S \u2282 C(Z), \u2206S \u2286 S, dim(S) = s < \u221e, is the set of solutions of some\nhomogeneous difference equation (6) with deg(p(\u00b7)) = s, p(0) = 1. Moreover, such p(\u00b7) is unique.\nOn the other hand, for any polynomial p(\u00b7), solutions of (6) are exponential polynomials [?\n] with frequencies determined by the roots of p(\u00b7). For instance, discrete-time polynomials\nk=0 cktk, t \u2208 Z of degree s \u2212 1 (that is, exponential polynomials with all zero fre-\nquencies) form a linear space of dimension s of solutions of the equation (6) with a polynomial\n\nxt = (cid:80)s\u22121\np(\u2206) = (1 \u2212 \u2206)s with a unique root of multiplicity s, having coef\ufb01cients pk = (\u22121)k(cid:0)s\nfunctions sampled over the regular grid [10]. Sum of harmonic oscillations xt = (cid:80)s\n\u03c9k \u2208 [0, 2\u03c0) being all different, is another example; here, p(\u2206) =(cid:81)s\n\nrally, signals which are close, in the (cid:96)2 distance, to discrete-time polynomials are Sobolev-smooth\nk=1 cke\u0131\u03c9kt,\n\n(cid:1). Natu-\n\nk=1(1 \u2212 e\u0131\u03c9k \u2206).\n\nk\n\nWe can now state an oracle inequality for the constrained recovery estimator; see [15, Appendix B].\nTheorem 2.1. Let \u0001 \u2265 1, and let \u03d5o \u2208 C(Zn\n(cid:107)\u03d5o(cid:107)\u2217\n\n0 ) be such that\nn,1 \u2264 \u0001/\n(cid:113)\ns + \u0001(cid:0)\u03ba(cid:112)ln [1/\u03b1] + ln [n/\u03b1](cid:1).\n(cid:107)x \u2212(cid:98)xcon(cid:107)n,2 \u2264 (cid:107)x \u2212 \u03d5o \u2217 y(cid:107)n,2 + C\u03c3\n\nSuppose that Assumption A holds for some s \u2208 Z+ and \u03ba < \u221e. Then for any \u03b1, 0 < \u03b1 \u2264 1, it\nholds with probability at least 1 \u2212 \u03b1:\n\nn + 1.\n\n\u221a\n\n(7)\n\nWhen considering simple signals, Theorem 2.1 gives the following.\nCorollary 2.1. Assume that signal x is (m, \u03c1)-simple, \u03c1 \u2265 1 and m \u2208 Z+. Let n = 2m, \u0001 \u2265 2\u03c12,\nand let Assumption A hold for some s \u2208 Z+ and \u03ba < \u221e. Then for any \u03b1, 0 < \u03b1 \u2264 1, it holds with\nprobability at least 1 \u2212 \u03b1:\n\n(cid:107)x \u2212(cid:98)xcon(cid:107)n,2 \u2264 C\u03c3\u03c12(cid:112)ln[1/\u03b1] + C(cid:48)\u03c3\n\n(cid:113)\n\ns + \u0001(cid:0)\u03ba(cid:112)ln [1/\u03b1] + ln [n/\u03b1](cid:1).\n\n\u221a\n\nAdaptation and price The price for adaptation in Theorem 2.1 and Corollary 2.1 is determined\nby three parameters: the bound on the \ufb01lter norm \u0001, the deterministic error \u03ba, and the subspace\ndimension s. Assuming that the signal to recover is simple, and that \u0001 = 2\u03c12, let us compare the\nmagnitude of the oracle error to the term of the risk which re\ufb02ects \u201cprice of adaptation\u201d. Typically (in\nfact, in all known to us cases of recovery of signals from a shift-invariant subspace), the parameter\n\u03c1 is at least\n\u03b3 for\nthe term (cid:107)x \u2212 \u03d5o \u2217 y(cid:107)n,2 (we denote \u03b3 = ln(1/\u03b1)). As a result, for instance, in the \u201cparametric\nsituation\u201d, when the signal belongs or is very close to the subspace, that is when \u03ba = O(ln(n)),\n\n\u03b3 ln n)]1/2(cid:1) is much smaller than the bound on the oracle\n\nthe price of adaptation O(cid:0)\u03c3[s + \u03c12(\u03b3 +\n\ns. Therefore, the bound (5) implies the \u201ctypical bound\u201d O(\u03c3\n\n\u221a\n\u03b3\u03c12) = \u03c3s\n\nerror. In the \u201cnonparametric situation\u201d, when \u03ba = O(\u03c12), the price of adaptation has the same order\nof magnitude as the oracle error.\nFinally, note that under the premise of Corollary 2.1 we can also bound the pointwise error. We state\nthe result for \u0001 = 2\u03c12 for simplicity; the proof can be found in [15, Appendix B].\n\n\u221a\n\n\u221a\n\n4\n\n\fTheorem 2.2. Assume that signal x is (m, \u03c1)-simple, \u03c1 \u2265 1 and m \u2208 Z+. Let n = 2m, \u0001 = 2\u03c12,\nand let Assumption A hold for some s \u2208 Z+ and \u03ba < \u221e. Then for any \u03b1, 0 < \u03b1 \u2264 1, the\n\nconstrained recovery(cid:98)xcon satis\ufb01es\n|xn \u2212 [(cid:98)xcon]n| \u2264 C\n\n\u03c3\u03c1\u221a\nm + 1\n\n(cid:20)\n\u03c12(cid:112)ln[n/\u03b1] + \u03c1\n\n(cid:113)\n\u03ba(cid:112)ln [1/\u03b1] +\n\n(cid:21)\n\n\u221a\n\ns\n\n.\n\n3 Oracle inequality for penalized recovery\n\nTo use the constrained recovery estimator with a provable guarantee, see e.g. Theorem 2.1, one must\nknow the norm of a small-error linear \ufb01lter \u0001, or at least have an upper bound on it. However, if this\nparameter is unknown, but instead the noise variance is known (or can be estimated from data), we\ncan build a more practical estimator that still enjoys an oracle inequality.\n\nThe penalized recovery estimator [(cid:98)xpen]t = [(cid:98)\u03d5 \u2217 y]t is an optimal solution to a regularized least-\n\nsquares minimization problem, where the regularization penalizes the (cid:96)1-norm of the \ufb01lter in the\nFourier domain:\n\n\u221a\nn,2 + \u03bb\n\nn + 1(cid:107)\u03d5(cid:107)\u2217\n\nn,1\n\n(cid:9) .\n\n(cid:8)(cid:107)y \u2212 \u03d5 \u2217 y(cid:107)2\n\n(cid:98)\u03d5 \u2208 Argmin\n\n\u03d5\u2208C(Zn\n0 )\n\n(8)\n\n\u221a\n\nn,1 \u2264 \u0001/\n\nSimilarly to Theorem 2.1, we establish an oracle inequality for the penalized recovery estimator.\nTheorem 3.1. Let Assumption A hold for some s \u2208 Z+ and \u03ba < \u221e, and let \u03d5o \u2208 C(Zn\n(cid:107)\u03d5o(cid:107)\u2217\n\nn + 1 for some \u0001 \u2265 1.\n\n1o. Suppose that the regularization parameter of penalized recovery(cid:98)xpen satis\ufb01es \u03bb \u2265 \u03bb,\n(cid:113)\ns + ((cid:98)\u0001 + 1)\u03ba(cid:112)ln[1/\u03b1],\n\n\u03bb := 60\u03c32 ln[63n/\u03b1].\nThen, for 0 < \u03b1 \u2264 1, it holds with probability at least 1 \u2212 \u03b1:\n\n0 ) satisfy\n\nwhere(cid:98)\u0001 :=\n\n2o. Moreover, if \u03ba \u2264 \u00af\u03ba,\n\n\u221a\n\n(cid:107)x \u2212(cid:98)xpen(cid:107)n,2 \u2264 (cid:107)x \u2212 \u03d5o \u2217 y(cid:107)n,2 + C(cid:112)\u0001\u03bb + C(cid:48)\u03c3\nn + 1(cid:107)(cid:98)\u03d5(cid:107)\u2217\n(cid:112)ln [16/\u03b1]\n\n10 ln[42n/\u03b1]\n\n\u00af\u03ba :=\n\nn,1.\n\n,\n\nand \u03bb \u2265 2\u03bb, one has\n\n(cid:107)x \u2212(cid:98)xpen(cid:107)n,2 \u2264 (cid:107)x \u2212 \u03d5o \u2217 y(cid:107)n,2 + C(cid:112)\u0001\u03bb + C(cid:48)\u03c3\n\n\u221a\n\ns.\n\nThe proof closely follows that of Theorem 2.1 and can also be found in [15, Appendix B].\n\n4 Discussion\n\nThere is some redundancy between \u201csimplicity\u201d of a signal, as de\ufb01ned by (3), and Assumption\nA. Usually a simple signal or image x is also close to a low-dimensional subspace of C(Z) (see,\ne.g., [10, section 4]), so that Assumption A holds \u201cautomatically\u201d. Likewise, x is \u201calmost\u201d simple\nwhen it is close to a low-dimensional time-invariant subspace. Indeed, if x \u2208 C(Z) belongs to S,\ni.e. Assumption A holds with \u03ba = 0, one can easily verify that for n \u2265 s there exists a \ufb01lter\n\u03c6o \u2208 C(Zn\u2212n) such that\n\n(cid:107)\u03c6o(cid:107)2 \u2264(cid:112)s/(n + 1), and x\u03c4 = [\u03c6o \u2217 x]\u03c4 , \u03c4 \u2208 Z .\n\n(9)\nSee [15, Appendix C] for the proof. This implies that x can be recovered ef\ufb01ciently from observa-\ntions (1):\n\n(cid:2)E(cid:8)|x\u03c4 \u2212 [\u03c6o \u2217 y]\u03c4|2(cid:9)(cid:3)1/2 \u2264 \u03c3\n\n(cid:114) s\n\n.\n\nn + 1\n\nIn other words, if instead of the \ufb01ltering problem we were interested in the interpolation problem of\nrecovering xt given 2n + 1 observations yt\u2212n, ..., yt+n on the left and on the right of t, Assumption\n\n5\n\n\fA would imply a kind of simplicity of x. On the other hand, it is clear that Assumption A is not\nsuf\ufb01cient to imply the simplicity of x \u201cwith respect to the \ufb01ltering\u201d, in the sense of the de\ufb01nition\nwe use in this paper, when we are allowed to use only observations on the left of t to compute the\nestimation of xt. Indeed, one can see, for instance, that already signals from the parametric family\nX\u03b1 = {x \u2208 C(Z) : x\u03c4 = c\u03b1\u03c4 , c \u2208 C}, with a given |\u03b1| > 1, which form a one-dimensional\nspace of solutions of the equation x\u03c4 = \u03b1x\u03c4\u22121, cannot be estimated with small risk at t using only\nobservations on the left of t (unless c = 0), and thus are not simple in the sense of (3).\nOf course, in the above example, the \u201cdif\ufb01culty\u201d of the family X\u03b1 is due to instability of solutions\nof the difference equation which explode when \u03c4 \u2192 +\u221e. Note that signals x \u2208 X\u03b1 with |\u03b1| \u2264 1\n(linear functions, oscillations, or damped oscillations) are simple. More generally, suppose that x\nsatis\ufb01es a difference equation of degree s:\n\n0 = p(\u2206)x\u03c4\n\n=\n\npix\u03c4\u2212i\n\n,\n\n(10)\n\n(cid:35)\n\n(cid:34)\n\ns(cid:88)\n\ni=0\n\nwhere p(z) =(cid:80)s\n\ni=0 pizi is the corresponding characteristic polynomial and \u2206 is the right shift op-\nerator. When p(z) is unstable \u2013 has roots inside the unit circle \u2013 (depending on \u201cinitial conditions\u201d)\nthe set of solutions to the equation (10) contains dif\ufb01cult to \ufb01lter signals. Observe that stability of\nsolutions is related to the direction of the time axis; when the characteristic polynomial p(z) has\nroots outside the unit circle, the corresponding solutions may be \u201cleft unstable\u201d \u2013 increase exponen-\ntially when \u03c4 \u2192 \u2212\u221e. In this case \u201cright \ufb01ltering\u201d \u2013 estimating x\u03c4 using observations on the right\nof \u03c4 \u2013 will be dif\ufb01cult. A special situation where interpolation and \ufb01ltering is always simple arises\nwhen the characteristic polynomial of the difference equation has all its roots on the unit circle. In\nthis case, solutions to (10) are \u201cgeneralized harmonic oscillations\u201d (harmonic oscillations modulated\nby polynomials), and such signals are known to be simple. Theorem 4.1 summarizes the properties\nof the solutions of (10) in this particular case; see [15, Appendix C] for the proof.\nTheorem 4.1. Let s be a positive integer, and let p = [p0; ...; ps] \u2208 Cs+1 be such that the polynomial\n\ni=0 pizi has all its roots on the unit circle. Then for every integer m satisfying\n\np(z) =(cid:80)s\n\nm \u2265 m(s) := Cs2 ln(s + 1),\n\none can point out q \u2208 Cm+1 such that any solution to (10) satis\ufb01es\n\nx\u03c4 = [q \u2217 x]\u03c4 , \u2200\u03c4 \u2208 Z,\n\nand\n\n\u221a\n(cid:107)q(cid:107)2 \u2264 \u03c1(s, m)/\n\nm where \u03c1(s, m) = C(cid:48) min\n\n(cid:110)\n\ns3/2\n\nln s, s(cid:112)ln[ms]\n\n(cid:111)\n\n.\n\n\u221a\n\n(11)\n\n5 Numerical experiments\n\nWe present preliminary results on simulated data of the proposed adaptive signal recovery meth-\nods in several application scenarios. We compare the performance of the penalized (cid:96)2-recovery of\nSec. 3 to that of the Lasso recovery of [1] in signal and image denoising problems. Implementation\ndetails for the penalized (cid:96)2-recovery are given in Sec. 6. Discussion of the discretization approach\nunderlying the competing Lasso method can be found in [1, Sec. 3.6].\nWe follow the same methodology in both signal and image denoising experiments. For each level of\nthe signal-to-noise ratio SNR \u2208 {1, 2, 4, 8, 16}, we perform N Monte-Carlo trials. In each trial,\nwe generate a random signal x on a regular grid with n points, corrupted by the i.i.d. Gaussian noise\nof variance \u03c32. The signal is normalized: (cid:107)x(cid:107)2 = 1 so SNR\u22121 = \u03c3\nn. We set the regularization\npenalty in each method as follows. For penalized (cid:96)2-recovery (8), we use \u03bb = 2\u03c32 log[63n/\u03b1] with\n\u03b1 = 0.1. For Lasso [1], we use the common setting \u03bb = \u03c3\n2 log n. We report experimental results\nsignal-to-noise ratio SNR\u22121.\n\nby plotting the (cid:96)2-error (cid:107)(cid:98)x \u2212 x(cid:107)2, averaged over N Monte-Carlo trials, versus the inverse of the\n\n\u221a\n\n\u221a\n\nSignal denoising We consider denoising of a one-dimensional signal in two different scenarios,\n\ufb01xing N = 100 and n = 100. In the RandomSpikes scenario, the signal is a sum of 4 harmonic\noscillations, each characterized by a spike of a random amplitude at a random position in the con-\ntinuous frequency domain [0, 2\u03c0]. In the CoherentSpikes scenario, the same number of spikes is\n\n6\n\n\fFigure 1: Signal and image denoising in different scenarios, left to right: RandomSpikes, Coher-\nentSpikes, RandomSpikes-2D, and CoherentSpikes-2D. The steep parts of the curves on high noise\nlevels correspond to observations being thresholded to zero.\n\nsampled by pairs. Spikes in each pair have the same amplitude and are separated by only 0.1 of\nthe DFT bin 2\u03c0/n which could make recovery harder due to high signal coherency. However, in\npractice we found RandomSpikes to be slightly harder than CoherentSpikes for both methods, see\nFig. 1. As Fig. 1 shows, the proposed penalized (cid:96)2-recovery outperforms the Lasso method for all\nnoise levels. The performance gain is particularly signi\ufb01cant for high signal-to-noise ratios.\n\nImage Denoising We now consider recovery of an unknown regression function f on the regular\ngrid on [0, 1]2 given the noisy observations:\n\n\u03c4 \u2208 {0, 1, ..., m \u2212 1}2 ,\n\ny\u03c4 = x\u03c4 + \u03c3\u03b6\u03c4 ,\n\n(12)\nwhere x\u03c4 = f (\u03c4 /m). We \ufb01x N = 40, and the grid dimension m = 40; the number of samples\nis then n = m2. For the penalized (cid:96)2-recovery, we implement the blockwise denoising strategy\n(see Appendix for the implementation details) with just one block for the entire image. We present\nadditional numerical illustrations in the supplementary material.\nWe study three different scenarios for generating the ground-truth signal in this experiment. The\n\ufb01rst two scenarios, RandomSpikes-2D and CoherentSpikes-2D, are two-dimensional counterparts of\nthose studied in the signal denoising experiment: the ground-truth signal is a sum of 4 harmonic\noscillations in R2 with random frequencies and amplitudes. The separation in the CoherentSpikes-\n2D scenario is 0.2\u03c0/m in each dimension of the torus [0, 2\u03c0]2. The results for these scenarios are\nshown in Fig. 1. Again, the proposed penalized (cid:96)2-recovery outperforms the Lasso method for all\nnoise levels, especially for high signal-to-noise ratios.\nIn scenario DimensionReduction-2D we investigate the problem of estimating a function with a\nhidden low-dimensional structure. We consider the single-index model of the regression function:\n\ng(\u00b7) \u2208 S 1\n\n\u03b2(1).\n\nf (t) = g(\u03b8T t),\n\n(13)\nHere, S 1\n\u03b2(1) = {g : R \u2192 R,(cid:107)g(\u03b2)(\u00b7)(cid:107)2 \u2264 1} is the Sobolev ball of smooth periodic functions on\n[0, 1], and the unknown structure is formalized as the direction \u03b8. In our experiments we sample\nthe direction \u03b8 uniformly at random and consider different values of the smoothness index \u03b2. If\nit is known a priori that the regression function possesses the structure (13), and only the index is\nunknown, one can use estimators attaining \u201done-dimensional\u201d rates of recovery; see e.g. [12] and\nreferences therein. In contrast, our recovery algorithms are not aware of the underlying structure but\nmight still adapt to it.\nAs shown in Fig. 2, the (cid:96)2-recovery performs well in this scenario despite the fact that the available\ntheoretical bounds are pessimistic. For example, the signal (13) with a smooth g can be approxi-\nmated by a small number of harmonic oscillations in R2. As follows from the proof of [9, Proposi-\ntion 10] combined with Theorem 4.1, for a sum of k harmonic oscillations in Rd one can point out a\nreproducing linear \ufb01lter with \u0001(k) = O(k2d) (neglecting the logarithmic factors), i.e. the theoretical\nguarantee is quite conservative for small values of \u03b2.\n\n6 Details of algorithm implementation\n\nHere we give a brief account of some techniques and implementation tricks exploited in our codes.\n\nSolving the optimization problems Note that the optimization problems (2) and (8) underlying\nthe proposed recovery algorithms are well structured Second-Order Conic Programs (SOCP) and\n\n7\n\n<pn0.060.120.250.5124`2-error0.0250.050.10.250.51Lasso[1]Pen.`2-rec.<pn0.060.120.250.51240.0250.050.10.250.51Lasso[1]Pen.`2-rec.<pn0.060.120.250.51240.0050.010.0250.050.10.250.51Lasso[1]Pen.`2-rec.<pn0.060.120.250.51240.0050.010.0250.050.10.250.51Lasso[1]Pen.`2-rec.\f\u03b2 = 2\n\n\u03b2 = 1\n\n\u03b2 = 0.5\n\nFigure 2: Image denoising in DimensionReduction scenario; smoothness decreases from left to right.\n\ncan be solved using Interior-point methods (IPM). However, the computational complexity of IPM\napplied to SOCP with dense matrices grows rapidly with problem dimension, so that large problems\nof this type arising in signal and image processing are well beyond the reach of these techniques. On\nthe other hand, these problems possess nice geometry associated with complex (cid:96)1-norm. Moreover,\ntheir \ufb01rst-order information \u2013 the value of objective and its gradient at a given \u03d5 \u2013 can be computed\nusing Fast Fourier Transform in time which is almost linear in problem size. Therefore, we used \ufb01rst-\norder optimization algorithms, such as Mirror-Prox and Nesterov\u2019s accelerated gradient algorithms\n(see [14] and references therein) in our recovery implementation. A complete description of the\napplication of these optimization algorithms to our problem is beyond the scope of the paper; we\nshall present it elsewhere.\n\nInterpolating recovery In Sec. 2-3 we considered only recoveries which estimated the value xt\nof the signal via the observations at n + 1 points t \u2212 n, ..., t \u201con the left\u201d (\ufb01ltering problem). To\nrecover the whole signal, one may consider a more \ufb02exible alternative \u2013 interpolating recovery \u2013\nwhich estimates xt using observations on the left and on the right of t. In particular, if the objective\nis to recover a signal on the interval {\u2212n, ..., n}, one can apply interpolating recoveries which use\nthe same observations y\u2212n, ..., yn to estimate x\u03c4 at any \u03c4 \u2208 {\u2212n, ..., n}, by altering the relative\nposition of the \ufb01lter and the current point.\n\nBlockwise recovery\nIdeally, when using pointwise recovery, a speci\ufb01c \ufb01lter is constructed for\neach time instant t. This may pose a tremendous amount of computation, for instance, when recov-\nering a high-resolution image. Alternatively, one may split the signal into blocks, and process the\npoints of each block using the same \ufb01lter (cf. e.g. Theorem 2.1). For instance, a one-dimensional\nsignal can be divided into blocks of length, say, 2m + 1, and to recover x \u2208 C(Zm\u2212m) in each\nblock one may \ufb01t one \ufb01lter of length m + 1 recovering the right \u201chalf-block\u201d xm\n0 and another \ufb01lter\nrecovering the left \u201chalf-block\u201d x\u22121\u2212m.\n\n7 Conclusion\n\nWe introduced a new family of estimators for structure-blind signal recovery that can be computed\nusing convex optimization. The proposed estimators enjoy oracle inequalities for the (cid:96)2-risk and for\nthe pointwise risk. Extensive theoretical discussions and numerical experiments will be presented\nin the follow-up journal paper.\n\nAcknowledgments\n\nWe would like to thank Arnak Dalalyan and Gabriel Peyr\u00b4e for fruitful discussions. DO, AJ, ZH were\nsupported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025) and the project Titan (CNRS-\nMastodons). ZH was also supported by the project Macaron (ANR-14-CE23-0003-01), the MSR-\nInria joint centre, and the program \u201cLearning in Machines and Brains\u201d (CIFAR). Research of AN\nwas supported by NSF grants CMMI-1262063, CCF-1523768.\n\n8\n\n<pn0.060.120.250.5124`2-error0.0250.050.10.250.51Lasso[1]Pen.`2-rec.<pn0.060.120.250.51240.0250.050.10.250.51Lasso[1]Pen.`2-rec.<pn0.060.120.250.51240.0250.050.10.250.51Lasso[1]Pen.`2-rec.\fReferences\n[1] B. N. Bhaskar, G. Tang, and B. Recht. Atomic norm denoising with applications to line spectral\n\nestimation. IEEE Trans. Signal Processing, 61(23):5987\u20135999, 2013.\n\n[2] D. L. Donoho. Statistical estimation and optimal recovery. Ann. Statist., 22(1):238\u2013270, 03\n\n1994.\n\n[3] D. L. Donoho and M. G. Low. Renormalization exponents and optimal pointwise rates of\n\nconvergence. Ann. Statist., 20(2):944\u2013970, 06 1992.\n\n[4] Z. Harchaoui, A. Juditsky, A. Nemirovski, and D. Ostrovsky. Adaptive recovery of signals by\nconvex optimization. In Proceedings of The 28th Conference on Learning Theory, COLT 2015,\nParis, France, July 3-6, 2015, pages 929\u2013955, 2015.\n\n[5] S. Haykin. Adaptive \ufb01lter theory. Prentice Hall, 1991.\n\n[6] I. Ibragimov and R. Khasminskii. Nonparametric estimation of the value of a linear functional\n\nin Gaussian white noise. Theor. Probab. & Appl., 29(1):1\u201332, 1984.\n\n[7] I. Ibragimov and R. Khasminskii. Estimation of linear functionals in Gaussian noise. Theor.\n\nProbab. & Appl., 32(1):30\u201339, 1988.\n\n[8] A. Juditsky and A. Nemirovski. Nonparametric denoising of signals with unknown local struc-\n\nture, I: Oracle inequalities. Appl. & Comput. Harmon. Anal., 27(2):157\u2013179, 2009.\n\n[9] A. Juditsky and A. Nemirovski. Nonparametric estimation by convex programming. Ann.\n\nStatist., 37(5a):2278\u20132300, 2009.\n\n[10] A. Juditsky and A. Nemirovski. Nonparametric denoising signals of unknown local structure,\nII: Nonparametric function recovery. Appl. & Comput. Harmon. Anal., 29(3):354\u2013367, 2010.\n\n[11] T. Kailath, A. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000.\n\n[12] O. Lepski and N. Serdyukova. Adaptive estimation under single-index constraint in a regres-\n\nsion model. Ann. Statist., 42(1):1\u201328, 2014.\n\n[13] S. Mallat. A wavelet tour of signal processing. Academic Press, 1999.\n\n[14] Y. Nesterov and A. Nemirovski. On \ufb01rst-order algorithms for (cid:96)1/nuclear norm minimization.\n\nActa Num., 22:509\u2013575, 2013.\n\n[15] D. Ostrovsky, Z. Harchaoui, A. Juditsky, and A. Nemirovski. Structure-Blind Signal Recovery.\n\narXiv:1607.05712v2, Oct. 2016.\n\n[16] A. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2008.\n\n[17] L. Wasserman. All of Nonparametric Statistics. Springer, 2006.\n\n9\n\n\f", "award": [], "sourceid": 2446, "authors": [{"given_name": "Dmitry", "family_name": "Ostrovsky", "institution": "Univ. Grenoble Alpes"}, {"given_name": "Zaid", "family_name": "Harchaoui", "institution": "NYU"}, {"given_name": "Anatoli", "family_name": "Juditsky", "institution": "UJF"}, {"given_name": "Arkadi", "family_name": "Nemirovski", "institution": "Gerogia Institute of Technology"}]}