{"title": "A convex program for bilinear inversion of sparse vectors", "book": "Advances in Neural Information Processing Systems", "page_first": 8548, "page_last": 8558, "abstract": "We consider the bilinear inverse problem of recovering two vectors, x in R^L and w in R^L, from their entrywise product. We consider the case where x and w have known signs and are sparse with respect to known dictionaries of size K and N, respectively. Here, K and N may be larger than, smaller than, or equal to L. We introduce L1-BranchHull, which is a convex program posed in the natural parameter space and does not require an approximate solution or initialization in order to be stated or solved. We study the case where x and w are S1- and S2-sparse with respect to a random dictionary, with the sparse vectors satisfying an effective sparsity condition, and present a recovery guarantee that depends on the number of measurements as L > Omega(S1+S2)(log(K+N))^2. Numerical experiments verify that the scaling constant in the theorem is not too large. One application of this problem is the sweep distortion removal task in dielectric imaging, where one of the signals is a nonnegative reflectivity, and the other signal lives in a known subspace, for example that given by dominant wavelet coefficients. We also introduce a variants of L1-BranchHull for the purposes of tolerating noise and outliers, and for the purpose of recovering piecewise constant signals. We provide an ADMM implementation of these variants and show they can extract piecewise constant behavior from real images.", "full_text": "A convex program for bilinear inversion of sparse\n\nvectors\n\nAlireza Aghasi\n\nGeorgia State Business School\n\nGSU, GA\n\naaghasi@gsu.edu\n\nDept. of Electrical Engineering\n\nAli Ahmed\n\nITU, Lahore\n\nali.ahmed@itu.edu.pk\n\nDept. of Mathematics and College of Computer and Information Science\n\nPaul Hand\n\nNortheastern University, MA\np.hand@northeastern.edu\n\nDept. of Computational and Applied Mathematics\n\nBabhru Joshi\n\nRice University, TX\n\nbabhru.joshi@rice.edu\n\nAbstract\n\nWe consider the bilinear inverse problem of recovering two vectors, x 2 RL and\nw 2 RL, from their entrywise product. We consider the case where x and w have\nknown signs and are sparse with respect to known dictionaries of size K and N,\nrespectively. Here, K and N may be larger than, smaller than, or equal to L. We\nintroduce `1-BranchHull, which is a convex program posed in the natural parameter\nspace and does not require an approximate solution or initialization in order to\nbe stated or solved. We study the case where x and w are S1- and S2-sparse\nwith respect to a random dictionary, with the sparse vectors satisfying an effective\nsparsity condition, and present a recovery guarantee that depends on the number of\nmeasurements as L \u2326(S1 + S2) log2(K + N ). Numerical experiments verify\nthat the scaling constant in the theorem is not too large. One application of this\nproblem is the sweep distortion removal task in dielectric imaging, where one of the\nsignals is a nonnegative re\ufb02ectivity, and the other signal lives in a known subspace,\nfor example that given by dominant wavelet coef\ufb01cients. We also introduce a\nvariants of `1-BranchHull for the purposes of tolerating noise and outliers, and\nfor the purpose of recovering piecewise constant signals. We provide an ADMM\nimplementation of these variants and show they can extract piecewise constant\nbehavior from real images.\n\n1 Introduction\n\nWe study the problem of recovering two unknown signals x and w in RL from observations y =\nA(w, x), where A is a bilinear operator. Let B 2 RL\u21e5K and C 2 RL\u21e5N such that w = Bh and\nx = Cm with khk0 \uf8ff S1 and kmk0 \uf8ff S2. Let the bilinear operator A : RL \u21e5 RL ! RL satisfy\n(1)\nwhere denotes entrywise product. The bilinear inverse problem (BIP) we consider is to \ufb01nd w and\nx from y, B, C and sign (w), up to the inherent scaling ambiguity.\n\ny = A(w, x) = w x,\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fBIPs, in general, have many applications in signal processing and machine learning and include\nfundamental practical problems like phase retrieval (Fienup [1982], Cand\u00e8s and Li [2012], Cand\u00e8s\net al. [2013]), blind deconvolution (Ahmed et al. [2014], Stockham et al. [1975], Kundur and\nHatzinakos [1996], Aghasi et al. [2016a]), non-negative matrix factorization (Hoyer [2004], Lee and\nSeung [2001]), self-calibration (Ling and Strohmer [2015]), blind source separation (D. et al. [2005]),\ndictionary learning (Tosic and Frossard [2011]), etc. These problems are in general challenging and\nsuffer from identi\ufb01ability issues that make the solution set non-unique and non-convex. A common\nidenti\ufb01ability issue, also shared by the BIP in (1), is the scaling ambiguity. In particular, if (w\\, x\\)\nsolves a BIP, then so does (cw\\, c1x\\) for any nonzero c 2 R. In this paper, we resolve this scaling\nambiguity by \ufb01nding the point in the solution set closest to the origin with respect to the `1 norm.\n\nw`\n\n0\n\n`\n\n` w\n\nx\n\ny `\n\n=\n\nx`\n\nConvex Hull\n\nm1\n\nh2\n\nh1\n\n(a) Convex relaxation\n\n(b) Geometry of `1-BranchHull\n\nFigure 1: Panel (a) shows the convex hull of the relevant branch of a hyperbola given a measurement\ny` and the sign information sign(w`). Panel (b) shows the interaction between the `1-ball in the\nobjective of (3) with its feasibility set. The feasibility set is \u2018pointy\u2019 along a hyperbola, which\nallows for signal recovery where the `1 ball touches it. The gray hyperplane segments correspond\nto linearizations of the hyperbolic measurements, which is an important component of our recovery\nproof.\nAnother identi\ufb01ability issue of the BIP in (1) is if (w\\, x\\) solves (1), then so does (1, w\\ x\\),\nwhere 1 is the vector of ones. In prior works like Ahmed et al. [2014], which studies the blind\ndeconvolution problem and is a BIP in the Fourier Domain, the identi\ufb01ability issue is resolved by\nassuming the signals live in a known subspace. In comparison to Ahmed et al. [2014], we resolve the\nidenti\ufb01ability issue with a much weaker structural assumption of sparsity in known bases at the cost\nof known signs; justi\ufb01ed in actual applications, especially, in imaging. Natural choices for such bases\ninclude the standard basis, the Discrete Cosine Transform (DCT) basis, and a wavelet basis.\nRecent work on sparse rank-1 matrix recovery problem in Lee et al. [2017], which is motivated by con-\nsidering the lifted version of the sparse blind deconvolution problem, provides an exact recovery guar-\nantee of the sparse vectors h and m that satisfy a \"peakiness\" condition, i.e. min{khk1,kmk1} c\nfor some absolute constant c 2 R. This result holds with high probability for random measurements\nif the number of measurement, up to a log factor, satisfy L \u2326(S1 + S2). For general vectors\nwithout the peakiness condition, the same work shows exact recovery is possible if the number of\nmeasurements, up to a log factor, satisfy L \u2326(S1S2).\nThe main contribution of this paper is to introduce an algorithm for the sparse BIP described in (1)\nwhich recovers sparse vectors that satisfy a comparable effective sparsity condition. Precisely, we say\nthe sparse vectors h\\ and m\\ have comparable effective sparsity if there exist an \u21b5 2 R such that\n(2)\n\n.\n\nC \uf8ff \u21b5 \uf8ff C for some C 2 R+. Intuitively, the ratios kh\\k1\nkh\\k2\n\nare about\nwith \u21b5 satisfying 1\nthe same if the sparsity levels of h\\ and m\\ are close and the magnitudes of the nonzero entries of\nh\\ and m\\ are about the same. Under this assumption on the sparse signals, we present a convex\nprogram stated in the natural parameter space, which in the noiseless setting with random B and\n\nand km\\k1\nkm\\k2\n\nkh\\k1\nkh\\k2\n\n= \u21b5km\\k1\nkm\\k2\n\n2\n\n\fC, exactly recovers the sparse vectors with at most S1 + S2 combined nonzero entries with high\nprobability if the number measurements satisfy L \u2326(S1 + S2) log2(K + N ).\n1.1 Convex program and main results\nWe introduce a convex program written in the natural parameter space for the bilinear inverse problem\ndescribed in (1). Let (h\\, m\\) 2 RK \u21e5 RN with kh\\k0 \uf8ff S1 and km\\k0 \uf8ff S2. Let w` = b|\n` h\\,\nx` = c|\n` are the `th row of B and C. Also, let s = sign(y)\nand t = sign(Bh\\). The convex program we consider to recover (h\\, m\\) is the `1-BranchHull\nprogram\n\n` m\\ and y` = b|\n\n` m\\, where b|\n\n` and c|\n\n` h\\c|\n\n`1-BH :\n\nminimize\nh2RK ,m2RN khk1 + kmk1\n\nsubject to s`(b|\n\n` hc|\nt`b|\n\n` m) | y`|\n` h 0,` = 1, 2, . . . , L.\n\n(3)\n\nThe motivation for the feasible set in program (3) follows from the observation that each measurement\ny` = w` \u00b7 x` de\ufb01nes a hyperbola in R2. As shown in Figure (1a), the sign information t` = w`\nrestricts (w`, x`) to one of the branch of the hyperbola. The feasible set in (3) corresponds to the\nconvex hull of particular branches of the hyperbola for each y`. This also implies that the feasible set\nis convex as it is the intersection of L convex sets.\nThe objective function in (3) is an `1 minimization over (h, m) that \ufb01nds a sparse point (\u02c6h, \u02c6m) with\nk\u02c6hk1 = k \u02c6mk1. Geometrically, this happens as the solution lies at the intersection of the `1-ball,\nand the hyperbolic curve (constraint) as shown in Figure 1a and 1b. So, the minimizer of (3), under\nsuccessful recovery, is\u2713h\\qkm\\k1\n\nkm\\k1\u25c6.\n, m\\q kh\\k1\n\nOur main result is that under the structural assumptions that w and x live in random subspaces\nwith (h\\, m\\) containing at most S1 + S2 non zero entries and (h\\, m\\) satis\ufb01ng the effective\nsparsity condition (2), the `1-BranchHull program (3) recovers h\\, and m\\ (to within the scaling\nambiguity) with high probability, provided the number of measurements, up to log factors, satisfy\nL \u2326(S1 + S2) log2(K + N ).\nTheorem 1. Suppose we observe the pointwise product of two vectors Bh\\, and Cm\\ through a bilin-\near measurement model in (1), where B, and C are standard Gaussian random matrices. If (h\\, m\\)\n\nsatisfy (2), then the `1-BranchHull program (3) uniquely recovers\u2713h\\qkm\\k1\nkm\\k1\u25c6 when-\never L CpS1 + S2 log(K + N ) + t2 for any t 0 with probability at least 1 e2Lt2. Here\n\n, m\\q kh\\k1\n\nC is an absolute constant.\n\nkh\\k1\n\nkh\\k1\n\n1.2 Prior art for bilinear inverse problems\nRecent approaches to solving bilinear inverse problems like blind deconvolution and phase retrieval\nhave been to lift the problems into a low rank matrix recovery task or to formulate an optimization\nprograms in the natural parameter space. Lifting transforms the problem of recovering h 2 RK and\nm 2 RN from bilinear measurements to the problem of recovering a low rank matrix hm| from\nlinear measurements. The low rank matrix can then be recovered using a semide\ufb01nite program. The\nresult in Ahmed et al. [2014] for blind deconvolution showed that if h and m are representations\nof the target signals with respect to Fourier and Gaussian subspaces, respectively, then the lifting\nmethod successfully recovers the low rank matrix. The recovery occurs with high probability under\nnear optimal sample complexity. Unfortunately, solving the semide\ufb01nite program is prohibitively\ncomputationally expensive because they operate in high-dimension space. Also, it is not clear how to\nenforce additional structure like sparsity of h and m in the lifted formulation in a way that allows\noptimal sample complexity (Li and Voroninski [2013], Oymak et al. [2015]).\nIn comparison to the lifting approach for blind deconvolution and phase retrieval, methods that\nformulate an algorithm in the natural parameter space like alternating minimization and gradient\ndescent based method are computationally ef\ufb01cient and also enjoy rigorous recovery guarantees\nunder optimal or near optimal sample complexity (Li et al. [2016], Cand\u00e8s et al. [2015], Netrapalli\net al. [2013], Sun et al. [2016]). In fact, the work in Lee et al. [2017] for sparse blind deconvolution\n\n3\n\n\fis based on alternating minimization. In the paper, the authors use an alternating minimization that\nsuccessively approximate the sparse vectors while enforcing the low rank property of the lifted matrix.\nHowever, because these methods are non-convex, convergence to the global optimal requires a good\ninitialization (Tu et al. [2015], Chen and Candes [2015], Li et al. [2016]).\nOther approaches that operate in the natural parameter space include PhaseMax (Bahmani and\nRomberg [2016], Goldstein and Studer [2016]) and BranchHull (Aghasi et al. [2016b]). PhaseMax\nis a linear program which has been proven to \ufb01nd the target signal in phase retrieval under optimal\nsample complexity if a good anchor vector is available. As with alternating minimization and\ngradient descent based approach, PhaseMax requires a good initialization. However, in PhaseMax\nthe initialization is part of the optimization program but in alternating minimization the initialization\nis part of the algorithmic implementation. BranchHull is a convex program which solves the BIP\ndescribed in (3) excluding the sparsity assumption under optimal sample complexity. Like the\n`1-BranchHull presented in this paper, BranchHull does not require an initialization but requires the\nsign information of the signals.\nThe `1-BranchHull program (3) combines strengths of both the lifting method and the gradient\ndescent based method. Speci\ufb01cally, the `1-BranchHull program is a convex program that operates in\nthe natural parameter space, without a need for an initialization, and without restrictive assumptions\non the class of recoverable signals. These strengths are achieved at the cost of the sign information of\nthe target signals w and x. However, the sign assumption can be justi\ufb01ed in imaging applications\nwhere the goal might be to recover pixel values of a target image, which are non-negative. Also, as in\nPhaseMax, the sign information can be thought of as an anchor vector which anchors the solution to\none of the branches of the L hyperbolic measurements.\n\n1.3 Extension to noise and outlier\n\nRBH:\n\nExtending the theory of the `1-BranchHull program (3) to the case with noise is important as most\nreal data contain signi\ufb01cant noise. Formulation 3 may be particularly susceptible to noise that changes\nthe sign of even a single measurement. For the bilinear inverse problem as described in (1) with small\ndense noise and arbitrary outliers, we propose the following robust `1-BranchHull program\n` m + \u21e0`)b|\n` h | y`|,\n` h 0,` = 1, . . . , L.\n\nh2RK ,m2RN ,\u21e02RL khk1 + kmk1 + k\u21e0k1\n\nsubject to s`(c|\nt`b|\n\nminimize\n\nThe slack variable \u21e0 controls the shape of the feasible set. For measurements y` with incorrect sign,\nthe corresponding slack variables \u21e0` shifts the feasible set so that the target signal is feasible. In the\noutlier case, the `1 penalty promotes sparsity of slack variable \u21e0. We implement a slight variation of\nthe above program, detailed in Section 1.4, to remove distortions from real and synthetic images.\n\n(4)\n\n1.4 Total variation extension of `1-BranchHull\n\nThe robust `1-BranchHull program (4) is \ufb02exible and can be altered to remove distortions from an\notherwise piecewise constant signal. In the case where w = Bh\\ is a piecewise constant signal,\nx = Cm\\ is a distortion signal and y = w x is the distorted signal, the total variation version (5)\nof the robust BranchHull program (4), under successful recovery, produces the piecewise constant\nsignal Bh\\, up to a scaling.\n\nTV BH :\n\nminimize\n\nh2RK ,m2RN ,\u21e02RL\n\nTV (Bh) + kmk1 + k\u21e0k1\n\n(5)\nt`b>` h 0,` = 1, 2, . . . , L.\nIn (5), TV(\u00b7) is a total variation operator and is the `1 norm of the vector containing pairwise\ndifference of neighboring elements of the target signal Bh. We implement (5) to remove distortions\nfrom images in Section 3.2.\n\nsubject to s`(\u21e0` + c>` m)b>` h | y`|\n\n1.5 Notation\n\nVectors and matrices are written with boldface, while scalars and entries of vectors are written in\nplain font. For example, c` is the `the entry of the vector c. We write 1 as the vector of all ones with\n\n4\n\n\fdimensionality appropriate for the context. We write I N as the N \u21e5N identity matrix. For any x 2 R,\nlet (x) 2 Z such that x 1 < (x) \uf8ff x. For any matrix A, let kAkF be the Frobenius norm of A.\nFor any vector x, let kxk0 be the number of non-zero entries in x. For x 2 RK and y 2 RN, (x, y)\nis the corresponding vector in RK \u21e5 RN, and h(x1, y1), (x2, y2)i = hx1, x2i + hy1, y2i. For a set\nA\u21e2 Rm, and a vector a 2 Rm, we de\ufb01ne by a A , a set obtained by incrementing every element\nof A by a.\n2 Algorithm\n\nIn this section, we present an Alternating Direction Method of Multipliers (ADMM) implementation\nof an extension of the robust `1-BranchHull program (4). The ADMM implementation of the `1-\nBranchHull program (3) is similar to the ADMM implementation of (6) and we leave it to the readers.\nThe extension of the robust `1-BranchHull program we consider is\n\nminimize\n\nh2RK ,m2RN ,\u21e02RL kP hk1 + kmk1 + k\u21e0k1\n\nsubject to s`(\u21e0` + c>` m)b>` h | y`|\n\n(6)\n\nt`b>` h 0,` = 1, 2, . . . , L,\n\nwhere P 2 RJ\u21e5K for some J 2 Z. The above extension reduces to the robust `1-BranchHull\nprogram if P = I K. Recalling that w = Bh and x = Cm, we make use of the following notations\n\nu = x\n\n\u21e0! , v = m\n\nw\n\nh\n\n\u21e0! , E =0@\n\nC 0\n0 B\n0\n0\n\n0\n0\n\n1I L\n\n1A and Q = I N\n\n0\n0 P\n0\n0\n\n0\n0\n\nI L! .\n\nUsing this notation, our convex program can be compactly written as\n\nminimize\n\nv2RN +K+L,u2R3L kQvk1 subject to u = Ev, u 2C .\n\nHere C =(x, w, \u21e0) 2 R3L| s`(\u21e0` + x`)w` | y`|, t`w` 0,` = 1, . . . , L is the convex feasible\n\nset of (6). Introducing a new variable z the resulting convex program can be written as\n\nminimize\n\nv,u,z\n\nkzk1 subject to u = Ev, Qv = z, u 2C .\n\nWe may now form the scaled ADMM steps as follows\n\nuk+1 = arg min\n\nu\n\nzk+1 = arg min\n\nz\n\nvk+1 = arg min\n\nv\n\n\u21e2\n2 ku + \u21b5k Evkk2\nIC(u) +\n\u21e2\n2 kz + k Qvkk2\nkzk1 +\n\u21e2\n\u21e2\n2 kk + zk+1 Qvk2 ,\n2 k\u21b5k + uk+1 Evk2 +\n\n(7)\n\n(8)\n\n(9)\n\n\u21b5k+1 = \u21b5k + uk+1 Evk+1,\nk+1 = k + vk+1 Qvk+1.\n\nwhere IC(\u00b7) in (7) is the indicator function on C such that IC(u) = 0 if u 2C and in\ufb01nity otherwise.\nWe would like to note that the \ufb01rst three steps of the proposed ADMM scheme can be presented in\nclosed form. The update in (7) is the following projection\n\nwhere proj\nC(v) is the projection of v onto C. Details of computing the projection onto C are presented\nin the Supplementary material. The update in (8) can be written in terms of the soft-thresholding\noperator\n\nuk+1 = proj\n\nC (Evk \u21b5k) ,\n\nzk+1 = S1/\u21e2 (Qvk k) ,\n\nwhere\n\n(Sc(z))i =( zi c\n\nzi + c\n\n0\n\nzi > c\n|zi|\uf8ff c\nzi < c\n\n,\n\nwhere c > 0 and (Sc(z))i is the ith entry of Sc(z). Finally, the update in (9) takes the following\nform\n\nvk+1 =\u21e3E>E + Q|Q\u23181\u21e3E> (\u21b5k + uk+1) + Q>(k + zk+1)\u2318 .\n\nIn our implementation of the ADMM scheme, we initialize the algorithm with the v0 = 0, \u21b50 = 0,\n0 = 0.\n\n5\n\n\f3 Numerical Experiments\n\nIn this section, we provide numerical experiments on synthetic and real data where the signals follow\nthe multiplicative model (1), which is compatible with physics of lighting (Hold [1986]). This is in\ncontrast to well-known methods for image de-illumination like He et al. [2011] where the external\nlight has an additive contribution to the image. Other methods like Chen et al. [2006] work with\nadditive models by working with the images in the log domain, while we directly work with the\nmultiplicative model in a robust-to-noise way. The experiment on real data presented in this section\nshows total variation `1-BranchHull program can be used to remove distortions from an image. The\nsynthetic experiment numerically veri\ufb01es Theorem 1 with a low scaling constant.\n\n3.1 Phase Portrait\n\n15\n\n13\n\n11\n\n9 \n\n7 \n\n5 \n\n3 \n\n1 \n\n4 12 20 28 36 44 52 60 68 76 84 92 100 108 116 124 132 140\n\nFigure 2: The empirical recovery probability from synthetic data with sparsity level S as a function\nof total number of measurements L. Each block correspond to the average from 10 independent\ntrials. White blocks correspond to successful recovery and black blocks correspond to unsuccessful\nrecovery. The area to the right of the line satis\ufb01es L > 0.25(S1 + S2) log2(N + K).\n\nWe \ufb01rst show a phase portrait that veri\ufb01es Theorem 1. Consider the following measurements: \ufb01x\nN 2{ 20, 40, . . . , 300}, L 2{ 4, 8, . . . , 140} and let K = N. Let the target signal (h\\, m\\) 2\nRK \u21e5 RN be such that both h\\ and m\\ have 0.05N non-zero entries with the nonzero indices\nrandomly selected and set to \u00b11. Let S1 and S2 be the number of nonzero entries in h\\ and m\\,\nrespectively. Let B 2 RL\u21e5K and C 2 RL\u21e5N such that Bij \u21e0 1pLN (0, 1) and Cij \u21e0 1pLN (0, 1).\nLastly, let y = Bh\\ Cm\\ and t = sign(Bh\\).\nFigure 2 shows the fraction of successful recoveries from 10 independent trials using (3) for the\nbilinear inverse problem (1) from data as described above. Let (\u02c6h, \u02c6m) be the output of (3) and let\n(\u02dch, \u02dcm) be the candidate minimizer. We solve (3) using an ADMM implementation similar to the\nADMM implementation detailed in Section 2 with the step size parameter \u21e2 = 1. For each trial,\nwe say (3) successfully recovers the target signal if k(\u02c6h, \u02c6m) (\u02dch, \u02dcm)k2 < 1010. Black squares\ncorrespond to no successful recovery and white squares correspond to 100% successful recovery.\nThe line corresponds to L = C(S1 + S2) log2(K + N ) with C = 0.25 and indicates that the sample\ncomplexity constant in Theorem 1 is not very large.\n\n3.2 Distortion removal from images\nWe use the total variation BranchHull program (5) to remove distortions from real images \u02dcy 2 Rp\u21e5q.\nIn the experiments, The observation y 2 RL is the column-wise vectorization of the image \u02dcy, the\ntarget signal w = Bh is the vectorization of the piecewise constant image and x = Cm corresponds\nto the distortions in the image. We use (5) to recover piecewise constant target images like in the\nDh in block form. Here,\nforeground of Figure 3a with TV(Bh) = kDBhk1, where D =\uf8ff Dv\nDv 2 R(Lq)\u21e5L and Dh 2 R(Lp)\u21e5L with\np1\u2318\nif j = i +\u21e3 i1\n, (Dh)ij =( 1\nif j = i + 1 +\u21e3 i1\np1\u2318\n\nif j = i\nif j = i + p\notherwise\n\notherwise\n\n(Dv)ij =8>><>>:\n\n1\n1\n0\n\n1\n0\n\n.\n\n6\n\n\fLastly, we solve (5) using the ADMM algorithm detailed in Section 2 with P = DB.\n\n(a) Distorted image\n\n(b) Recovered image\n\n(c) Distorted image\n\n(d) Recovered image\n\nFigure 3: Panel (a) shows an image of a mousepad with distortions and panel(b) is the piecewise\nconstant image recovered using total variation `1-BranchHull. Similarly, panel (d) shows an image\ncontaining rice grains and panel (e) is the recovered image.\n\nWe now show two experiments on real images. The \ufb01rst image, shown in Figure 3a, was captured\nusing a camera and resized to a 115 \u21e5 115 image. The measurement y 2 RL is the vectorization of\nthe image with L = 13225. Let B be the L \u21e5 L identity matrix. Let F be the L \u21e5 L inverse DCT\nmatrix. Let C 2 RL\u21e5300 with the \ufb01rst column set to 1 and remaining columns randomly selected\nfrom columns of F without replacement. The matrix C is scaled so that kCkF = kBkF = pL.\nThe vector of known sign t is set to 1. Let (\u02c6h, \u02c6m, \u02c6\u21e0) be the output of (5) with = 103 and \u21e2 = 104.\nFigure 3b corresponds to B \u02c6h and shows that the object in the center was successfully recovered.\nThe second real image, shown in Figure 3c, is an image of rice grains. The size of the image is\n128 \u21e5 128. The measurement y 2 RL is the vectorization of the image with L = 16384. Let B be\nthe L \u21e5 L identity matrix. Let C 2 RL\u21e550 with the \ufb01rst column set to 1. The remaining columns\nof C are sampled from Bessel function of the \ufb01rst kind J\u232b() with each column corresponding to\na \ufb01xed 2 R. Speci\ufb01cally, \ufb01x g 2 RL with gi = 9 + 14 i1\nL1. For each remaining column c\n+5|\u21e32|(0.1 + 10|\u21e33|). The matrix C is scaled so that\nof C, \ufb01x \u21e3 \u21e0N (0, I3) and let ci = J\nkCkF = kBkF = pL. The vector of known sign t is set to 1. Let (\u02c6h, \u02c6m, \u02c6\u21e0) be the output of (5)\nwith = 103 and \u21e2 = 107. Figure 3d corresponds to B \u02c6h.\n\n6+0.1|\u21e31|\n\ngi\n\n4 Proof Outline\n\n` m\\ and y` = b|\n\nIn this section, we provide a proof of Theorem 1 by considering a related linear program with larger\nfeasible set. Let (h\\, m\\) 2 RK \u21e5 RN with kh\\k0 \uf8ff S1 and km\\k0 \uf8ff S2. Let w` = b|\n` h\\,\nx` = c|\n` m\\. Also, let s = sign(y) and t = sign(Bh\\). We will shows that\nthe (3) recovers (\u02dch, \u02dcm) such that (\u02dch, \u02dcm) =\u2713h\\qkm\\k1\n\nkm\\k1\u25c6.\n, m\\q kh\\k1\n\nConsider program (10) which has a linear constraint set that contains the feasible set of the `1-\nBrachHull program (3).\n\n` h\\ \u00b7 c|\n\nkh\\k1\n\nminimize\n\nh2RK ,m2RN khk1 + kmk1subject to s`(b|\n\n` hc|\n\n` \u02dcm + b|\n` = 1, 2, . . . , L,\n\n`\n\n\u02dchc|\n\n` m) 2|y`|\n\n(10)\n\nLP :\n\nLet\n\nS :=n(h, m) 2 RK \u21e5 RN | (h, m) = \u21b5(\u02dch, \u02dcm), and \u21b5 2 [1, 1]o .\n\n(11)\n\nObserve that if (\u02dch, \u02dcm) is a minimizer of (10) then so are all the points in the set (\u02dch, \u02dcm) S .\nLemma 1. If the optimization program (10) recovers (h, m) 2 (\u02dch, \u02dcm) S , then the BranchHull\nprogram (3) recovers (\u02dch, \u02dcm).\n\n7\n\n\fA proof of Lemma 1, provided in Supplementary material, follows from the observations that the\nfeasible set of (10) contains the feasible set of (3) and (\u02dch, \u02dcm) is the only feasible point in (3) among\nall (h, m) 2 (\u02dch, \u02dcm) S .\nWe now show that the solution of (10) lies in the set (\u02dch, \u02dcm) S . Let a|\n` ) 2\nRK+N denote the `th row of a matrix A. The linear constraint in (10) are now simply sA(h, m) \n2|y|. Note that S\u21e2N := span(\u02dch, \u02dcm) \u2713 Null(A).\nOur strategy will be to show that for any feasible perturbation (h, m) 2N ? the objective of the\nlinear program (10) strictly increases, where N? is the orthogonal complement of the subspace N .\nThis will be equivalent to showing that the solution of (10) lies in the set (\u02dch, \u02dcm) S .\nThe subgradient of the `1-norm at the proposed solution (\u02dch, \u02dcm) is\n@k(\u02dch, \u02dcm)k1 := {g 2 RK+N : kgk1 \uf8ff 1 and gh = sign(h\\\n\n) , gm = sign(m\\\n\n` = (c|\n\n` \u02dcmb|\n\n` , b|\n\n\u02dchc|\n\n)},\n\nm\n\nh\n\n`\n\nwhere h, and m denote the support of non-zeros in h\\, and m\\, respectively. To show the linear\nprogram converges to a solution (\u02c6h, \u02c6m) 2 (\u02dch, \u02dcm) S , it suf\ufb01ces to show that the set of following\ndescent directions\n\nn(h, m) 2N ? :\u2326g, (h, m)\u21b5 \uf8ff 0, 8g 2 @k(\u02dch, \u02dcm)k1o\n\u2713(h, m) 2N ? : hgh, hhi + hgm, mmi + k(hc\n\u2713(h, m) 2N ? : kgh[mk2k(hh, mm)k2 + k(hc\n=n(h, m) 2N ? : k(hc\n\nm)k1 \uf8ff 0 \nm)k1 \uf8ff 0 \nm)k1 \uf8ffpS1 + S2k(hh, mm)k2o =: D\n\n, mc\n\n, mc\n\n, mc\n\nh\n\nh\n\nh\n\n(12)\n\ndoes not contain any vector (h, m) that is consistent with the constraints. We do this by quantifying\nthe \u201cwidth\" of the set D through a Rademacher complexity, and a probability that the gradients of the\nconstraint functions lie in a certain half space. This allows us to use small ball method developed in\nKoltchinskii and Mendelson [2015], Mendelson [2014] to ultimately show that it is highly unlikely to\nhave descent directions in D that meet the constraints in (10). We now concretely state the de\ufb01nitions\nof the Rademacher complexity, and probability term mentioned above.\nDe\ufb01ne linear functions\n\nf`(h, m) :=D(b|\n\n` \u02dcmb`), (h, m)E ,` = 1, 2, 3, . . . , L.\n(h, m) at (\u02dch, \u02dcm) are then simply rf` = ( @f`(\u02dch, \u02dcm)\n\nThe linear constraints in the LP (10) are de\ufb01ned these linear functions as s`f`(h, m)2|y`|.\nThe gradients of f` w.r.t.\n@m ) =\n(s`c|\n\n, @f`(\u02dch, \u02dcm)\n\n\u02dchc`, c|\n\n` \u02dcmb`, s`b|\n\n@h\n\n`\n\n`\n\n\u02dchc`). De\ufb01ne the Rademacher complexity of a set D\u21e2 RM as\nk(h,m)k2E ,\n\n\"`Drf`,\n\nC(D) := E sup\n\n(h,m)2D\n\n1pL\n\n(h,m)\n\nLX`=1\n\nwhere \"1,\" 2, . . . ,\" L are iid Rademacher random variables that are independent of everything else.\nFor a set D, the quantity C(D) is a measure of width of D around the origin in terms of the gradients\nof the constraint functions. For example, an equally distributed random set of gradient functions\nmight lead to a smaller value of C(D).\nOur results also depend on a probability p\u2327 (D), and a positive parameter \u2327 introduced below\n\n(13)\n\n(14)\n\np\u2327 (D) = inf\n\n(h,m)2D\n\nP\u21e3Drf`,\n\n(h,m)\n\nk(h,m)k2E \u2327\u2318 .\n\nIntuitively, p\u2327 (D) quanti\ufb01es the size of D through the gradient vectors. For a small enough \ufb01xed\nparameter, a small value of p\u2327 (D) means that the D is mainly invisible to to the gradient vectors.\nLemma 2. Let D be the set of descent directions, already characterized in (12), for which C(D), and\np\u2327 (D) can be determined using (13), and (14). Choose L \u21e3 2C(D)+t\u2327\n\u2327 p\u2327 (D) \u23182\nfor any t > 0. Then the\nsolution (\u02c6h, \u02c6m) of the LP in (10) lies in the set (\u02dch, \u02dcm) S with probability at least 1 e2Lt2.\n\n8\n\n\fProof of this lemma is based on small ball method developed in Koltchinskii and Mendelson [2015],\nMendelson [2014] and further studied in Lecu\u00e9 et al. [2018], Lecu\u00e9 and Mendelson [2017]. The\nproof is mainly repeated using the argument in Bahmani and Romberg [2017], and is provided in\nthe supplementary material for completeness. We now state the main theorem for linear program\n(10). The theorems states that if the sparse signals satisfy the effective sparsity condition (2) and\nL Ct(S1 +S2) log2(K +N ), then the minimizer of the linear program (10) is in the set (\u02dch, \u02dcm)S\nwith high probability.\nTheorem 2 (Exact recovery). Suppose we observe pointwise product of two vectors Bh\\, and Cm\\\nthrough a bilinear measurement model in (1), where B, and C are standard Gaussian random\nmatrices. If (h\\, m\\) satisfy (2), then the linear program (10) recovers (\u02c6h, \u02c6m) 2 (\u02dch, \u02dcm) S with\nprobability at least 1 e2Lt2 whenever L CpS1 + S2 log(K + N ) + t2, where C is an\nabsolute constant.\nIn light of Lemma 2, the proof of Theorem 2 reduces to computing the Rademacher complexity C(D)\nde\ufb01ned in (13), and the tail probability estimate p\u2327 (D) de\ufb01ned in (14) of the set of descent directions\nD de\ufb01ned in (12). The Rademacher complexity is bounded from above by\n2(S1 + S2) log2(K + N ).\nand for \u2327 = min{k\u02dchk2,k \u02dcmk2}, the tail probability is bounded by p\u2327 (D) 1\n8c4 , where both C and\nc are constants. These bounds are shown in the Supplementary material. The proof of Theorem 1\nfollows by applying Lemma 1 to Theorem 2.\n\nC(D) \uf8ff Cqk \u02dcmk2\n\n2 + k\u02dchk2\n\nAcknowledgements\n\nAli Ahmed would like to acknowledge the partial support through the grant for the National center of\ncyber security (NCCS) from HEC, Pakistan. Paul Hand would like to acknowledge funding by the\ngrant NSF DMS-1464525.\n\nReferences\nJames R Fienup. Phase retrieval algorithms: a comparison. Applied optics, 21(15):2758\u20132769, 1982.\n\nE. Cand\u00e8s and X. Li. Solving quadratic equations via phaselift when there are about as many\n\nequations as unknowns. Found. Comput. Math., pages 1\u201310, 2012.\n\nE. Cand\u00e8s, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude\n\nmeasurements via convex programming. Commun. Pure Appl. Math., 66(8):1241\u20131274, 2013.\n\nAli Ahmed, Benjamin Recht, and Justin Romberg. Blind deconvolution using convex programming.\n\nIEEE Trans. Inform. Theory, 60(3):1711\u20131732, 2014.\n\nThomas G Stockham, Thomas M Cannon, and Robert B Ingebretsen. Blind deconvolution through\n\ndigital signal processing. Proceedings of the IEEE, 63(4):678\u2013692, 1975.\n\nDeepa Kundur and Dimitrios Hatzinakos. Blind image deconvolution. IEEE signal processing\n\nmagazine, 13(3):43\u201364, 1996.\n\nAlireza Aghasi, Barmak Heshmat, Albert Redo-Sanchez, Justin Romberg, and Ramesh Raskar.\nSweep distortion removal from terahertz images via blind demodulation. Optica, 3(7):754\u2013762,\n2016a.\n\nPatrik O Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine\n\nlearning research, 5(Nov):1457\u20131469, 2004.\n\nDaniel D Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances\n\nin neural information processing systems, pages 556\u2013562, 2001.\n\nShuyang Ling and Thomas Strohmer. Self-calibration and biconvex compressive sensing. Inverse\n\nProblems, 31(11):115002, 2015.\n\n9\n\n\fO\u2019Grady Paul D., Pearlmutter Barak A., and Rickard Scott T. Survey of sparse and non-sparse\nmethods in source separation. International Journal of Imaging Systems and Technology, 15(1):\n18\u201333, 2005.\n\nIvana Tosic and Pascal Frossard. Dictionary learning. IEEE Signal Processing Magazine, 28(2):\n\n27\u201338, 2011.\n\nKiryung Lee, Yihing Wu, and Yoram Bresler. Near optimal compressed sensing of a class of sparse\n\nlow-rank matrices via sparse power factorization. arXiv preprint arXiv:1702.04342, 2017.\n\nXiaodong Li and Vladislav Voroninski. Sparse signal recovery from quadratic measurements via\n\nconvex programming. SIAM Journal on Mathematical Analysis, 45(5):3019\u20133033, 2013.\n\nSamet Oymak, Amin Jalali, Maryam Fazel, Yonina C Eldar, and Babak Hassibi. Simultaneously\nstructured models with application to sparse and low-rank matrices. IEEE Trans. Inform. Theory,\n61(5):2886\u20132908, 2015.\n\nXiaodong Li, Shuyang Ling, Thomas Strohmer, and Ke Wei. Rapid, robust, and reliable blind\n\ndeconvolution via nonconvex optimization. arXiv preprint arXiv:1606.04933, 2016.\n\nEmmanuel Cand\u00e8s, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger \ufb02ow:\n\nTheory and algorithms. IEEE Trans. Inform. Theory, 61(4):1985\u20132007, 2015.\n\nPraneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization.\n\nIn Advances Neural Inform. Process. Syst., pages 2796\u20132804, 2013.\n\nJu Sun, Qing Qu, and John Wright. A geometric analysis of phase retrieval. In Information Theory\n\n(ISIT), 2016 IEEE International Symposium on, pages 2379\u20132383. IEEE, 2016.\n\nStephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Benjamin Recht. Low-rank\nsolutions of linear matrix equations via procrustes \ufb02ow. arXiv preprint arXiv:1507.03566, 2015.\nYuxin Chen and Emmanuel Candes. Solving random quadratic systems of equations is nearly as easy\n\nas solving linear systems. In Advances Neural Inform. Process. Syst., pages 739\u2013747, 2015.\n\nSohail Bahmani and Justin Romberg. Phase retrieval meets statistical learning theory: A \ufb02exible\n\nconvex relaxation. arXiv preprint arXiv:1610.04210, 2016.\n\nTom Goldstein and Christoph Studer. Phasemax: Convex phase retrieval via basis pursuit. arXiv\n\npreprint arXiv:1610.07531, 2016.\n\nAlireza Aghasi, Ali Ahmed, and Paul Hand. Branchhull: Convex bilinear inversion from the entrywise\n\nproduct of signals with known signs. arXiv preprint arXiv:1312.0525v2, 2016b.\n\nBerthold K. P. Hold. Robot Vision. The MIT Press, 1986.\nK. He, J. Sun, and X. Tang. Single image haze removal using dark channel prior. IEEE Transactions\non Pattern Analysis and Machine Intelligence, 33(12):2341\u20132353, Dec 2011. ISSN 0162-8828.\ndoi: 10.1109/TPAMI.2010.168.\n\nT. Chen, Wotao Yin, Xiang Sean Zhou, D. Comaniciu, and T. S. Huang. Total variation models for\nvariable lighting face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n28(9):1519\u20131524, Sept 2006. ISSN 0162-8828. doi: 10.1109/TPAMI.2006.195.\n\nVladimir Koltchinskii and Shahar Mendelson. Bounding the smallest singular value of a random\n\nmatrix without concentration. Int. Math. Research Notices, 2015(23):12991\u201313008, 2015.\n\nShahar Mendelson. Learning without concentration. In Conference on Learning Theory, pages 25\u201339,\n\n2014.\n\nGuillaume Lecu\u00e9, Shahar Mendelson, et al. Regularization and the small-ball method i: sparse\n\nrecovery. The Annals of Statistics, 46(2):611\u2013641, 2018.\n\nGuillaume Lecu\u00e9 and Shahar Mendelson. Regularization and the small-ball method ii: complexity\n\ndependent error rates. The Journal of Machine Learning Research, 18(1):5356\u20135403, 2017.\n\n10\n\n\fSohail Bahmani and Justin Romberg. Anchored regression: Solving random convex equations via\n\nconvex programming. arXiv preprint arXiv:1702.05327, 2017.\n\nColin McDiarmid. On the method of bounded differences. Surveys in combinatorics, 141(1):148\u2013188,\n\n1989.\n\nAad W van der Vaart and Jon A Wellner. Weak convergence and empirical processes with applications\nto statistics. Journal of the Royal Statistical Society-Series A Statistics in Society, 160(3):596\u2013608,\n1997.\n\nMichel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes.\n\nSpringer Science & Business Media, 2013.\n\nMichael G Akritas, S Lahiri, and Dimitris N Politis. Topics in nonparametric statistics. Springer,\n\n2016.\n\nSara van de Geer and Johannes Lederer. The bernstein\u2013orlicz norm and deviation inequalities.\n\nProbability theory and related \ufb01elds, 157(1-2):225\u2013250, 2013.\n\n11\n\n\f", "award": [], "sourceid": 5148, "authors": [{"given_name": "Alireza", "family_name": "Aghasi", "institution": "Institute for Insight"}, {"given_name": "Ali", "family_name": "Ahmed", "institution": "Information Technology University"}, {"given_name": "Paul", "family_name": "Hand", "institution": "Northeastern University"}, {"given_name": "Babhru", "family_name": "Joshi", "institution": "Rice University"}]}