{"title": "Reshaped Wirtinger Flow for Solving Quadratic System of Equations", "book": "Advances in Neural Information Processing Systems", "page_first": 2622, "page_last": 2630, "abstract": "We study the problem of recovering a vector $\\bx\\in \\bbR^n$ from its magnitude measurements $y_i=|\\langle \\ba_i, \\bx\\rangle|, i=1,..., m$. Our work is along the line of the Wirtinger flow (WF) approach \\citet{candes2015phase}, which solves the problem by minimizing a nonconvex loss function via a gradient algorithm and can be shown to converge to a global optimal point under good initialization. In contrast to the smooth loss function used in WF, we adopt a nonsmooth but lower-order loss function, and design a gradient-like algorithm (referred to as reshaped-WF). We show that for random Gaussian measurements, reshaped-WF enjoys geometric convergence to a global optimal point as long as the number $m$ of measurements is at the order of $\\cO(n)$, where $n$ is the dimension of the unknown $\\bx$. This improves the sample complexity of WF, and achieves the same sample complexity as truncated-WF \\citet{chen2015solving} but without truncation at gradient step. Furthermore, reshaped-WF costs less computationally than WF, and runs faster numerically than both WF and truncated-WF. Bypassing higher-order variables in the loss function and truncations in the gradient loop, analysis of reshaped-WF is simplified.", "full_text": "Reshaped Wirtinger Flow for\n\nSolving Quadratic System of Equations\n\nHuishuai Zhang\n\nDepartment of EECS\nSyracuse University\nSyracuse, NY 13244\nhzhan23@syr.edu\n\nYingbin Liang\n\nDepartment of EECS\nSyracuse University\nSyracuse, NY 13244\nyliang06@syr.edu\n\nAbstract\n\nWe study the problem of recovering a vector x \u2208 Rn from its magnitude measure-\nments yi = |(cid:104)ai, x(cid:105)|, i = 1, ..., m. Our work is along the line of the Wirtinger \ufb02ow\n(WF) approach Cand\u00e8s et al. [2015], which solves the problem by minimizing a\nnonconvex loss function via a gradient algorithm and can be shown to converge\nto a global optimal point under good initialization. In contrast to the smooth loss\nfunction used in WF, we adopt a nonsmooth but lower-order loss function, and\ndesign a gradient-like algorithm (referred to as reshaped-WF). We show that for\nrandom Gaussian measurements, reshaped-WF enjoys geometric convergence to\na global optimal point as long as the number m of measurements is at the order\nof O(n), where n is the dimension of the unknown x. This improves the sample\ncomplexity of WF, and achieves the same sample complexity as truncated-WF\nChen and Candes [2015] but without truncation at gradient step. Furthermore,\nreshaped-WF costs less computationally than WF, and runs faster numerically than\nboth WF and truncated-WF. Bypassing higher-order variables in the loss function\nand truncations in the gradient loop, analysis of reshaped-WF is simpli\ufb01ed.\n\n1\n\nIntroduction\n\nfor i = 1,\u00b7\u00b7\u00b7 , m,\n\ni=1 and the design vectors {ai}m\n\nRecovering a signal via a quadratic system of equations has gained intensive attention recently.\nMore speci\ufb01cally, suppose a signal of interest x \u2208 Rn/Cn is measured via random design vectors\nai \u2208 Rn/Cn with the measurements yi given by\nyi = |(cid:104)ai, x(cid:105)| ,\n\n(1)\ni = |(cid:104)ai, x(cid:105)|2. The goal is to recover\nwhich can also be written equivalently in a quadratic form as y(cid:48)\nthe signal x based on the measurements y = {yi}m\ni=1. Such a\nproblem arises naturally in the phase retrieval applications, in which the sign/phase of the signal is\nto be recovered from only measurements of magnitudes. Various algorithms have been proposed to\nsolve this problem since 1970s. The error-reduction methods proposed in Gerchberg [1972], Fienup\n[1982] work well empirically but lack theoretical guarantees. More recently, convex relaxation of\nthe problem has been formulated, for example, via phase lifting Chai et al. [2011], Cand\u00e8s et al.\n[2013], Gross et al. [2015] and via phase cut Waldspurger et al. [2015], and the correspondingly\ndeveloped algorithms typically come with performance guarantee. The reader can refer to the review\npaper Shechtman et al. [2015] to learn more about applications and algorithms of the phase retrieval\nproblem.\nWhile with good theoretical guarantee, these convex methods often suffer from computational com-\nplexity particularly when the signal dimension is large. On the other hand, more ef\ufb01cient nonconvex\napproaches have been proposed and shown to recover the true signal as long as initialization is good\nenough. Netrapalli et al. [2013] proposed AltMinPhase algorithm, which alternatively updates the\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fphase and the signal with each signal update solving a least-squares problem, and showed that Alt-\nMinPhase converges linearly and recovers the true signal with O(n log3 n) Gaussian measurements.\nMore recently, Cand\u00e8s et al. [2015] introduces Wirtinger \ufb02ow (WF) algorithm, which guarantees\nsignal recovery via a simple gradient algorithm with only O(n log n) Gaussian measurements and\nattains \u0001\u2212accuracy within O(mn2 log 1/\u0001) \ufb02ops. More speci\ufb01cally, WF obtains good initialization\nby the spectral method, and then minimizes the following nonconvex loss function\n\n(cid:96)W F (z) :=\n\n1\n4m\n\n(|aT\n\ni z|2 \u2212 y2\n\ni )2,\n\n(2)\n\nm(cid:88)\n\ni=1\n\nvia the gradient descent scheme.\nWF was further improved by truncated Wirtinger \ufb02ow (truncated-WF) algorithm proposed in Chen\nand Candes [2015], which adopts a Poisson loss function of |aT\ni z|2, and keeps only well-behaved\nmeasurements based on carefully designed truncation thresholds for calculating the initial seed and\nevery step of gradient . Such truncation assists to yield linear convergence with certain \ufb01xed step size\nand reduces both the sample complexity to (O(n)) and the convergence time to (O(mn log 1/\u0001)).\ni z|2 so that the optimization objective is a\nIt can be observed that WF uses the quadratic loss of |aT\nsmooth function of aT\ni z and the gradient step becomes simple. But this comes with a cost of a quartic\nloss function. In this paper, we adopt the quadratic loss of |aT\ni z|. Although the loss function is not\nsmooth everywhere, it reduces the order of aT\ni z to be two, and the general curvature can be more\namenable to convergence of the gradient algorithm. The goal of this paper is to explore potential\nadvantages of such a nonsmooth lower-order loss function.\n\n1.1 Our Contribution\n\nThis paper adopts the following loss function1\n\n(cid:96)(z) :=\n\n1\n2m\n\ni=1\n\nm(cid:88)\n\n(cid:0)|aT\n\ni z| \u2212 yi\n\n(cid:1)2\n\n.\n\n(3)\n\nCompared to the loss function (2) in WF that adopts |aT\ni z|2, the above loss function adopts the\nabsolute value/magnitude |aT\ni z| and hence has lower-order variables. For such a nonconvex and\nnonsmooth loss function, we develop a gradient descent-like algorithm, which sets zero for the\n\u201cgradient\" component corresponding to nonsmooth samples. We refer to such an algorithm together\nwith truncated initialization using spectral method as reshaped Wirtinger \ufb02ow (reshaped-WF). We\nshow that the lower-order loss function has great advantage in both statistical and computational\nef\ufb01ciency, although scarifying smoothness. In fact, the curvature of such a loss function behaves\nsimilarly to that of a least-squares loss function in the neighborhood of global optimums (see Section\n2.2), and hence reshaped-WF converges fast. The nonsmoothness does not signi\ufb01cantly affect the\nconvergence of the algorithm because only with negligible probability the algorithm encounters\nnonsmooth points for some samples, which furthermore are set not to contribute to the gradient\ndirection by the algorithm. We summarize our main results as follows.\n\n\u2022 Statistically, we show that reshaped-WF recovers the true signal with O(n) samples, when\nthe design vectors consist of independently and identically distributed (i.i.d.) Gaussian\nentries, which is optimal in the order sense. Thus, even without truncation in gradient\nsteps (truncation only in initialization stage), reshaped WF improves the sample complexity\nO(n log n) of WF, and achieves the same sample complexity as truncated-WF. It is thus\nmore robust to random measurements.\n\u2022 Computationally, reshaped-WF converges geometrically, requiring O(mn log 1/\u0001) \ufb02ops\nto reach \u0001\u2212accuracy. Again, without truncation in gradient steps, reshaped-WF improves\ncomputational cost O(mn2 log(1/\u0001) of WF and achieves the same computational cost as\ntruncated-WF. Numerically, reshaped-WF is generally two times faster than truncated-WF\nand four to six times faster than WF in terms of the number of iterations and time cost.\n\nCompared to WF and truncated-WF, our technical proof of performance guarantee is much simpler,\nbecause the lower-order loss function allows to bypass higher-order moments of variables and\n\n1The loss function (3) was also used in Fienup [1982] to derive a gradient-like update for the phase retrieval\nproblem with Fourier magnitude measurements. However, our paper is to characterize global convergence\nguarantee for such an algorithm with appropriate initialization, which was not studied in Fienup [1982].\n\n2\n\n\ftruncation in gradient steps. We also anticipate that such analysis is more easily extendable. On the\nother hand, the new form of the gradient step due to nonsmoothness of absolute function requires\nnew developments of bounding techniques.\n1.2 Connection to Related Work\n\nAlong the line of developing nonconvex algorithms with global performance guarantee for the phase\nretrieval problem, Netrapalli et al. [2013] developed alternating minimization algorithm, Cand\u00e8s et al.\n[2015], Chen and Candes [2015], Zhang et al. [2016], Cai et al. [2015] developed/studied \ufb01rst-order\ngradient-like algorithms, and a recent study Sun et al. [2016] characterized geometric structure of the\nnonconvex objective and designed a second-order trust-region algorithm. Also notably is Wei [2015],\nwhich empirically demonstrated fast convergence of a so-called Kaczmarz stochastic algorithm. This\npaper is most closely related to Cand\u00e8s et al. [2015], Chen and Candes [2015], Zhang et al. [2016],\nbut develops a new gradient-like algorithm based on a lower-order nonsmooth (as well as nonconvex)\nloss function that yields advantageous statistical/computational ef\ufb01ciency.\nVarious algorithms have been proposed for minimizing a general nonconvex nonsmooth objective,\nsuch as gradient sampling algorithm Burke et al. [2005], Kiwiel [2007] and majorization-minimization\nmethod Ochs et al. [2015]. These algorithms were often shown to convergence to critical points\nwhich may be local minimizers or saddle points, without explicit characterization of convergence\nrate. In contrast, our algorithm is speci\ufb01cally designed for the phase retrieval problem, and can be\nshown to converge linearly to global optimum under appropriate initialization.\nThe advantage of nonsmooth loss function exhibiting in our study is analogous in spirit to that of\nthe recti\ufb01er activation function (of the form max{0,\u00b7}) in neural networks. It has been shown that\nrecti\ufb01ed linear unit (ReLU) enjoys superb advantage in reducing the training time Krizhevsky et al.\n[2012] and promoting sparsity Glorot et al. [2011] over its counterparts of sigmoid and hyperbolic\ntangent functions, in spite of non-linearity and non-differentiability at zero. Our result in fact also\ndemonstrates that a nonsmooth but simpler loss function yields improved performance.\n1.3 Paper Organization and Notations\n\nThe rest of this paper is organized as follows. Section 2 describes reshaped-WF algorithm in detail\nand establishes its performance guarantee. In particular, Section 2.2 provides intuition about why\nreshaped-WF is fast. Section 3 compares reshaped-WF with other competitive algorithms numerically.\nFinally, Section 4 concludes the paper with comments on future directions.\nThroughout the paper, boldface lowercase letters such as ai, x, z denote vectors, and boldface capital\nletters such as A, Y denote matrices. For two matrices, A (cid:22) B means that B\u2212 A is positive de\ufb01nite.\nThe indicator function 1A = 1 if the event A is true, and 1A = 0 otherwise. The Euclidean distance\nbetween two vectors up to a global sign difference is de\ufb01ned as dist(z, x) := min{(cid:107)z\u2212x(cid:107),(cid:107)z +x(cid:107)}.\n2 Algorithm and Performance Guarantee\nIn this paper, we wish to recover a signal x \u2208 Rn based on m measurements yi given by\n\n(4)\nwhere ai \u2208 Rn for i = 1,\u00b7\u00b7\u00b7 , m are known measurement vectors generated by Gaussian distribution\nN (0, I n\u00d7n). We focus on the real-valued case in analysis, but the algorithm designed below is\napplicable to the complex-valued case and the case with coded diffraction pattern (CDP) as we\ndemonstrate via numerical results in Section 3.\nWe design reshaped-WF (see Algorithm 1) for solving the above problem, which contains two stages:\nspectral initialization and gradient loop. Suggested values for parameters are \u03b1l = 1, \u03b1u = 5 and\n\u00b5 = 0.8. The scaling parameter in \u03bb0 and the conjugate transpose a\u2217\ni allow the algorithm readily\napplicable to complex and CDP cases. We next describe the two stages of the algorithm in detail in\nSections 2.1 and 2.2, respectively, and establish the convergence of the algorithm in Section 2.3.\n2.1\n\nInitialization via Spectral Method\n\nyi = |(cid:104)ai, x(cid:105)| ,\n\nfor i = 1,\u00b7\u00b7\u00b7 , m,\n\nWe \ufb01rst note that initialization can adopt the spectral initialization method for WF in Cand\u00e8s et al.\n[2015] or that for truncated-WF in Chen and Candes [2015], both of which are based on |a\u2217\ni x|2.\ni x| instead, and\nHere, we propose an alternative initialization in Algorithm 1 that uses magnitude |a\u2217\ntruncates samples with both lower and upper thresholds as in (5). We show that such initialization\nachieves smaller sample complexity than WF and the same order-level sample complexity as truncated-\nWF, and furthermore, performs better than both WF and truncated-WF numerically.\n\n3\n\n\fi=1, {ai}m\ni=1;\n\nAlgorithm 1 Reshaped Wirtinger Flow\nInput: y = {yi}m\nParameters: Lower and upper thresholds \u03b1l, \u03b1u for truncation in initialization, stepsize \u00b5;\nInitialization: Let z(0) = \u03bb0\u02dcz, where \u03bb0 =\nvector of\n\n(cid:1) and \u02dcz is the leading eigen-\n\nmn(cid:80)m\ni=1 (cid:107)ai(cid:107)1\n\n\u00b7(cid:0) 1\n\n(cid:80)m\n\ni=1 yi\n\nm\n\nY :=\nGradient loop: for t = 0 : T \u2212 1 do\n\n1\nm\n\nz(t+1) = z(t) \u2212 \u00b5\nm\n\nm(cid:88)\nm(cid:88)\n\ni=1\n\ni=1\n\nyiaia\u2217\n\ni 1{\u03b1l\u03bb0 0. The initialization step in\nAlgorithm 1 yields z(0) satisfying (cid:107)z(0) \u2212 x(cid:107) \u2264 \u03b4(cid:107)x(cid:107) with probability at least 1 \u2212 exp(\u2212c(cid:48)m\u00012),\nif m > C(\u03b4, \u0001)n, where C is a positive number only affected by \u03b4 and \u0001, and c(cid:48) is some positive\nconstant.\n\nFigure 1: Comparison of three initialization\nmethods with m = 6n and 50 iterations\nusing power method.\n\nFinally, Figure 1 demonstrates that reshaped-WF achieves better initialization accuracy in terms of\nthe relative error (cid:107)z(0)\u2212x(cid:107)\n\nthan WF and truncated-WF with Gaussian measurements.\n\n(cid:107)x(cid:107)\n\n2.2 Gradient Loop and Why Reshaped-WF is Fast\n\nThe gradient loop of Algorithm 1 is based on the loss function (3), which is rewritten below\n\nWe de\ufb01ne the update direction as\n\n\u2207(cid:96)(z) :=\n\n1\nm\n\nm(cid:88)\n(cid:0)|aT\ni z)(cid:1) ai =\n\n(cid:1)2\ni z| \u2212 yi\n(cid:18)\nm(cid:88)\n\ni=1\n\n.\n\n1\nm\n\ni=1\n\n(cid:96)(z) :=\n\n1\n2m\n\nm(cid:88)\n\n(cid:0)aT\n\ni z \u2212 yi \u00b7 sgn(aT\n\ni=1\n\n4\n\n(7)\n\n(cid:19)\n\nai,\n\n(8)\n\ni z \u2212 yi \u00b7 aT\ni z\naT\n|aT\ni z|\n\n010002000300040005000600070008000n: signal dimension0.60.70.80.911.1Relative error reshaped-WFtruncated-WFWF\fwhere sgn(\u00b7) is the sign function for nonzero arguments. We further set sgn(0) = 0 and 0|0| = 0.\nIn fact, \u2207(cid:96)(z) equals the gradient of the loss function (7) if aT\ni z (cid:54)= 0 for all i = 1, ..., m. For\nsamples with nonsmooth point, i.e., aT\ni z = 0, we adopt Fr\u00e9chet superdifferential Kruger [2003] for\nnonconvex function to set the corresponding gradient component to be zero (as zero is an element in\nFr\u00e9chet superdifferential). With abuse of terminology, we still refer to \u2207(cid:96)(z) in (8) as \u201cgradient\u201d for\nsimplicity, which rather represents the update direction in the gradient loop of Algorithm 1.\nWe next provide the intuition about why reshaped WF is fast. Suppose that the spectral method sets\nan initial point in the neighborhood of ground truth x. We compare reshaped-WF with the following\nproblem of solving x from linear equations yi = (cid:104)ai, x(cid:105) with yi and ai for i = 1, . . . , m given. In\nparticular, we note that this problem has both magnitude and sign observation of the measurements.\nFurther suppose that the least-squares loss is used and gradient descent is applied to solve this\nproblem. Then the gradient is given by\n\nLeast square gradient: \u2207(cid:96)LS(z) =\n\n1\nm\n\n(cid:0)aT\n\nm(cid:88)\n\ni=1\n\ni x(cid:1) ai.\n\ni z \u2212 aT\n\nWe now argue informally that the gradient (8) of reshaped-WF behaves similarly to the least-squares\ngradient (9). For each i, the two gradient components are close if |aT\ni z) is viewed as\nan estimate of aT\ni x. The following lemma (see Suppl. B.2 for the proof) shows that if dist(z, x) is\nsmall (guaranteed by initialization), then aT\nLemma 1. Let ai \u223c N (0, I n\u00d7n). For any given x and z satisfying (cid:107)x \u2212 z(cid:107) <\n\ni x|.\n(cid:107)x(cid:107), we have\n\ni z has the same sign with aT\n\nP{(aT\n\ni x)(aT\n\ni x)2 = t(cid:107)x(cid:107)2} \u2264 erfc\n\ni z) < 0(cid:12)(cid:12)(aT\n(cid:82) \u221e\nu exp(\u2212\u03c4 2)d\u03c4.\n\n\u03c0\n\nwhere h = z \u2212 x and erfc(u) := 2\u221a\n\n(9)\n\n(10)\n\ni x| \u00b7 sgn(aT\ni x for large |aT\n\u221a\n2\u22121\u221a\n(cid:18)\u221a\n\n(cid:19)\n\n2\n\nt(cid:107)x(cid:107)\n2(cid:107)h(cid:107)\n\n,\n\ni z so that the\ni x may have\ni z but contributes less to the gradient. Hence, overall the two gradients (8) and (9)\n\nIt is easy to observe in (10) that large aT\ni x is likely to have the same sign as aT\ncorresponding gradient components in (8) and (9) are likely equal, whereas small aT\ndifferent sign as aT\nshould be close to each other with a large probability.\nThis fact can be further veri\ufb01ed numerically. Figure 2(a) illustrates that reshaped-WF takes almost\nthe same number of iterations for recovering a signal (with only magnitude information) as the least-\nsquares gradient descent method for recovering a signal (with both magnitude and sign information).\n\n(a) Convergence behavior\n\n(b) Expected loss of reshaped-WF\n\n(c) Expected loss of WF\n\nFigure 2: Intuition of why reshaped-WF fast. (a) Comparison of convergence behavior between\nreshaped-WF and least-squares gradient descent. Initialization and parameters are the same for two\nmethods: n = 1000, m = 6n, step size \u00b5 = 0.8. (b) Expected loss function of reshaped-WF for\nx = [1 \u2212 1]T . (c) Expected loss function of WF for x = [1 \u2212 1]T .\n\nFigure 2(b) further illustrates that the expected loss surface of reshaped-WF (see Suppl. B for\nexpression) behaves similarly to a quadratic surface around the global optimums as compared to the\nexpected loss surface for WF (see Suppl. B for expression) in Figure 2(c).\n\n5\n\n020406080100Number of iterations10-2010-1510-1010-5100Relative errorLeast-squares gradientRWF-200.511.5-1-222.533.5-1.544.55-1z10-0.5z200.5111.5222020402608011001201.51401601z100.5z20-0.5-1-1-1.5-2-2\f2.3 Geometric Convergence of Reshaped-WF\n\nWe characterize the convergence of reshaped-WF in the following theorem.\nTheorem 1. Consider the problem of solving any given x \u2208 Rn from a system of equations (4) with\nGaussian measurement vectors. There exist some universal constants \u00b50 > 0 (\u00b50 can be set as 0.8 in\npractice), 0 < \u03c1, \u03bd < 1 and c0, c1, c2 > 0 such that if m \u2265 c0n and \u00b5 < \u00b50, then with probability at\nleast 1 \u2212 c1 exp(\u2212c2m), Algorithm 1 yields\n\ndist(z(t), x) \u2264 \u03bd(1 \u2212 \u03c1)t(cid:107)x(cid:107),\n\n\u2200t \u2208 N.\n\n(11)\n\nOutline of the Proof. We outline the proof here with details relegated to Suppl. C. Compared to WF\nand truncated-WF, our proof is much simpler due to the lower-order loss function that reshaped-WF\nrelies on.\nThe central idea is to show that within the neighborhood of global optimums, reshaped-WF satis\ufb01es\nthe Regularity Condition RC(\u00b5, \u03bb, c) Chen and Candes [2015], i.e.,\n(cid:107)h(cid:107)2\n\n(12)\nfor all z and h = z \u2212 x obeying (cid:107)h(cid:107) \u2264 c(cid:107)x(cid:107), where 0 < c < 1 is some constant. Then, as shown in\nChen and Candes [2015], once the initialization lands into this neighborhood, geometric convergence\ncan be guaranteed, i.e.,\n\n(cid:104)\u2207(cid:96)(z), h(cid:105) \u2265 \u00b5\n2\n\n(cid:107)\u2207(cid:96)(z)(cid:107)2 +\n\n\u03bb\n2\n\ndist2 (z + \u00b5\u2207(cid:96)(z), x) \u2264 (1 \u2212 \u00b5\u03bb)dist2(z, x),\n\n(13)\n\nfor any z with (cid:107)z \u2212 x(cid:107) \u2264 \u0001(cid:107)x(cid:107).\nLemmas 2 and 3 in Suppl.C yield that\n\n(cid:104)\u2207(cid:96)(z), h(cid:105) \u2265 (1 \u2212 0.26 \u2212 2\u0001)(cid:107)h(cid:107)2 = (0.74 \u2212 2\u0001)(cid:107)h(cid:107)2.\n\nAnd Lemma 4 in Suppl.C further yields that\n\n(14)\nTherefore, the above two bounds imply that Regularity Condition (12) holds for \u00b5 and \u03bb satisfying\n\n(cid:107)\u2207(cid:96)(z)(cid:107) \u2264 (1 + \u03b4) \u00b7 2(cid:107)h(cid:107).\n\n0.74 \u2212 2\u0001 \u2265 \u00b5\n2\n\n\u00b7 4(1 + \u03b4)2 +\n\n\u03bb\n2\n\n.\n\n(15)\n\nWe note that (15) implies an upper bound \u00b5 \u2264 0.74\n2 = 0.37, by taking \u0001 and \u03b4 to be suf\ufb01ciently small.\nThis suggests a range to set the step size in Algorithm 1. However, in practice, \u00b5 can be set much\nlarger than such a bound, say 0.8, while still keeping the algorithm convergent. This is because the\ncoef\ufb01cients in the proof are set for convenience of proof rather than being tightly chosen.\nTheorem 1 indicates that reshaped-WF recovers the true signal with O(n) samples, which is order-\nlevel optimal. Such an algorithm improves the sample complexity O(n log n) of WF. Furthermore,\nreshaped-WF does not require truncation of weak samples in the gradient step to achieve the same\nsample complexity as truncated-WF. This is mainly because reshaped-WF bene\ufb01ts from the lower-\norder loss function given in (7), the curvature of which behaves similarly to the least-squares loss\nfunction locally as we explain in Section 2.2.\nTheorem 1 also suggests that reshaped-WF converges geometrically at a constant step size. To\nreach \u0001\u2212accuracy, it requires computational cost of O(mn log 1/\u0001) \ufb02ops, which is better than WF\n(O(mn2 log(1/\u0001)). Furthermore, it does not require truncation in gradient steps to reach the same\ncomputational cost as truncated-WF. Numerically, as we demonstrate in Section 3, reshaped-WF is\ntwo times faster than truncated-WF and four to six times faster than WF in terms of both iteration\ncount and time cost in various examples.\nAlthough our focus in this paper is on the noise-free model, reshaped-WF can be applied to noisy\nmodels as well. Suppose the measurements are corrupted by bounded noises {\u03b7i}m\ni=1 satisfying\n(cid:107)\u03b7(cid:107)/\nm \u2264 c(cid:107)x(cid:107). Then by adapting the proof of Theorem 1, it can be shown that the gradient loop\nof reshaped-WF is robust such that\n\n\u221a\n\ndist(z(t), x) (cid:46) (cid:107)\u03b7(cid:107)\u221a\n\n(16)\nfor some \u03c1 \u2208 (0, 1). The numerical result under the Poisson noise model in Section 3 further\ncorroborates the stability of reshaped-WF.\n\nm\n\n+ (1 \u2212 \u03c1)t(cid:107)x(cid:107),\n\n\u2200t \u2208 N,\n\n6\n\n\fTable 1: Comparison of iteration count and time cost among algorithms (n = 1000, m = 8n)\n\nreal case\n\ncomplex case\n\nAlgorithms\niterations\ntime cost(s)\niterations\ntime cost(s)\n\nreshaped-WF\n72\n0.477\n272.7\n6.956\n\ntruncated-WF WF\n182\n1.232\n486.7\n12.815\n\n319.2\n2.104\n915.4\n23.306\n\nAltMinPhase\n5.8\n0.908\n156\n93.22\n\n3 Numerical Comparison with Other Algorithms\n\nIn this section, we demonstrate the numerical ef\ufb01ciency of reshaped-WF by comparing its perfor-\nmance with other competitive algorithms. Our experiments are run not only for real-valued case but\nalso for complex-valued and CDP cases. All the experiments are implemented in Matlab 2015b and\ncarried out on a computer equipped with Intel Core i7 3.4GHz CPU and 12GB RAM.\nWe \ufb01rst compare the sample complexity of reshaped-WF with those of truncated-WF and WF\nvia the empirical successful recovery rate versus the number of measurements. For reshaped-WF,\nwe follow Algorithm 1 with suggested parameters. For truncated-WF and WF, we use the codes\nprovided in the original papers with the suggested parameters. We conduct the experiment for real,\ncomplex and CDP cases respectively. For real and complex cases, we set the signal dimension n\nto be 1000, and the ratio m/n take values from 2 to 6 by a step size 0.1. For each m, we run 100\ntrials and count the number of successful trials. For each trial, we run a \ufb01xed number of iterations\nT = 1000 for all algorithms. A trial is declared to be successful if z(T ), the output of the algorithm,\nsatis\ufb01es dist(z(T ), x)/(cid:107)x(cid:107) \u2264 10\u22125. For the real case, we generate signal x \u223c N (0, I n\u00d7n), and the\nmeasurement vectors ai \u223c N (0, I n\u00d7n) i.i.d. for i = 1, . . . , m. For the complex case, we generate\nsignal x \u223c N (0, I n\u00d7n) + jN (0, I n\u00d7n) and measurements ai \u223c 1\n2N (0, I n\u00d7n)\ni.i.d. for i = 1, . . . , m. For the CDP case, we generate signal x \u223c N (0, I n\u00d7n) + jN (0, I n\u00d7n) that\nyields measurements\n\n2N (0, I n\u00d7n) + j 1\n\ny(l) = |F D(l)x|,\n\n1 \u2264 l \u2264 L,\n\n(17)\n\nwhere F represents the discrete Fourier transform (DFT) matrix, and D(l) is a diagonal matrix\n(mask). We set n = 1024 for convenience of FFT and m/n = L = 1, 2, . . . , 8. All other settings are\nthe same as those for the real case.\n\n(a) Real case\n\n(b) Complex case\n\n(c) CDP case\n\nFigure 3: Comparison of sample complexity among reshaped-WF, truncated-WF and WF.\n\nFigure 3 plots the fraction of successful trials out of 100 trials for all algorithms, with respect to m. It\ncan be seen that for although reshaped-WF outperforms only WF (not truncated-WF) for the real\ncase, it outperforms both WF and truncated-WF for complex and CDP cases. An intuitive explanation\nfor the real case is that a substantial number of samples with small |aT\ni z| can deviate gradient so\nthat truncation indeed helps to stabilize the algorithm if the number of measurements is not large.\nFurthermore, reshaped-WF exhibits shaper transition than truncated-WF and WF.\nWe next compare the convergence rate of reshaped-WF with those of truncated-WF, WF and AltMin-\nPhase. We run all of the algorithms with suggested parameter settings in the original codes. We gener-\nate signal and measurements in the same way as those in the \ufb01rst experiment with n = 1000, m = 8n.\nAll algorithms are seeded with reshaped-WF initialization. In Table 1, we list the number of iterations\nand time cost for those algorithms to achieve the relative error of 10\u221214 averaged over 10 trials.\nClearly, reshaped-WF takes many fewer iterations as well as runing much faster than truncated-WF\nand WF. Although reshaped-WF takes more iterations than AltMinPhase, it runs much faster than\n\n7\n\n2n3n4n5n6nm: Number of measurements (n=1000)00.20.40.60.81Empirical success ratereshaped-WFtruncated-WFWF2n3n4n5n6nm: Number of measurements (n=1000)00.20.40.60.81Empirical success ratereshaped-WFtruncated-WFWF1n2n3n4n5n6n7n8nm: Number of measurements (n=1024)00.20.40.60.81Empirical success ratereshaped-WFtruncated-WFWF\fAltMinPhase due to the fact that each iteration of AltMinPhase needs to solve a least-squares problem\nthat takes much longer time than a simple gradient update in reshaped-WF.\nWe also compare the performance of the above algorithms on the recovery of a real image from the\nFourier intensity measurements (2D CDP with the number of masks L = 16). The image (provided in\nSuppl.D) is the Milky Way Galaxy with resolution 1920\u00d7 1080. Table 2 lists the number of iterations\nand time cost of the above four algorithms to achieve the relative error of 10\u221215. It can be seen that\nreshaped-WF outperforms all other three algorithms in computational time cost. In particular, it is\ntwo times faster than truncated-WF and six times faster than WF in terms of both the number of\niterations and computational time cost.\n\nTable 2: Comparison of iterations and time cost among algorithms on Galaxy image (L = 16)\n\nAlgorithms\niterations\ntime cost(s)\n\nreshaped-WF\n65\n141\n\ntruncated-WF WF AltMinPhase\n160\n567\n\n420\n998\n\n110\n213\n\nWe next demonstrate the robustness of reshaped-WF to noise corruption and compare it with truncated-\nWF. We consider the phase retrieval problem in imaging applications, where random Poisson noises\nare often used to model the sensor and electronic noise Fogel et al. [2013]. Speci\ufb01cally, the noisy\nmeasurements of intensity can be expressed as yi =\nfor i = 1, 2, ...m\nwhere \u03b1 denotes the level of input noise, and Poisson(\u03bb) denotes a random sample generated by the\nPoisson distribution with mean \u03bb. It can be observed from Figure 4 that reshaped-WF performs better\nthan truncated-WF in terms of recovery accuracy under different noise levels.\n\n\u03b1 \u00b7 Poisson(cid:0)|aT\n\ni x|2/\u03b1(cid:1),\n\n(cid:113)\n\nFigure 4: Comparison of relative error under Poisson noise between reshaped-WF and truncated WF.\n\n4 Conclusion\n\nIn this paper, we proposed reshaped-WF to recover a signal from a quadratic system of equations,\nbased on a nonconvex and nonsmooth quadratic loss function of absolute values of measurements.\nThis loss function sacri\ufb01ces the smoothness but enjoys advantages in statistical and computational\nef\ufb01ciency. It also has potential to be extended in various scenarios. One interesting direction is to\nextend such an algorithm to exploit signal structures (e.g., nonnegativity, sparsity, etc) to assist the\nrecovery. The lower-order loss function may offer great simplicity to prove performance guarantee in\nsuch cases. Another interesting topic is to study stochastic version of reshaped-WF. We have observed\nin preliminary experiments that the stochastic version of reshaped-WF converges fast numerically.\nIt will be of great interest to fully understand the theoretic performance of such an algorithm and\nexplore the reason behind its fast convergence.\n\nAcknowledgments\n\nThis work is supported in part by the grants AFOSR FA9550-16-1-0077 and NSF ECCS 16-09916.\n\n8\n\n0102030405060Number of iterations10-410-310-210-1100Relative errorreshaped-WFtruncated-WFnoise level ,=1noise level,=0.001\fReferences\nJ. V. Burke, A. S. Lewis, and M. L. Overton. A robust gradient sampling algorithm for nonsmooth, nonconvex\n\noptimization. SIAM Journal on Optimization, 15(3):751\u2013779, 2005.\n\nT. T. Cai, X. Li, and Z. Ma. Optimal rates of convergence for noisy sparse phase retrieval via thresholded\n\nwirtinger \ufb02ow. arXiv preprint arXiv:1506.03382, 2015.\n\nE. J. Cand\u00e8s, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude\nmeasurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241\u2013\n1274, 2013.\n\nE. J. Cand\u00e8s, X. Li, and M. Soltanolkotabi. Phase retrieval via wirtinger \ufb02ow: Theory and algorithms. IEEE\n\nTransactions on Information Theory, 61(4):1985\u20132007, 2015.\n\nA. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-only measurements.\n\nProblems, 27(1), 2011.\n\nInverse\n\nY. Chen and E. Candes. Solving random quadratic systems of equations is nearly as easy as solving linear\n\nsystems. In Advances in Neural Information Processing Systems (NIPS). 2015.\n\nJ. D. Donahue. Products and quotients of random variables and their applications. Technical report, DTIC\n\nDocument, 1964.\n\nJ. R. Fienup. Phase retrieval algorithms: a comparison. Applied Optics, 21(15):2758\u20132769, 1982.\n\nF. Fogel, I. Waldspurger, and A. d\u2019Aspremont. Phase retrieval for imaging problems. arXiv preprint\n\narXiv:1304.7735, 2013.\n\nR. W. Gerchberg. A practical algorithm for the determination of phase from image and diffraction plane pictures.\n\nOptik, 35:237, 1972.\n\nX. Glorot, A. Bordes, and Y. Bengio. Deep sparse recti\ufb01er neural networks. In International Conference on\n\nArti\ufb01cial Intelligence and Statistics (AISTATS), 2011.\n\nD. Gross, F. Krahmer, and R. Kueng. Improved recovery guarantees for phase retrieval from coded diffraction\n\npatterns. Applied and Computational Harmonic Analysis, 2015.\n\nK. C. Kiwiel. Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM\n\nJournal on Optimization, 18(2):379\u2013388, 2007.\n\nA. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classi\ufb01cation with deep convolutional neural networks.\n\nIn Advances in neural information processing systems (NIPS), 2012.\n\nA. Y. Kruger. On fr\u00e9chet subdifferentials. Journal of Mathematical Sciences, 116(3):3325\u20133358, 2003.\n\nP. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating minimization. Advances in Neural\n\nInformation Processing Systems (NIPS), 2013.\n\nP. Ochs, A. Dosovitskiy, T. Brox, and T. Pock. On iteratively reweighted algorithms for nonsmooth nonconvex\n\noptimization in computer vision. SIAM Journal on Imaging Sciences, 8(1):331\u2013372, 2015.\n\nY. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev. Phase retrieval with application\n\nto optical imaging: a contemporary overview. IEEE Signal Processing Magazine, 32(3):87\u2013109, 2015.\n\nJ. Sun, Q. Qu, and J. Wright. A geometric analysis of phase retrieval. arXiv preprint arXiv:1602.06664, 2016.\n\nR. Vershynin. Introduction to the non-asymptotic analysis of random matrices. Compressed Sensing, Theory\n\nand Applications, pages 210 \u2013 268, 2012.\n\nI. Waldspurger, A. d\u2019Aspremont, and S. Mallat. Phase recovery, maxcut and complex semide\ufb01nite programming.\n\nMathematical Programming, 149(1-2):47\u201381, 2015.\n\nK. Wei. Solving systems of phaseless equations via kaczmarz methods: a proof of concept study. Inverse\n\nProblems, 31(12):125008, 2015.\n\nH. Zhang, Y. Chi, and Y. Liang. Provable non-convex phase retrieval with outliers: Median truncated wirtinger\n\n\ufb02ow. arXiv preprint arXiv:1603.03805, 2016.\n\n9\n\n\f", "award": [], "sourceid": 1358, "authors": [{"given_name": "Huishuai", "family_name": "Zhang", "institution": "Syracuse University"}, {"given_name": "Yingbin", "family_name": "Liang", "institution": "Syracuse University"}]}