{"title": "Block Coordinate Regularization by Denoising", "book": "Advances in Neural Information Processing Systems", "page_first": 382, "page_last": 392, "abstract": "We consider the problem of estimating a vector from its noisy measurements using a prior specified only through a denoising function. Recent work on plug-and-play priors (PnP) and regularization-by-denoising (RED) has shown the state-of-the-art performance of estimators under such priors in a range of imaging tasks. In this work, we develop a new block coordinate RED algorithm that decomposes a large-scale estimation problem into a sequence of updates over a small subset of the unknown variables. We theoretically analyze the convergence of the algorithm and discuss its relationship to the traditional proximal optimization. Our analysis complements and extends recent theoretical results for RED-based estimation methods. We numerically validate our method using several denoiser priors, including those based on convolutional neural network (CNN) denoisers.", "full_text": "Block Coordinate Regularization by Denoising\n\nYu Sun\n\nWashington University in St. Louis\n\nsun.yu@wustl.edu\n\nJiaming Liu\n\nWashington University in St. Louis\n\njiaming.liu@wustl.edu\n\nUlugbek S. Kamilov\n\nWashington University in St. Louis\n\nkamilov@wustl.edu\n\nAbstract\n\nWe consider the problem of estimating a vector from its noisy measurements\nusing a prior speci\ufb01ed only through a denoising function. Recent work on plug-\nand-play priors (PnP) and regularization-by-denoising (RED) has shown the state-\nof-the-art performance of estimators under such priors in a range of imaging\ntasks.\nIn this work, we develop a new block coordinate RED algorithm that\ndecomposes a large-scale estimation problem into a sequence of updates over a\nsmall subset of the unknown variables. We theoretically analyze the convergence of\nthe algorithm and discuss its relationship to the traditional proximal optimization.\nOur analysis complements and extends recent theoretical results for RED-based\nestimation methods. We numerically validate our method using several denoiser\npriors, including those based on convolutional neural network (CNN) denoisers.\n\n1\n\nIntroduction\n\nProblems involving estimation of an unknown vector x \u2208 Rn from a set of noisy measurements\ny \u2208 Rm are important in many areas, including machine learning, image processing, and compressive\nsensing. Consider the scenario in Fig. 1, where a vector x \u223c px passes through the measurement\nchannel py|x to produce the measurement vector y. When the estimation problem is ill-posed, it\nbecomes essential to include the prior px in the estimation process. However, in high-dimensional\nsettings, it is dif\ufb01cult to directly obtain the true prior px for certain signals (such as natural images)\nand one is hence restricted to various indirect sources of prior information on x. This paper considers\nthe cases where the prior information on x is speci\ufb01ed only via a denoising function, D : Rn \u2192 Rn,\ndesigned for the removal of additive white Gaussian noise (AWGN).\nThere has been considerable recent interest in leveraging denoisers as priors for the recovery of\nx. One popular strategy, known as plug-and-play priors (PnP) [1], extends traditional proximal\noptimization [2] by replacing the proximal operator with a general off-the-shelf denoiser. It has been\nshown that the combination of proximal algorithms with advanced denoisers, such as BM3D [3]\nor DnCNN [4], leads to the state-of-the-art performance for various imaging problems [5\u201315]. A\nsimilar strategy has also been adopted in the context of a related class of algorithms known as\napproximate message passing (AMP) [16\u201319]. Regularization-by-denoising (RED) [20], and the\nclosely related deep mean-shift priors [21], represent an alternative, in which the denoiser is used to\nspecify an explicit regularizer that has a simple gradient. More recent work has clari\ufb01ed the existence\nof explicit RED regularizers [22], demonstrated its excellent performance on phase retrieval [23],\nand further boosted its performance in combination with a deep image prior [24]. In short, the use\nof advanced denoisers has proven to be essential for achieving the state-of-the-art results in many\ncontexts. However, solving the corresponding estimation problem is still a signi\ufb01cant computational\nchallenge, especially in the context of high-dimensional vectors x, typical in modern applications.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: The estimation problem considered in this work. The vector x \u2208 Rn, with a prior px(x),\npasses through the measurement channel py|x(y|x) to result in the measurements y \u2208 Rm. The\nestimation algorithm fD(y) does not have a direct access to the prior, but can rely on a denoising\nfunction D : Rn \u2192 Rn, speci\ufb01cally designed for the removal of additive white Gaussian noise\n(AWGN). We propose block coordinate RED as a scalable algorithm for obtaining x given y and D.\n\nIn this work, we extend the current family of RED algorithms by introducing a new block coordinate\nRED (BC-RED) algorithm. The algorithm relies on random partial updates on x, which makes it\nscalable to vectors that would otherwise be prohibitively large for direct processing. Additionally,\nas we shall see, the overall computational complexity of BC-RED can sometimes be lower than\ncorresponding methods operating on the full vector. This behavior is consistent with the traditional\ncoordinate descent methods that can outperform their full gradient counterparts by being able to\nbetter reuse local updates and take larger steps [25\u201329]. We present two theoretical results related\nto BC-RED. We \ufb01rst theoretically characterize the convergence of the algorithm under a set of\ntransparent assumptions on the data-\ufb01delity and the denoiser. Our analysis complements the recent\ntheoretical analysis of full-gradient RED algorithms in [22] by considering block-coordinate updates\nand establishing the explicit worst-case convergence rate. Our second result establishes backward\ncompatibility of BC-RED with the traditional proximal optimization. We show that when the denoiser\ncorresponds to a proximal operator, BC-RED can be interpreted as an approximate MAP estimator,\nwhose approximation error can be made arbitrarily small. To the best of our knowledge, this explicit\nlink with proximal optimization is missing in the current literature on RED. BC-RED thus provides\na \ufb02exible, scalable, and theoretically sound algorithm applicable to a wide variety of large-scale\nestimation problems. We demonstrate BC-RED on image recovery from linear measurements using\nseveral denoising priors, including those based on convolutional neural network (CNN) denoisers.\nAll proofs and some technical details that have been omitted for space appear in the full paper [30]\nthat also provides more background and simulations.\n\n2 Background\n\nIt is common to formulate the estimation in Figure 1 as an optimization problem\n\nf (x) with f (x) = g(x) + h(x),\n\n(1)\n\nx\u2208Rn\n\n(cid:98)x = arg min\n\nwhere g is the data-\ufb01delity term and h is the regularizer. For example, the maximum a posteriori\nprobability (MAP) estimator is obtained by setting\n\ng(x) = \u2212log(py|x(y|x))\n\nand h(x) = \u2212log(px(x)),\n\nwhere py|x is the likelihood that depends on y and px is the prior. One of the most popular data-\n\ufb01delity terms is least-squares g(x) = 1\n2, which assumes a linear measurement model\nunder AWGN. Similarly, one of the most popular regularizers is based on a sparsity-promoting penalty\nh(x) = \u03c4(cid:107)Dx(cid:107)1, where D is a linear transform and \u03c4 > 0 is the regularization parameter [31\u201334].\nMany widely used regularizers, including the ones based on the (cid:96)1-norm, are nondifferentiable.\nProximal algorithms [2], such as the proximal-gradient method (PGM) [35\u201338] and alternating\ndirection method of multipliers (ADMM) [39\u201342], are a class of optimization methods that can\ncircumvent the need to differentiate nonsmooth regularizers by using the proximal operator\n\n2(cid:107)y \u2212 Ax(cid:107)2\n\nprox\u00b5h(z) := arg min\n\nx\u2208Rn (cid:26) 1\n\n2(cid:107)x \u2212 z(cid:107)2\n\n2 + \u00b5h(x)(cid:27) , \u00b5 > 0,\n\nz \u2208 Rn.\n\n(2)\n\nThe observation that the proximal operator can be interpreted as the MAP denoiser for AWGN has\nprompted the development of PnP [1], where the proximal operator prox\u00b5h(\u00b7), within ADMM or\nPGM, is replaced with a more general denoising function D(\u00b7).\n\n2\n\npx(x)AAACCnicbVDLSsNAFL3xWesr1qWbYBHqpiRV0GXBjcsK9gFtCJPppB06MwkzE2kJ+QO/wa2u3Ylbf8Klf+L0sbCtBy73cM693MsJE0aVdt1va2Nza3tnt7BX3D84PDq2T0otFacSkyaOWSw7IVKEUUGammpGOokkiIeMtMPR3dRvPxGpaCwe9SQhPkcDQSOKkTZSYJeSIOuFPBvneWXeLwO77FbdGZx14i1IGRZoBPZPrx/jlBOhMUNKdT030X6GpKaYkbzYSxVJEB6hAekaKhAnys9mv+fOhVH6ThRLU0I7M/XvRoa4UhMemkmO9FCtelPxP6+b6ujWz6hIUk0Enh+KUubo2JkG4fSpJFiziSEIS2p+dfAQSYS1iWvpSshzk4m3msA6adWq3lW19nBdrtcW6RTgDM6hAh7cQB3uoQFNwDCGF3iFN+vZerc+rM/56Ia12DmFJVhfv3pGmzc=x2RnAAACD3icbVC7TsMwFHXKq5RXgYGBxaJCYqqSggRjJRbGguhDakJlu25r1XYi20FUUT6Cb2CFmQ2x8gmM/AlOm4G2HOlKR+fcq3vvwRFn2rjut1NYWV1b3yhulra2d3b3yvsHLR3GitAmCXmoOhhpypmkTcMMp51IUSQwp208vs789iNVmoXy3kwiGgg0lGzACDJW6pWPfCySpxT6TEJfIDPCOLlLH6xTcavuFHCZeDmpgByNXvnH74ckFlQawpHWXc+NTJAgZRjhNC35saYRImM0pF1LJRJUB8n0gRSeWqUPB6GyJQ2cqn8nEiS0nghsO7Mb9aKXif953dgMroKEySg2VJLZokHMoQlhlgbsM0WJ4RNLEFHM3grJCClEjM1sbgsWqc3EW0xgmbRqVe+8Wru9qNRreTpFcAxOwBnwwCWogxvQAE1AQApewCt4c56dd+fD+Zy1Fpx85hDMwfn6BeTJnQk=py|x(y|x)AAACHHicbZDLSsNAFIYn9VbrLerSzWAV6qYkVdBlwY3LCvYCbQiT6aQdOpOEmYkYYl7Ah/AZ3OranbgVXPomTtosbOsPAx//OYdz5vciRqWyrG+jtLK6tr5R3qxsbe/s7pn7Bx0ZxgKTNg5ZKHoekoTRgLQVVYz0IkEQ9xjpepPrvN69J0LSMLhTSUQcjkYB9SlGSluueRK56cDjaZLBR5jDQ5bVFowz16xadWsquAx2AVVQqOWaP4NhiGNOAoUZkrJvW5FyUiQUxYxklUEsSYTwBI1IX2OAOJFOOv1NBk+1M4R+KPQLFJy6fydSxKVMuKc7OVJjuVjLzf9q/Vj5V05KgyhWJMCzRX7MoAphHg0cUkGwYokGhAXVt0I8RgJhpQOc2+LxTGdiLyawDJ1G3T6vN24vqs1GkU4ZHIFjUAM2uARNcANaoA0weAIv4BW8Gc/Gu/FhfM5aS0YxcwjmZHz9AlpSops=y2RmAAACD3icbVC7TsMwFHV4lvIKMDCwWFRITFVSkGCsxMJYEH1ITahs122t2k5kO0hRlI/gG1hhZkOsfAIjf4LbZqAtR7rS0Tn36t57cMyZNp737aysrq1vbJa2yts7u3v77sFhS0eJIrRJIh6pDkaaciZp0zDDaSdWFAnMaRuPbyZ++4kqzSL5YNKYhgINJRswgoyVeu5xgEWW5jBgEgYCmRHG2X3+KHpuxat6U8Bl4hekAgo0eu5P0I9IIqg0hCOtu74XmzBDyjDCaV4OEk1jRMZoSLuWSiSoDrPpAzk8s0ofDiJlSxo4Vf9OZEhonQpsOyc36kVvIv7ndRMzuA4zJuPEUElmiwYJhyaCkzRgnylKDE8tQUQxeyskI6QQMTazuS1Y5DYTfzGBZdKqVf2Lau3uslKvFemUwAk4BefAB1egDm5BAzQBATl4Aa/gzXl23p0P53PWuuIUM0dgDs7XL+TbnQk=unknown priorunknown vectorknown likelihoodavailable meas.estimation algorithmMAP denoiserMMSE denoiserCNN denoiserAWGN denoisersavailable denoiserD(\u00b7)AAACCXicbVDLSsNAFJ3UV62vVJduBotQNyWpgi4LunBZwT6gCWUynbRDZzJhZqKUkC/wG9zq2p249Stc+idO0yy09cCFwzn3cg8niBlV2nG+rNLa+sbmVnm7srO7t39gVw+7SiQSkw4WTMh+gBRhNCIdTTUj/VgSxANGesH0eu73HohUVET3ehYTn6NxREOKkTbS0K56HOmJCtObrO7hkdBnQ7vmNJwccJW4BamBAu2h/e2NBE44iTRmSKmB68TaT5HUFDOSVbxEkRjhKRqTgaER4kT5aR49g6dGGcFQSDORhrn6+yJFXKkZD8xmHnTZm4v/eYNEh1d+SqM40STCi0dhwqAWcN4DHFFJsGYzQxCW1GSFeIIkwtq09edLwDPTibvcwCrpNhvueaN5d1FrNYt2yuAYnIA6cMElaIFb0AYdgMEjeAYv4NV6st6sd+tjsVqyipsj8AfW5w/ws5pRbx=fD(y)AAACIXicbVDLSsNAFJ3UV62vqEs3g0VoNyWpgm6Egi5cVrAPaEqZTCbt0JkkzEzUEPINfoTf4FbX7sSduPJPnLZZ2NYDFw7n3Mu997gRo1JZ1pdRWFldW98obpa2tnd298z9g7YMY4FJC4csFF0XScJoQFqKKka6kSCIu4x03PHVxO/cEyFpGNypJCJ9joYB9SlGSksDs+o8UI+MkEodl6ePWQYvoT9IHY7USPrpdZZVJkaSVQdm2apZU8BlYuekDHI0B+aP44U45iRQmCEpe7YVqX6KhKKYkazkxJJECI/RkPQ0DRAnsp9OX8rgiVY86IdCV6DgVP07kSIuZcJd3Tk9ddGbiP95vVj5F/2UBlGsSIBni/yYQRXCST7Qo4JgxRJNEBZU3wrxCAmElU5xbovLM52JvZjAMmnXa/ZprX57Vm7U83SK4AgcgwqwwTlogBvQBC2AwRN4Aa/gzXg23o0P43PWWjDymUMwB+P7F67bpOY=\fConsider the following alternative to PnP that also relies on a denoising function [20, 21]\n\nxt \u2190 xt\u22121 \u2212 \u03b3(cid:0)\u2207g(xt\u22121) + H(xt\u22121)(cid:1) where H(x) := \u03c4 (x \u2212 D(x)),\n\n(3)\nUnder some conditions on the denoiser, it is possible to relate H(\u00b7) in (3) to some explicit regular-\nization function h. For example, when the denoiser is locally homogeneous and has a symmetric\nJacobian [20, 22], the operator H(\u00b7) corresponds to the gradient of the following function\n\n\u03c4 > 0.\n\nh(x) =\n\nxT(x \u2212 D(x)).\n\n(4)\n\n\u03c4\n2\n\nOn the other hand, when the denoiser corresponds to the minimum mean squared error (MMSE)\nestimator D(z) = E[x|z] for the AWGN denoising problem [21, 22], z = x + e, with x \u223c px(x)\nand e \u223c N (0, \u03c32I), the operator H(\u00b7) corresponds to the gradient of\npz(x) = (px \u2217 pe)(x) =(cid:90)Rn\n\nh(x) = \u2212\u03c4 \u03c32log(pz(x)),\n\npx(z)\u03c6\u03c3(x \u2212 z) dz,\n\nwhere \u03c6\u03c3 is the Gaussian probability density function of variance \u03c32 and \u2217 denotes convolution. In\nthis paper, we will use the term RED to denote all methods seeking the \ufb01xed points of (3). The key\nbene\ufb01ts of the RED methods [20\u201324] are their explicit separation of the forward model from the\nprior, their ability to accommodate powerful denoisers (such as the ones based on CNNs) without\ndifferentiating them, and their state-of-the-art performance on a number of imaging tasks. The next\nsection further extends the scalability of RED by designing a new block coordinate RED algorithm.\n\n(5)\n\n3 Block Coordinate RED\n\nAll the current RED algorithms operate on vectors in Rn. We propose BC-RED, shown in Algorithm 1,\nto allow for partial randomized updates on x. Consider the decomposition of Rn into b \u2265 1 subspaces\n\nRn = Rn1 \u00d7 Rn2 \u00d7 \u00b7\u00b7\u00b7 \u00d7 Rnb with n = n1 + n2 + \u00b7\u00b7\u00b7 + nb.\n\nx =\n\nFor each i \u2208 {1, . . . , b}, we de\ufb01ne the matrix Ui : Rni \u2192 Rn that injects a vector in Rni into Rn and\ni that extracts the ith block from a vector in Rn. Then, for any x = (x1, . . . , xb) \u2208 Rn\nits transpose UT\nb(cid:88)i=1\n\nb(cid:88)i=1\nNote that (6) directly implies the norm preservation (cid:107)x(cid:107)2\n2 for any x \u2208 Rn. We\nare interested in a block-coordinate algorithm that uses only a subset of operator outputs corresponding\nto coordinates in some block i \u2208 {1, . . . , b}. Hence, for an operator G : Rn \u2192 Rn, we de\ufb01ne the\nblock-coordinate operator Gi : Rn \u2192 Rni as\n\ni = 1, . . . , b \u21d4\n2 = (cid:107)x1(cid:107)2\n\nUixi with xi = UT\n\n2 +\u00b7\u00b7\u00b7+(cid:107)xb(cid:107)2\n\ni x \u2208 Rni,\n\nUiUT\n\ni = I.\n\n(6)\n\nWe introduce the following BC-RED algorithm.\n\nGi(x) := [G(x)]i = UT\n\ni G(x) \u2208 Rni, x \u2208 Rn.\n\n(7)\n\nAlgorithm 1 Block Coordinate Regularization by Denoising (BC-RED)\n1: input: initial value x0 \u2208 Rn, parameter \u03c4 > 0, and step-size \u03b3 > 0.\n2: for k = 1, 2, 3, . . . do\n3:\n4:\n\nChoose an index ik \u2208 {1, . . . , b}\nxk \u2190 xk\u22121 \u2212 \u03b3Uik Gik (xk\u22121)\n\nwhere Gi(x) := UT\n\ni G(x) with G(x) := \u2207g(x) + \u03c4 (x \u2212 D(x)).\n\n5: end for\n\nNote that when b = 1, we have n = n1 and U1 = UT\npaper is also applicable to the full-gradient RED algorithm in (3).\nAs with traditional coordinate descent methods (see [28] for a review), BC-RED can be implemented\nusing different block selection strategies. The strategy adopted for our theoretical analysis selects\nblock indices ik as i.i.d. random variables distributed uniformly over {1, . . . , b}. An alternative is to\n\n1 = I. Hence, the theoretical analysis in this\n\n3\n\n\fproceed in epochs of b consecutive iterations, where at the start of each epoch the set {1, . . . , b} is\nreshuf\ufb02ed, and ik is then selected consecutively from this ordered set. We numerically compare the\nconvergence of both BC-RED variants in Section 5.\nBC-RED updates its iterates one randomly picked block at a time using the output of G. When the\nalgorithm converges, it converges to the vectors in the zero set of G\n\nG(x\u2217) = \u2207g(x\u2217) + \u03c4 (x\u2217 \u2212 D(x\u2217)) = 0 \u21d4 x\u2217 \u2208 zer(G) := {x \u2208 Rn : G(x) = 0}.\n\nConsider the following two sets\n\nzer(\u2207g) := {x \u2208 Rn : \u2207g(x) = 0}\n\nand \ufb01x(D) := {x \u2208 Rn : x = D(x)},\n\nwhere zer(\u2207g) is the set of all critical points of the data-\ufb01delity and \ufb01x(D) is the set of all \ufb01xed\npoints of the denoiser. Intuitively, the \ufb01xed points of D correspond to all the vectors that are not\ndenoised, and therefore can be interpreted as vectors that are noise-free according to the denoiser.\nNote that if x\u2217 \u2208 zer(\u2207g) \u2229 \ufb01x(D), then G(x\u2217) = 0 and x\u2217 is one of the solutions of BC-RED.\nHence, any vector that is consistent with the data for a convex g and noiseless according to D is in\nthe solution set. On the other hand, when zer(\u2207g) \u2229 \ufb01x(D) = \u2205, then x\u2217 \u2208 zer(G) corresponds to a\ntradeoff between the two sets, explicitly controlled via \u03c4 > 0. This explicit control is one of the key\ndifferences between RED and PnP.\nBC-RED bene\ufb01ts from considerable \ufb02exibility compared to the full-gradient RED. Since each update\nis restricted to only one block of x, the algorithm is suitable for parallel implementations and can deal\nwith problems where the vector x is distributed in space and in time. However, the maximal bene\ufb01t\nof BC-RED is achieved when Gi is ef\ufb01cient to evaluate. Fortunately, it was systematically shown\nin [43] that many operators\u2014common in machine learning, image processing, and compressive\nsensing\u2014admit coordinate friendly updates.\nFor a speci\ufb01c example, consider the least-squares data-\ufb01delity g and a block-wise denoiser D. De\ufb01ne\nthe residual vector r(x) := Ax \u2212 y and consider a single iteration of BC-RED that produces x+ by\nupdating the ith block of x. Then, the update direction and the residual update can be computed as\n(8)\nwhere Ai \u2208 Rm\u00d7ni is a submatrix of A consisting of the columns corresponding to the ith block. In\nmany problems of practical interest [43], the complexity of working with Ai is roughly b times lower\nthan with A. Also, many advanced denoisers can be effectively applied on image patches rather\nthan on the full image [44\u201346]. Therefore, in such settings, the speed of b iterations of BC-RED is\nexpected to be at least comparable to a single iteration of the full-gradient RED.\n\nr(x+) = r(x) \u2212 \u03b3AiGi(x),\n\ni r(x) + \u03c4 (xi \u2212 D(xi))\n\nGi(x) = AT\n\nand\n\n4 Convergence Analysis and Compatibility with Proximal Optimization\n\nIn this section, we present two theoretical results related to BC-RED. We \ufb01rst establish its convergence\nto an element of zer(G) and then discuss its compatibility with the theory of proximal optimization.\n\n4.1 Fixed Point Convergence of BC-RED\n\nOur analysis requires three assumptions that together serve as suf\ufb01cient conditions for convergence.\nAssumption 1. The operator G is such that zer(G) (cid:54)= \u2205. There is a \ufb01nite number R0 such that the\ndistance of the initial x0 \u2208 Rn to the farthest element of zer(G) is bounded, that is\n\nmax\n\nx\u2217\u2208zer(G)(cid:107)x0 \u2212 x\u2217(cid:107)2 \u2264 R0.\n\nThis assumption is necessary to guarantee convergence and is related to the existence of the minimizers\nin the literature on traditional coordinate minimization [25\u201328].\nThe next two assumptions rely on Lipschitz constants along directions speci\ufb01ed by speci\ufb01c blocks.\nWe say that Gi is block Lipschitz continuous with constant \u03bbi > 0 if\n\n(cid:107)Gi(x) \u2212 Gi(y)(cid:107)2 \u2264 \u03bbi(cid:107)hi(cid:107)2, x = y + Uihi, y \u2208 Rn, hi \u2208 Rni.\n\nWhen \u03bbi = 1, we say that Gi is block nonexpansive. Note that if an operator G is globally \u03bb-Lipschitz\ncontinuous, then it is straightforward to see that each Gi = UT\ni G is also block \u03bb-Lipschitz continuous.\n\n4\n\n\fAssumption 2. The function g is continuously differentiable and convex. Additionally, for each\ni \u2208 {1, . . . , b} the block gradient \u2207ig is block Lipschitz continuous with constant Li > 0. We de\ufb01ne\nthe largest block Lipschitz constant as Lmax := max{L1, . . . , Lb}.\nLet L > 0 denote the global Lipschitz constant of \u2207g. We always have Lmax \u2264 L and, for some\ng, it may even happen that Lmax = L/b [28]. As we shall see, the largest possible step-size \u03b3 of\nBC-RED depends on Lmax, while that of the full-gradient RED on L. Hence, one natural advantage\nof BC-RED is that it can often take more aggressive steps compared to the full-gradient RED.\nAssumption 3. The denoiser D is such that each block denoiser Di is block nonexpansive.\n\nSince the proximal operator is nonexpansive [2], it automatically satis\ufb01es this assumption. We revisit\nthis scenario in a greater depth in Section 4.2. We can now establish the following result for BC-RED.\nTheorem 1. Run BC-RED for t \u2265 1 iterations with random i.i.d. block selection under Assumptions 1-\n3 using a \ufb01xed step-size 0 < \u03b3 \u2264 1/(Lmax + 2\u03c4 ). Then, we have\n2(cid:35) \u2264\n(cid:107)G(xk\u22121)(cid:107)2\n\nE(cid:20) min\nk\u2208{1,...,t}(cid:107)G(xk\u22121)(cid:107)2\n\nb(Lmax + 2\u03c4 )\n\nR2\n0.\n\n2(cid:21) \u2264 E(cid:34) 1\n\nt\n\nt(cid:88)k=1\n\n\u03b3t\n\n(9)\n\nA proof of the theorem is provided in the extend version of this paper [30]. Theorem 1 establishes\nthe \ufb01xed-point convergence of BC-RED in expectation to zer(G) with O(1/t) rate. The proof relies\non the monotone operator theory [47, 48], widely used in the context of convex optimization [2],\nincluding in the uni\ufb01ed analysis of various traditional coordinate descent algorithms [49, 50]. Note\nthat the theorem does not assume the existence of any regularizer h, which makes it applicable to\ndenoisers beyond those characterized with explicit functions in (4) and (5).\nSince Lmax \u2264 L, one important implication of Theorem 1, is that the worst-case convergence rate\n(in expectation) of b iterations of BC-RED is better than that of a single iteration of the full-gradient\nRED (to see this, note that the full-gradient rate is obtained by setting b = 1, Lmax = L, and\nremoving the expectation in (9)). This implies that in coordinate friendly settings (as discussed at the\nend of Section 3), the overall computational complexity of BC-RED can be lower than that of the\nfull-gradient RED. This gain is primarily due to two factors: (a) possibility to pick a larger step-size\n\u03b3 = 1/(Lmax + 2\u03c4 ); (b) immediate reuse of each local block-update when computing the next iterate\n(the full-gradient RED updates the full vector before computing the next iterate).\nIn the special case of D(x) = x \u2212 (1/\u03c4 )\u2207h(x), for some convex function h, BC-RED reduces to\nthe traditional coordinate descent method applied to (1). Hence, under the assumptions of Theorem 1,\none can rely on the analysis of traditional randomized coordinate descent methods in [28] to obtain\n\nE(cid:2)f (xt)(cid:3) \u2212 f\u2217 \u2264\n\n2b\n\u03b3t\n\nR2\n0\n\n(10)\n\nwhere f\u2217 is the minimum value in (1). However, as discussed in Section 4.2, when the denoiser is a\nproximal operator of some convex h, BC-RED is not directly solving (1), but rather its approximation.\nFinally, note that the analysis in Theorem 1 only provides suf\ufb01cient conditions for the convergence of\nBC-RED. As corroborated by our numerical studies in Section 5, the actual convergence of BC-RED\nis more general and often holds beyond nonexpansive denoisers. One plausible explanation for this is\nthat such denoisers are locally nonexpansive over the set of input vectors used in testing. On the other\nhand, the recent techniques for spectral-normalization of CNNs [51\u201353] provide a convenient tool for\nbuilding globally nonexpansive neural denoisers that result in provable convergence of BC-RED.\n\n4.2 Convergence for Proximal Operators\n\nOne of the limitations of the current RED theory is in its limited backward compatibility with the\ntheory of proximal optimization. For example, as discussed in [20] (see section \u201cCan we mimic\nany prior?\u201d), the popular total variation (TV) denoiser [31] cannot be justi\ufb01ed with the original\nRED regularization function (4). In this section, we show that BC-RED (and hence also the full-\ngradient RED) can be used to solve (1) for any convex, closed, and proper function h. We do this\nby establishing a formal link between RED and the concept of Moreau smoothing, widely used in\n\n5\n\n\fnonsmooth optimization [54\u201356]. In particular, we consider the following proximal-operator denoiser\n\nD(z) = prox(1/\u03c4 )h(z) = arg min\n\nx\u2208Rn (cid:26) 1\n\n2(cid:107)x \u2212 z(cid:107)2\n\n2 + (1/\u03c4 )h(x)(cid:27) ,\n\n\u03c4 > 0,\n\nz \u2208 Rn,\n\n(11)\n\nwhere h is a closed, proper, and convex function [2]. Since the proximal operator is nonexpansive, it\nis also block nonexpansive, which means that Assumption 3 is automatically satis\ufb01ed. Our analysis,\nhowever, requires an additional assumption using the constant R0 de\ufb01ned in Assumption 1.\nAssumption 4. There is a \ufb01nite number G0 that bounds the largest subgradient of h, that is\n\nmax{(cid:107)g(x)(cid:107)2 : g(x) \u2208 \u2202h(x), x \u2208 B(x0, R0)} \u2264 G0,\n\nwhere B(x0, R0) := {x \u2208 Rn : (cid:107)x \u2212 x0(cid:107)2 \u2264 R0} denotes a ball of radius R0, centered at x0.\nThis assumption on boundedness of the subgradients holds for a large number of regularizers used in\npractice, including both TV and the (cid:96)1-norm penalties. We can now establish the following result.\nTheorem 2. Run BC-RED for t \u2265 1 iterations with random i.i.d. block selection and the denoiser (11)\nunder Assumptions 1-4 using a \ufb01xed step-size 0 < \u03b3 \u2264 1/(Lmax + 2\u03c4 ). Then, we have\n\nwhere the function f is de\ufb01ned in (1) and f\u2217 is its minimum.\n\nE(cid:2)f (xt)(cid:3) \u2212 f\u2217 \u2264\n\n2b\n\u03b3t\n\nR2\n\n0 +\n\nG2\n0\n2\u03c4\n\n,\n\n(12)\n\nThe theorem is proved in the extend version of this paper [30]. It establishes that BC-RED in\nexpectation approximates the solution of (1) with an error bounded by (G2\n0/(2\u03c4 )). For example, by\nsetting \u03c4 = \u221at and \u03b3 = 1/(Lmax + 2\u221at), one obtains the following bound\n0(cid:3) .\n\n\u221at(cid:2)2b(Lmax + 2)R2\n\n0 + G2\n\nE(cid:2)f (xt)(cid:3) \u2212 f\u2217 \u2264\n\n1\n\nWhen h(x) = \u2212log(px(x)), the proximal operator corresponds to the MAP denoiser, and the\nsolution of BC-RED corresponds to an approximate MAP estimator. This approximation can be\nmade as precise as desired by considering larger values for the parameter \u03c4 > 0. Note that this further\njusti\ufb01es the RED framework by establishing that it can be used to compute a minimizer of any proper,\nclosed, and convex (but not necessarily differentiable) h. Therefore, our analysis strengthens RED by\nshowing that it can accommodate a much larger class of explicit regularization functions, beyond\nthose characterized in (4) and (5).\n\n(13)\n\n5 Numerical Validation\n\nThere is a considerable recent interest in using advanced priors in the context of image recovery\nfrom underdetermined (m < n) and noisy measurements. Recent work [20\u201324] suggests signi\ufb01cant\nperformance improvements due to advanced denoisers (such as BM3D [3] or DnCNN [4]) over\ntraditional sparsity-driven priors (such as TV [31]). Our goal is to complement these studies with\nseveral simulations validating our theoretical analysis and providing additional insights into BC-RED.\nThe code for our implementation of BC-RED is available through the following link1.\nWe consider inverse problems of form y = Ax + e, where e \u2208 Rm is an AWGN vector and\nA \u2208 Rm\u00d7n is a matrix corresponding to either a sparse-view Radon transform, i.i.d. zero-mean\nGaussian random matrix of variance 1/m, or radially subsampled two-dimensional Fourier transform.\nSuch matrices are commonly used in the context of computerized tomography (CT) [57], compressive\nsensing [33, 34], and magnetic resonance imaging (MRI) [58], respectively. In all simulations, we\nset the measurement ratio to be approximately m/n = 0.5 with AWGN corresponding to input\nsignal-to-noise ratio (SNR) of 30 dB and 40 dB. The images used correspond to 10 images randomly\nselected from the NYU fastMRI dataset [59], resized to be 160 \u00d7 160 pixels. BC-RED is set to work\nwith 16 blocks, each of size 40 \u00d7 40 pixels. The reconstruction quality is quanti\ufb01ed using SNR\naveraged over all ten test images.\n\n1https://github.com/wustl-cig/bcred\n\n6\n\n\fFigure 2: Left: Illustration of the convergence of BC-RED under a nonexpansive DnCNN\u2217 prior.\nAverage normalized distance to zer(G) is plotted against the iteration number with the shaded areas\nrepresenting the range of values attained over all test images. Right: Illustration of the in\ufb02uence of\nthe parameter \u03c4 > 0 for solving TV regularized least-squares problem using BC-RED. As \u03c4 increases,\nBC-RED provides an increasingly accurate approximation to the TV optimization problem.\n\nTable 1: Average SNRs obtained for different measurement matrices and image priors.\n\nMethods\n\nRadon\n\nRandom\n\nFourier\n\nPGM (TV)\nU-Net\nRED (TV)\nBC-RED (TV)\nRED (BM3D)\nBC-RED (BM3D)\nRED (DnCNN\u2217)\nBC-RED (DnCNN\u2217)\n\n30 dB\n20.66\n21.90\n20.79\n20.78\n21.55\n21.56\n20.89\n20.88\n\n40 dB\n24.40\n21.72\n24.46\n24.42\n25.24\n25.16\n24.38\n24.42\n\n30 dB\n26.07\n16.37\n25.64\n25.70\n26.46\n26.50\n26.53\n26.60\n\n40 dB\n28.42\n16.40\n28.30\n28.39\n27.82\n27.88\n28.05\n28.12\n\n30 dB\n28.74\n22.11\n28.67\n28.71\n28.89\n28.85\n29.33\n29.40\n\n40 dB\n29.99\n22.11\n29.97\n29.99\n29.79\n29.80\n30.32\n30.39\n\nIn addition to well-studied denoisers, such as TV and BM3D, we design our own CNN denoiser\ndenoted DnCNN\u2217, which is a simpli\ufb01ed version of the popular DnCNN denoiser (see [30] for details).\nThis simpli\ufb01cation reduces the computational complexity of denoising, which is important when\nrunning many iterations of BC-RED. Additionally, it makes it easier to control the global Lipschitz\nconstant of the CNN via spectral-normalization [52]. We train DnCNN\u2217 for the removal of AWGN at\nfour noise levels corresponding to \u03c3 \u2208 {5, 10, 15, 20}. For each experiment, we select the denoiser\nachieving the highest SNR value. Note that the \u03c3 parameter of BM3D is also \ufb01ne-tuned for each\nexperiment from the same set {5, 10, 15, 20}.\nTheorem 1 establishes the convergence of BC-RED in expectation to an element of zer(G). This\nis illustrated in Fig. 2 (left) for the Radon matrix with 30 dB noise and a nonexpansive DnCNN\u2217\ndenoiser (see also [30] for additional convergence plots). The average value of (cid:107)G(xk)(cid:107)2\n2/(cid:107)G(x0)(cid:107)2\n2\nis plotted against the iteration number for the full-gradient RED and BC-RED, with b updates of\nBC-RED (each modifying a single block) represented as one iteration. We numerically tested two\nblock selection rules for BC-RED (i.i.d. and epoch) and observed that processing in randomized\nepochs leads to a faster convergence. For reference, the \ufb01gure also plots the normalized squared norm\nof the gradient mapping vectors produced by the traditional PGM with TV [60]. The shaded areas\nindicate the range of values taken over 10 runs corresponding to each test image. The results highlight\nthe potential of BC-RED to enjoy a better convergence rate compared to the full-gradient RED, with\nBC-RED (epoch) achieving the accuracy of 10\u221210 in 104 iterations, while the full-gradient RED\nachieves the same accuracy in 190 iterations.\nTheorem 2 establishes that for proximal-operator denoisers, BC-RED computes an approximate\nsolution to (1) with an accuracy controlled by the parameter \u03c4. This is illustrated in Fig. 2 (right)\nfor the Fourier matrix with 40 dB noise and the TV regularized least-squares problem. The av-\nerage value of (f (xk) \u2212 f\u2217)/(f (x0) \u2212 f\u2217) is plotted against the iteration number for BC-RED\nwith \u03c4 \u2208 {0.01, 0.1, 1}. The optimal value f\u2217 is obtained by running the traditional PGM until\nconvergence. As before, the \ufb01gure groups b updates of BC-RED as a single iteration. The results are\nconsistent with our theoretical analysis and show that as \u03c4 increases BC-RED provides an increasingly\n\n7\n\n100103k10-7100f(xk)-f(x*)\u2327=1AAAB/nicdVDLSgMxFM3UV62vqks3wSK4GibtlLYLoejGZQVrC+1QMmnahiaZIckIZSj4DW517U7c+isu/RPTh2BFD1w4nHMv994Txpxp43kfTmZtfWNzK7ud29nd2z/IHx7d6ShRhDZJxCPVDrGmnEnaNMxw2o4VxSLktBWOr2Z+654qzSJ5ayYxDQQeSjZgBBsrtbsGJ/ACol6+4Lk+8iu1GvRc5BUr5ZIlfhWhahki15ujAJZo9PKf3X5EEkGlIRxr3UFebIIUK8MIp9NcN9E0xmSMh7RjqcSC6iCd3zuFZ1bpw0GkbEkD5+rPiRQLrScitJ0Cm5H+7c3Ev7xOYgbVIGUyTgyVZLFokHBoIjh7HvaZosTwiSWYKGZvhWSEFSbGRrSyJRRTm8n34/B/cld0Uckt3viF+uUynSw4AafgHCBQAXVwDRqgCQjg4BE8gWfnwXlxXp23RWvGWc4cgxU471/975YH\u2327=102AAACBHicdVC7TgJBFJ3FF+ILtbSZSExs3MyyIEthQrSxxETABBYyOwwwYfaRmVkTsqH1G2y1tjO2/oelf+IsYCJGT3KTk3Puzb33eBFnUiH0YWRWVtfWN7Kbua3tnd29/P5BU4axILRBQh6KOw9LyllAG4opTu8iQbHvcdryxlep37qnQrIwuFWTiLo+HgZswAhWWup2FI7hBbRQNzkrTnv5AjKR7aBqBSLTLlcdu6zJuVMqIQdaJpqhABao9/KfnX5IYp8GinAsZdtCkXITLBQjnE5znVjSCJMxHtK2pgH2qXST2dVTeKKVPhyEQleg4Ez9OZFgX8qJ7+lOH6uR/O2l4l9eO1YDx01YEMWKBmS+aBBzqEKYRgD7TFCi+EQTTATTt0IywgITpYNa2uL5aSbfj8P/SbNoWrZZvCkVapeLdLLgCByDU2CBCqiBa1AHDUCAAI/gCTwbD8aL8Wq8zVszxmLmECzBeP8C+nyYMw==\u2327=101AAACBHicdVC7SgNBFJ2NrxhfUUubwSDYuMxmE90UQtDGMoJ5QLIJs5NJMmT2wcysEJa0foOt1nZi639Y+ifOJhGM6IELh3Pu5d57vIgzqRD6MDIrq2vrG9nN3Nb2zu5efv+gIcNYEFonIQ9Fy8OSchbQumKK01YkKPY9Tpve+Dr1m/dUSBYGd2oSUdfHw4ANGMFKS92OwjG8hBbqJmfWtJcvINNBJcspQ2QW7YpVsTVBjl0unUPLRDMUwAK1Xv6z0w9J7NNAEY6lbFsoUm6ChWKE02muE0saYTLGQ9rWNMA+lW4yu3oKT7TSh4NQ6AoUnKk/JxLsSznxPd3pYzWSv71U/Mtrx2rguAkLoljRgMwXDWIOVQjTCGCfCUoUn2iCiWD6VkhGWGCidFBLWzw/zeT7cfg/aRRNyzaLt6VC9WqRThYcgWNwCixwAargBtRAHRAgwCN4As/Gg/FivBpv89aMsZg5BEsw3r8A6G+YJw==iteration0103(f(xk)f\u21e4)/(f(x0)f\u21e4)AAACG3icbVDLTgIxFO3gC/GFunTTSEzARJxBE10S3bjERB4JDKRTOtDQdiZtx0gmfIAf4Te41bU749aFS//EApMo4EmanJxzbu7t8UJGlbbtLyu1tLyyupZez2xsbm3vZHf3aiqIJCZVHLBANjykCKOCVDXVjDRCSRD3GKl7g+uxX78nUtFA3OlhSFyOeoL6FCNtpE42l/fzLY/HD6P2oHDit48Lp7+KPVVMyi7aE8BF4iQkBxJUOtnvVjfAESdCY4aUajp2qN0YSU0xI6NMK1IkRHiAeqRpqECcKDeefGYEj4zShX4gzRMaTtS/EzHiSg25Z5Ic6b6a98bif14z0v6lG1MRRpoIPF3kRwzqAI6bgV0qCdZsaAjCkppbIe4jibA2/c1s8fjIdOLMN7BIaqWic1Ys3Z7nyldJO2lwAA5BHjjgApTBDaiAKsDgETyDF/BqPVlv1rv1MY2mrGRmH8zA+vwBT3OfpQ==10010\u22127Fourier (40 dB)0250500iteration10-1410-7100(x-Px)/(x0-Px0)kG(xk)k22/kG(x0)k22AAACM3icbVDLSsNAFJ34rPUVdelmsAh1U5MqKLgputBlBfuApi2T6aQdOpOEmYlY0nyMH+E3uNWluFLc+g9O2yzs48CFwzn3cu89bsioVJb1YSwtr6yurWc2sptb2zu75t5+VQaRwKSCAxaIuoskYdQnFUUVI/VQEMRdRmpu/2bk1x6JkDTwH9QgJE2Ouj71KEZKS23zyhk6HKme9OLbJO+4PH5KWv0TZ9gutorwFC6yrdRumzmrYI0B54mdkhxIUW6bX04nwBEnvsIMSdmwrVA1YyQUxYwkWSeSJES4j7qkoamPOJHNePxkAo+10oFeIHT5Co7V/xMx4lIOuKs7xxfPeiNxkdeIlHfZjKkfRor4eLLIixhUARwlBjtUEKzYQBOEBdW3QtxDAmGlc53a4vJEZ2LPJjBPqsWCfVYo3p/nStdpOhlwCI5AHtjgApTAHSiDCsDgGbyCN/BuvBifxrfxM2ldMtKZAzAF4/cPEZKrDw==iteration050010\u221214100Radon (30 dB)PGM \nRED \nBC-RED (i.i.d.)\nBC-RED (epoch)\u2327=0.1AAAB8XicdVDLSgMxFM34rPVVdekmWARXQ2YcnW6EohuXFewD26Fk0kwbmskMSUYoQ//CjQtF3Po37vwbM20FFT0QODnnXu69J0w5UxqhD2tpeWV1bb20Ud7c2t7Zreztt1SSSUKbJOGJ7IRYUc4EbWqmOe2kkuI45LQdjq8Kv31PpWKJuNWTlAYxHgoWMYK1ke56GmfwAiLb6VeqyPa9mue7xfccuWfIEOR7CLnQsdEMVbBAo1957w0SksVUaMKxUl0HpTrIsdSMcDot9zJFU0zGeEi7hgocUxXks42n8NgoAxgl0jyh4Uz93pHjWKlJHJrKGOuR+u0V4l9eN9NRLciZSDNNBZkPijIOdQKL8+GASUo0nxiCiWRmV0hGWGKiTUhlE8LXpfB/0nJt59R2b7xq/XIRRwkcgiNwAhzggzq4Bg3QBAQI8ACewLOlrEfrxXqdly5Zi54D8APW2yf47I/P\u2327=0.01AAAB8nicdVDLSgMxFM34rPVVdekmWARXQ1JrRxdC0Y3LCvYB06Fk0rQNzTxI7gil9DPcuFDErV/jzr8xfQgqeiBwcs693HtPmCppgJAPZ2l5ZXVtPbeR39za3tkt7O03TJJpLuo8UYluhcwIJWNRBwlKtFItWBQq0QyH11O/eS+0kUl8B6NUBBHrx7InOQMr+W1gGb7ExCW0UygSl9KLsndm/2VCKhXPEo9UKKWYumSGIlqg1im8t7sJzyIRA1fMGJ+SFIIx0yC5EpN8OzMiZXzI+sK3NGaRMMF4tvIEH1uli3uJti8GPFO/d4xZZMwoCm1lxGBgfntT8S/Pz6B3HoxlnGYgYj4f1MsUhgRP78ddqQUHNbKEcS3trpgPmGYcbEp5G8LXpfh/0ii59NQt3ZaL1atFHDl0iI7QCaLIQ1V0g2qojjhK0AN6Qs8OOI/Oi/M6L11yFj0H6Aect09zE5AQ\fFigure 3: Recovery of a 8292 \u00d7 8364 pixel galaxy image degraded by a spatially variant blur and\na high-amount of AWGN. The ef\ufb01cacy of BC-RED is due to the natural sparsity in this large-scale\nproblem, with all of the information contained in a small part of the full image.\n\naccurate solution to TV. On the other hand, since the range of possible values for the step-size \u03b3\ndepends on \u03c4, the speed of convergence to f\u2217 is also in\ufb02uenced by \u03c4.\nThe bene\ufb01ts of the full-gradient RED algorithms have been well discussed in prior work [20\u201324].\nTable 1 summarizes the average SNR performance of BC-RED in comparison to the full-gradient RED\nfor all three matrix types and several priors. Unlike the full-gradient RED, BC-RED is implemented\nusing block-wise denoisers that work on image patches rather than the full images. We empirically\nfound that 40 pixel padding on the denoiser input is suf\ufb01cient for BC-RED to match the performance\nof the full-gradient RED. The table also includes the results for the traditional PGM with TV [60] and\nthe widely-used end-to-end U-Net approach [61, 62]. The latter \ufb01rst backprojects the measurements\ninto the image domain and then denoises the result using U-Net [63]. The model was speci\ufb01cally\ntrained end-to-end for the Radon matrix with 30 dB noise and applied as such to other measurement\nsettings. All the algorithms were run until convergence with hyperparameters optimized for SNR.\nThe DnCNN\u2217 denoiser in the table corresponds to the residual network with the Lipschitz constant of\ntwo (see [30] for details). The overall best SNR in the table is highlighted in bold-italic, while the\nbest RED prior is highlighted in light-green. First, note the excellent agreement between BC-RED\nand the full-gradient RED. This close agreement between two methods is encouraging as BC-RED\nrelies on block-wise denoising and our analysis does not establish uniqueness of the solution, yet, in\npractice, both methods seem to yield solutions of nearly identical quality. Second, note that BC-RED\nand RED provide excellent approximations to PGM-TV solutions. Third, note how (unlike U-Net)\nBC-RED and RED with DnCNN\u2217 generalize to different measurement models. Finally. no prior\nseems to be universally good on all measurement settings, which indicates to the potential bene\ufb01t of\ntailoring speci\ufb01c priors to speci\ufb01c measurement models.\nCoordinate descent methods are known to be highly bene\ufb01cial in problems where both m and n\nare very large, but each measurement depends only on a small subset of the unknowns [64]. Fig. 3\ndemonstrates BC-RED in such large-scale setting by adopting the experimental setup from a recent\nwork [65] (see also [30] for additional simulations). Speci\ufb01cally, we consider the recovery of a\n8292\u00d7 8364 pixel galaxy image degraded by 597 known point spread functions (PSFs) corresponding\nto different spatial locations. The natural sparsity of the problem makes it ideal for BC-RED, which\nis implemented to update 41 \u00d7 41 pixel blocks in a randomized fashion by only picking areas\ncontaining galaxies. The computational complexity of BC-RED is further reduced by considering a\nsimpler variant of DnCNN\u2217 that has only four convolutional layers (see [30] for additional details).\nFor comparison, we additionally show the result obtained by using the low-rank recovery method\nfrom [65] with all the parameters kept at the values set by the authors. Note that our intent here is not\nto justify DnCNN\u2217 as a prior for image deblurring, but to demonstrate that BC-RED can indeed be\napplied to a realistic, nontrivial image recovery task on a large image.\n\n6 Conclusion and Future Work\n\nCoordinate descent methods have become increasingly important in optimization for solving large-\nscale problems arising in data analysis. We have introduced BC-RED as a coordinate descent\nextension to the current family of RED algorithms and theoretically analyzed its convergence.\nPreliminary experiments suggest that BC-RED can be an effective tool in large-scale estimation\nproblems arising in image recovery. More experiments are certainly needed to better asses the promise\nof this approach in various estimation tasks. For future work, we would like to explore accelerated\nand asynchronous variants of BC-RED to further enhance its performance in parallel settings.\n\n8\n\n\fAcknowledgments\n\nThis material is based upon work supported in part by NSF award CCF-1813910 and by NVIDIA\nCorporation with the donation of the Titan Xp GPU for research. The authors thank B. Wohlberg as\nwell as anonymous reviewers for insightful comments.\n\nReferences\n[1] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, \u201cPlug-and-play priors for model based recon-\n\nstruction,\u201d in Proc. IEEE Global Conf. Signal Process. and INf. Process. (GlobalSIP), 2013.\n\n[2] N. Parikh and S. Boyd, \u201cProximal algorithms,\u201d Foundations and Trends in Optimization, vol. 1, no. 3, pp.\n\n123\u2013231, 2014.\n\n[3] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, \u201cImage denoising by sparse 3-D transform-domain\n\ncollaborative \ufb01ltering,\u201d IEEE Trans. Image Process., vol. 16, no. 16, pp. 2080\u20132095, August 2007.\n\n[4] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, \u201cBeyond a Gaussian denoiser: Residual learning of\ndeep CNN for image denoising,\u201d IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142\u20133155, July 2017.\n[5] A. Danielyan, V. Katkovnik, and K. Egiazarian, \u201cBM3D frames and variational image deblurring,\u201d IEEE\n\nTrans. Image Process., vol. 21, no. 4, pp. 1715\u20131728, April 2012.\n\n[6] S. H. Chan, X. Wang, and O. A. Elgendy, \u201cPlug-and-play ADMM for image restoration: Fixed-point\n\nconvergence and applications,\u201d IEEE Trans. Comp. Imag., vol. 3, no. 1, pp. 84\u201398, March 2017.\n\n[7] Sreehari et al., \u201cPlug-and-play priors for bright \ufb01eld electron tomography and sparse interpolation,\u201d IEEE\n\nTrans. Comp. Imag., vol. 2, no. 4, pp. 408\u2013423, December 2016.\n\n[8] S. Ono, \u201cPrimal-dual plug-and-play image restoration,\u201d IEEE Signal Process. Lett., vol. 24, no. 8, pp.\n\n1108\u20131112, 2017.\n\n[9] U. S. Kamilov, H. Mansour, and B. Wohlberg, \u201cA plug-and-play priors approach for solving nonlinear\nimaging inverse problems,\u201d IEEE Signal Process. Lett., vol. 24, no. 12, pp. 1872\u20131876, December 2017.\n[10] T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers, \u201cLearning proximal operators: Using denoising\nnetworks for regularizing inverse imaging problems,\u201d in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), 2017.\n[11] K. Zhang, W. Zuo, S. Gu, and L. Zhang, \u201cLearning deep CNN denoiser prior for image restoration,\u201d in\n\nProc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017.\n\n[12] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, \u201cPlug-and-play unplugged: Optimization free\nreconstruction using consensus equilibrium,\u201d SIAM J. Imaging Sci., vol. 11, no. 3, pp. 2001\u20132020, 2018.\n[13] Y. Sun, B. Wohlberg, and U. S. Kamilov, \u201cAn online plug-and-play algorithm for regularized image\n\nreconstruction,\u201d IEEE Trans. Comput. Imaging, 2019.\n\n[14] A. M. Teodoro, J. M. Bioucas-Dias, and M. A. T. Figueiredo, \u201cA convergent image fusion algorithm\nusing scene-adapted Gaussian-mixture-based denoising,\u201d IEEE Trans. Image Process., vol. 28, no. 1, pp.\n451\u2013463, Jan. 2019.\n\n[15] E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, \u201cPlug-and-play methods provably converge\n\nwith properly trained denoisers,\u201d in Proc. 36th Int. Conf. Machine Learning (ICML), 2019.\n\n[16] J. Tan, Y. Ma, and D. Baron, \u201cCompressive imaging via approximate message passing with image\n\ndenoising,\u201d IEEE Trans. Signal Process., vol. 63, no. 8, pp. 2085\u20132092, Apr. 2015.\n\n[17] C. A. Metzler, A. Maleki, and R. G. Baraniuk, \u201cFrom denoising to compressed sensing,\u201d IEEE Trans. Inf.\n\nTheory, vol. 62, no. 9, pp. 5117\u20135144, September 2016.\n\n[18] C. A. Metzler, A. Maleki, and R. Baraniuk, \u201cBM3D-PRGAMP: Compressive phase retrieval based on\n\nBM3D denoising,\u201d in Proc. IEEE Int. Conf. Image Proc., 2016.\n\n[19] A. Fletcher, S. Rangan, S. Sarkar, and P. Schniter, \u201cPlug-in estimation in high-dimensional linear inverse\nproblems: A rigorous analysis,\u201d in Proc. Advances in Neural Information Processing Systems 32, 2018.\n\n[20] Y. Romano, M. Elad, and P. Milanfar, \u201cThe little engine that could: Regularization by denoising (RED),\u201d\n\nSIAM J. Imaging Sci., vol. 10, no. 4, pp. 1804\u20131844, 2017.\n\n[21] S. A. Bigdeli, M. Jin, P. Favaro, and M. Zwicker, \u201cDeep mean-shift priors for image restoration,\u201d in Proc.\n\nAdvances in Neural Information Processing Systems 31, 2017.\n\n[22] E. T. Reehorst and P. Schniter, \u201cRegularization by denoising: Clari\ufb01cations and new interpretations,\u201d IEEE\n\nTrans. Comput. Imag., vol. 5, no. 1, pp. 52\u201367, Mar. 2019.\n\n[23] C. A. Metzler, P. Schniter, A. Veeraraghavan, and R. G. Baraniuk, \u201cprDeep: Robust phase retrieval with a\n\n\ufb02exible deep network,\u201d in Proc. 35th Int. Conf. Machine Learning (ICML), 2018.\n\n9\n\n\f[24] G. Mataev, M. Elad, and P. Milanfar, \u201cDeepRED: Deep image prior powered by RED,\u201d in Proc. IEEE Int.\n\nConf. Comp. Vis. Workshops (ICCVW), 2019.\n\n[25] P. Tseng, \u201cConvergence of a block coordinate descent method for nondifferentiable minimization,\u201d J.\n\nOptimiz. Theory App., vol. 109, no. 3, pp. 475\u2013494, June 2001.\n\n[26] Y. Nesterov, \u201cEf\ufb01ciency of coordinate descent methods on huge-scale optimization problems,\u201d SIAM J.\n\nOptim., vol. 22, no. 2, pp. 341\u2013362, 2012.\n\n[27] A. Beck and L. Tetruashvili, \u201cOn the convergence of block coordinate descent type methods,\u201d SIAM J.\n\nOptim., vol. 23, no. 4, pp. 2037\u20132060, Oct. 2013.\n\n[28] S. J. Wright, \u201cCoordinate descent algorithms,\u201d Math. Program., vol. 151, no. 1, pp. 3\u201334, June 2015.\n[29] O. Fercoq and A. Gramfort, \u201cCoordinate descent methods,\u201d Lecture notes Optimization for Data Science,\n\n\u00c9cole polytechnique, 2018.\n\n[30] Y. Sun, J. Liu, and U. S. Kamilov,\n\narXiv:1905.05113 [cs.CV].\n\n\u201cBlock coordinate regularization by denoising,\u201d May 2019,\n\n[31] L. I. Rudin, S. Osher, and E. Fatemi, \u201cNonlinear total variation based noise removal algorithms,\u201d Physica\n\nD, vol. 60, no. 1\u20134, pp. 259\u2013268, November 1992.\n\n[32] R. Tibshirani, \u201cRegression and selection via the lasso,\u201d J. R. Stat. Soc. Series B (Methodological), vol. 58,\n\nno. 1, pp. 267\u2013288, 1996.\n\n[33] E. J. Cand\u00e8s, J. Romberg, and T. Tao, \u201cRobust uncertainty principles: Exact signal reconstruction from\nhighly incomplete frequency information,\u201d IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489\u2013509, February\n2006.\n\n[34] D. L. Donoho, \u201cCompressed sensing,\u201d IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289\u20131306, April 2006.\n[35] M. A. T. Figueiredo and R. D. Nowak, \u201cAn EM algorithm for wavelet-based image restoration,\u201d IEEE\n\nTrans. Image Process., vol. 12, no. 8, pp. 906\u2013916, August 2003.\n\n[36] I. Daubechies, M. Defrise, and C. De Mol, \u201cAn iterative thresholding algorithm for linear inverse problems\nwith a sparsity constraint,\u201d Commun. Pure Appl. Math., vol. 57, no. 11, pp. 1413\u20131457, November 2004.\n[37] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle, \u201cA (cid:96)1-uni\ufb01ed variational framework for image\n\nrestoration,\u201d in Proc. ECCV, Springer, Ed., New York, 2004, vol. 3024, pp. 1\u201313.\n\n[38] A. Beck and M. Teboulle, \u201cA fast iterative shrinkage-thresholding algorithm for linear inverse problems,\u201d\n\nSIAM J. Imaging Sciences, vol. 2, no. 1, pp. 183\u2013202, 2009.\n\n[39] J. Eckstein and D. P. Bertsekas, \u201cOn the Douglas-Rachford splitting method and the proximal point\n\nalgorithm for maximal monotone operators,\u201d Mathematical Programming, vol. 55, pp. 293\u2013318, 1992.\n\n[40] M. V. Afonso, J. M.Bioucas-Dias, and M. A. T. Figueiredo, \u201cFast image recovery using variable splitting\nand constrained optimization,\u201d IEEE Trans. Image Process., vol. 19, no. 9, pp. 2345\u20132356, September\n2010.\n\n[41] M. K. Ng, P. Weiss, and X. Yuan, \u201cSolving constrained total-variation image restoration and reconstruction\nproblems via alternating direction methods,\u201d SIAM J. Sci. Comput., vol. 32, no. 5, pp. 2710\u20132736, August\n2010.\n\n[42] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, \u201cDistributed optimization and statistical learning\nvia the alternating direction method of multipliers,\u201d Foundations and Trends in Machine Learning, vol. 3,\nno. 1, pp. 1\u2013122, 2011.\n\n[43] Z. Peng, T. Wu, Y. Xu, M. Yan, and W. Yin, \u201cCoordinate-friendly structures, algorithms and applications,\u201d\n\nAdv. Math. Sci. Appl., vol. 1, no. 1, pp. 57\u2013119, Apr. 2016.\n\n[44] M. Elad and M. Aharon, \u201cImage denoising via sparse and redundant representations over learned dictionar-\n\nies,\u201d IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736\u20133745, December 2006.\n\n[45] A. Buades, B. Coll, and J. M. Morel, \u201cImage denoising methods. A new nonlocal principle,\u201d SIAM Rev,\n\nvol. 52, no. 1, pp. 113\u2013147, 2010.\n\n[46] D. Zoran and Y. Weiss, \u201cFrom learning models of natural image patches to whole image restoration,\u201d in\n\nProc. IEEE Int. Conf. Comp. Vis. (ICCV), 2011.\n\n[47] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces,\n\nSpringer, 2 edition, 2017.\n\n[48] E. K. Ryu and S. Boyd, \u201cA primer on monotone operator methods,\u201d Appl. Comput. Math., vol. 15, no. 1,\n\npp. 3\u201343, 2016.\n\n[49] Z. Peng, Y. Xu, M. Yan, and W. Yin, \u201cARock: An algorithmic framework for asynchronous parallel\n\ncoordinate updates,\u201d SIAM J. Sci. Comput., vol. 38, no. 5, pp. A2851\u2013A2879, 2016.\n\n10\n\n\f[50] Y. T. Chow, T. Wu, and W. Yin, \u201cCyclic coordinate-update algorithms for \ufb01xed-point problems: Analysis\n\nand applications,\u201d SIAM J. Sci. Comput., vol. 39, no. 4, pp. A1280\u2013A1300, 2017.\n\n[51] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, \u201cSpectral normalization for generative adversarial\n\nnetworks,\u201d in International Conference on Learning Representations (ICLR), 2018.\n\n[52] H. Sedghi, V. Gupta, and P. M. Long, \u201cThe singular values of convolutional layers,\u201d in International\n\nConference on Learning Representations (ICLR), 2019.\n\n[53] H. Gouk, E. Frank, B. Pfahringer, and M. Cree, \u201cRegularisation of neural networks by enforcing Lipschitz\n\ncontinuity,\u201d 2018, arXiv:1804.04368.\n\n[54] J. J. Moreau, \u201cProximit\u00e9 et dualit\u00e9 dans un espace hilbertien,\u201d Bull. Soc. Math. France, vol. 93, pp.\n\n273\u2013299, 1965.\n\n[55] R. T. Rockafellar and R. J-B Wets, Variational Analysis, Springer, 1998.\n[56] Y.-L. Yu, \u201cBetter approximation and faster algorithm using the proximal average,\u201d in Proc. Advances in\n\nNeural Information Processing Systems 26, 2013.\n\n[57] A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, IEEE, 1988.\n[58] F. Knoll, K. Brendies, T. Pock, and R. Stollberger, \u201cSecond order total generalized variation (TGV) for\n\nMRI,\u201d Magn. Reson. Med., vol. 65, no. 2, pp. 480\u2013491, February 2011.\n\n[59] Zbontar et al., \u201cfastMRI: An open dataset and benchmarks for accelerated MRI,\u201d 2018, arXiv:1811.08839.\n[60] A. Beck and M. Teboulle, \u201cFast gradient-based algorithm for constrained total variation image denoising\nand deblurring problems,\u201d IEEE Trans. Image Process., vol. 18, no. 11, pp. 2419\u20132434, November 2009.\n[61] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, \u201cDeep convolutional neural network for inverse\n\nproblems in imaging,\u201d IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509\u20134522, Sept. 2017.\n\n[62] Y. S. Han, J. Yoo, and J. C. Ye, \u201cDeep learning with domain adaptation for accelerated projection\n\nreconstruction MR,\u201d Magn. Reson. Med., vol. 80, no. 3, pp. 1189\u20131205, Sept. 2017.\n\n[63] O. Ronneberger, P. Fischer, and T. Brox, \u201cU-Net: Convolutional networks for biomedical image segmenta-\n\ntion,\u201d in Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.\n\n[64] F. Niu, B. Recht, C. R\u00e9, and S. J. Wright, \u201cHogwild!: A lock-free approach to parallelizing stochastic\n\ngradient descent,\u201d in Proc. Advances in Neural Information Processing Systems 24, 2011.\n\n[65] S. Farrens, F. M. Ngol\u00e8 Mboula, and J.-L. Starck, \u201cSpace variant deconvolution of galaxy survey images,\u201d\n\nA&A, vol. 601, pp. A66, 2017.\n\n11\n\n\f", "award": [], "sourceid": 186, "authors": [{"given_name": "Yu", "family_name": "Sun", "institution": "Washington University in St. Louis"}, {"given_name": "Jiaming", "family_name": "Liu", "institution": "Washington University in St. Louis"}, {"given_name": "Ulugbek", "family_name": "Kamilov", "institution": "Washington University in St. Louis"}]}