{"title": "Adversarial Regularizers in Inverse Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 8507, "page_last": 8516, "abstract": "Inverse Problems in medical imaging and computer vision are traditionally solved using purely model-based methods. Among those variational regularization models are one of the most popular approaches. We propose a new framework for applying data-driven approaches to inverse problems, using a neural network as a regularization functional. The network learns to discriminate between the distribution of ground truth images and the distribution of unregularized reconstructions. Once trained, the network is applied to the inverse problem by solving the corresponding variational problem. Unlike other data-based approaches for inverse problems, the algorithm can be applied even if only unsupervised training data is available. Experiments demonstrate the potential of the framework for denoising on the BSDS dataset and for computer tomography reconstruction on the LIDC dataset.", "full_text": "Adversarial Regularizers in Inverse Problems\n\nSebastian Lunz\n\nDAMTP\n\nUniversity of Cambridge\nCambridge CB3 0WA\nlunz@math.cam.ac.uk\n\nOzan \u00d6ktem\n\nDepartment of Mathematics\n\nKTH - Royal Institute of Technology\n\n100 44 Stockholm\n\nozan@kth.se\n\nAbstract\n\nCarola-Bibiane Sch\u00f6nlieb\n\nDAMTP\n\nUniversity of Cambridge\nCambridge CB3 0WA\ncbs31@cam.ac.uk\n\nInverse Problems in medical imaging and computer vision are traditionally solved\nusing purely model-based methods. Among those variational regularization models\nare one of the most popular approaches. We propose a new framework for applying\ndata-driven approaches to inverse problems, using a neural network as a regular-\nization functional. The network learns to discriminate between the distribution of\nground truth images and the distribution of unregularized reconstructions. Once\ntrained, the network is applied to the inverse problem by solving the corresponding\nvariational problem. Unlike other data-based approaches for inverse problems,\nthe algorithm can be applied even if only unsupervised training data is available.\nExperiments demonstrate the potential of the framework for denoising on the BSDS\ndataset and for computed tomography reconstruction on the LIDC dataset.\n\n1\n\nIntroduction\n\nInverse problems naturally occur in many applications in computer vision and medical imaging. A\nsuccessful classical approach relies on the concept of variational regularization [11, 24]. It combines\nknowledge about how data is generated in the forward operator with a regularization functional that\nencodes prior knowledge about the image to be reconstructed.\nThe success of neural networks in many computer vision tasks has motivated attempts at using deep\nlearning to achieve better performance in solving inverse problems [15, 2, 25]. A major dif\ufb01culty\nis the ef\ufb01cient usage of knowledge about the forward operator and noise model in such data driven\napproaches, avoiding the necessity to relearn the physical model structure.\nThe framework considered here aims to solve this by using neural networks as part of variational\nregularization, replacing the typically hand-crafted regularization functional with a neural network.\nAs classical learning methods for regularization functionals do not scale to the high dimensional\nparameter spaces needed for neural networks, we propose a new training algorithm for regularization\nfunctionals. It is based on the ideas in Wasserstein generative adversarial models [5], training the\nnetwork as a critic to tell apart ground truth images from unregularized reconstructions.\nOur contributions are as follows:\n\n1. We introduce the idea of learning a regularization functional given by a neural network,\ncombining the advantages of the variational formulation for inverse problems with data-\ndriven approaches.\n\n2. We propose a training algorithm for regularization functionals which scales to high dimen-\n\nsional parameter spaces.\n\n3. We show desirable theoretical properties of the regularization functionals obtained this way.\n4. We demonstrate the performance of the algorithm for denoising and computed tomography.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f2 Background\n\nInverse Problems in Imaging\n\n2.1\nLet X and Y be re\ufb02exive Banach spaces. In a generic inverse problem in imaging, the image x \u2208 X\nis recovered from measurement y \u2208 Y , where\n\ny = Ax + e.\n\n(1)\nA : X \u2192 Y denotes the linear forward operator and e \u2208 Y is a random noise term. Typical\ntasks in computer vision that can be phrased as inverse problems include denoising, where A is the\nidentity operator, or inpainting, where A is given by a projection operator onto the complement of\nthe inpainting domain. In medical imaging, common forward operators are the Fourier transform in\nmagnetic resonance imaging (MRI) and the ray transform in computed tomography (CT).\n\n2.2 Deep Learning in Inverse Problems\nOne approach to solve (1) using deep learning is to directly learn the mapping y \u2192 x using a neural\nnetwork. While this has been observed to work well for denoising and inpainting [28], the approach\ncan become infeasible in inverse problems involving forward operator with a more complicated\nstructure [4] and when only very limited training data is available. This is typically the case in\napplications in medical imaging.\nOther approaches have been developed to tackle inverse problems with complex forward operators. In\n[15] an algorithm has been suggested that \ufb01rst applies a pseudo-inverse to the operator A, leading to\na noisy reconstruction. This result is then denoised using deep learning techniques. Other approaches\n[1, 14, 25] propose applying a neural network iteratively. Learning proximal operators for solving\ninverse problems is a further direction of research [2, 19].\n\n2.3 Variational regularization\n\nVariational regularization is a well-established model-based method for solving inverse problems.\nGiven a single measurement y, the image x is recovered by solving\n\nargminx (cid:107)Ax \u2212 y(cid:107)2\n\n(2)\nwhere the data term (cid:107)Ax \u2212 y(cid:107)2\n2 ensures consistency of the reconstruction x with the measurement y\nand the regularization functional f : X \u2192 R allows us to insert prior knowledge onto the solution\nx. The functional f is usually hand-crafted, with typical choices including total variation (TV) [23]\nwhich leads to piecewise constant images and total generalized variation (TGV) [16], generating\npiecewise linear images.\n\n2 + \u03bbf (x),\n\n3 Learning a regularization functional\n\nIn this paper, we design a regularization functional based on training data. We \ufb01x a-priori a class\nof admissible regularization functionals F and then learn the choice {f}f\u2208F from data. Existing\napproaches to learning a regularization functionals are based on the idea that f should be chosen such\nthat a solution to the variational problem\n\n(3)\nbest approximates the true solution. Given training samples (xj, yj), identifying f using this method\nrequires one to solve the bilevel optimization problem [17, 9]\n\n2 + \u03bbf (x),\n\nargminx (cid:107)Ax \u2212 y(cid:107)2\n\n(cid:88)\n\nargminf\u2208F\n\n(cid:107) \u02dcxj \u2212 xj(cid:107)2 ,\n\nsubject to\n\n\u02dcxj \u2208 argminx (cid:107)Ax \u2212 yj(cid:107)2\n\n2 + f (x).\n\n(4)\n\nj\n\nBut this is computationally feasible only for small sets of admissible functions F. In particular, it\ndoes not scale to sets F parametrized by some high dimensional space \u0398.\nWe hence apply a novel technique for learning the regularization functional f \u2208 F that scales to high\ndimensional parameter spaces. It is based on the idea of learning to discriminate between noisy and\nground truth images.\n\n2\n\n\fIn particular, we consider approaches where the regularization functional is given by a neural network\n\u03a8\u0398 with network parameters \u0398. In this setting, the class F is given by the functions that can be\nparametrized by the network architecture of \u03a8 for some choice of parameters \u0398. Once \u0398 is \ufb01xed, the\ninverse problem (1) is solved by\n\nargminx (cid:107)Ax \u2212 y(cid:107)2\n\n2 + \u03bb\u03a8\u0398(x).\n\n(5)\n\n3.1 Regularization functionals as critics\nDenote by xi \u2208 X independent samples from the distribution of ground truth images Pr and by\nyi \u2208 Y independent samples from the distribution of measurements PY . Note that we only use\nsamples from both marginals of the joint distribution PX\u00d7Y of images and measurement, i.e. we are\nin the setting of unsupervised learning.\nThe distribution PY on measurement space can be mapped to a distribution on image space by\n\u2020\n\u03b4. In [15] it has been shown that such an inverse\napplying a -potentially regularized- pseudo-inverse A\ncan in fact be computed ef\ufb01ciently for a large class of forward operators. This in particular includes\nFourier and ray transforms occurring in MRI and CT. Let\n\u2020\n\u03b4)#PY\n\nPn = (A\n\nbe the distribution obtained this way. Here, # denotes the push-forward of measures, i.e.\n\u2020\n\u2020\n\u03b4Y \u223c (A\n\u03b4)#PY for Y \u223c PY . Samples drawn from Pn will be corrupted with noise that both\nA\ndepends on the noise model e as well as on the operator A.\nA good regularization functional \u03a8\u0398 is able to tell apart the distributions Pr and Pn- taking high\nvalues on typical samples of Pn and low values on typical samples of Pr [7]. It is thus clear that\n\nEX\u223cPr [\u03a8\u0398(X)] \u2212 EX\u223cPn [\u03a8\u0398(X)]\n\nbeing small is desirable. With this in mind, we choose the loss functional for learning the regularizer\nto be\n\nEX\u223cPr [\u03a8\u0398(X)] \u2212 EX\u223cPn [\u03a8\u0398(X)] + \u03bb \u00b7 E(cid:104)\n\n(cid:105)\n\n((cid:107)\u2207x\u03a8\u0398(X)(cid:107) \u2212 1)2\n\n+\n\n.\n\n(6)\n\nThe last term in the loss functional serves to enforce the trained network \u03a8\u0398 to be Lipschitz continuous\nwith constant one [13]. The expected value in this term is taken over all lines connecting samples in\nPn and Pr.\nTraining a neural network as a critic was \ufb01rst proposed in the context of generative modeling in\n[12]. The particular choice of loss functional has been introduced in [5] to train a critic that captures\nthe Wasserstein distance between the distributions Pr and Pn. A minimizer to (6) approximates a\nmaximizer f to the Kantorovich formulation of optimal transport [26].\n\nWass(Pr, Pn) = sup\n\nf\u22081\u2212Lip\n\nEX\u223cPn [f (X)] \u2212 EX\u223cPr [f (X)] .\n\n(7)\n\nRelaxing the hard Lipschitz constraint in (7) into a penalty term as in (6) was proposed in [13].\nTracking the gradients of \u03a8\u0398 for our experiments demonstrates that this way the Lipschitz constraint\ncan in fact be enforced up to a small error.\n\nAlgorithm 1 Learning a regularization functional\nRequire: Gradient penalty coef\ufb01cient \u00b5, batch size m, Adam hyperparameters \u03b1, inverse A+\n\u03b4\n\nSample ground truth image xr \u223c Pr, measurement y \u223c PY and random number \u0001 \u223c U [0, 1]\nxn \u2190 A+\n\u03b4 y\nxi = \u0001xr + (1 \u2212 \u0001)xn\nLi \u2190 \u03a8\u0398(xr) \u2212 \u03a8\u0398(xn) + \u00b5 ((cid:107)\u2207xi\u03a8\u0398(xi)(cid:107) \u2212 1)2\n\n+\n\nwhile \u0398 has not converged do\n\nfor i \u2208 1, ..., m do\n\nend for\n\u0398 \u2190 Adam(\u2207\u0398\n\nend while\n\n(cid:80)m\n\ni=1 Li, \u03b1)\n\n3\n\n\fAlgorithm 2 Applying a learned regularization functional with gradient descent\nRequire: Learned regularization functional \u03a8\u0398, measurements y, regularization weight \u03bb, step size\n\n\u03b4 , Stopping criterion S\n\n\u0001, operator A, inverse A+\nx \u2190 A+\n\u03b4 y\nwhile S not satis\ufb01ed do\n\n(cid:2)(cid:107)Ax \u2212 y(cid:107)2\n\nx \u2190 x \u2212 \u0001\u2207x\n\n2 + \u03bb\u03a8\u0398(x)(cid:3)\n\nend while\nreturn x\n\nIn the proposed algorithm, gradient descent is used to solve (5). As the neural network is in general\nnon-convex, convergence to a global optimum cannot be guaranteed. However, stable convergence to\na critical point has been observed in practice. More sophisticated algorithms like momentum methods\nor a forward-backward splitting of data term and regularization functional can be applied [10].\n\n3.2 Distributional Analysis\n\nHere we analyze the impact of the learned regularization functional on the induced image distribution.\nMore precisely, given a noisy image x drawn from Pn, consider the image obtained by performing a\nstep of gradient descent of size \u03b7 over the regularization functional \u03a8\u0398\n\ng\u03b7(x) := x \u2212 \u03b7 \u00b7 \u2207x\u03a8\u0398(x).\n\n(8)\nThis yields a distribution P\u03b7 := (g\u03b7)#Pn of noisy images that have undergone one step of gradient\ndescent. We show that this distribution is closer in Wasserstein distance to the distribution of ground\ntruth images Pr than the noisy image distribution Pn. The regularization functional hence introduces\nthe highly desirable incentive to align the distribution of minimizers of the regularization problem (5)\nwith the distribution of ground truth images.\nHenceforth, assume the network \u03a8\u0398 has been trained to perfection, i.e. that it is a 1-Lipschitz function\nwhich achieves the supremum in (7). Furthermore, assume \u03a8\u0398 is almost everywhere differentiable\nwith respect to the measure Pn.\nTheorem 1. Assume that \u03b7 (cid:55)\u2192 Wass(Pr, P\u03b7) admits a left and a right derivative at \u03b7 = 0, and that\nthey are equal. Then,\n\nWass(Pr, P\u03b7)|\u03b7=0 = \u2212EX\u223cPn\n\nd\nd\u03b7\n\n(cid:2)(cid:107)\u2207x\u03a8\u0398(X)(cid:107)2(cid:3) .\n\nProof. The proof follows [5, Theorem 3]. By an envelope theorem [20, Theorem 1], the existence of\nthe derivative at \u03b7 = 0 implies\n\nWass(Pr, P\u03b7)|\u03b7=0 =\nd\nd\u03b7\nOn the other hand, for a.e. x \u2208 X one can bound\n\nd\nd\u03b7\n\n(cid:12)(cid:12)(cid:12)(cid:12) = |(cid:104)\u2207x\u03a8\u0398(g\u03b7(x)),\u2207x\u03a8\u0398(x)(cid:105)| \u2264 (cid:107)\u2207x\u03a8\u0398(g\u03b7(x)(cid:107) \u00b7 (cid:107)\u2207x\u03a8\u0398(x)(cid:107) \u2264 1,\n\nEX\u223cPn [\u03a8\u0398(g\u03b7(X))]|\u03b7=0.\n\n(cid:12)(cid:12)(cid:12)(cid:12) d\n\nd\u03b7\n\n\u03a8\u0398(g\u03b7(x))\n\nfor any \u03b7 \u2208 R. Hence, in particular the difference quotient is bounded\n\n(cid:12)(cid:12)(cid:12)(cid:12) 1\n\n\u03b7\n\n[\u03a8\u0398(g\u03b7(x)) \u2212 \u03a8\u0398(x)]\n\n(cid:12)(cid:12)(cid:12)(cid:12) \u2264 1\n\nfor any x and \u03b7. By dominated convergence, this allows us to conclude\n\nEX\u223cPn [\u03a8\u0398(g\u03b7(X))]|\u03b7=0 = EX\u223cPn\n\nd\nd\u03b7\n\n[\u03a8\u0398(g\u03b7(X))]|\u03b7=0.\n\nd\nd\u03b7\n\nFinally,\n\n[\u03a8\u0398(g\u03b7(X))]|\u03b7=0 = \u2212(cid:107)\u2207x\u03a8\u0398(X)(cid:107)2.\n\nd\nd\u03b7\n\n4\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\n\fRemark 1. Under the weak assumptions in [13, Corollary 1], we have (cid:107)\u2207x\u03a8\u0398(x)(cid:107) = 1, for Pn a.e.\nx \u2208 X. This allows to compute the rate of decay of Wasserstein distance explicitly to\n\n[\u03a8\u0398(g\u03b7(X))]|\u03b7=0 = \u22121\n\nd\nd\u03b7\n\n(14)\n\nNote that the above calculations also show that the particular choice of loss functional is optimal\nin terms of decay rates of the Wasserstein distance, introducing the strongest incentive to align\nthe distribution of reconstructions with the ground truth distribution amongst all regularization\nfunctionals. To make this more precise, consider any other regularization functional f : X \u2192 R with\nnorm-bounded gradients, i.e. (cid:107)\u2207f (x)(cid:107) \u2264 1.\nCorollary 1. Denote by \u02dcg\u03b7(x) = x \u2212 \u03b7 \u00b7 \u2207f (x) the \ufb02ow associated to f. Set \u02dcP\u03b7 := (\u02dcg\u03b7)#(Pn).\nThen\n\nWass(Pr, \u02dcP\u03b7)|\u03b7=0 \u2265 \u22121 =\n\nd\nd\u03b7\n\nWass(Pr, P\u03b7)|\u03b7=0\n\nd\nd\u03b7\n\nProof. An analogous computation as above shows\n\nWass(Pr, \u02dcP\u03b7)|\u03b7=0 = \u2212EX\u223cPn [(cid:104)\u2207x\u03a8\u0398(x),\u2207xf (x)(cid:105)] \u2265 \u22121 = \u2212EX\u223cPn\n\nd\nd\u03b7\n\n(15)\n\n(cid:2)(cid:107)\u2207x\u03a8\u0398(X)(cid:107)2(cid:3) .\n\n3.3 Analysis under data manifold assumption\n\nHere we discuss which form of regularization functional is desirable under the data manifold\nassumption and show that the loss function (6) in fact gives rise to a regularization functional\nof this particular form.\nAssumption 1 (Weak Data Manifold Assumption). Assume the measure Pr is supported on the\nweakly compact set M, i.e. Pr(Mc) = 0\nThis assumption captures the intuition that real data lies in a curved lower-dimensional subspace of\nX.\nIf we consider the regularization functional as encoding prior knowledge about the image distribution,\nit follows that we would like the regularizer to penalize images which are away from M. An extreme\nway of doing this would be to set the regularization functional as the characteristic function of M.\nHowever, this choice of functional comes with two major disadvantages: First, solving (5) with\nmethods based on gradient descent becomes impossible when using such a regularization functional.\nSecond, the functional effectively leads to a projection onto the data manifold, possibly causing\nartifacts due to imperfect knowledge of M [8].\nAn alternative to consider is the distance function to the data manifold d(x,M), since such a choice\nprovides meaningful gradients everywhere. This is implicitly done in [21]. In Theorem 2, we show\nthat our chosen loss function in fact does give rise to a regularization functional \u03a8\u0398 taking the\ndesirable form of the l2 distance function to M.\nDenote by\n\nPM : D \u2192 M,\n\n(16)\nthe data manifold projection, where D denotes the set of points for which such a projection exists.\nWe assume Pn(D) = 1. This can be guaranteed under weak assumptions on M and Pn.\nAssumption 2. Assume the measures Pr and Pn satisfy\n\nx \u2192 argminy\u2208M (cid:107)x \u2212 y(cid:107)\n\n(PM)#(Pn) = Pr\n\n(17)\n\ni.e. for every measurable set A \u2282 X, we have Pn(P \u22121M (A)) = Pr(A)\nWe hence assume that the distortions of the true data present in the distribution of pseudo-inverses Pn\nare well-behaved enough to recover the distribution of true images from noisy ones by projecting back\nonto the manifold. Note that this is a much weaker than assuming that any given single image can be\nrecovered by projecting its pseudo-inverse back onto the data manifold. Heuristically, Assumption 2\ncorresponds to a low-noise assumption.\n\n5\n\n\fTheorem 2. Under Assumptions 1 and 2, a maximizer to the functional\n\nEX\u223cPn f (X) \u2212 EX\u223cPr f (X)\n\nsup\n\nf\u22081\u2212Lip\n\nis given by the distance function to the data manifold\ndM(x) := min\n\ny\u2208M(cid:107)x \u2212 y(cid:107)\n\n(18)\n\n(19)\n\nProof. First show that dM is Lipschitz continuous with Lipschitz constant 1. Let x1, x2 \u2208 X be\narbitrary and denote by \u02dcy a minimizer to miny\u2208M (cid:107)x2 \u2212 y(cid:107)2. Indeed,\ny\u2208M(cid:107)x2 \u2212 y(cid:107) = min\n\ny\u2208M(cid:107)x1 \u2212 y(cid:107) \u2212 (cid:107)x2 \u2212 \u02dcy(cid:107)\n\ndM(x1) \u2212 dM(x2) = min\n\ny\u2208M(cid:107)x1 \u2212 y(cid:107) \u2212 min\n\u2264 (cid:107)x1 \u2212 \u02dcy(cid:107) \u2212 (cid:107)x2 \u2212 \u02dcy(cid:107) \u2264 (cid:107)x1 \u2212 x2(cid:107),\n\nwhere we used the triangle inequality in the last step. This proves Lipschitz continuity by exchanging\nthe roles of x1 and x2.\nNow, we prove that dM obtains the supremum in 18. Let h be any 1-Lipschitz function. By\nassumption 2, one can rewrite\n\nEX\u223cPn [h(X)] \u2212 EX\u223cPr [h(X)] = EX\u223cPn [h(X) \u2212 h(PM(X))] .\n\nAs h is 1 Lipschitz, this can be bounded via\n\nEX\u223cPn [h(X) \u2212 h(PM(X))] \u2264 EX\u223cPn [(cid:107)X \u2212 PM(X)(cid:107)] .\n\nThe distance between x and PM(x) is by de\ufb01nition given by dM(x). This allows to conclude via\n\nEX\u223cPn [(cid:107)X \u2212 PM(X)(cid:107)] = EX\u223cPn [dM(X)] = EX\u223cPn [dM(X) \u2212 dM(PM(X))]\n\n= EX\u223cPn [dM(X)] \u2212 EX\u223cPr [dM(X)] .\n\nRemark 2 (Non-uniqueness). The functional (18) does not necessarily have a unique maximizer.\nFor example, f can be changed to an arbitrary 1-Lipschitz function outside the convex hull of\nsupp(Pr) \u2229 supp(Pn).\n\n4 Stability\n\n(20)\n\n(21)\n\nFollowing the well-developed stability theory for classical variational problems [11], we derive a\nstability estimate for the adversarial regularizer algorithm. The key difference to existing theory is\nthat we do not assume the regularization functional f is bounded from below. Instead, this is replaced\nby a 1 Lipschitz assumption on f.\nTheorem 3 (Weak Stability in Data Term). We make Assumption 3. Let yn be a sequence in Y with\nyn \u2192 y in the norm topology and denote by xn a sequence of minimizers of the functional\n\nargminx\u2208X (cid:107)Ax \u2212 yn(cid:107)2 + \u03bbf (x)\n\nThen xn has a weakly convergent subsequence and the limit x is a minimizer of (cid:107)Ax \u2212 y(cid:107)2 + \u03bbf (x).\nThe assumptions and the proof are contained in Appendix A.\n\n5 Computational Results\n\n5.1 Parameter estimation\n\nApplying the algorithm to new data requires choosing a regularization parameter \u03bb. Making the\nassumption that the ground truth images are critical points of the variational problem (5), \u03bb can\nbe estimated ef\ufb01ciently from the noise level, using the fact that the regularization functional has\ngradients of unit norm. This leads to the formula\n\nwhere A\u2217 denotes the adjoint and pn the noise distribution. In all experiments, the regularization\nparameter has been chosen according to this formula without further tuning.\n\n\u03bb = 2 Ee\u223cpn(cid:107)A\u2217e(cid:107)2,\n\n6\n\n\fTable 1: Denoising results on BSDS dataset\n\nMethod\nNoisy Image\n\nMODEL-BASED\n\nTotal Variation [23]\n\nSUPERVISED\n\nDenoising N.N. [28]\n\nUNSUPERVISED\n\nAdversarial Regularizer (ours)\n\nPSNR (dB)\n20.3\n\nSSIM\n.534\n\n26.3\n\n28.8\n\n28.2\n\n.836\n\n.908\n\n.892\n\n(a) Ground Truth\n\n(b) Noisy Image\n\n(c) TV\n\n(d) Denoising N.N. (e) Adversarial Reg.\n\nFigure 1: Denoising Results on BSDS\n\n5.2 Denoising\n\nAs a toy problem, we compare the performance of total variation denoising [23], a supervised\ndenoising neural network approach [28] based on the UNet [22] architecture and our proposed\nalgorithm on images of size 128 \u00d7 128 cut out of images taken from the BSDS500 dataset [3]. The\nimages have been corrupted with Gaussian white noise. We report the average peak signal-to-noise\nratio (PSNR) and the structural similarity index (SSIM) [27] in Table 1.\nThe results in Figure 1 show that the adversarial regularizer algorithm is able to outperform classical\nvariational methods in all quality measures. It achieves results of comparable visual quality than\nsupervised data-driven algorithms, without relying on supervised training data.\n\n5.3 Computed Tomography\n\nComputer Tomography reconstruction is an application in which the variational approach is very\nwidely used in practice. Here, it serves as a prototype inverse problem with non-trivial forward\noperator. We compare the performance of total variation [18, 23], post-processing [15], Regular-\nization by Denoising (RED) [21] and our proposed regularizers on the LIDC/IDRI database [6] of\nlung scans. The denoising algorithm underlying RED has been chosen to be the denoising neural\nnetwork previously trained for post-processing. Measurements have been simulated by taking the ray\ntransform, corrupted with Gaussian white noise. With 30 different angles taken for the ray transform,\nthe forward operator is undersampled. The code is available online 1.\nThe results on different noise levels can be found in Table 2 and Figure 2, with further examples in\nAppendix C. Note in Table 2 that Post-Processing has been trained with PSNR as target loss function.\nAgain, total variation is outperformed by a large margin in all categories. Our reconstructions are of\nthe same or superior visual quality than the ones obtained with supervised machine learning methods,\ndespite having used unsupervised data only.\n\n6 Conclusion\n\nWe have proposed an algorithm for solving inverse problems, using a neural network as regulariza-\ntion functional. We have introduced a novel training algorithm for regularization functionals and\nshowed that the resulting regularizers have desirable theoretical properties. Unlike other data-based\n\n1https://github.com/lunz-s/DeepAdverserialRegulariser\n\n7\n\n\fTable 2: CT reconstruction on LIDC dataset\n\n(a) High noise\n\n(b) Low noise\n\nMethod\n\nPSNR (dB)\n\nSSIM\n\nMethod\n\nPSNR (dB)\n\nSSIM\n\nMODEL-BASED\n\nFiltered Backprojection\nTotal Variation [18]\n\nSUPERVISED\n\nPost-Processing [15]\nRED [21]\n\nUNSUPERVISED\n\nAdversarial Reg. (ours)\n\n14.9\n27.7\n\n31.2\n29.9\n\n30.5\n\n.227\n.890\n\n.936\n.904\n\n.927\n\nMODEL-BASED\n\nFiltered Backprojection\nTotal Variation [18]\n\nSUPERVISED\n\nPost-Processing [15]\nRED [21]\n\nUNSUPERVISED\n\nAdversarial Reg. (ours)\n\n23.3\n30.0\n\n33.6\n32.8\n\n32.5\n\n.604\n.924\n\n.955\n.947\n\n.946\n\n(a) Ground Truth\n\n(b) FBP\n\n(c) TV\n\n(d) Post-Processing (e) Adversarial Reg.\n\nFigure 2: Reconstruction from simulated CT measurements on the LIDC dataset\n\napproaches in inverse problems, the proposed algorithm can be trained even if only unsupervised\ntraining data is available. This allows to apply the algorithm to situations where -due to a lack of\nappropriate training data- machine learning methods have not been used yet.\nThe variational framework enables us to effectively insert knowledge about the forward operator and\nthe noise model into the reconstruction, allowing the algorithm to be trained on little training data. It\nalso comes with the advantages of a well-developed stability theory and the possibility of adapting\nthe algorithms to different noise levels by changing the regularization parameter \u03bb, without having to\nretrain the model from scratch.\nThe computational results demonstrate the potential of the algorithm, producing reconstructions of\nthe same or even superior visual quality as the ones obtained with supervised approaches on the LIDC\ndataset, despite the fact that only unsupervised data has been used for training. Classical methods\nlike total variation are outperformed by a large margin.\nOur approach is particularly well-suited for applications in medical imaging, where usually very few\ntraining samples are available and ground truth images to a particular measurement are hard to obtain,\nmaking supervised algorithms impossible train.\n\n7 Extensions\n\nThe algorithm admits some extensions and modi\ufb01cations.\n\n\u2022 Local Regularizers. The regularizer is restricted to act on small patches of pixels only,\ngiving the value of the regularization functional by averaging over all patches. This allows\nto harvest many training samples from a single image, making the algorithm trainable on\neven less training data. Local Adversarial Regularizers can be implemented by choosing a\nneural network architecture consisting of convolutional layers followed by a global average\npooling.\n\n\u2022 Recursive Training. When applying the regularization functional, the variational problem\nhas to be solved. In this process, the regularization functional is confronted with partially\nreconstructed images, which are neither ground truth images nor exhibit the typical noise\ndistribution the regularization functional has been trained on. By adding these images to the\n\n8\n\n\fsamples the regularization functional is trained on, the neural network is enabled to learn\nfrom its own outputs. First implementations show that this can lead to an additional boost in\nperformance, but that the choice of which images to add is very delicate.\n\n8 Acknowledgments\n\nWe thank Sam Power, Robert Tovey, Matthew Thorpe, Jonas Adler, Erich Kobler, Jo Schlemper,\nChristoph Kehle and Moritz Scham for helpful discussions and advice.\nThe authors acknowledge the National Cancer Institute and the Foundation for the National Institutes\nof Health, and their critical role in the creation of the free publicly available LIDC/IDRI Database\nused in this study. The work by Sebastian Lunz was supported by the EPSRC grant EP/L016516/1\nfor the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis\nand by the Cantab Capital Institute for the Mathematics of Information. The work by Ozan \u00d6ktem\nwas supported by the Swedish Foundation for Strategic Research grant AM13-0049. Carola-Bibiane\nSch\u00f6nlieb acknowledges support from the Leverhulme Trust project on \u2018Breaking the non-convexity\nbarrier\u2019, EPSRC grant Nr. EP/M00483X/1, the EPSRC Centre Nr. EP/N014588/1, the RISE projects\nCHiPS and NoMADS, the Cantab Capital Institute for the Mathematics of Information and the Alan\nTuring Institute.\n\nReferences\n[1] Jonas Adler and Ozan \u00d6ktem. Solving ill-posed inverse problems using iterative deep neural\n\nnetworks. Inverse Problems, 33(12), 2017.\n\n[2] Jonas Adler and Ozan \u00d6ktem. Learned primal-dual reconstruction. IEEE Transactions on\n\nMedical Imaging, 37(6):1322\u20131332, 2018.\n\n[3] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and\n\nhierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5).\n\n[4] Maria Argyrou, Dimitris Maintas, Charalampos Tsoumpas, and Efstathios Stiliaris. Tomo-\ngraphic image reconstruction based on arti\ufb01cial neural network (ANN) techniques. In Nuclear\nScience Symposium and Medical Imaging Conference (NSS/MIC). IEEE, 2012.\n\n[5] Mart\u00edn Arjovsky, Soumith Chintala, and L\u00e9on Bottou. Wasserstein generative adversarial\n\nnetworks. International Conference on Machine Learning, ICML, 2017.\n\n[6] Samuel Armato, Geoffrey McLennan, Luc Bidaut, Michael McNitt-Gray, Charles Meyer,\nAnthony Reeves, Binsheng Zhao, Denise Aberle, Claudia Henschke, Eric Hoffman, et al.\nThe lung image database consortium (LIDC) and image database resource initiative (IDRI): a\ncompleted reference database of lung nodules on ct scans. Medical physics, 38(2), 2011.\n\n[7] Martin Benning, Guy Gilboa, Joana Sarah Grah, and Carola-Bibiane Sch\u00f6nlieb. Learning \ufb01lter\nfunctions in regularisers by minimising quotients. In International Conference on Scale Space\nand Variational Methods in Computer Vision. Springer, 2017.\n\n[8] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros Dimakis. Compressed sensing using\n\ngenerative models. arXiv preprint arXiv:1703.03208, 2017.\n\n[9] Luca Calatroni, Chung Cao, Juan Carlos De Los Reyes, Carola-Bibiane Sch\u00f6nlieb, and Tuomo\nValkonen. Bilevel approaches for learning of variational imaging models. RADON book series,\n8, 2012.\n\n[10] Antonin Chambolle and Thomas Pock. An introduction to continuous optimization for imaging.\n\nActa Numerica, 25, 2016.\n\n[11] Heinz Werner Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems,\n\nvolume 375. Springer Science & Business Media, 1996.\n\n[12] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil\nOzair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural\ninformation processing systems (NIPS), 2014.\n\n[13] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville.\nImproved training of wasserstein GANs. Advances in Neural Information Processing Systems\n(NIPS), 2017.\n\n9\n\n\f[14] Kerstin Hammernik, Teresa Klatzer, Erich Kobler, Michael Recht, Daniel Sodickson, Thomas\nPock, and Florian Knoll. Learning a variational network for reconstruction of accelerated MRI\ndata. Magnetic resonance in medicine, 79(6), 2018.\n\n[15] Kyong Hwan Jin, Michael McCann, Emmanuel Froustey, and Michael Unser. Deep convolu-\ntional neural network for inverse problems in imaging. IEEE Transactions on Image Processing,\n26(9), 2017.\n\n[16] Florian Knoll, Kristian Bredies, Thomas Pock, and Rudolf Stollberger. Second order total\n\ngeneralized variation (TGV) for MRI. Magnetic Resonance in Medicine, 65(2), 2011.\n\n[17] Karl Kunisch and Thomas Pock. A bilevel optimization approach for parameter learning\n\niniational models. SIAM Journal on Imaging Sciences, 6(2), 2013.\n\n[18] Rowan Leary, Zineb Saghi, Paul Midgley, and Daniel Holland. Compressed sensing electron\n\ntomography. Ultramicroscopy, 131, 2013.\n\n[19] Tim Meinhardt, Michael Moeller, Caner Hazirbas, and Daniel Cremers. Learning proximal\noperators: Using denoising networks for regularizing inverse imaging problems. In International\nConference on Computer Vision (ICCV), 2017.\n\n[20] Paul Milgrom and Ilya Segal. Envelope theorems for arbitrary choice sets. Econometrica, 70(2),\n\n2002.\n\n[21] Yaniv Romano, Michael Elad, and Peyman Milanfar. The little engine that could: Regularization\n\nby denoising (RED). SIAM Journal on Imaging Sciences, 10(4), 2017.\n\n[22] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for\nbiomedical image segmentation. In International Conference on Medical Image Computing\nand Computer-Assisted Intervention. Springer, 2015.\n\n[23] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal\n\nalgorithms. Physica D: nonlinear phenomena, 60(1-4), 1992.\n\n[24] Otmar Scherzer, Markus Grasmair, Harald Grossauer, Markus Haltmeier, and Frank Lenzen.\n\nVariational methods in imaging. Springer, 2009.\n\n[25] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony Price, and Daniel Rueckert. A\ndeep cascade of convolutional neural networks for mr image reconstruction. In International\nConference on Information Processing in Medical Imaging. Springer, 2017.\n\n[26] C\u00e9dric Villani. Optimal transport: old and new, volume 338. Springer Science & Business\n\nMedia, 2008.\n\n[27] Zhou Wang, Alan Bovik, Hamid Sheikh, and Eero Simoncelli. Image quality assessment: from\n\nerror visibility to structural similarity. IEEE transactions on image processing, 13(4), 2004.\n\n[28] Junyuan Xie, Linli Xu, and Enhong Chen. Image denoising and inpainting with deep neural\n\nnetworks. In Advances in Neural Information Processing Systems (NIPS), 2012.\n\n10\n\n\f", "award": [], "sourceid": 5137, "authors": [{"given_name": "Sebastian", "family_name": "Lunz", "institution": "University of Cambridge"}, {"given_name": "Ozan", "family_name": "\u00d6ktem", "institution": "KTH - Royal Institute of Technology"}, {"given_name": "Carola-Bibiane", "family_name": "Sch\u00f6nlieb", "institution": "Cambridge University"}]}