{"title": "Variational Denoising Network: Toward Blind Noise Modeling and Removal", "book": "Advances in Neural Information Processing Systems", "page_first": 1690, "page_last": 1701, "abstract": "Blind image denoising is an important yet very challenging problem in computer\nvision due to the complicated acquisition process of real images. In this work we\npropose a new variational inference method, which integrates both noise estimation and image denoising into a unique Bayesian framework, for blind image denoising. Specifically, an approximate posterior, parameterized by deep neural networks, is presented by taking the intrinsic clean image and noise variances as latent variables conditioned on the input noisy image. This posterior provides explicit parametric forms for all its involved hyper-parameters, and thus can be easily implemented for blind image denoising with automatic noise estimation for the test noisy image. On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression. On the other hand, VDN inherits the advantages of traditional model-driven approaches, especially the good generalization capability of generative models. VDN has good interpretability and can be flexibly utilized to estimate and remove complicated non-i.i.d. noise collected in real scenarios. Comprehensive experiments are performed to substantiate the superiority of our method in blind image denoising.", "full_text": "Variational Denoising Network: Toward Blind Noise\n\nModeling and Removal\n\nZongsheng Yue1,2, Hongwei Yong2, Qian Zhao1, Lei Zhang2,3, Deyu Meng4,1,*\n\n1 School of Mathematics and Statistics, Xi\u2019an Jiaotong University, Shaanxi, China\n\n2Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong\n\n4Faculty of Information Technology, The Macau University of Science and Technology, Macau, China\n\n3DAMO Academy, Alibaba Group, Shenzhen, China\n\n*Corresponding author: dymeng@mail.xjtu.edu.cn\n\nAbstract\n\nBlind image denoising is an important yet very challenging problem in computer\nvision due to the complicated acquisition process of real images. In this work we\npropose a new variational inference method, which integrates both noise estimation\nand image denoising into a unique Bayesian framework, for blind image denoising.\nSpeci\ufb01cally, an approximate posterior, parameterized by deep neural networks, is\npresented by taking the intrinsic clean image and noise variances as latent variables\nconditioned on the input noisy image. This posterior provides explicit parametric\nforms for all its involved hyper-parameters, and thus can be easily implemented\nfor blind image denoising with automatic noise estimation for the test noisy image.\nOn one hand, as other data-driven deep learning methods, our method, namely\nvariational denoising network (VDN), can perform denoising ef\ufb01ciently due to\nits explicit form of posterior expression. On the other hand, VDN inherits the\nadvantages of traditional model-driven approaches, especially the good general-\nization capability of generative models. VDN has good interpretability and can\nbe \ufb02exibly utilized to estimate and remove complicated non-i.i.d. noise collected\nin real scenarios. Comprehensive experiments are performed to substantiate the\nsuperiority of our method in blind image denoising.\n\nIntroduction\n\n1\nImage denoising is an important research topic in computer vision, aiming at recovering the underlying\nclean image from an observed noisy one. The noise contained in a real noisy image is generally\naccumulated from multiple different sources, e.g., capturing instruments, data transmission media,\nimage quantization, etc. [40]. Such complicated generation process makes it fairly dif\ufb01cult to access\nthe noise information accurately and recover the underlying clean image from the noisy one. This\nconstitutes the main aim of blind image denoising.\nThere are two main categories of image denoising methods. Most classical methods belong to the \ufb01rst\ncategory, mainly focusing on constructing a rational maximum a posteriori (MAP) model, involving\nthe \ufb01delity (loss) and regularization terms, from a Bayesian perspective [6]. An understanding for data\ngeneration mechanism is required for designing a rational MAP objective, especially better image\npriors like sparsity [3], low-rankness [16, 50, 42], and non-local similarity [9, 27]. These methods\nare superior mainly in their interpretability naturally led by the Bayesian framework. They, however,\nstill exist critical limitations due to their assumptions on both image prior and noise (generally i.i.d.\nGaussian), possibly deviating from real spatially variant (i.e.,non-i.i.d.) noise, and their relatively low\nimplementation speed since the algorithm needs to be re-implemented for any new coming image.\nRecently, deep learning approaches represent a new trend along this research line. The main idea is to\n\ufb01rstly collect large amount of noisy-clean image pairs and then train a deep neural network denoiser\non these training data in an end-to-end learning manner. This approach is especially superior in its\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\feffective accumulation of knowledge from large datasets and fast denoising speed for test images.\nThey, however, are easy to over\ufb01t to the training data with certain noisy types, and still could not be\ngeneralized well on test images with unknown but complicated noises.\nThus, blind image denoising especially for real images is still a challenging task, since the real\nnoise distribution is dif\ufb01cult to be pre-known (for model-driven MAP approaches) and hard to be\ncomprehensively simulated by training data (for data-driven deep learning approaches).\nAgainst this issue, this paper proposes a new variational inference method, aiming at directly inferring\nboth the underlying clean image and the noise distribution from an observed noisy image in a unique\nBayesian framework. Speci\ufb01cally, an approximate posterior is presented by taking the intrinsic clean\nimage and noise variances as latent variables conditioned on the input noisy image. This posterior\nprovides explicit parametric forms for all its involved hyper-parameters, and thus can be ef\ufb01ciently\nimplemented for blind image denoising with automatic noise estimation for test noisy images.\nIn summary, this paper mainly makes following contributions: 1) The proposed method is capable\nof simultaneously implementing both noise estimation and blind image denoising tasks in a unique\nBayesian framework. The noise distribution is modeled as a general non-i.i.d. con\ufb01gurations with\nspatial relevance across the image, which evidently better complies with the heterogeneous real\nnoise beyond the conventional i.i.d. noise assumption. 2) Succeeded from the \ufb01ne generalization\ncapability of the generative model, the proposed method is veri\ufb01ed to be able to effectively estimate\nand remove complicated non-i.i.d. noises in test images even though such noise types have never\nappeared in training data, as clearly shown in Fig. 3. 3) The proposed method is a generative approach\noutputted a complete distribution revealing how the noisy image is generated. This not only makes\nthe result with more comprehensive interpretability beyond traditional methods purely aiming at\nobtaining a clean image, but also naturally leads to a learnable likelihood (\ufb01delity) term according\nto the data-self. 4) The most commonly utilized deep learning paradigm, i.e., taking MSE as loss\nfunction and training on large noisy-clean image pairs, can be understood as a degenerated form of\nthe proposed generative approach. Their over\ufb01tting issue can then be easily explained under this\nvariational inference perspective: these methods intrinsically put dominant emphasis on \ufb01tting the\npriors of the latent clean image, while almost neglects the effect of noise variations. This makes them\nincline to over\ufb01t noise bias on training data and sensitive to the distinct noises in test noisy images.\nThe paper is organized as follows: Section 2 introduces related work. Sections 3 presents the proposed\nfull Bayesion model, the deep variational inference algorithm, the network architecture and some\ndiscussions. Section 4 demonstrates experimental results and the paper is \ufb01nally concluded.\n2 Related Work\nWe present a brief review for the two main categories of image denoising methods, i.e., model-driven\nMAP based methods and data-driven deep learning based methods.\nModel-driven MAP based Methods: Most classical image denoising methods belong to this cate-\ngory, through designing a MAP model with a \ufb01delity/loss term and a regularization one delivering\nthe pre-known image prior. Along this line, total variation denoising [37], anisotropic diffusion [31]\nand wavelet coring [38] use the statistical regularities of images to remove the image noise. Later,\nthe nonlocal similarity prior, meaning many small patches in a non-local image area possess similar\ncon\ufb01gurations, was widely used in image denoising. Typical ones include CBM3D [11] and non-local\nmeans [9]. Some dictionary learning methods [16, 13, 42] and Field-of-Experts (FoE) [36], also re-\nvealing certain prior knowledge of image patches, had also been attempted for the task. Several other\napproaches focusing on the \ufb01delity term, which are mainly determined by the noise assumption on\ndata. E.g., Mulitscale [23] assumed the noise of each patch and its similar patches in the same image\nto be correlated Gaussian distribution, and LR-MoG [30, 48, 50], DP-GMM [44] and DDPT [49]\n\ufb01tted the image noise by using Mixture of Gaussian (MoG) as an approximator for noises.\nData-driven Deep Learning based Methods: Instead of pre-setting image prior, deep learning\nmethods directly learn a denoiser (formed as a deep neural network) from noisy to clean ones\non a large collection of noisy-clean image pairs. Jain and Seung [19] \ufb01rstly adopted a \ufb01ve layer\nconvolution neural network (CNN) for the task. Then some auto-encoder based methods [41, 2] were\napplied. Meantime, Burger et al. [10] achieved the comparable performance with BM3D using plain\nmulti-layer perceptron (MLP). Zhang et al. [45] further proposed the denoising convolution network\n(DnCNN) and achieved state-of-the-art performance on Gaussian denoising tasks. Mao et al. [29]\nproposed a deep fully convolution encoding-decoding network with symmetric skip connection. Tai\n\n2\n\n\fet al. [39] preposed a very deep persistent memory network (MemNet) to explicitly mine persistent\nmemory through an adaptive learning process. Recently, NLRN [25], N3Net [33] and UDNet [24]\nall embedded the non-local property of image into DNN to facilitate the denoising task. In order to\nboost the \ufb02exibility against spatial variant noise, FFDNet [46] was proposed by pre-evaluating the\nnoise level and inputting it to the network together with the noisy image. Guo et al. [17] and Brooks\net al. [8] both attempted to simulate the generation process of the images in camera.\n3 Variational Denoising Network for Blind Noise Modeling\nGiven training set D = {yj, xj}n\nj=1, where yj, xj denote the jth training pair of noisy and the\nexpected clean images, n represents the number of training images, our aim is to construct a variational\nparametric approximation to the posterior of the latent variables, including the latent clean image\nand the noise variances, conditioned on the noisy image. Note that for the noisy image y, its training\npair x is generally a simulated \u201cclean\u201d one obtained as the average of many noisy ones taken under\nsimilar camera conditions [4, 1], and thus is always not the exact latent clean image z. This explicit\nparametric posterior can then be used to directly infer the clean image and noise distribution from\nany test noisy image. To this aim, we \ufb01rst need to formulate a rational full Bayesian model of the\nproblem based on the knowledge delivered by the training image pairs.\n3.1 Constructing Full Bayesian Model Based on Training Data\nDenote y = [y1,\u00b7\u00b7\u00b7 , yd]T and x = [x1,\u00b7\u00b7\u00b7 , xd]T as any training pair in D, where d (width*height)\nis the size of a training image1. We can then construct the following model to express the generation\nprocess of the noisy image y:\n\nyi \u223c N (yi|zi, \u03c32\n\ni ), i = 1, 2,\u00b7\u00b7\u00b7 , d,\n\n(1)\nwhere z \u2208 Rd is the latent clean image underlying y, N (\u00b7|\u00b5, \u03c32) denotes the Gaussian distribution\nwith mean \u00b5 and variance \u03c32. Instead of assuming i.i.d. distribution for the noise as conventional [28,\n13, 16, 42], which largely deviates the spatial variant and signal-depend characteristics of the real\nnoise [46, 8], we models the noise as a non-i.i.d. and pixel-wise Gaussian distribution in Eq. (1).\nThe simulated \u201cclean\u201d image x evidently provides a strong prior to the latent variable z. Accordingly\nwe impose the following conjugate Gaussian prior on z:\n\n(2)\n\n2,\u00b7\u00b7\u00b7 , \u03c32\ni \u223c IG\n\u03c32\n\n0), i = 1, 2,\u00b7\u00b7\u00b7 , d,\nwhere \u03b50 is a hyper-parameter and can be easily set as a small value.\nBesides, for \u03c32 = {\u03c32\n\nzi \u223c N (zi|xi, \u03b52\n\n1, \u03c32\n\n(cid:18)\nd}, we also introduce a rational conjugate prior as follows:\n\n(cid:19)\n\n(3)\n\n\u2212 1,\n\np2\u03bei\n2\n\ni | p2\n\u03c32\n2\n\n, i = 1, 2,\u00b7\u00b7\u00b7 , d,\n\nwhere IG(\u00b7|\u03b1, \u03b2) is the inverse Gamma distribution with parameter \u03b1 and \u03b2, \u03be = G(cid:0)( \u02c6y \u2212 \u02c6x)2; p(cid:1)\n\nrepresents the \ufb01ltering output of the variance map ( \u02c6y \u2212 \u02c6x)2 by a Gaussian \ufb01lter with p \u00d7 p window,\nand \u02c6y, \u02c6x \u2208 Rh\u00d7w are the matrix (image) forms of y, x \u2208 Rd, respectively. Note that the mode of\nabove IG distribution is \u03bei [6, 43], which is a approximate evaluation of \u03c32\nCombining Eqs. (1)-(3), a full Bayesian model for the problem can be obtained. The goal then turns\nto infer the posterior of latent variables z and \u03c32 from noisy image y, i.e., p(z, \u03c32|y).\n3.2 Variational Form of Posterior\nWe \ufb01rst construct a variational distribution q(z, \u03c32|y) to approximate the posterior p(z, \u03c32|y) led\nby Eqs. (1)-(3). Similar to the commonly used mean-\ufb01eld variation inference techniques, we assume\nconditional independence between variables z and \u03c32, i.e.,\n\ni in p \u00d7 p window.\n\nq(z, \u03c32|y) = q(z|y)q(\u03c32|y).\n\n(4)\nBased on the conjugate priors in Eqs. (2) and (3), it is natural to formulate variational posterior forms\nof z and \u03c32 as follows:\nq(z|y) =\n\ni |\u03b1i(y; WS), \u03b2i(y; WS)), (5)\n\ni (y; WD)), q(\u03c32|y) =\n\nN (zi|\u00b5i(y; WD), m2\n\nd(cid:89)\n\nd(cid:89)\n\nIG(\u03c32\n\ni\n\ni\n\n1We use j (= 1,\u00b7\u00b7\u00b7 , n) and i (= 1,\u00b7\u00b7\u00b7 , d) to express the indexes of training data and data dimension,\n\nrespectively, throughout the entire paper.\n\n3\n\n\fFigure 1: The architecture of the proposed deep variational inference network. The red solid lines denote the\nforward process, and the blue dotted lines mark the gradient \ufb02ow direction in the BP algorithm.\n\nwhere \u00b5i(y; WD) and m2\ni (y; WD) are designed as the prediction functions for getting posterior\nparameters of latent variable z directly from y. The function is represented as a network, called\ndenoising network or D-Net, with parameters WD. Similarly, \u03b1i(y; WS) and \u03b2i(y; WS) denote the\nprediction functions for evaluating posterior parameters of \u03c32 from y, where WS represents the\nparameters of the network, called Sigma network or S-Net. The aforementioned is illustrated in Fig. 1.\nOur aim is then to optimize these network parameters WD and WS so as to get the explicit functions\nfor predicting clean image z as well as noise knowledge \u03c32 from any test noisy image y. A rational\nobjective function with respect to WD and WS is thus necessary to train both the networks.\nNote that the network parameters WD and WS are shared by posteriors calculated on all training\ndata, and thus if we train them on the entire training set, the method is expected to induce the general\nstatistical inference insight from noisy image to its underlying clean image and noise level.\n\n3.3 Variational Lower Bound of Marginal Data Likelihood\nFor notation convenience, we simply write \u00b5i(y; WD), m2\nm2\nthe training set, we can decompose its marginal likelihood as the following form [7]:\n\ni (y; WD), \u03b1i(y; WS), \u03b2i(y; WS) as \u00b5i,\ni , \u03b1i, \u03b2i in the following calculations. For any noisy image y and its simulated \u201cclean\u201d image x in\n\nlog p(y; z, \u03c32) = L(z, \u03c32; y) + DKL\n\n(cid:0)q(z, \u03c32|y)||p(z, \u03c32|y)(cid:1) ,\n(cid:2)log p(y|z, \u03c32)p(z)p(\u03c32) \u2212 log q(z, \u03c32|y)(cid:3) ,\n\nwhere\n\nL(z, \u03c32; y) = Eq(z,\u03c32|y)\n\n(7)\nHere Ep(x)[f (x)] represents the exception of f (x) w.r.t. stochastic variable x with probability density\nfunction p(x). The second term of Eq. (6) is a KL divergence between the variational approximate\nposterior q(z, \u03c32|y) and the true posterior p(z, \u03c32|y) with non-negative value. Thus the \ufb01rst term\nL(z, \u03c32; y) constitutes a variational lower bound on the marginal likelihood of p(y|z, \u03c32), i.e.,\n\nlog p(y; z, \u03c32) \u2265 L(z, \u03c32; y).\n\nAccording to Eqs. (4), (5) and (7), the lower bound can then be rewritten as:\nL(z, \u03c32; y) = Eq(z,\u03c32|y)\nIt\u2019s pleased that all the three terms in Eq (9) can be integrated analytically as follows:\n\n(cid:2)log p(y|z, \u03c32)(cid:3)\u2212 DKL (q(z|y)||p(z))\u2212 DKL\nd(cid:88)\n\n(cid:2)log p(y|z, \u03c32)(cid:3) =\n\n(cid:0)q(\u03c32|y)||p(\u03c32)(cid:1) . (9)\n(cid:2)(yi \u2212 \u00b5i)2 + m2\n\n(cid:3)(cid:111)\n\n(10)\n\nEq(z,\u03c32|y)\n\nlog 2\u03c0 \u2212 1\n2\n\n(log \u03b2i \u2212 \u03c8(\u03b1i)) \u2212 \u03b1i\n2\u03b2i\n\ni\n\n,\n\n(cid:110) \u2212 1\nd(cid:88)\n\n2\n\ni=1\n\nDKL (q(z|y)||p(z)) =\n\n(6)\n\n(8)\n\n(11)\n\n(cid:110) (\u00b5i \u2212 xi)2\n\n(cid:20) m2\n\ni\n\u03b52\n0\n\n+\n\n1\n2\n\ni=1\n\n2\u03b52\n0\n\n4\n\n(cid:21)(cid:111)\n\n,\n\n\u2212 log\n\n\u2212 1\n\nm2\ni\n\u03b52\n0\n\n\ud835\udf41\ud835\udc8e\ud835\udfd0\ud835\udf36\ud835\udf37\ud835\udc37\ud835\udc58\ud835\udc59(\ud835\udc5e(\ud835\udc9b|\ud835\udc9a)||\ud835\udc5d\ud835\udc9b)\ud835\udc37\ud835\udc58\ud835\udc59(\ud835\udc5e(\ud835\udf482|\ud835\udc9a)||\ud835\udc5d\ud835\udf482)\ud835\udc38\ud835\udc5e(\ud835\udc67,\ud835\udf0e2)[log\ud835\udc5d\ud835\udc9a\ud835\udc9b,\ud835\udf482]\u2112(\ud835\udc9b,\ud835\udf482;\ud835\udc9a)D-Net: Denoising NetworkS-Net: Sigma NetworkVariational Posterior:\ud835\udc5e\ud835\udc9b\ud835\udc9a=\ud835\udca9\ud835\udc9b\ud835\udf41,\ud835\udc8e2\ud835\udc5e\ud835\udf482\ud835\udc9a=\ud835\udc3c\ud835\udc3a(\ud835\udf482|\ud835\udf36,\ud835\udf37)\ud835\udc9a\ud835\udc9a\f(cid:0)q(\u03c32|y)||p(\u03c32)(cid:1) =\n\nDKL\n\n(cid:26)(cid:18)\n\nd(cid:88)\n\ni=1\n\n(cid:19)\n\n(cid:19)(cid:18)\n\n+ 1\n\n\u03c8(\u03b1i) +\n\n(cid:20)\n\nlog \u0393\n\n(cid:18) p2\n\n2\n\n(cid:19)\n\n(cid:21)\n(cid:19)(cid:27)\n\n\u2212 1\n\n(cid:19)\n\n\u2212 log \u0393(\u03b1i)\n\n(cid:18) p2\u03bei\n\n\u2212 1\n\n\u03b1i \u2212 p2\n2\n\n(cid:18) p2\n\n2\n\n\u2212 n(cid:88)\n\nj=1\n\nmin\n\nWD,WS\n\n(12)\nwhere \u03c8(\u00b7) denotes the digamma function. Calculation details are listed in supplementary material.\nWe can then easily get the expected objective function (i.e., a negtive lower bound of the marginal\nlikelihood on entire training set) for optimizing the network parameters of D-Net and S-Net as follows:\n\n+ \u03b1i\n\n2\u03b2i\n\n+\n\n,\n\n\u2212 1\n\nlog \u03b2i \u2212 log\n\np2\u03bei\n2\n\nL(zj, \u03c32\n\nj ; yj).\n\n(13)\n\n3.4 Network Learning\nAs aforementioned, we use D-Net and S-Net together to infer the variational parameters \u00b5, m2 and\n\u03b1, \u03b2 from the input noisy image y, respectively, as shown in Fig. 1. It is critical to consider how\nto calculate derivatives of this objective with respect to WD, WS involved in \u00b5, m2, \u03b1 and \u03b2 to\nfacilitate an easy use of stochastic gradient varitional inference. Fortunately, different from other\nrelated variational inference techniques like VAE [22], all three terms of Eqs. (10)-(12) in the lower\nbound Eq. (9) are differentiable and their derivatives can be calculated analytically without the need\nof any reparameterization trick, largely reducing the dif\ufb01culty of network training.\nAt the training stage of our method, the network parameters can be easily updated with backprop-\nagation (BP) algorithm [15] through Eq. (13). The function of each term in this objective can be\nintuitively explained: the \ufb01rst term represents the likelihood of the observed noisy images in training\nset, and the last two terms control the discrepancy between the variational posterior and the corre-\nsponding prior. During the BP training process, the gradient information from the likelihood term of\nEq. (10) is used for updating both the parameters of D-Net and S-Net simultaneously, implying that\nthe inference for the latent clean image z and \u03c32 is guided to be learned from each other.\nAt the test stage, for any test noisy image, through feeding it into D-Net, the \ufb01nal denoising result can\nbe directly obtained by \u00b5. Additionally, through inputting the noisy image to the S-Net, the noise\ndistribution knowledge (i.e., \u03c32) is easily inferred. Speci\ufb01cally, the noise variance in each pixel can\nbe directly obtained by using the mode of the inferred inverse Gamma distribution: \u03c32\n\ni = \u03b2i\n\n(\u03b1i+1).\n\n3.5 Network Architecture\nThe D-Net in Fig. 1 takes the noisy image y as input to infer the variational parameters \u00b5 and\nm2 in q(z|y) of Eq. (5), and performs the denoising task in the proposed variational inference\nalgorithm. In order to capture multi-scale information of the image, we use a U-Net [35] with depth\n4 as the D-Net, which contains 4 encoder blocks ([Conv+ReLU]\u00d72+Average pooling), 3 decoder\nblocks (Transpose Conv+[Conv+ReLU]\u00d72) and symmetric skip connection under each scale. For\nparameter \u00b5, the residual learning strategy is adopted as in [45], i.e., \u00b5 = y + f (y; WD), where\nf (\u00b7; WD) denotes the D-Net with parameters WD. As for the S-Net, which takes the noisy image\ny as input and outputs the predicted variational parameters \u03b1 and \u03b2 in q(\u03c32|y) of Eq (5), we use\nthe DnCNN [45] architecture with \ufb01ve layers, and the feature channels of each layer is set as 64.\nIt should be noted that our proposed method is a general framework, most of the commonly used\nnetwork architectures [46, 34, 24, 47] in image restoration can also be easily substituted.\n3.6 Some Discussions\nIt can be seen that the proposed method succeeds advantages of both model-driven MAP and data-\ndriven deep learning methods. On one hand, our method is a generative approach and possesses\n\ufb01ne interpretability to the data generation mechanism; and on the other hand it conducts an explicit\nprediction function, facilitating ef\ufb01cient image denoising as well as noise estimation directly through\nan input noisy image. Furthermore, beyond current methods, our method can \ufb01nely evaluate and\nremove non-i.i.d. noises embedded in images, and has a good generalization capability to images\nwith complicated noises, as evaluated in our experiments. This complies with the main requirement\nof the blind image denoising task.\nIf we set the hyper-parameter \u03b52\n0 in Eq.(2) as an extremely small value close to 0, it is easy to see\nthat the objective of the proposed method is dominated by the second term of Eq. (10), which makes\n\n5\n\n\fFigure 2: (a) The spatially variant map M for noise generation in training data. (b1)-(d1): Three different Ms\non testing data in Cases 1-3. (b2)-(d2): Correspondingly predicted Ms by our method on the testing data.\n\nTable 1: The PSNR(dB) results of all competing methods on the three groups of test datasets. The best and\nsecond best results are highlighted in bold and Italic, respectively.\n\nCases\n\nDatasets\n\nCase 1\n\nCase 2\n\nCase 3\n\nSet5\nLIVE1\nBSD68\nSet5\nLIVE1\nBSD68\nSet5\nLIVE1\nBSD68\n\nCBM3D WNNM\n26.53\n27.76\n26.58\n25.27\n25.13\n26.51\n24.61\n26.34\n23.52\n25.18\n23.52\n25.28\n26.07\n27.88\n24.67\n26.50\n26.44\n24.60\n\nNCSR\n26.62\n24.96\n24.96\n25.76\n24.08\n24.27\n26.84\n24.96\n24.95\n\nMLP\n27.26\n25.71\n25.58\n25.73\n24.31\n24.30\n26.88\n25.26\n25.10\n\nMethods\nDnCNN-B\n\nMemNet\n\n29.85\n28.81\n28.73\n29.04\n28.18\n28.15\n29.13\n28.17\n28.11\n\n30.10\n28.96\n28.74\n29.55\n28.56\n28.36\n29.51\n28.37\n28.20\n\nFFDNet\n30.16\n28.99\n28.78\n29.60\n28.58\n28.43\n29.54\n28.39\n28.22\n\nFFDNetv\n\n30.15\n28.96\n28.77\n29.56\n28.56\n28.42\n29.49\n28.38\n28.20\n\nUDNet\n28.13\n27.19\n27.13\n26.01\n25.25\n25.13\n27.54\n26.48\n26.44\n\nVDN\n30.39\n29.22\n29.02\n29.80\n28.82\n28.67\n29.74\n28.65\n28.46\n\nminimizing(cid:80)n\n\nthe objective degenerate as the MSE loss generally used in traditional deep learning methods (i.e.,\nj=1 ||\u00b5(yj; WD) \u2212 xj||2. This provides a new understanding to explain why they\nincline to over\ufb01t noise bias in training data. The posterior inference process puts dominant emphasis\non \ufb01tting priors imposed on the latent clean image, while almost neglects the effect of noise variations.\nThis naturally leads to its sensitiveness to unseen complicated noises contained in test images.\nVery recently, both CBDNet [17] and FFDNet [46] are presented for the denoising task by feeding\nthe noisy image integrated with the pre-estimated noise level into the deep network to make it better\ngeneralize to distinct noise types in training stage. Albeit more or less improving the generalization\ncapability of network, such strategy is still too heuristic and is not easy to interpret how the input\nnoise level intrinsically in\ufb02uence the \ufb01nal denoising result. Comparatively, our method is constructed\nin a sound Bayesian manner to estimate clean image and noise distribution together from the input\nnoisy image, and its generalization can be easily explained from the perspective of generative model.\n\n4 Experimental Results\nWe evaluate the performance of our method on synthetic and real datasets in this section. All\nexperiments are evaluated in the sRGB space. We brie\ufb02y denote our method as VDN in the following.\nThe training and testing codes of our VDN is available at https://github.com/zsyOAOA/VDNet.\n\n4.1 Experimental Setting\nNetwork training and parameter setting: The weights of D-Net and S-Net in our variational\nalgorithm were initialized according to [18]. In each epoch, we randomly crop N = 64 \u00d7 5000\npatches with size 128 \u00d7 128 from the images for training. The Adam algorithm [21] is adopted to\noptimize the network parameters through minimizing the proposed negative lower bound objective.\nThe initial learning rate is set as 2e-4 and linearly decayed in half every 10 epochs until to 1e-6. The\nwindow size p in Eq. (3) is set as 7. The hyper-parameter \u03b52\n0 is set as 5e-5 and 1e-6 in the following\nsynthetic and real-world image denoising experiments, respectively.\nComparison methods: Several state-of-the-art denoising methods are adopted for performance\ncomparison, including CBM3D [11], WNNM [16], NCSR [14], MLP [10], DnCNN-B [45], Mem-\nNet [39], FFDNet [46], UDNet [24] and CBDNet [17]. Note that CBDNet is mainly designed for\nblind denoising task, and thus we only compared CBDNet on the real noise removal experiments.\n\n6\n\n(a)(d1)(d2)(c1)(c2)(b1)(b2)\fFigure 3: Image denoising results of a typical test image in Case 2. (a) Noisy image, (b) Groundtruth, (c)\nCBM3D (24.63dB), (d) DnCNN-B (27.83dB), (e) FFDNet (28.06dB), (f) VDN (28.32dB).\n\nTable 2: The PSNR(dB) results of all competing methods on AWGN noise cases of three test datasets.\nSigma\n\nDatasets\n\nMLP\n\nMethods\nDnCNN-B\n\nMemNet\n\n\u03c3 = 15\n\n\u03c3 = 25\n\n\u03c3 = 50\n\nSet5\nLIVE1\nBSD68\nSet5\nLIVE1\nBSD68\nSet5\nLIVE1\nBSD68\n\nCBM3D WNNM\n32.92\n33.42\n31.70\n32.85\n31.27\n32.67\n30.61\n30.92\n29.15\n30.05\n28.62\n29.83\n28.16\n27.58\n26.07\n26.98\n26.81\n25.86\n\nNCSR\n32.57\n31.46\n30.84\n30.33\n29.05\n28.35\n27.20\n26.06\n25.75\n\n-\n-\n-\n\n30.55\n29.16\n28.93\n27.59\n26.12\n26.01\n\n34.04\n33.72\n33.87\n31.88\n31.23\n31.22\n28.95\n27.95\n27.91\n\n34.18\n33.84\n33.76\n31.98\n31.26\n31.17\n29.10\n27.99\n27.91\n\nFFDNet\n34.30\n33.96\n33.85\n32.10\n31.37\n31.21\n29.25\n28.10\n27.95\n\nFFDNete\n\n34.31\n33.96\n33.68\n32.09\n31.37\n31.20\n29.25\n28.10\n27.95\n\nUDNet\n34.19\n33.74\n33.76\n31.82\n31.09\n31.02\n28.87\n27.82\n27.76\n\nVDN\n34.34\n33.94\n33.90\n32.24\n31.50\n31.35\n29.47\n28.36\n28.19\n\n4.2 Experiments on Synthetic Non-I.I.D. Gaussian Noise Cases\nSimilar to [46], we collected a set of source images to train the network, including 432 images from\nBSD [5], 400 images from the validation set of ImageNet [12] and 4744 images from the Waterloo\nExploration Database [26]. Three commonly used datasets in image restoration (Set5, LIVE1 and\nBSD68 in [20]) were adopted as test datasets to evaluate the performance of different methods. In\norder to evaluate the effectiveness and robustness of VDN under the non-i.i.d. noise con\ufb01guration,\nwe simulated the non-i.i.d. Gaussian noise as following,\n\nn = n1 (cid:12) M , n1\n\nij \u223c N (0, 1),\n\n(14)\nwhere M is a spatially variant map with the same size as the source image. We totally generated\nfour kinds of Ms as shown in Fig. 2. The \ufb01rst (Fig. 2 (a)) is used for generating noisy images of\ntraining data and the others (Fig. 2 (b)-(d)) generating three groups of testing data (denotes as Cases\n1-3). Under this noise generation manner, the noises in training data and testing data are with evident\ndifference, suitable to verify the robustness and generalization capability of competing methods.\nComparson with the State-of-the-art: Table 1 lists the average PSNR results of all competing\nmethods on three groups of testing data. From Table 1, it can be easily observed that: 1) The\nVDN outperforms other competing methods in all cases, indicating that VDN is able to handle such\ncomplicated non-i.i.d. noise; 2) VDN surpasses FFDNet about 0.25dB averagely even though FFDNet\ndepends on the true noise level information instead of automatically inferring noise distribution as\nour method; 3) the discriminative methods MLP, DnCNN-B and UDNet seem to evidently over\ufb01t on\ntraining noise bias; 4) the classical model-driven method CBM3D performs more stably than WNNM\nand NCSR, possibly due to the latter\u2019s improper i.i.d. Gaussian noise assumption. Fig. 3 shows the\ndenoising results of different competing methods on one typical image in testing set of Case 2, and\nmore denoising results can be found in the supplementary material. Note that we only display the top\nfour best results from all due to page limitation. It can be seen that the denoised images by CBM3D\nand DnCNN-B still contain obvious noise, and FFDNet over-smoothes the image and loses some\nedge information, while our proposed VDN removes most of the noise and preserves more details.\nEven though our VDN is designed based on the non-i.i.d. noise assumption and trained on the\nnon-i.i.d. noise data, it also performs well on additive white Gaussian noise (AWGN) removal task.\nTable 2 lists the average PSNR results under three noise levels (\u03c3 = 15, 25, 50) of AWGN. It is easy\nto see that our method obtains the best or at least comparable performance with the state-of-the-art\nmethod FFDNet. Combining Table 1 and Table 2, it should be rational to say that our VDN is robust\nand able to handle a wide range of noise types, due to its better noise modeling manner.\nNoise Variance Prediction: The S-Net plays the role of noise modeling and is able to infer the noise\ndistribution from the noisy image. To verify the \ufb01tting capability of S-Net, we provided the M\n\n7\n\n(a)(c)(b)(d)(e)(f)\fFigure 4: Denoising results on one typical image in the validation set of SIDD. (a) Noisy image, (b) Simulated\n\u201cclean\u201d image, (c) WNNM(21.80dB), (d) DnCNN (34.48dB), (e) CBDNet (34.84dB), (d) VDN (35.50dB).\n\nTable 3: The comparison results of different methods on SIDD benchmark and validation dataset.\n\nDatasets\nMethods\nPSNR\nSSIM\n\nCBM3D\n25.65\n0.685\n\nWNNM\n25.78\n0.809\n\nSIDD Benchmark\nMLP\n24.71\n0.641\n\n23.66\n0.583\n\nDnCNN-B\n\nCBDNet\n\n33.28\n0.868\n\nVDN\n39.23\n0.971\n\nDnCNN-B\n\nSIDD Validation\nCBDNet\n\n38.41\n0.909\n\n38.68\n0.901\n\nVDN\n39.28\n0.909\n\nTable 4: The comparison results of all competing methods on DND benchmark dataset.\n\nMethods\nPSNR\nSSIM\n\nCBM3D\n34.51\n0.8507\n\nWNNM\n34.67\n0.8646\n\nNCSR\n34.05\n0.8351\n\nMLP\n34.23\n0.8331\n\nDnCNN-B\n\n37.90\n0.9430\n\nFFDNet\n37.61\n0.9415\n\nCBDNet\n\n38.06\n0.9421\n\nVDN\n39.38\n0.9518\n\npredicted by S-Net as the input of FFDNet, and the denoising results are listed in Table 1 (denoted as\nFFDNetv). It is obvious that FFDNet under the real noise level and FFDNetv almost have the same\nperformance, indicating that the S-Net effectively captures proper noise information. The predicted\nnoise variance Maps on three groups of testing data are shown in Fig. 2 (b2-d2) for easy observation.\n\n4.3 Experiments on Real-World Noise\nIn this part, we evaluate the performance of VDN on real blind denoising task, including two\nbanchmark datasets: DND [32] and SIDD [1]. DND consists of 50 high-resolution images with\nrealistic noise from 50 scenes taken by 4 consumer cameras. However, it does not provide any other\nadditional noisy and clean image pairs to train the network. SIDD [1] is another real-world denoising\nbenchmark, containing 30, 000 real noisy images captured by 5 cameras under 10 scenes. For each\nnoisy image, it estimates one simulated \u201cclean\u201d image through some statistical methods [1]. About\n80% (\u223c 24, 000 pairs) of this dataset are provided for training purpose, and the rest as held for\nbenchmark. And 320 image pairs selected from them are packaged together as a medium version\nof SIDD, called SIDD Medium Dataset2, for fast training of a denoiser. We employed this medium\nvesion dataset to train a real-world image denoiser, and test the performance on the two benchmarks.\nTable 3 lists PSNR results of different methods on SIDD benchmark3. Note that we only list the\nresults of the competing methods that are available on the of\ufb01cial benchmark website2. It is evident\nthat VDN outperforms other methods. However, note that neither DnCNN-B nor CBDNet performs\nwell, possibly because they were trained on the other datasets, whose noise type is different from\nSIDD. For fair comparison, we retrained DnCNN-B and CBDNet based on the SIDD dataset. The\nperformance on the SIDD validation set is also listed in Table 3. Under same training conditions, VDN\nstill outperforms DnCNN-B 0.87 PSNR and CBDNet 0.60dB PSNR, indicating the effectiveness and\nsigni\ufb01cance of our non-i.i.d. noise modeling manner. For easy visualization, on one typical denoising\nexample, results of the best four competing methods are displayed in Fig. 4\nTable 4 lists the performance of all competing methods on the DND benchmark4. From the table, it\nis easy to be seen that our proposed VDN surpasses all the competing methods. It is worth noting\nthat CBDNet has the same optimized network with us, containing a S-Net designed for estimating\nthe noise distribution and a D-Net for denoising. The superiority of VDN compared with CBDNet\nmainly bene\ufb01ts from the deep variational inference optimization.\nFor easy visualization, on one typical denoising example, results of the best four competing methods\nare displayed in Fig. 4. Obviously, WNNM is ubable to remove the complex real noise, maybe\nbecause the low-rankness prior is insuf\ufb01cient to describe all the image information and the IID\nGaussian noise assumption is in con\ufb02ict with the real noise. With the powerful feature extraction\n\n2https://www.eecs.yorku.ca/ kamel/sidd/index.php\n3We employed the function \u2019compare_ssim\u2019 in scikit-image library to calculate the SSIM value, which is a\n\nlittle difference with the SIDD of\ufb01cial results\n\n4https://noise.visinf.tu-darmstadt.de/\n\n8\n\n(a)(b)(c)(d)(e)(f)\fFigure 5: The noise variance map predicted by our proposed VDN on SIDD and DND benchmarks. (a1-a3):\nThe noisy image, real noise (|y \u2212 x|) and noise variance map of one typical image of SIDD validation dataset.\n(b1-b2): The noisy image and predicted noise variance map of one typical image of DND dataset.\n\nTable 5: Performance of VDN under different \u03b52\ntion dataset (p = 7).\n1e-4\n38.89\n0.9046\n\n1e-5\n39.20\n0.9079\n\n1e-7\n39.05\n0.9064\n\n1e-6\n39.28\n0.9086\n\nPSNR\nSSIM\n\n\u03b52\n0\n\nTable 6: Performance of VDN under different p values on\nSIDD validation dataset (\u03b52\n0 = 1e-6).\n7\n11\n\n15\n\n19\n\n5\n\np\n\n39.28\n0.9086\n\n39.26\n0.9086\n\n39.24\n0.9079\n\n39.24\n0.9079\n\n0 values on SIDD valida-\n\n1e-8\n39.03\n0.9063\n\nMSE\n39.01\n0.9061\n\nPSNR\nSSIM\n\n39.26\n0.9089\n\nability of CNN, DnCNN and CBDNet obtain much better denoising results than WNNM, but still\nwith a little noise. However, the denoising result of our proposed VDN has almost no noise and is\nvery close to the groundtruth.\nIn Fig. 5, we displayed the noise variance map predicted by S-Net on the two real benchmarks.\nThe variance maps had been enlarged several times for easy visualization. It is easy to see that the\npredicted noise variance map relates to the image content, which is consistent with the well-known\nsignal-depend property of real noise to some extent.\n\n4.4 Hyper-parameters Analysis\n\nThe hyper-parameter \u03b50 in Eq. (2) determines how much does the desired latent clean image z depend\non the simulated groundtruth x. As discussed in Section 3.6, the negative variational lower bound\ndegenerates into MSE loss when \u03b50 is setted as an extremely small value close to 0. The performance\nof VDN under different \u03b50 values on the SIDD validation dataset is listed in Table 5. For explicit\ncomparison, we also directly trained the D-Net under MSE loss as baseline. From Table 5, we can\nsee that: 1) when \u03b50 is too large, the proposed VDN obtains relatively worse results since the prior\nconstraint on z by simulated groundtruth x becomes weak; 2) with \u03b50 decreasing, the performance of\nVDN tends to be similar with MSE loss as analysised in theory; 3) the results of VDN surpasses MSE\nloss about 0.3 dB PSNR when \u03b52\n0 = 1e-6, which veri\ufb01es the importantance of noise modeling in our\nmethod. Therefore, we suggest that the \u03b52\nIn Eq. (3), we introduced a conjugate inverse gamma distribution as prior for \u03c32. The mode of this\ninverse gamma distribution \u03bei provides a rational approximate evaluation for \u03c32\ni , which is a local\nestimation in a p \u00d7 p window centered at the ith pixel. We compared the performance of VDN under\ndifferent p values on the SIDD validation dataset in Table 6. Empirically, VDN performs consistently\nwell for the hyper-parameter p.\n\n0 is set as 1e-5 or 1e-6 in the real-world denoising task.\n\n5 Conclusion\n\nWe have proposed a new variational inference algorithm, namely varitional denoising network (VDN),\nfor blind image denoising. The main idea is to learn an approximate posterior to the true posterior with\nthe latent variables (including clean image and noise variances) conditioned on the input noisy image.\nUsing this variational posterior expression, both tasks of blind image denoising and noise estimation\ncan be naturally attained in a unique Bayesian framework. The proposed VDN is a generative method,\nwhich can easily estimate the noise distribution from the input data. Comprehensive experiments\nhave demonstrated the superiority of VDN to previous works on blind image denoising. Our method\ncan also facilitate the study of other low-level vision tasks, such as super-resolution and deblurring.\nSpeci\ufb01cally, the \ufb01delity term in these tasks can be more faithfully set under the estimated non-i.i.d.\nnoise distribution by VDN, instead of the traditional i.i.d. Gaussian noise assumption.\nAcknowledgements:This research was supported by National Key R&D Program of China\n(2018YFB1004300), the China NSFC project under contract 61661166011, 11690011, 61603292,\n61721002 and U1811461, and Kong Kong RGC General Research Fund (PolyU 152216/18E).\n\n9\n\n(a1) Noisy image(a2) Noise(b1) Noisy image(b2) SigmaMap(a3) SigmaMap\fReferences\n[1] Abdelrahman Abdelhamed, Stephen Lin, and Michael S. Brown. A high-quality denoising\ndataset for smartphone cameras. In The IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR), June 2018.\n\n[2] Forest Agostinelli, Michael R Anderson, and Honglak Lee. Adaptive multi-column deep neural\nIn Advances in Neural Information\n\nnetworks with application to robust image denoising.\nProcessing Systems, pages 1493\u20131501, 2013.\n\n[3] Michal Aharon, Michael Elad, Alfred Bruckstein, et al. K-svd: An algorithm for designing\novercomplete dictionaries for sparse representation. IEEE Transactions on signal processing,\n54(11):4311, 2006.\n\n[4] Josue Anaya and Adrian Barbu. Renoir - a dataset for real low-light noise image reduction.\n\narXiv preprint arXiv:1409.8230, 2014.\n\n[5] Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and\nhierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):898\u2013916, May\n2011.\n\n[6] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.\n\n[7] David M Blei, Michael I Jordan, et al. Variational inference for dirichlet process mixtures.\n\nBayesian analysis, 1(1):121\u2013143, 2006.\n\n[8] Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron.\n\nUnprocessing images for learned raw denoising. arXiv preprint arXiv:1811.11127, 2018.\n\n[9] Antoni Buades, Bartomeu Coll, and J-M Morel. A non-local algorithm for image denoising.\nIn 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition\n(CVPR\u201905), volume 2, pages 60\u201365. IEEE, 2005.\n\n[10] Harold C Burger, Christian J Schuler, and Stefan Harmeling. Image denoising: Can plain\nneural networks compete with bm3d? In 2012 IEEE conference on computer vision and pattern\nrecognition, pages 2392\u20132399. IEEE, 2012.\n\n[11] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-\ndomain collaborative \ufb01ltering. IEEE Transactions on Image Processing, 16(8):2080\u20132095, Aug\n2007.\n\n[12] Jia Deng, Olga Russakovsky, Jonathan Krause, Michael Bernstein, Alexander C. Berg, and\nLi Fei-Fei. Scalable multi-label annotation. In ACM Conference on Human Factors in Comput-\ning Systems (CHI), 2014.\n\n[13] Weisheng Dong, Guangming Shi, and Xin Li. Nonlocal image restoration with bilateral variance\nestimation: a low-rank approach. IEEE transactions on image processing, 22(2):700\u2013711, 2013.\n\n[14] Weisheng Dong, Lei Zhang, Guangming Shi, and Xin Li. Nonlocally centralized sparse\nrepresentation for image restoration. IEEE transactions on Image Processing, 22(4):1620\u20131630,\n2012.\n\n[15] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.\n\n[16] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Weighted nuclear norm min-\nimization with application to image denoising. In Proceedings of the IEEE conference on\ncomputer vision and pattern recognition, pages 2862\u20132869, 2014.\n\n[17] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei Zhang. Toward convolutional blind\n\ndenoising of real photographs. arXiv preprint arXiv:1807.04686, 2018.\n\n[18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti\ufb01ers:\nSurpassing human-level performance on imagenet classi\ufb01cation. In Proceedings of the IEEE\ninternational conference on computer vision, pages 1026\u20131034, 2015.\n\n10\n\n\f[19] Viren Jain and Sebastian Seung. Natural image denoising with convolutional networks. In\n\nAdvances in neural information processing systems, pages 769\u2013776, 2009.\n\n[20] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very\ndeep convolutional networks. In Proceedings of the IEEE conference on computer vision and\npattern recognition, pages 1646\u20131654, 2016.\n\n[21] Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. interna-\n\ntional conference on learning representations, 2015.\n\n[22] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint\n\narXiv:1312.6114, 2013.\n\n[23] Marc Lebrun, Miguel Colom, and Jean-Michel Morel. Multiscale image blind denoising. IEEE\n\nTransactions on Image Processing, 24(10):3149\u20133161, 2015.\n\n[24] Stamatios Lefkimmiatis. Universal denoising networks: a novel cnn architecture for image\ndenoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,\npages 3204\u20133213, 2018.\n\n[25] Ding Liu, Bihan Wen, Yuchen Fan, Chen Change Loy, and Thomas S Huang. Non-local\nrecurrent network for image restoration. In Advances in Neural Information Processing Systems,\npages 1673\u20131682, 2018.\n\n[26] Kede Ma, Zhengfang Duanmu, Qingbo Wu, Zhou Wang, Hongwei Yong, Hongliang Li, and Lei\nZhang. Waterloo Exploration Database: New challenges for image quality assessment models.\nIEEE Transactions on Image Processing, 26(2):1004\u20131016, Feb. 2017.\n\n[27] Matteo Maggioni, Vladimir Katkovnik, Karen Egiazarian, and Alessandro Foi. Nonlocal\ntransform-domain \ufb01lter for volumetric data denoising and reconstruction. IEEE transactions on\nimage processing, 22(1):119\u2013133, 2013.\n\n[28] Julien Mairal, Michael Elad, and Guillermo Sapiro. Sparse representation for color image\n\nrestoration. IEEE Transactions on image processing, 17(1):53\u201369, 2008.\n\n[29] Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang. Image restoration using very deep convo-\nlutional encoder-decoder networks with symmetric skip connections. In Advances in neural\ninformation processing systems, pages 2802\u20132810, 2016.\n\n[30] Deyu Meng and Fernando De La Torre. Robust matrix factorization with unknown noise. In\n\nIEEE International Conference on Computer Vision, 2013.\n\n[31] Pietro Perona and Jitendra Malik. Scale-space and edge detection using anisotropic diffusion.\n\nIEEE Transactions on pattern analysis and machine intelligence, 12(7):629\u2013639, 1990.\n\n[32] Tobias Plotz and Stefan Roth. Benchmarking denoising algorithms with real photographs.\nIn Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages\n1586\u20131595, 2017.\n\n[33] Tobias Pl\u00f6tz and Stefan Roth. Neural nearest neighbors networks. In Advances in Neural\n\nInformation Processing Systems, pages 1087\u20131098, 2018.\n\n[34] Tobias Pl\u00f6tz and Stefan Roth. Neural nearest neighbors networks. In S. Bengio, H. Wallach,\nH. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural\nInformation Processing Systems 31, pages 1087\u20131098. Curran Associates, Inc., 2018.\n\n[35] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for\nbiomedical image segmentation. In International Conference on Medical image computing and\ncomputer-assisted intervention, pages 234\u2013241. Springer, 2015.\n\n[36] Stefan Roth and Michael J Black. Fields of experts. International Journal of Computer Vision,\n\n82(2):205, 2009.\n\n[37] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal\n\nalgorithms. Physica D: nonlinear phenomena, 60(1-4):259\u2013268, 1992.\n\n11\n\n\f[38] Eero P Simoncelli and Edward H Adelson. Noise removal via bayesian wavelet coring. In\nProceedings of 3rd IEEE International Conference on Image Processing, volume 1, pages\n379\u2013382. IEEE, 1996.\n\n[39] Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. Memnet: A persistent memory network\nfor image restoration. In Proceedings of the IEEE international conference on computer vision,\npages 4539\u20134547, 2017.\n\n[40] Yanghai Tsin, Visvanathan Ramesh, and Takeo Kanade. Statistical calibration of ccd imaging\nprocess. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV\n2001, volume 1, pages 480\u2013487. IEEE, 2001.\n\n[41] Junyuan Xie, Linli Xu, and Enhong Chen. Image denoising and inpainting with deep neural\n\nnetworks. In Advances in neural information processing systems, pages 341\u2013349, 2012.\n\n[42] Jun Xu, Lei Zhang, and David Zhang. A trilateral weighted sparse coding scheme for real-world\nimage denoising. In The European Conference on Computer Vision (ECCV), September 2018.\n\n[43] Hongwei Yong, Deyu Meng, Wangmeng Zuo, and Lei Zhang. Robust online matrix factorization\nfor dynamic background subtraction. IEEE transactions on pattern analysis and machine\nintelligence, 40(7):1726\u20131740, 2017.\n\n[44] Zongsheng Yue, Deyu Meng, Yongqing Sun, and Qian Zhao. Hyperspectral image restoration\n\nunder complex multi-band noises. Remote Sensing, 10(10):1631, 2018.\n\n[45] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian\ndenoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image\nProcessing, 26(7):3142\u20133155, 2017.\n\n[46] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward a fast and \ufb02exible solution for\ncnn-based image denoising. IEEE Transactions on Image Processing, 27(9):4608\u20134622, 2018.\n\n[47] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network\nfor image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and\nPattern Recognition, pages 2472\u20132481, 2018.\n\n[48] Qian Zhao, Deyu Meng, Zongben Xu, Wangmeng Zuo, and Lei Zhang. Robust principal\ncomponent analysis with complex noise. In Proceedings of The 31st International Conference\non Machine Learning, pages 55\u201363, 2014.\n\n[49] Fengyuan Zhu, Guangyong Chen, Jianye Hao, and Pheng-Ann Heng. Blind image denoising\nIEEE transactions on pattern analysis and machine\n\nvia dependent dirichlet process tree.\nintelligence, 39(8):1518\u20131531, 2017.\n\n[50] Fengyuan Zhu, Guangyong Chen, and Pheng-Ann Heng. From noise modeling to blind image\ndenoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,\npages 420\u2013429, 2016.\n\n12\n\n\f", "award": [], "sourceid": 954, "authors": [{"given_name": "Zongsheng", "family_name": "Yue", "institution": "Xi'an Jiaotong University"}, {"given_name": "Hongwei", "family_name": "Yong", "institution": "The Hong Kong Polytechnic University"}, {"given_name": "Qian", "family_name": "Zhao", "institution": "Xi'an Jiaotong University"}, {"given_name": "Deyu", "family_name": "Meng", "institution": "Xi'an Jiaotong University"}, {"given_name": "Lei", "family_name": "Zhang", "institution": "The Hong Kong Polytechnic Univ"}]}