{"title": "Image Denoising and Inpainting with Deep Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 341, "page_last": 349, "abstract": "We present a novel approach to low-level vision problems that combines sparse coding and deep networks pre-trained with denoising auto-encoder (DA). We propose an alternative training scheme that successfully adapts DA, originally designed for unsupervised feature learning, to the tasks of image denoising and blind inpainting. Our method achieves state-of-the-art performance in the image denoising task. More importantly, in blind image inpainting task, the proposed method provides solutions to some complex problems that have not been tackled before. Specifically, we can automatically remove complex patterns like superimposed text from an image, rather than simple patterns like pixels missing at random. Moreover, the proposed method does not need the information regarding the region that requires inpainting to be given a priori. Experimental results demonstrate the effectiveness of the proposed method in the tasks of image denoising and blind inpainting. We also show that our new training scheme for DA is more effective and can improve the performance of unsupervised feature learning.", "full_text": "Image Denoising and Inpainting with Deep Neural\n\nNetworks\n\nJunyuan Xie, Linli Xu, Enhong Chen1\n\nSchool of Computer Science and Technology\nUniversity of Science and Technology of China\n\neric.jy.xie@gmail.com, linlixu@ustc.edu.cn, cheneh@ustc.edu.cn\n\nAbstract\n\nWe present a novel approach to low-level vision problems that combines sparse\ncoding and deep networks pre-trained with denoising auto-encoder (DA). We pro-\npose an alternative training scheme that successfully adapts DA, originally de-\nsigned for unsupervised feature learning, to the tasks of image denoising and blind\ninpainting. Our method\u2019s performance in the image denoising task is comparable\nto that of KSVD which is a widely used sparse coding technique. More impor-\ntantly, in blind image inpainting task, the proposed method provides solutions to\nsome complex problems that have not been tackled before. Speci\ufb01cally, we can\nautomatically remove complex patterns like superimposed text from an image,\nrather than simple patterns like pixels missing at random. Moreover, the proposed\nmethod does not need the information regarding the region that requires inpaint-\ning to be given a priori. Experimental results demonstrate the effectiveness of the\nproposed method in the tasks of image denoising and blind inpainting. We also\nshow that our new training scheme for DA is more effective and can improve the\nperformance of unsupervised feature learning.\n\n1\n\nIntroduction\n\nObserved image signals are often corrupted by acquisition channel or arti\ufb01cial editing. The goal of\nimage restoration techniques is to restore the original image from a noisy observation of it. Image\ndenoising and inpainting are common image restoration problems that are both useful by themselves\nand important preprocessing steps of many other applications. Image denoising problems arise when\nan image is corrupted by additive white Gaussian noise which is common result of many acquisition\nchannels, whereas image inpainting problems occur when some pixel values are missing or when\nwe want to remove more sophisticated patterns, like superimposed text or other objects, from the\nimage. This paper focuses on image denoising and blind inpainting.\nVarious methods have been proposed for image denoising. One approach is to transfer image signals\nto an alternative domain where they can be more easily separated from the noise [1, 2, 3]. For\nexample, Bayes Least Squares with a Gaussian Scale-Mixture (BLS-GSM), which was proposed by\nPortilla et al, is based on the transformation to wavelet domain [2].\nAnother approach is to capture image statistics directly in the image domain. Following this strategy,\nA family of models exploiting the (linear) sparse coding technique have drawn increasing attention\nrecently [4, 5, 6, 7, 8, 9]. Sparse coding methods reconstruct images from a sparse linear combination\nof an over-complete dictionary. In recent research, the dictionary is learned from data instead of hand\ncrafted as before. This learning step improves the performance of sparse coding signi\ufb01cantly. One\nexample of these methods is the KSVD sparse coding algorithm proposed in [6].\n\n1Corresponding author.\n\n1\n\n\fImage inpainting methods can be divided into two categories: non-blind inpainting and blind in-\npainting. In non-blind inpainting, the regions that need to be \ufb01lled in are provided to the algorithm\na priori, whereas in blind inpainting, no information about the locations of the corrupted pixels is\ngiven and the algorithm must automatically identify the pixels that require inpainting. The state-\nof-the-art non-blind inpainting algorithms can perform very well on removing text, doodle, or even\nvery large objects [10, 11, 12]. Some image denoising methods, after modi\ufb01cation, can also be ap-\nplied to non-blind image inpainting with state-of-the-art results [7]. Blind inpainting, however, is a\nmuch harder problem. To the best of our knowledge, existing algorithms can only address i.i.d. or\nsimply structured impulse noise [13, 14, 15].\nAlthough sparse coding models perform well in practice, they share a shallow linear structure. Re-\ncent research suggests, however, that non-linear, deep models can achieve superior performance in\nvarious real world problems. One typical category of deep models are multi-layer neural networks.\nIn [16], Jain et al. proposed to denoise images with convolutional neural networks.\nIn this pa-\nper, we propose to combine the advantageous \u201csparse\u201d and \u201cdeep\u201d principles of sparse coding and\ndeep networks to solve the image denoising and blind inpainting problems. The sparse variants of\ndeep neural network are expected to perform especially well in vision problems because they have\na similar structure to human visual cortex [17].\nDeep neural networks with many hidden layers were generally considered hard to train before a new\ntraining scheme was proposed which is to adopt greedy layer-wise pre-training to give better ini-\ntialization of network parameters before traditional back-propagation training [18, 19]. There exist\nseveral methods for pre-training, including Restricted Boltzmann Machine (RBM) and Denoising\nAuto-encoder (DA) [20, 21].\nWe employ DA to perform pre-training in our method because it naturally lends itself to denoising\nand inpainting tasks. DA is a two-layer neural network that tries to reconstruct the original input\nfrom a noisy version of it. The structure of a DA is shown in Fig.1a. A series of DAs can be stacked\nto form a deep network called Stacked Denoising Auto-encoders (SDA) by using the hidden layer\nactivation of the previous layer as input of the next layer.\nSDA is widely used for unsupervised pre-training and feature learning [21]. In these settings, only\nthe clean data is provided while the noisy version of it is generated during training by adding random\nGaussian or Salt-and-Pepper noise to the clean data. After training of one layer, only the clean data\nis passed on to the network to produce the clean training data for the next layer while the noisy\ndata is discarded. The noisy training data for the next layer is similarly constructed by randomly\ncorrupting the generated clean training data.\nFor the image denoising and inpainting tasks, however, the choices of clean and noisy input are\nnatural: they are set to be the desired image after denoising or inpainting and the observed noisy\nimage respectively. Therefore, we propose a new training scheme that trains the DA to reconstruct\nthe clean image from the corresponding noisy observation. After training of the \ufb01rst layer, the\nhidden layer activations of both the noisy input and the clean input are calculated to serve as the\ntraining data of the second layer. Our experiments on the image denoising and inpainting tasks\ndemonstrate that SDA is able to learn features that adapt to speci\ufb01c noises from white Gaussian\nnoise to superimposed text.\nInspired by SDA\u2019s ability to learn noise speci\ufb01c features in denoising tasks, we argue that in unsuper-\nvised feature learning problems the type of noise used can also affect the performance. Speci\ufb01cally,\ninstead of corrupting the input with arbitrarily chosen noise, more sophisticated corruption process\nthat agrees to the true noise distribution in the data can improve the quality of the learned features.\nFor example, when learning audio features, the variations of noise on different frequencies are usu-\nally different and sometimes correlated. Hence instead of corrupting the training data with simple\ni.i.d. Gaussian noise, Gaussian noise with more realistic parameters that are either estimated from\ndata or suggested by theory should be a better choice.\n\n2 Model Description\n\nIn this section, we \ufb01rst introduce the problem formulation and some basic notations. Then we brie\ufb02y\ngive preliminaries about Denoising Auto-encoder (DA), which is a fundamental building block of\nour proposed method.\n\n2\n\n\f(a) Denoising auto-encoder (DA) architecture\n\n(b) Stacked sparse denoising auto-encoder architecture\n\nFigure 1: Model architectures.\n\n2.1 Problem Formulation\n\nAssuming x is the observed noisy image and y is the original noise free image, we can formulate\nthe image corruption process as:\n(1)\nwhere \u03b7 : Rn \u2192 Rn is an arbitrary stochastic corrupting process that corrupts the input. Then, the\ndenoising task\u2019s learning objective becomes:\n\nx = \u03b7(y).\n\nf = argmin\n\nf\n\nEy(cid:107)f (x) \u2212 y(cid:107)2\n\n2\n\n(2)\n\nFrom this formulation, we can see that the task here is to \ufb01nd a function f that best approximates\n\u03b7\u22121. We can now treat the image denoising and inpainting problems in a uni\ufb01ed framework by\nchoosing appropriate \u03b7 in different situations.\n\n2.2 Denoising Auto-encoder\n\nLet yi be the original data for i = 1, 2, ..., N and xi be the corrupted version of corresponding yi.\nDA is de\ufb01ned as shown in Fig.1a:\n\n(3)\n(4)\nwhere \u03c3(x) = (1 + exp(\u2212x))\u22121 is the sigmoid activation function which is applied element-wise to\nvectors, hi is the hidden layer activation, \u02c6y(xi) is an approximation of yi and \u0398 = {W, b, W(cid:48), b(cid:48)}\nrepresents the weights and biases. DA can be trained with various optimization methods to minimize\nthe reconstruction loss:\n\nh(xi) = \u03c3(Wxi + b)\n\u02c6y(xi) = \u03c3(W(cid:48)h(xi) + b(cid:48))\n\n(cid:107)yi \u2212 \u02c6y(xi)(cid:107).\n\n(5)\n\nN(cid:88)\n\ni=1\n\n\u03b8 = argmin\n\n\u03b8\n\nAfter \ufb01nish training a DA, we can move on to training the next layer by using the hidden layer\nactivation of the \ufb01rst layer as the input of the next layer. This is called Stacked denoising auto-\nencoder (SDA) [21].\n\n2.3 Stacked Sparse Denoising Auto-encoders\n\nIn this section, we will describe the structure and optimization objective of the proposed model\nStacked Sparse Denoising Auto-encoders (SSDA). Due to the fact that directly processing the en-\ntire image is intractable, we instead draw overlapping patches from the image as our data objects.\nIn the training phase, the model is supplied with both the corrupted noisy image patches xi, for\ni = 1, 2, ..., N, and the original patches yi. After training, SSDA will be able to reconstruct the\ncorresponding clean image given any noisy observation.\nTo combine the virtues of sparse coding and neural networks and avoid over-\ufb01tting, we train a DA\nto minimize the reconstruction loss regularized by a sparsity-inducing term:\n((cid:107)W(cid:107)2\n\n(cid:107)yi \u2212 \u02c6y(xi)(cid:107)2\n\n2 + \u03b2 KL(\u02c6\u03c1(cid:107)\u03c1) +\n\nF + (cid:107)W(cid:48)(cid:107)2\nF )\n\nL1(X, Y; \u03b8) =\n\n(6)\n\n\u03bb\n2\n\nN(cid:88)\n\ni=1\n\n1\n2\n\n1\nN\n\n3\n\n\fMethod\n25/PSNR=20.17\n30.52 \u00b1 1.02\nSSDA\nBLS-GSM 30.49 \u00b1 1.17\n30.96 \u00b1 0.77\nKSVD\n\nStandard deviation \u03c3\n\n50/PSNR=14.16\n27.37 \u00b1 1.10\n27.28 \u00b1 1.44\n27.34 \u00b1 1.11\n\n100/PSNR=8.13\n24.18 \u00b1 1.39\n24.37 \u00b1 1.36\n23.50 \u00b1 1.15\n\nTable 1: Comparison of the denoising performance. Performance is measured by Peak Signal to\nNoise Ratio (PSNR). Results are averaged over testing set.\n\nwhere\n\n| \u02c6\u03c1|(cid:88)\n\nKL(\u02c6\u03c1(cid:107)\u03c1) =\n\n\u03c1 log\n\n\u03c1\n\u02c6\u03c1j\n\n+ (1 \u2212 \u03c1) log\n\n(1 \u2212 \u03c1)\n1 \u2212 \u02c6\u03c1j\n\n,\n\n\u02c6\u03c1 =\n\n1\nN\n\nj=1\n\nN(cid:88)\n\nh(xi).\n\ni\n\nand h(\u00b7) and \u02c6y(\u00b7) are de\ufb01ned in (3), (4) respectively. Here \u02c6\u03c1 is the average activation of the hidden\nlayer. We regularize the hidden layer representation to be sparse by choosing small \u03c1 so that the KL-\ndivergence term will encourage the mean activation of hidden units to be small. Hence the hidden\nunits will be zero most of the time and achieve sparsity.\nAfter training of the \ufb01rst DA, we use h(yi) and h(xi) as the clean and noisy input respectively\nfor the second DA. This is different from the approach described in [21], where xi is discarded and\n\u03b7(h(yi)) is used as the noisy input. We point out that our method is more natural in that, since h(yi)\nlies in a different space from yi, the meaning of applying \u03b7(\u00b7) to h(yi) is not clear.\nWe then initialize a deep network with the weights obtained from K stacked DAs. The network has\none input layer, one output and 2K \u2212 1 hidden layers as shown in Fig.1b. The entire network is then\ntrained using the standard back-propagation algorithm to minimize the following objective:\n\nL2(X, Y; \u03b8) =\n\n1\nN\n\n(cid:107)yi \u2212 y(xi)(cid:107)2\n\n2 +\n\n1\n2\n\n\u03bb\n2\n\n((cid:107)Wj(cid:107)2\n\nF ).\n\n(7)\n\nN(cid:88)\n\ni=1\n\n2K(cid:88)\n\nj=1\n\nHere we removed the sparsity regularization because the pre-trained weights will serve as regular-\nization to the network [18].\nIn both of the pre-training and \ufb01ne-tuning stages, the loss functions are optimized with L-BFGS\nalgorithm (a Quasi-Newton method) which, according to [22], can achieve fastest convergence in\nour settings.\n\n3 Experiments\n\nWe narrow our focus down to denoising and inpainting of grey-scale images, but there is no dif\ufb01culty\nin generalizing to colored images. We use a set of natural images collected from the web1 as our\ntraining set and standard testing images2 as the testing set. We create noisy images from clean\ntraining and testing images by applying the function (1) to them. Image patches are then extracted\nfrom both clean and noisy images to train SSDAs. We employ Peak Signal to Noise Ratio (PSNR)\nto quantify denoising results: 10 log10(2552/\u03c32\ne is the mean squared error. PSNR is one\nof the standard indicators used for evaluating image denoising results.\n\ne ), where \u03c32\n\n3.1 Denoising White Gaussian Noise\n\nWe \ufb01rst corrupt images with additive white Gaussian noise of various standard deviations. For the\nproposed method, one SSDA model is trained for each noise level. We evaluate different hyper-\nparameter combinations and report the best result. We set K to 2 for all cases because adding more\nlayers may slightly improve the performance but require much more training time. In the meantime,\nwe try different patch sizes and \ufb01nd that higher noise level generally requires larger patch size. The\n\n1http://decsai.ugr.es/cvg/dbimagenes/\n2Widely used images commonly referred to as Lena, Barbara, Boat, Pepper, etc. in the image processing\n\ncommunity.\n\n4\n\n\fFigure 2: Visual comparison of denoising results. Results of images corrupted by white Gaussian\nnoise with standard deviation \u03c3 = 50 are shown. The last row zooms in on the outlined region of\nthe original image.\n\n5\n\n\fdimension of hidden layers is generally set to be a constant factor times the dimension of the input3.\nSSDA is not very sensitive to the weights of the regularization terms. For Bayes Least Squares-\nGaussian Scale Mixture (BLS-GSM) and KSVD method, we use the fully trained and optimized\ntoolbox obtained from the corresponding authors [2, 7]. All three models are tuned to speci\ufb01c\nnoise level of each input. The comparison of quantitative results are shown in Tab.1. Numerical\nresults showed that differences between the three algorithms are statistical insigni\ufb01cant. A visual\ncomparison is shown in Fig.2. We \ufb01nd that SSDA gives clearer boundary and restores more texture\ndetails than KSVD and BLS-GSM although the PSNR scores are close. This indicates that although\nthe reconstruction errors averaged over all pixels are the same, SSDA is better at denoising complex\nregions.\n\n3.2\n\nImage Inpainting\n\nFigure 3: Visual comparison of inpainting results.\n\nFor the image inpainting task, we test our model on the text removal problem. Both the training\nand testing set compose of images with super-imposed text of various fonts and sizes from 18-pix to\n36-pix. Due to the lack of comparable blind inpainting algorithms, We compare our method to the\nnon-blind KSVD inpainting algorithm [7], which signi\ufb01cantly simpli\ufb01es the problem by requiring\nthe knowledge of which pixels are corrupted and require inpainting. A visual comparison is shown\nin Fig.3. We \ufb01nd that SSDA is able to eliminate text of small fonts completely while text of larger\nfonts is dimmed. The proposed method, being blind, generates results comparable to KSVD\u2019s even\nthough KSVD is a non-blind algorithm. Non-blind inpainting is a well developed technology that\nworks decently on the removal of small objects. Blind inpainting, however, is much harder since it\ndemands automatic identi\ufb01cation of the patterns that requires inpainting, which, by itself is a very\nchallenging problem. To the best of our knowledge, former methods are only capable of removing\ni.i.d. or simply structured impulse noise [13, 14, 15]. SSDA\u2019s capability of blind inpainting of\ncomplex patterns is one of this paper\u2019s major contributions.\n\n6\n\n\fTraining noise\nGaussian\nSalt-and-Pepper\nImage background\n\nGaussian\n91.42%\n90.05%\n84.88%\n\nTesting noise\n\nSalt-and-Pepper\n\nImage background\n\n82.95%\n90.14%\n74.47%\n\n86.45%\n81.77%\n86.87%\n\nTable 2: Comparison of classi\ufb01cation results. Highest accuracy in each column is shown in bold\nfont.\n\n3.3 Hidden Layer Feature Analysis\n\nTraditionally when training denoising auto-encoders, the noisy training data is usually generated\nwith arbitrarily selected simple noise distribution regardless of the characteristics of the speci\ufb01c\ntraining data [21]. However, we propose that this process deserves more attention. In real world\nproblems, the clean training data is in fact usually subject to noise. Hence, if we estimate the\ndistribution of noise and exaggerate it to generate noisy training data, the resulting DA will learn to\nbe more robust to noise in the input data and produce better features.\nInspired by SSDA\u2019s ability to learn different features when trained on denoising different noise pat-\nterns, we argue that training denoising auto-encoders with noise patterns that \ufb01t to speci\ufb01c situations\ncan also improve the performance of unsupervised feature learning. We demonstrate this by a com-\nparison of classi\ufb01cation performance with different sets of features learned on the MNIST dataset.\nWe train DAs with different types of noise and then apply them to handwritten digits corrupted by\nthe type of noise they are trained on as well as other types of noise. We compare the quality of the\nlearned features by feeding them to SVMs and comparing the corresponding classi\ufb01cation accuracy.\nThe results are shown in Tab.2. We \ufb01nd that the highest classi\ufb01cation accuracy on each type of\nnoise is achieved by the DA trained to remove that type of noise. This is not surprising since more\ninformation is utilized, however it indicates that instead of arbitrarily corrupting input with noise\nthat follows simple distribution and feeding it to DA, more sophisticated methods that corrupt input\nin more realistic ways can achieve better performance.\n\n4 Discussion\n\n4.1 Prior vs. Learned Structure\n\nUnlike models relying on structural priors, our method\u2019s denoising ability comes from learning.\nSome models, for example BLS-GSM, have carefully designed structures that can give surprisingly\ngood results with random parameter settings [23]. However, randomly initialized SSDA obviously\ncan not produce any meaningful results. Therefore SSDA\u2019s ability to denoise and inpaint images is\nmostly the result of training. Whereas models that rely on structural priors usually have very limited\nscope of applications, our model can be adapted to other tasks more conveniently. With some mod-\ni\ufb01cations, it is possible to denoise audio signals or complete missing data (as a data preprocessing\nstep) with SSDA.\n\n4.2 Advantages and Limitations\n\nTraditionally, for complicated inpainting tasks, an inpainting mask that tells the algorithm which\npixels correspond to noise and require inpainting is supplied a priori. However, in various situations\nthis is time consuming or sometimes even impossible. Our approach, being blind, has signi\ufb01cant\nadvantages in such circumstances. This makes our method a suitable choice for fully automatic and\nnoise pattern speci\ufb01c image processing.\nThe limitation of our method is also obvious: SSDA strongly relies on supervised training. In our\nexperiment, we \ufb01nd that SSDA can generalize to unseen, but similar noise patterns. Generally speak-\ning, however, SSDA can remove only the noise patterns it has seen in the training data. Therefore,\n\n3We set this factor to 5. The other hyper-parameters are: \u03bb = 10\u22124, \u03b2 = 10\u22122, \u03c1 = 0.05.\n\n7\n\n\fSSDA would only be suitable in circumstances where the scope of denoising tasks is narrow, such\nas reconstructing images corrupted by a certain procedure.\n\n5 Conclusion\n\nIn this paper, we present a novel approach to image denoising and blind inpainting that combines\nsparse coding and deep neural networks pre-trained with denoising auto-encoders. We propose a\nnew training scheme for DA that makes it possible to denoise and inpaint images within a uni\ufb01ed\nframework. In the experiments, our method achieves performance comparable to traditional linear\nsparse coding algorithm on the simple task of denoising additive white Gaussian noise. Moreover,\nour non-linear approach successfully tackles the much harder problem of blind inpainting of com-\nplex patterns which, to the best of our knowledge, has not been addressed before. We also show\nthat the proposed training scheme is able to improve DA\u2019s performance in the tasks of unsupervised\nfeature learning.\nIn our future work, we would like to explore the possibility of adapting the proposed approach to var-\nious other applications such as denoising and inpainting of audio and video, image super-resolution\nand missing data completion. It is also meaningful to investigate into the effects of different hyper-\nparameter settings on the learned features.\n\n6 Acknowledgement\n\nResearch supported by grants from the National Natural Science Foundation of China (No.\n61003135 & No. 61073110), NSFC Major Program (No. 71090401/71090400), the Fundamental\nResearch Funds for the Central Universities (WK0110000022), the National Major Special Science\n& Technology Projects (No. 2011ZX04016-071), and Research Fund for the Doctoral Program of\nHigher Education of China (20093402110017, 20113402110024).\n\nReferences\n[1] J. Xu, K. Zhang, M. Xu, and Z. Zhou. An adaptive threshold method for image denoising based\non wavelet domain. Proceedings of SPIE, the International Society for Optical Engineering,\n7495:165, 2009.\n\n[2] J. Portilla, V. Strela, M.J. Wainwright, and E.P. Simoncelli.\n\nmixtures of Gaussians in the wavelet domain.\n12(11):1338\u20131351, 2003.\n\nImage denoising using scale\nImage Processing, IEEE Transactions on,\n\n[3] F. Luisier, T. Blu, and M. Unser. A new SURE approach to image denoising: Interscale\northonormal wavelet thresholding. IEEE Transactions on Image Processing, 16(3):593\u2013606,\n2007.\n\n[4] B.A. Olshausen and D.J. Field. Sparse coding with an overcomplete basis set: A strategy\n\nemployed by V1? Vision research, 37(23):3311\u20133325, 1997.\n\n[5] K. Kreutz-Delgado, J.F. Murray, B.D. Rao, K. Engan, T.W. Lee, and T.J. Sejnowski. Dictionary\n\nlearning algorithms for sparse representation. Neural computation, 15(2):349\u2013396, 2003.\n\n[6] M. Elad and M. Aharon.\n\nImage denoising via sparse and redundant representations over\n\nlearned dictionaries. IEEE Transactions on Image Processing, 15(12):3736\u20133745, 2006.\n\n[7] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEE\n\nTransactions on Image Processing, 17(1):53\u201369, 2008.\n\n[8] X. Lu, H. Yuan, P. Yan, Y. Yuan, L. Li, and X. Li. Image denoising via improved sparse coding.\n\nProceedings of the British Machine Vision Conference, pages 74\u20131, 2011.\n\n[9] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding.\nProceedings of the 26th Annual International Conference on Machine Learning, pages 689\u2013\n696, 2009.\n\n[10] A. Criminisi, P. P\u00b4erez, and K. Toyama. Region \ufb01lling and object removal by exemplar-based\n\nimage inpainting. IEEE Transactions on Image Processing, 13(9):1200\u20131212, 2004.\n\n8\n\n\f[11] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester.\n\nImage inpainting. Proceedings of\nthe 27th annual conference on Computer graphics and interactive techniques, pages 417\u2013424,\n2000.\n\n[12] A. Telea. An image inpainting technique based on the fast marching method. Journal of\n\ngraphics tools., 9(1):23\u201334, 2004.\n\n[13] B. Dong, H. Ji, J. Li, Z. Shen, and Y. Xu. Wavelet frame based blind image inpainting. Applied\n\nand Computational Harmonic Analysis, 2011.\n\n[14] Y. Wang, A. Szlam, and G. Lerman. Robust locally linear analysis with applications to image\n\ndenoising and blind inpainting. preprint, 2011.\n\n[15] M. Yan. Restoration of images corrupted by impulse noise using blind inpainting and l0 norm.\n\npreprint, 2011.\n\n[16] V. Jain and H.S. Seung. Natural image denoising with convolutional networks. Advances in\n\nNeural Information Processing Systems, 21:769\u2013776, 2008.\n\n[17] H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. Advances\n\nin Neural Information Processing Systems 20, pages 873\u2013880, 2008.\n\n[18] D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, and S. Bengio. Why does\nunsupervised pre-training help deep learning? The Journal of Machine Learning Research,\n11:625\u2013660, 2010.\n[19] Y. Bengio. Learning deep architectures for AI. Foundations and Trends R(cid:13) in Machine Learn-\n\ning, 2(1):1\u2013127, 2009.\n\n[20] R. Salakhutdinov and G.E. Hinton. Deep boltzmann machines. Proceedings of the interna-\n\ntional conference on arti\ufb01cial intelligence and statistics, 5(2):448\u2013455, 2009.\n\n[21] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol. Stacked denoising autoen-\ncoders: Learning useful representations in a deep network with a local denoising criterion. The\nJournal of Machine Learning Research, 11:3371\u20133408, 2010.\n\n[22] Q.V. Le, A. Coates, B. Prochnow, and A.Y. Ng. On optimization methods for deep learning.\n\nLearning, pages 265\u2013272, 2011.\n\n[23] S. Roth and M.J. Adviser-Black. High-order markov random \ufb01elds for low-level vision. Brown\n\nUniversity Press, 2007.\n\n9\n\n\f", "award": [], "sourceid": 184, "authors": [{"given_name": "Junyuan", "family_name": "Xie", "institution": null}, {"given_name": "Linli", "family_name": "Xu", "institution": null}, {"given_name": "Enhong", "family_name": "Chen", "institution": null}]}