{"title": "Non-Local Recurrent Network for Image Restoration", "book": "Advances in Neural Information Processing Systems", "page_first": 1673, "page_last": 1682, "abstract": "Many classic methods have shown non-local self-similarity in natural images to be an effective prior for image restoration. However, it remains unclear and challenging to make use of this intrinsic property via deep networks. In this paper, we propose a non-local recurrent network (NLRN) as the first attempt to incorporate non-local operations into a recurrent neural network (RNN) for image restoration. The main contributions of this work are: (1) Unlike existing methods that measure self-similarity in an isolated manner, the proposed non-local module can be flexibly integrated into existing deep networks for end-to-end training to capture deep feature correlation between each location and its neighborhood. (2) We fully employ the RNN structure for its parameter efficiency and allow deep feature correlation to be propagated along adjacent recurrent states. This new design boosts robustness against inaccurate correlation estimation due to severely degraded images. (3) We show that it is essential to maintain a confined neighborhood for computing deep feature correlation given degraded images. This is in contrast to existing practice that deploys the whole image. Extensive experiments on both image denoising and super-resolution tasks are conducted. Thanks to the recurrent non-local operations and correlation propagation, the proposed NLRN achieves superior results to state-of-the-art methods with many fewer parameters.", "full_text": "Non-Local Recurrent Network for Image Restoration\n\nDing Liu1, Bihan Wen1, Yuchen Fan1, Chen Change Loy2, Thomas S. Huang1\n\n1University of Illinois at Urbana-Champaign 2Nanyang Technological University\n\n{dingliu2, bwen3, yuchenf4, t-huang1}@illinois.edu ccloy@ntu.edu.sg\n\nAbstract\n\nMany classic methods have shown non-local self-similarity in natural images\nto be an effective prior for image restoration. However, it remains unclear and\nchallenging to make use of this intrinsic property via deep networks.\nIn this\npaper, we propose a non-local recurrent network (NLRN) as the \ufb01rst attempt to\nincorporate non-local operations into a recurrent neural network (RNN) for image\nrestoration. The main contributions of this work are: (1) Unlike existing methods\nthat measure self-similarity in an isolated manner, the proposed non-local module\ncan be \ufb02exibly integrated into existing deep networks for end-to-end training to\ncapture deep feature correlation between each location and its neighborhood. (2)\nWe fully employ the RNN structure for its parameter ef\ufb01ciency and allow deep\nfeature correlation to be propagated along adjacent recurrent states. This new design\nboosts robustness against inaccurate correlation estimation due to severely degraded\nimages. (3) We show that it is essential to maintain a con\ufb01ned neighborhood for\ncomputing deep feature correlation given degraded images. This is in contrast to\nexisting practice [41] that deploys the whole image. Extensive experiments on both\nimage denoising and super-resolution tasks are conducted. Thanks to the recurrent\nnon-local operations and correlation propagation, the proposed NLRN achieves\nsuperior results to state-of-the-art methods with many fewer parameters. The code\nis available at https://github.com/Ding-Liu/NLRN.\n\n1\n\nIntroduction\n\nImage restoration is an ill-posed inverse problem that aims at estimating the underlying image from its\ndegraded measurements. Depending on the type of degradation, image restoration can be categorized\ninto different sub-problems, e.g., image denoising and image super-resolution (SR). The key to\nsuccessful restoration typically relies on the design of an effective regularizer based on image priors.\nBoth local and non-local image priors have been extensively exploited in the past. Considering\nimage denoising as an example, local image properties such as Gaussian \ufb01ltering and total variation\nbased methods [31] are widely used in early studies. Later on, the notion of self-similarity in natural\nimages draws more attention and it has been exploited by non-local-based methods, e.g., non-local\nmeans [2], collaborative \ufb01ltering [8], joint sparsity [27], and low-rank modeling [15]. These non-local\nmethods are shown to be effective in capturing the correlation among non-local patches to improve\nthe restoration quality.\nWhile non-local self-similarity has been extensively studied in the literature, approaches for capturing\nthis intrinsic property with deep networks are little explored. Recent convolutional neural networks\n(CNNs) for image restoration [10, 20, 28, 49] achieve impressive performance over conventional\napproaches but do not explicitly use self-similarity properties in images. To rectify this weakness, a\nfew studies [23, 30] apply block matching to patches before feeding them into CNNs. Nevertheless,\nthe block matching step is isolated and thus not jointly trained with image restoration networks.\nIn this paper, we present the \ufb01rst attempt to incorporate non-local operations in CNN for image\nrestoration, and propose a non-local recurrent network (NLRN) as an ef\ufb01cient yet effective network\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fwith non-local module. First, we design a non-local module to produce reliable feature correlation\nfor self-similarity measurement given severely degraded images, which can be \ufb02exibly integrated\ninto existing deep networks while embracing the bene\ufb01t of end-to-end learning. For high parameter\nef\ufb01ciency without compromising restoration quality, we deploy a recurrent neural network (RNN)\nframework similar to [21, 35, 36] such that operations with shared weights are applied recursively.\nSecond, we carefully study the behavior of non-local operation in deep feature space and \ufb01nd that\nlimiting the neighborhood of correlation computation improves its robustness to degraded images. The\ncon\ufb01ned neighborhood helps concentrate the computation on relevant features in the spatial vicinity\nand disregard noisy features, which is in line with conventional image restoration approaches [8, 15].\nIn addition, we allow message passing of non-local operations between adjacent recurrent states of\nRNN. Such inter-state \ufb02ow of feature correlation facilitates more robust correlation estimation. By\ncombining the non-local operation with typical convolutions, our NLRN can effectively capture and\nemploy both local and non-local image properties for image restoration.\nIt is noteworthy that recent work has adopted similar ideas on video classi\ufb01cation [41]. However,\nour method signi\ufb01cantly differs from it in the following aspects. For each location, we measure the\nfeature correlation of each location only in its neighborhood, rather than throughout the whole image\nas in [41]. In our experiments, we show that deep features useful for computing non-local priors\nare more likely to reside in neighboring regions. A larger neighborhood (the whole image as one\nextreme) can lead to inaccurate correlation estimation over degraded measurements. In addition, our\nmethod fully exploits the advantage of RNN architecture - the correlation information is propagated\namong adjacent recurrent states to increase the robustness of correlation estimation to degradations of\nvarious degrees. Moreover, our non-local module is \ufb02exible to handle inputs of various sizes, while\nthe module in [41] handles inputs of \ufb01xed sizes only.\nWe introduce NLRN by \ufb01rst relating our proposed model to other classic and existing non-local\nimage restoration approaches in a uni\ufb01ed framework. We thoroughly analyze the non-local module\nand recurrent architecture in our NLRN via extensive ablation studies. We provide a comprehensive\ncomparison with recent competitors, in which our NLRN achieves state-of-the-art performance\nin image denoising and SR over several benchmark datasets, demonstrating the superiority of the\nnon-local operation with recurrent architecture for image restoration.\n2 Related Work\nImage self-similarity as an important image characteristic has been used in a number of non-local-\nbased image restoration approaches. The early works include bilateral \ufb01ltering [38] and non-local\nmeans [2] for image denoising. Recent approaches exploit image self-similarity by imposing spar-\nsity [27, 44]. Alternatively, similar image patches are modeled with low-rankness [15], or by\ncollaborative Wiener \ufb01ltering [8, 47]. Neighborhood embedding is a common approach for image\nSR [5, 37], in which each image patch is approximated by multiple similar patches in a manifold.\nSelf-example based image SR approaches [14, 12] exploit the local self-similarity assumption, and\nextract LR-HR exemplar pairs merely from the low-resolution image across different scales to predict\nthe high-resolution image. Similar ideas are adopted for image deblurring [9].\nDeep neural networks have been prevalent for image restoration. The pioneering works include a\nmultilayer perceptron for image denoising [3] and a three-layer CNN for image SR [10]. Deconvolu-\ntion is adopted to save computation cost and accelerate inference speed [34, 11]. Very deep CNNs are\ndesigned to boost SR accuracy in [20, 22, 24]. Dense connections among various residual blocks are\nincluded in [39]. Similarly CNN based methods are developed for image denoising in [28, 49, 50, 26].\nBlock matching as a preprocessing step is cascaded with CNNs for image denoising [23, 30]. Be-\nsides CNNs, RNNs have also been applied for image restoration while enjoying the high parameter\nef\ufb01ciency [21, 35, 36].\nIn addition to image restoration, feature correlations are widely exploited along with neural networks\nin many other areas, including graphical models [51, 4, 17], relational reasoning [32], machine\ntranslation [13, 40] and so on. We do not elaborate on them here due to the limitation of space.\n3 Non-Local Operations for Image Restoration\nIn this section, we \ufb01rst present a uni\ufb01ed framework of non-local operations used for image restoration\nmethods, e.g., collaborative \ufb01ltering [8], non-local means [2], and low-rank modeling [15], and we\ndiscuss the relations between them. We then present the proposed non-local operation module.\n\n2\n\n\f3.1 A General Framework\nIn general, a non-local operation takes a multi-channel input X \u2208 RN\u00d7m as the image feature, and\ngenerates output feature Z \u2208 RN\u00d7k. Here N and m denote the number of image pixels and data\nchannels, respectively. We propose a general framework with the following formulation:\n\nZ = diag{\u03b4(X)}\u22121 \u03a6(X) G(X) .\n\n(1)\nHere, \u03a6(X) \u2208 RN\u00d7N is the non-local correlation matrix, and G(X) \u2208 RN\u00d7k is the multi-channel\nnon-local transform. Each row vector X i denotes the local features in location i. \u03a6(X)j\ni represents\nthe relationship between the X i and X j, and each row vector G(X)j is the embedding of X j.1 The\ndiagonal matrix diag{\u03b4(X)} \u2208 RN\u00d7N normalizes the output at each i-th pixel with normalization\nfactor \u03b4i(X).\n\n3.2 Classic Methods\n\nThe proposed framework works with various classic non-local methods for image restoration, includ-\ning methods based on low-rankness [15], collaborative \ufb01ltering [8], joint sparsity [27], as well as\nnon-local mean \ufb01ltering [2].\nBlock matching (BM) is a commonly used approach for exploiting non-local image structures\nin conventional methods [15, 8, 27]. A q \u00d7 q spatial neighborhood is set to be centered at each\nlocation i, and X i reduces to the image patch centered at i. BM selects the Ki most similar patches\n(Ki (cid:28) q2) from this neighborhood, which are used jointly to restore X i. Under the proposed\nnon-local framework, these methods can be represented as\n\n(cid:88)\n\n1\n\nj\u2208Ci\n\nj\u2208Ci\n\n\u03b4i(X)\n\n\u03a6(X)j\n\n\u03a6(X)j\n\ni G(X)j , \u2200i .\n\nHere \u03b4i(X) =(cid:80)\nnon-local methods based on different models. For example, in WNNM [15],(cid:80)\n\n(2)\nZi =\ni and Ci denotes the set of indices of the Ki selected patches. Thus, each\nrow \u03a6(X)i has only Ki non-zero entries. The embedding G(X) and the non-zero elements vary for\ni G(X)j\ncorresponds to the projection of X i onto the group-speci\ufb01c subspace as a function of the selected\npatches. Speci\ufb01cally, the subspace for calculating Zi is spanned by the eigenvectors U i of X TCi\nXCi.\ni , where diag{\u03c3} is obtained by applying the shrinkage function\nThus Zi = XCiU idiag{\u03c3}U T\nassociated with the weighted nuclear norm [15] to the eigenvalues of X TCi\nXCi. We show the\ngeneralization about more classic non-local image restoration methods in the supplementary material.\nExcept for the hard block matching, other methods, e.g., the non-local means algorithm [2], apply\nsoft block matching by calculating the correlation between the reference patch and each patch in\nthe neighborhood. Each element \u03a6(X)j\ni =\n\u03c6(X i, X j), where \u03c6(\u00b7 ) is determined by the distance metric. In [2], weighted Euclidean distance\nwith Gaussian kernel is applied as the metric, such that \u03c6(X i, X j) = exp{\u2212(cid:107)X i \u2212 X j(cid:107)2\n2,a /h2}.\nBesides, identity mapping is directly used as the embedding in [2], i.e., G(X)j = X j. In this case,\nthe non-local framework in (1) reduces to\n\ni is determined only by each {X i, X j} pair, so \u03a6(X)j\n\n\u03a6(X)j\n\nj\u2208Ci\n\n1\n\n2,a\n\nh2\n\nj\u2208Si\n\n\u03b4i(X)\n\n}X j , \u2200i,\n\nZi =\nj\u2208Si\nexp{\u2212(cid:107)X i \u2212 X j(cid:107)2\n\nwhere \u03b4i(X) =(cid:80)\n\nexp{\u2212(cid:107)X i \u2212 X j(cid:107)2\n(3)\n2,a /h2} and Si is the set of indices in the neighborhood\nof X i. Note that both a and h are constants, denoting the standard deviation of Gaussian kernel, and\nthe degree of \ufb01ltering, respectively [2]. It is noteworthy that the cardinality of Si for soft BM is much\nlarger than that of Ci for hard BM, which gives more \ufb02exibility of using feature correlations between\nneighboring locations.\nThe conventional non-local methods suffer from the drawback that parameters are either \ufb01xed [2], or\nobtained by suboptimal approaches [8, 27, 15], e.g., the parameters of WNNM are learned based on\nthe low-rankness assumption, which is suboptimal as the ultimate objective is to minimize the image\nreconstruction error.\n\n(cid:88)\n\n1In our analysis, if A is a matrix, Ai, Aj, and Aj\n\ni denote its i-th row, j-th column, and the element at the\n\ni-th row and j-th column, respectively.\n\n3\n\n\fFigure 1: An illustration of our non-local module working on a single location. The white tensor denotes\nthe deep feature representation of an entire image. The red \ufb01ber is the features of this location and the blue\ntensor denotes the features in its neighborhood. \u03b8, \u03c8 and g are implemented by 1 \u00d7 1 convolution followed by\nreshaping operations.\n3.3 The Proposed Non-Local Module\n\nBased on the general non-local framework in (1), we propose another soft block matching approach\nand apply the Euclidean distance with linearly embedded Gaussian kernel [41] as the distance metric.\nThe linear embeddings are de\ufb01ned as follows:\n\n\u03a6(X)j\n\ni = \u03c6(X i, X j) = exp{\u03b8(X i)\u03c8(X j)T} , \u2200i, j ,\n\n\u03b8(X i) = X iW \u03b8, \u03c8(X i) = X iW \u03c8, G(X)i = X iW g , \u2200i .\n\n(4)\n(5)\nThe embedding transforms W \u03b8, W \u03c6, and W g are all learnable and have the shape of m \u00d7 l, m \u00d7\nl, m \u00d7 m, respectively. Thus, the proposed non-local operation can be written as\nj } X iW g , \u2200i ,\n\n(cid:88)\n\nexp{X iW \u03b8W T\n\n\u03c8 X T\n\n(6)\n\nwhere \u03b4i(X) =(cid:80)\n\nZi =\n\n1\n\n\u03b4i(X)\n\nj\u2208Si\n\nj\u2208Si\n\n\u03c6(X i, X j). Similar to [2], to obtain Zi, we evaluate the correlation between\nX i and each X j in the neighborhood Si. More choices of \u03c6(X i, X j) are discussed in Section 5.\nThe proposed non-local operation can be implemented by common differentiable operations, and thus\ncan be jointly learned when incorporated into a neural network. We wrap it as a non-local module\nby adding a skip connection, as shown in Figure 1, since the skip connection enables us to insert a\nnon-local module into any pre-trained model, while maintaining its initial behavior by initializing\nW g as zero. Such a module introduces only a limited number of parameters since \u03b8, \u03c8 and g are\n1 \u00d7 1 convolutions and m = 128, l = 64 in practice. The output of this module on each location only\ndepends on its q \u00d7 q neighborhood, so this operation can work on inputs of various sizes.\nRelation to Other Methods: Recent works have combined non-local BM and neural networks\nfor image restoration [30, 23, 41]. Lefkimmiatis [23] proposed to \ufb01rst apply BM to noisy image\npatches. The hard BM results are used to group patch features, and a CNN conducts a trainable\ncollaborative \ufb01ltering over the matched patches. Qiao et al. [30] combined similar non-local BM\nwith TNRD networks [7] for image denoising. However, as conventional methods [8, 27, 15], these\nworks [23, 30] conduct hard BM directly over degraded input patches, which may be inaccurate over\nseverely degraded images. In contrast, our proposed non-local operation as soft BM is applied on\nlearned deep feature representations that are more robust to degradation. Furthermore, the matching\nresults in [23] are isolated from the neural network, similar to the conventional approaches, whereas\nthe proposed non-local module is trained jointly with the entire network in an end-to-end manner.\nWang et al. [41] used similar approaches to add non-local operations into neural networks for high-\nlevel vision tasks. However, unlike our approach, Wang et al. [41] calculated feature correlations\nthroughout the whole image. which is equivalent to enlarging the neighborhood to the entire image in\nour approach. We empirically show that increasing the neighborhood size does not always improve\nimage restoration performance, due to the inaccuracy of correlation estimation over degraded input\nimages. Hence it is imperative to choose a neighborhood of a proper size to achieve best performance\nfor image restoration. In addition, the non-local operation in [41] can only handle input images of\n\ufb01xed size, while our module in (6) is \ufb02exible to various image sizes. Finally, our non-local module,\nwhen incorporated into an RNN framework, allows the \ufb02ow of correlation information between\nadjacent states to enhance robustness against inaccurate correlation estimation. This is a new unique\nformulation to deal with degraded images. More details are provided next.\n\n4\n\nsoftmax 1\u00d7\ud835\udc59 \ud835\udc59\u00d7\ud835\udc5e2 1\u00d7\ud835\udc5a 1\u00d7\ud835\udc5e2 \ud835\udc5e2\u00d7\ud835\udc5a \ud835\udc5e \ud835\udc5e 1 \ud835\udc5a 1 \ud835\udc5a \ud835\udc5e\u00d7\ud835\udc5e\u00d7\ud835\udc5a 1\u00d71\u00d7\ud835\udc5a 1\u00d71\u00d7\ud835\udc5a 1 1 \ud835\udc5a \ud835\udf03:1\u00d71\u00d7\ud835\udc59 \ud835\udf13:1\u00d71\u00d7\ud835\udc59 \ud835\udc54:1\u00d71\u00d7\ud835\udc5a \fFigure 2: An illustration of the transition function\nfrecurrent in the proposed NLRN.\n\nFigure 3: The operations for a single location i in the\nnon-local module used in NLRN.\n\n4 Non-Local Recurrent Network\nIn this section, we describe the RNN architecture that incorporates the non-local module to form\nour NLRN. We adopt the common formulation of an RNN, which consists of a set of states, namely,\ninput state, output state and recurrent state, as well as transition functions among the states. The\ninput, output, and recurrent states are represented as x, y and s respectively. At each time step t,\nan RNN receives an input xt, and the recurrent state and the output state of the RNN are updated\nrecursively as follows:\n\n(7)\nwhere finput, foutput, and frecurrent are reused at every time step. In our NLRN, we set the following:\n\nyt = foutput(st),\n\nst = finput(xt) + frecurrent(st\u22121),\n\n\u2022 s0 is a function of the input image I.\n\u2022 xt = 0, \u2200t \u2208 {1, . . . , T}, and finput(0) = 0.\n\u2022 The output state yt is calculated only at the time T as the \ufb01nal output.\n\nfeat, st\n\nfeat denotes the feature map in time t and st\n\nWe add an identity path from the very \ufb01rst state which helps gradient backpropagation during\ntraining [35], and a residual path of the deep feature correlation between each location and its\nneighborhood from the previous state. Hence, st = {st\ncorr}, and st = frecurrent(st\u22121, s0), \u2200t \u2208\n{1, . . . , T}, where st\ncorr is the collection of deep feature\ncorrelation. For the transition function frecurrent, a non-local module is \ufb01rst adopted and is followed\nby two convolutional layers, before the feature s0 is added from the identity path. The weights in the\nnon-local module are shared across recurrent states just as convolutional layers, so our NLRN still\nkeeps high parameter ef\ufb01ciency as a whole. An illustration is displayed in Figure 2.\nIt is noteworthy that inside the non-local module, the feature correlation for location i from the\nprevious state, st\u22121\ncorr,i, is added to the estimated feature correlation in the current state before the\nsoftmax normalization, which enables the propagation of correlation information between adjacent\nstates for more robust correlation estimation. The details can be found in Figure 3. The initial\nstate s0 is set as the feature after a convolutional layer on the input image. foutput is represented\nby another single convolutional layer. All layers have 128 \ufb01lters with 3 \u00d7 3 kernel size except for\nthe non-local module. Batch normalization and ReLU activation function are performed ahead of\neach convolutional layer following [18]. We adopt residual learning and the output of NLRN is the\nresidual image \u02c6I = foutput(sT ) when NLRN is unfolded T times. During training, the objective is to\nminimize the mean square error L( \u02c6I, \u02dcI) = 1\n2|| \u02c6I + I \u2212 \u02dcI||2, where \u02dcI denotes the ground truth image.\nRelation to Other RNN Methods: Although RNNs have been adopted for image restoration before,\nour NLRN is the \ufb01rst to incorporate non-local operations into an RNN framework with correlation\npropagation. DRCN [21] recursively applies a single convolutional layer to the input feature map\nmultiple times without the identity path from the \ufb01rst state. DRRN [35] applies both the identity path\nand the residual path in each state, but without non-local operations, and thus there is no correlation\ninformation \ufb02ow across adjacent states. MemNet [36] builds dense connections among several types\nof memory blocks, and weights are shared in the same type of memory blocks but are different across\nvarious types. Compared with MemNet, our NLRN has an ef\ufb01cient yet effective RNN structure with\nshallower effective depth and fewer parameters, but obtains better restoration performance, which is\nshown in Section 5 in detail.\n5 Experiments\nDataset: For image denoising, we adopt two different settings to fairly and comprehensively compare\nwith recent deep learning based methods [28, 23, 49, 36]: (1) As in [7, 49, 23], we choose as the\n\n5\n\nNon-local module conv conv \ud835\udc94feat\ud835\udc61\u22121 \ud835\udc94feat0 \ud835\udc94corr\ud835\udc61\u22121 \ud835\udc94corr\ud835\udc61 \ud835\udc94feat\ud835\udc61 \ud835\udc94nl\ud835\udc61 softmax 1\u00d7\ud835\udc59 \ud835\udc59\u00d7\ud835\udc5e2 1\u00d7\ud835\udc5a 1\u00d7\ud835\udc5e2 \ud835\udc5e2\u00d7\ud835\udc5a \ud835\udc5e\u00d7\ud835\udc5e\u00d7\ud835\udc5a 1\u00d71\u00d7\ud835\udc5a 1\u00d71\u00d7\ud835\udc5a \ud835\udf03 \ud835\udf13 \ud835\udc54 \ud835\udc94feat,\ud835\udc56\ud835\udc61\u22121 \ud835\udc94feat,\ud835\udc5b\ud835\udc56\ud835\udc61\u22121 1\u00d7\ud835\udc5e2 \ud835\udc94corr,\ud835\udc56\ud835\udc61\u22121 1\u00d7\ud835\udc5e2 \ud835\udc94corr,\ud835\udc56\ud835\udc61 \ud835\udc94nl,\ud835\udc56\ud835\udc61 \ftraining set the combination of 200 images from the train set and 200 images from the test set in the\nBerkeley Segmentation Dataset (BSD) [29], and test on two popular benchmarks: Set12 and Set68\nwith \u03c3 = 15, 25, 50 following [49]. (2) As in [28, 36], we use as the training set the combination of\n200 images from the train set and 100 images from the val set in BSD, and test on Set14 and the\nBSD test set of 200 images with \u03c3 = 30, 50, 70 following [28, 36]. In addition, we evaluate our\nNLRN on the Urban100 dataset [19], which contains abundant structural patterns and textures, to\nfurther demonstrate the capability of using image self-similarity of our NLRN. The training set and\ntest set are strictly disjoint and all the images are converted to gray-scale in each experiment setup.\nFor image SR, we follow [20, 35, 36] and use a training set of 291 images where 91 images are\nproposed in [46] and other 200 are from the BSD train set. We adopt four benchmark sets: Set5 [1],\nSet14 [48], BSD100 [29] and Urban100 [19] for testing with three upscaling factors: \u00d72, \u00d73 and\n\u00d74. The low-resolution images are synthesized by bicubic downsampling.\nTraining Settings: We randomly sample patches whose size equals the neighborhood of non-local\noperation from images during training. We use \ufb02ipping, rotation and scaling for augmenting training\ndata. For image denoising, we add independent and identically distributed Gaussian noise with zero\nmean to the original image as the noisy input during training. We train a different model for each\nnoise level. For image SR, only the luminance channel of images is super-resolved, and the other two\ncolor channels are upscaled by bicubic interpolation, following [20, 21, 35]. Moreover, the training\nimages for all three upscaling factors: \u00d72, \u00d73 and \u00d74 are upscaled by bicubic interpolation into the\ndesired spatial size and are combined into one training set. We use this set to train one single model\nfor all these three upscaling factors as in [20, 35, 36].\nWe use Adam optimizer to minimize the loss function. We set the initial learning rate as 1e-3 and\nreduce it by half \ufb01ve times during training. We use Xavier initialization for the weights. We clip\nthe gradient at the norm of 0.5 to prevent the gradient explosion which is shown to empirically\naccelerate training convergence, and we adopt 16 as the minibatch size during training. Training a\nmodel takes about 3 days with a Titan Xp GPU. For non-local module, we use circular padding for\nthe neighborhood outside input patches. For convolution, we pad the boundaries of feature maps with\nzeros to preserve the spatial size of feature maps.\n\n5.1 Model Analysis\n\nIn this section, we analyze our model in the following aspects. First, we conduct the ablation study of\nusing different distance metrics in the non-local module. Table 1 compares instantiations including\nEuclidean distance, dot product, embedded dot product, Gaussian, symmetric embedded Gaussian\nand embedded Gaussian when used in NLRN of 12 unfolded steps. Embedded Gaussian achieves the\nbest performance and is adopted in the following experiments.\nWe compare the NLRN with its variants in terms of PSNR in Table 2. We have a few observations.\nFirst, the same model with untied weights performs worse than its weight-sharing counter-part. We\nspeculate that the model with untied weights is prone to model over-\ufb01tting and suffers much slower\ntraining convergence, both of which undermine its performance. To investigate the function of non-\nlocal modules, we implement a baseline RNN with the same parameter number of NLRN, and \ufb01nd it\nis worse than NLRN by about 0.2 dB, showing the advantage of using non-local image properties for\nimage restoration. Besides, we implement NLRNs where non-local module is used in every other\nstate or every three states, and observe that if the frequency of using non-local modules in NLRN\nis reduced, the performance decreases accordingly. We show the bene\ufb01t of propagating correlation\ninformation among adjacent states by comparing with the counter-part in terms of restoration accuracy.\nTo further analyze the non-local module, we visualize the feature correlation maps for non-local\noperations in Figure 4. It can be seen that as the number of recurrent states increases, the locations\n\nTable 1: Image denoising comparison of our proposed model\nwith various distance metrics on Set12 with noise level of 25.\n\nDistance metric\n\nEuclidean distance\n\nDot product\n\nEmbedded dot product\n\nGaussian\n\nSymmetric embedded Gaussian\n\nEmbedded Gaussian\n\n\u03c6(X i, X j)\nexp{\u2212(cid:107)X i \u2212 X j(cid:107)2\n\n2 /h2}\n\nX iX T\nj\n\u03b8(X i)\u03c8(X j)T\nexp{X iX T\nj }\n\nexp{\u03b8(X i)\u03b8(X j)T}\nexp{\u03b8(X i)\u03c8(X j)T}\n\nPSNR\n30.74\n30.68\n30.75\n30.69\n30.76\n30.80\n\n6\n\nTable 2: Image denoising comparison of our\nNLRN with its variants on Set12 with noise\nlevel of 25.\n\nModel\n\nNLRN\n\nNLRN w/o parameter sharing\nRNN with same parameter no.\n\nNon-local module in every other state\nNon-local module in every 3 states\nNLRN w/o propagating correlations\n\nPSNR\n30.65\n30.61\n30.76\n30.72\n30.78\n30.80\n\n\fFigure 4: Examples of correlation maps of non-local operations for\nimage denoising. Noisy patch/ground truth patch: the neighborhood of\nthe red center pixel used in non-local operations. (1)-(6): the correlation\nmap for recurrent state 1-6 from NLRN with unrolling length of 6.\n\nFigure 5: Neighborhood size vs.\nimage denoising performance of\nour proposed model on Set12 with\nnoise level of 25.\n\nMax effective depth\nParameter sharing\n\nParameter no.\n\nMulti-view testing\nTraining images\n\nPSNR\n\nDnCNN\n\n17\nNo\n554k\nNo\n400\n27.18\n\nRED MemNet\n30\nNo\n\n80\nYes\n667k\nNo\n300\n27.38\n\n4,131k\n\nYes\n300\n27.33\n\nNLRN\n\n38\nYes\n330k\nNo\n300\n27.60\n\nNo\n400\n27.64\n\nYes\n300\n27.66\n\nFigure 6: Unrolling length vs.\nimage denoising performance of\nour proposed model on Set12 with\nnoise level of 25.\n\nTable 3: Image denoising comparison of our proposed model with state-\nof-the-art network models on Set12 with noise level of 50. Model com-\nplexities are also compared.\n\nwith similar features progressively show higher correlations in the map, which demonstrates the\neffectiveness of the non-local module for exploiting image self-similarity.\nFigure 5 investigates the in\ufb02uence of the neighborhood size in the non-local module on image\ndenoising results. The performance peaks at q = 45. This shows that limiting the neighborhood\nhelps concentrate the correlation calculation on relevant features in the spatial vicinity and enhance\ncorrelation estimation. Therefore, it is necessary to choose a proper neighborhood size (rather than\nthe whole image) for image restoration. We select q = 45 for the rest of this paper unless stated\notherwise.\nThe unrolling length T determines the maximum effective depth (i.e., maximum number of convolu-\ntional layers) of NLRN. The in\ufb02uence of the unrolling length on image denoising results is shown in\nFigure 6. The performance increases as the unrolling length rises, but gets saturated after T = 12.\nGiven the tradeoff between restoration accuracy and inference time, we adopt T = 12 for NLRN in\nall the experiments.\n\n5.2 Comparisons with State-of-the-Art Methods\nWe compare our proposed model with a number of recent competitors for image denoising and\nimage SR, respectively. PSNR and SSIM [42] are adopted for measuring quantitative restoration\nperformance.\nImage Denoising: For a fair comparison with other methods based on deep networks, we train our\nmodel under two settings: (1) We use the training data as in TNRD [7], DnCNN [49] and NLNet [23],\nand the result is shown in Table 4. We cite the result of NLNet in the original paper [23], since no\npublic code or model is available. (2) We use the training data as in RED [28] and MemNet [36], and\nthe result is shown in Table 5. We note that RED uses multi-view testing [43] to boost the restoration\naccuracy, i.e., RED processes each test image as well as its rotated and \ufb02ipped versions, and all\nthe outputs are then averaged to form the \ufb01nal denoised image. Accordingly, we perform the same\nprocedure for NLRN and \ufb01nd its performance, termed as NLRN-MV, is consistently improved. In\naddition, we include recent non-deep-learning based methods: BM3D [8] and WNNM [15] in our\ncomparison. We do not list other methods [52, 3, 45, 6, 50] whose average performances are worse\nthan DnCNN or MemNet. Our NLRN signi\ufb01cantly outperforms all the competitors on Urban100 and\nyields the best results across almost all the noise levels and datasets.\nTo further show the advantage of the network design of NLRN, we compare different versions of\nNLRN with several state-of-the-art network models, i.e., DnCNN, RED and MemNet in Table 3.\nNLRN uses the fewest parameters but outperforms all the competitors. Speci\ufb01cally, NLRN bene\ufb01ts\n\n7\n\n25303540455055Neighborhood Size q30.7030.7230.7430.7630.7830.8030.82PSNR (dB)46810121416Unroll Length T30.6030.6530.7030.7530.8030.85PSNR (dB)\fTable 4: Benchmark image denoising results. Training and testing protocols are followed as in [49]. Average\nPSNR/SSIM for various noise levels on Set12, BSD68 and Urban100. The best performance is in bold.\n\nDataset\n\nNoise\n\nBM3D\n\nWNNM\n\nTNRD\n\nNLNet\n\nDnCNN\n\nNLRN\n\nSet12\n\nBSD68\n\nUrban100\n\n15\n25\n50\n15\n25\n50\n15\n25\n50\n\n32.37/0.8952\n29.97/0.8504\n26.72/0.7676\n31.07/0.8717\n28.57/0.8013\n25.62/0.6864\n32.35/0.9220\n29.70/0.8777\n25.95/0.7791\n\n32.70/0.8982\n30.28/0.8557\n27.05/0.7775\n31.37/0.8766\n28.83/0.8087\n25.87/0.6982\n32.97/0.9271\n30.39/0.8885\n26.83/0.8047\n\n32.50/0.8958\n30.06/0.8512\n26.81/0.7680\n31.42/0.8769\n28.92/0.8093\n25.97/0.6994\n31.86/0.9031\n29.25/0.8473\n25.88/0.7563\n\n-/-\n-/-\n-/-\n\n31.52/-\n29.03/-\n26.07/-\n\n-/-\n-/-\n-/-\n\n32.86/0.9031\n30.44/0.8622\n27.18/0.7829\n31.73/0.8907\n29.23/0.8278\n26.23/0.7189\n32.68/0.9255\n29.97/0.8797\n26.28/0.7874\n\n33.16/0.9070\n30.80/0.8689\n27.64/0.7980\n31.88/0.8932\n29.41/0.8331\n26.47/0.7298\n33.45/0.9354\n30.94/0.9018\n27.49/0.8279\n\nTable 5: Benchmark image denoising results. Training and testing protocols are followed as in [36]. Average\nPSNR/SSIM for various noise levels on 14 images, BSD200 and Urban100. Red is the best and blue is the\nsecond best performance.\n\nDataset\n\nNoise\n\nBM3D\n\nWNNM\n\nRED\n\nMemNet\n\nNLRN\n\n14 images\n\nBSD200\n\nUrban100\n\n30\n50\n70\n30\n50\n70\n30\n50\n70\n\n28.49/0.8204\n26.08/0.7427\n24.65/0.6882\n27.31/0.7755\n25.06/0.6831\n23.82/0.6240\n28.75/0.8567\n25.95/0.7791\n24.27/0.7163\n\n28.74/0.8273\n26.32/0.7517\n24.80/0.6975\n27.48/0.7807\n25.26/0.6928\n23.95/0.6346\n29.47/0.8697\n26.83/0.8047\n25.11/0.7501\n\n29.17/0.8423\n26.81/0.7733\n25.31/0.7206\n27.95/0.8056\n25.75/0.7167\n24.37/0.6551\n29.12/0.8674\n26.44/0.7977\n24.75/0.7415\n\n29.22/0.8444\n26.91/0.7775\n25.43/0.7260\n28.04/0.8053\n25.86/0.7202\n24.53/0.6608\n29.10/0.8631\n26.65/0.8030\n25.01/0.7496\n\n29.37/0.8460\n27.00/0.7777\n25.49/0.7255\n28.15/0.8423\n25.93/0.7214\n24.58/0.6614\n29.94/0.8830\n27.38/0.8241\n25.66/0.7707\n\nNLRN-MV\n29.41/0.8472\n27.05/0.7791\n25.54/0.7273\n28.20/0.8436\n25.97/0.8429\n24.62/0.6634\n29.99/0.8842\n27.43/0.8256\n25.71/0.7724\n\nTable 6: Benchmark SISR results. Average PSNR/SSIM for scale factor \u00d72, \u00d73 and \u00d74 on datasets Set5, Set14,\nBSD100 and Urban100. The best performance is in bold.\n\nDataset\n\nSet5\n\nSet14\n\nBSD100\n\nUrban100\n\nScale\n\u00d72\n\u00d73\n\u00d74\n\u00d72\n\u00d73\n\u00d74\n\u00d72\n\u00d73\n\u00d74\n\u00d72\n\u00d73\n\u00d74\n\nSRCNN\n\nVDSR\n\nDRCN\n\nLapSRN\n\nDRRN\n\nMemNet\n\nNLRN\n\n36.66/0.9542\n32.75/0.9090\n30.48/0.8628\n32.45/0.9067\n29.30/0.8215\n27.50/0.7513\n31.36/0.8879\n28.41/0.7863\n26.90/0.7101\n29.50/0.8946\n26.24/0.7989\n24.52/0.7221\n\n37.53/0.9587\n33.66/0.9213\n31.35/0.8838\n33.03/0.9124\n29.77/0.8314\n28.01/0.7674\n31.90/0.8960\n28.82/0.7976\n27.29/0.7251\n30.76/0.9140\n27.14/0.8279\n25.18/0.7524\n\n37.63/0.9588\n33.82/0.9226\n31.53/0.8854\n33.04/0.9118\n29.76/0.8311\n28.02/0.7670\n31.85/0.8942\n28.80/0.7963\n27.23/0.7233\n30.75/0.9133\n27.15/0.8276\n25.14/0.7510\n\n37.52/0.959\n33.82/0.923\n31.54/0.885\n33.08/0.913\n29.79/0.832\n28.19/0.772\n31.80/0.895\n28.82/0.797\n27.32/0.728\n30.41/0.910\n27.07/0.827\n25.21/0.756\n\n37.74/0.9591\n34.03/0.9244\n31.68/0.8888\n33.23/0.9136\n29.96/0.8349\n28.21/0.7721\n32.05/0.8973\n28.95/0.8004\n27.38/0.7284\n31.23/0.9188\n27.53/0.8378\n25.44/0.7638\n\n37.78/0.9597\n34.09/0.9248\n31.74/0.8893\n33.28/0.9142\n30.00/0.8350\n28.26/0.7723\n32.08/0.8978\n28.96/0.8001\n27.40/0.7281\n31.31/0.9195\n27.56/0.8376\n25.50/0.7630\n\n38.00/0.9603\n34.27/0.9266\n31.92/0.8916\n33.46/0.9159\n30.16/0.8374\n28.36/0.7745\n32.19/0.8992\n29.06/0.8026\n27.48/0.7306\n31.81/0.9249\n27.93/0.8453\n25.79/0.7729\n\nfrom inherent parameter sharing and uses only less than 1/10 parameters of RED. Compared with the\nRNN competitor, MemNet, NLRN uses only half of parameters and much shallower depth to obtain\nbetter performance, which shows the superiority of our non-local recurrent architecture.\nImage Super-Resolution: We compare our model with several recent SISR approaches, including\nSRCNN [10], VDSR [20], DRCN [21], LapSRN [22], DRRN [35] and MemNet [36] in Table 6. We\ncrop pixels near image borders before calculating PSNR and SSIM as in [10, 33, 20, 21]. We do\nnot list other methods [19, 33, 25, 34, 16] since their performances are worse than that of DRRN or\nMemNet. Besides, we do not include SRDenseNet [39] and EDSR [24] in the comparison because\nthe number of parameters in these two network models is over two orders of magnitude larger than\nthat of our NLRN and their training datasets are signi\ufb01cantly larger than ours. It can be seen that\nNLRN yields the best result across all the upscaling factors and datasets. Visual results are provided\nin the supplementary material.\n6 Conclusion\nWe have presented a new and effective recurrent network that incorporates non-local operations for\nimage restoration. The proposed non-local module can be trained end-to-end with the recurrent\nnetwork. We have studied the importance of computing reliable feature correlations within a con\ufb01ned\nneighorhood against the whole image, and have shown the bene\ufb01ts of passing feature correlation\nmessages between adjacent recurrent stages. Comprehensive evaluations over benchmarks for image\ndenoising and super-resolution demonstrate the superiority of NLRN over existing methods.\n\n8\n\n\fReferences\n[1] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-\n\nresolution based on nonnegative neighbor embedding. 2012.\n\n[2] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm for image denoising. In CVPR, 2005.\n[3] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with\n\nbm3d? In CVPR, 2012.\n\n[4] S. Chandra, N. Usunier, and I. Kokkinos. Dense and low-rank gaussian crfs using deep embeddings. In\n\nICCV, 2017.\n\n[5] H. Chang, D.-Y. Yeung, and Y. Xiong. Super-resolution through neighbor embedding. In CVPR, 2004.\n[6] F. Chen, L. Zhang, and H. Yu. External patch prior guided internal clustering for image denoising. In\n\nICCV, 2015.\n\n[7] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A \ufb02exible framework for fast and effective\n\nimage restoration. IEEE TPAMI, 2017.\n\n[8] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-domain\n\ncollaborative \ufb01ltering. IEEE TIP, 2007.\n\n[9] A. Danielyan, V. Katkovnik, and K. Egiazarian. Bm3d frames and variational image deblurring. TIP,\n\n21(4):1715\u20131728, 2012.\n\n[10] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep convolutional network for image super-resolution.\n\nIn ECCV, 2014.\n\n[11] C. Dong, C. C. Loy, and X. Tang. Accelerating the super-resolution convolutional neural network. In\n\nECCV, 2016.\n\n[12] G. Freedman and R. Fattal. Image and video upscaling from local self-examples. ACM Transactions on\n\nGraphics (TOG), 2011.\n\n[13] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin. Convolutional sequence to sequence\n\nlearning. In ICML, 2017.\n\n[14] D. Glasner, S. Bagon, and M. Irani. Super-resolution from a single image. In ICCV, 2009.\n[15] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image\n\ndenoising. In CVPR, pages 2862\u20132869, 2014.\n\n[16] W. Han, S. Chang, D. Liu, M. Yu, M. Witbrock, and T. S. Huang. Image super-resolution via dual-state\n\nrecurrent networks. In CVPR, June 2018.\n\n[17] A. W. Harley, K. G. Derpanis, and I. Kokkinos. Segmentation-aware convolutional networks using local\n\nattention masks. In ICCV, 2017.\n\n[18] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV, 2016.\n[19] J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In\n\nCVPR, 2015.\n\n[20] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional\n\nnetworks. In CVPR, 2016.\n\n[21] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution.\n\nIn CVPR, 2016.\n\n[22] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate\n\nsuper-resolution. In CVPR, 2017.\n\n[23] S. Lefkimmiatis. Non-local color image denoising with convolutional neural networks. In CVPR, 2017.\n[24] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image\n\nsuper-resolution. In CVPR Workshops, 2017.\n\n[25] D. Liu, Z. Wang, B. Wen, J. Yang, W. Han, and T. S. Huang. Robust single image super-resolution via\n\ndeep networks with sparse prior. TIP, 25(7):3194\u20133207, 2016.\n\n[26] D. Liu, B. Wen, X. Liu, Z. Wang, and T. S. Huang. When image denoising meets high-level vision tasks:\n\nA deep learning approach. In IJCAI, 2018.\n\n[27] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration.\n\nIn ICCV, 2009.\n\n[28] X. Mao, C. Shen, and Y.-B. Yang. Image restoration using very deep convolutional encoder-decoder\n\nnetworks with symmetric skip connections. In NIPS, 2016.\n\n[29] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its\n\napplication to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.\n\n[30] P. Qiao, Y. Dou, W. Feng, R. Li, and Y. Chen. Learning non-local image diffusion for image denoising. In\n\nACM on Multimedia Conference, 2017.\n\n[31] L. I. Rudin and S. Osher. Total variation based image restoration with free local constraints. In ICIP, 1994.\n[32] A. Santoro, D. Raposo, D. G. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, and T. Lillicrap. A simple\n\nneural network module for relational reasoning. In NIPS, 2017.\n\n[33] S. Schulter, C. Leistner, and H. Bischof. Fast and accurate image upscaling with super-resolution forests.\n\nIn CVPR, 2015.\n\n9\n\n\f[34] W. Shi, J. Caballero, F. Husz\u00e1r, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang. Real-time\nsingle image and video super-resolution using an ef\ufb01cient sub-pixel convolutional neural network. In\nCVPR, 2016.\n\n[35] Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In CVPR, 2017.\n[36] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In ICCV,\n\n2017.\n\n[37] R. Timofte, V. De, and L. Van Gool. Anchored neighborhood regression for fast example-based super-\n\nresolution. In ICCV, 2013.\n\n[38] C. Tomasi and R. Manduchi. Bilateral \ufb01ltering for gray and color images. In ICCV, 1998.\n[39] T. Tong, G. Li, X. Liu, and Q. Gao. Image super-resolution using dense skip connections. In ICCV, 2017.\n[40] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \u0141. Kaiser, and I. Polosukhin.\n\nAttention is all you need. In NIPS, 2017.\n\n[41] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. arXiv preprint arXiv:1711.07971,\n\n2017.\n\n[42] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility\n\nto structural similarity. IEEE TIP, 2004.\n\n[43] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang. Deep networks for image super-resolution with sparse\n\nprior. In ICCV, 2015.\n\n[44] B. Wen, S. Ravishankar, and Y. Bresler. Structured overcomplete sparsifying transform learning with\n\nconvergence guarantees and applications. IJCV, 2015.\n\n[45] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng. Patch group based nonlocal self-similarity prior learning\n\nfor image denoising. In ICCV, 2015.\n\n[46] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image super-resolution via sparse representation. IEEE TIP,\n\n2010.\n\n[47] R. Yin, T. Gao, Y. M. Lu, and I. Daubechies. A tale of two bases: Local-nonlocal regularization on image\n\npatches with convolution framelets. SIAM Journal on Imaging Sciences, 10(2):711\u2013750, 2017.\n\n[48] R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International\n\nconference on curves and surfaces, 2010.\n\n[49] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of\n\ndeep cnn for image denoising. IEEE TIP, 2017.\n\n[50] K. Zhang, W. Zuo, S. Gu, and L. Zhang. Learning deep cnn denoiser prior for image restoration. In CVPR,\n\n2017.\n\n[51] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr. Conditional\n\nrandom \ufb01elds as recurrent neural networks. In ICCV, 2015.\n\n[52] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In\n\nICCV, 2011.\n\n10\n\n\f", "award": [], "sourceid": 860, "authors": [{"given_name": "Ding", "family_name": "Liu", "institution": "Bytedance AI Lab"}, {"given_name": "Bihan", "family_name": "Wen", "institution": "University of Illinois at Urbana-Champaign"}, {"given_name": "Yuchen", "family_name": "Fan", "institution": "Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign"}, {"given_name": "Chen Change", "family_name": "Loy", "institution": "Nanyang Technological University"}, {"given_name": "Thomas", "family_name": "Huang", "institution": "UIUC"}]}