Paper ID: | 2330 |
---|---|

Title: | Learning Parametric Sparse Models for Image Super-Resolution |

This paper proposes a new sparse coding approach to image up sampling, using a parametric model of the sparse code distribution (mean and standard deviation) and a PCA dictionary. The method gives good results, outperforming SRCNN, even if the recently proposed "Accurate Image Super-Resolution Using Very Deep Convolutional Networks" by Kim at al. (CVPR'16) appears to do a bit better. That paper was probably accepted after this submission, so this should not be taken as too much of a weakness.

The idea is simple but very reasonable, the new twist compared to most methods I am familiar with being the estimation of the mean and standard deviation of the sparse codes (but this idea was apparently used in [14] before). I was confused by Section 3.1 The authors first say that the sparse codes alpha_i are obtained from the data and a given dictionary by using some sparse coding algorithm. But then they decide to take a non-redundant PCA-based dictionary (why? an appropriate dictionary could have been learned, this should be justified), and also to learn a linear mapping between the LR image features and the sparse codes (again, why? how is this justified). I didn't see the definition of S_lambda, or a justification as to why its use should give sparse codes. My low grade for technical quality comes from this part of the paper, that should be clarified and justified better. Likewise, W_k does not appear in Algorithm 1.. All this part of the paper should be clarified. It is difficult to assess as is. The rest of the method (Algorithm 2) sounds reasonable, as is the fact of looking for similar patches using a first estimate of the up samples patchers. As noted before, the result as good, even if a bit inferior to Kim et al. (CVPR'16).

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

This paper proposes a new image super-resolution (SR) method which is based on two major steps: learning of sparsity model and solving a sparsity promoting optimization problem based on the learned sparsity model. The sparsity model is learned using both local patch information between LR and HR patches, and nonlocal information, i.e. patches in similarity domain. The optimization problem from the second step is solved using an alternative optimization strategy. Numerical experiments showed the advantage of the proposed method with state-of-the-art methods.

The only comment I would like to make is to add the following reference that I found: Egiazarian, Karen, and Vladimir Katkovnik. "Single image super-resolution via BM3D sparse coding." Signal Processing Conference (EUSIPCO), 2015 23rd European. IEEE, 2015. In this paper, it used the same dataset. Comparing Table 2 of the above paper with Table 1 of the current manuscript, we can see that the two methods have comparable results in terms of PSNR values.

2-Confident (read it all; understood it all reasonably well)

The proposed solution to single image super-resolution starts from and is inspired by [14] and [4] to design a method which employs sparse priors learned from both training images and input LR image and learned mapping functions between LR patches and the sparse codes of the preferable HR patches.

The method is novel, the performance is impressive on 3 standard datasets, and the paper reads well. My concerns are expressed below and hopefully will be considered for improving the paper. 1) The parameters need to be not evaluated or discussed; at least some of them should be discussed if not supported by experiments. 2) In Table 1, I think that the SCSR method is the one from [2] and not [3]. why not having the results for x2, x4? 3) In Table 1, the PSNR/SSIM values for A+[7] and BSD100 are computed after saving to JPEG compressed images, while for the other methods (SRCNN, KK, NCSR) before. In the following papers (appeared first in 2015 on arXiv) are reported better numbers for A+: [Ref1] Timofte et al, Seven ways to improve example-based single image super resolution, CVPR 2016 [Ref2] Kim et al, Accurate image super-resolution using very deep convolutional networks, CVPR 2016 4) Combining internal and external information for superior restoration performance is not new; It was shown, among others, in [Ref1] for super-resolution, and by (Mosseri et al, ICCP 2013) for denoising. 5) The experiment from 5.2, Tab. 2, is questionable. For a fair comparison it is necessary to report also the results for the example-based methods [3,7,8,10] after retraining for the conditions from the test. 6) I find necessary a discussion and comparison in run-times and/or time complexity of the methods. How much slower is the proposed method when compared with NCSR, A+ or SRCNN?

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

This paper presents a way to learning sparse model for image SR. It uses two existing ideas: 1. Learning mapping functions between the LR image and the sparse codes corresponding to the desired HR patch. 2. Estimation of sparse models from the LR image and the sparse codes. Using the estimated sparse models, the sparse codes corresponding to the HR image can be calculated using a sparse coding framework.

The paper is well written and the ideas have been explained nicely. The math and the proofs seem right as well. However, I find the paper not novel enough for NIPS. The idea of combining two working ideas into a single technique can barely be considered novel. The authors state in the abstract that the class of techniques which learn priors from LR/HR patch pairs suffer from dealing with blurred LR images, but the paper does not address this issue in its formulation, or in the experiments. Similarly for frequency aliasing for the latter class of techniques (learning prior models from LR image). Furthermore, the qualitative and quantitative improvements are very marginal which can be achieved just by parameter tunings of the existing approaches. I would urge the authors to investigate novel ways to create an SR image from either natural image priors or using only a LR image using sparse coding techniques. Further there is a strong need for extensive experimentations and quantification of results which should be better than the state-of-the-art.

2-Confident (read it all; understood it all reasonably well)

This paper presented a super-resolution approach that combine the mapping learning based approach and sparse coding based approach. The variational model is also the combination of two by adding them together. The experiments showed quite good performance gain over the existing ones. The main concerns I have is it relevance to NIPS. There is no theoretical contribution on the learning, nor the new applications of learning. It is a minor improvement on the implementation of a vision problems, as shown in its reference. Thus, I don't think this paper is appropriate for being accepted to NIPS

My main concern is about its relevance to NIPS

3-Expert (read the paper in detail, know the area, quite certain of my opinion)