NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3776
Title:High-Quality Self-Supervised Deep Image Denoising

Reviewer 1

Originality: Good Quality: Technically sound Clarity: Could use improvement Significance: Moderately significant Weaknesses: -Trains with lots of images (50k); this seems to preclude the methods use in the fields, e.g. biomedical, where you'd like to train without ground truth data. -Only tests synthetic data. -Missing some related work that uses unbiased risk estimators to train denoisers without GT data. These methods already "reach similar quality as compareable models trained using reference clean data". [A] Soltanayev, Shakarim, and Se Young Chun. "Training deep learning based denoisers without ground truth data." Advances in Neural Information Processing Systems. 2018. [B] Metzler, Christopher A., et al. "Unsupervised Learning with Stein's Unbiased Risk Estimator." arXiv preprint arXiv:1805.10531 (2018). [C] Cha, Sungmin, and Taesup Moon. "Fully convolutional pixel adaptive image denoiser." arXiv preprint arXiv:1807.07569 (2018). Minor: On first reading the text, it was unclear that the mean and covariance matrix were across the three color channels. This could be stated explicitly for clarity.

Reviewer 2

1. In the comparison, which method is used as the baseline? Which one is N2C? 2. Why not compare with Noise2Void method?

Reviewer 3

Pros: -The Bayesian analysis with different noise models is interesting. The ablation study is carefully done and confirms the importance of the central pixel integration at test time. This is an important result and may be used in future works. I also find interesting that performance is not too degraded when noise level is unknown. -The experimental results show that their method perform almost as well as their Noise2Clean and Noise2Noise baselines over 3 datasets with different types of noise. It suggests the potential for image denoising using only single instances of corrupted images as training data. -The new convolutional architecture with receptive fields restricted to a half plane is also a nice contribution. The four rotated branches with shared kernels between the branches followed by 1X1 convolutions makes sense. Limitations: -The authors compare their model to their own baselines (Noise2Clean, Noise2Noise). They only provide a comparison with BM3D which is training free. Whereas their method and their baselines are trained on a subset of the imagenet validation set. I think it is important to compare their results to some existing N2C state-of-the-art methods. -In addition, previous N2C models tend to be trained on much smaller datasets. Could the author comment on that ? Does their method give strong performance when trained on a very large dataset only ? Would it be possible to compare their method and their baselines to some existing N2C methods on smaller datasets? -Finally it is not clear to me why the comparison to the masking-based strategy is defer to the supplements because the new architecture is a major contribution of the paper. They claim that their model is 10-100X faster to train than Noise2Void however they are no quantitative experiments in the main text to demonstrate that their architecture is superior to Noise2Void. Looking at section B in the supplement, it seems that even after convergence the masking strategy gives lower performance when evaluated on the kodak dataset (with similar posterior mean estimation). Does their method provide better results than the masking strategy ? Could the author explain why this is the case ?