NeurIPS 2020

An Unsupervised Information-Theoretic Perceptual Quality Metric


Review 1

Summary and Contributions: The authors propose an advanced perceptual quality metric, which is learned from adjacent video frames in an unsupervised manner. This learning scheme is well motivated by the observation of human visual system. Experiments on BAPPS and ImageNet-C datasets demonstrate the effectiveness of method.

Strengths: - The proposed method is well motivated and backed up by solid theories. - Experiments are comprehensive and the overall performance is promising. - Paper is well organized and easy to follow.

Weaknesses: A good quality metric should stand the test of time. Yet it seems that the authors have not preparation to make this project publicly available. Therefore, I encourage the authors to provide the code or executable files to ensure the reproducibility.

Correctness: Technically sound, but not be carefully checked.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: No

Additional Feedback:


Review 2

Summary and Contributions: An unsupervised Information-Theoretic based image quality metric using deep learning is proposed. Some experiments were conducted and showed the competitive performance.

Strengths: NOT CLEAR. Basically, I would not find the strengths of the work. Nothing could be got when reading the abstract several times. How this work is inspired by the physiology of the human visual system is not clear in the abstract. I do not think that using such words "our model is informed by the physiology of the human visual system" in the abstract means this work has new contributions. The key is to how to model the human visual system in your work. Unfortunately, this paper didn't describe it in details. There are too many DL based IQAs. What's your new contribution ?

Weaknesses: This paper is not easy to follow. It is difficult to judge the novelty. Author claimed that their model is inspired by visual physiology such as efficient coding and slowness. However, there are no any introductions about them in details. Moreover, I didn't see where two biological mechanisms are imitated in the model. I only see a DL-based IQA method without clear contributions. Finally, there are no scientific contributions in this manuscript such as " perceptual similarity is an emergent property shared across deep visual representations" discovered in [35]. [35] Richard Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In: (2018). cite arxiv:1801.03924Comment: Code and data available https://www.github.com/richzhang/PerceptualSimilarity. URL: http://arxiv.org/abs/1801.03924. after response: Although some confusion may have been clarified, author didn't positively reply the key comments that their model is inspired by visual physiology such as efficient coding and slowness. However, there are no any introductions about them in details. "Moreover, where two biological mechanisms in details are imitated in the model is not clear". Furthermore, the paper is lacking of scientific contributions considering that it is only built on previous work.

Correctness: No

Clarity: This paper is very difficult to follow.

Relation to Prior Work: Maybe

Reproducibility: No

Additional Feedback: 1. How this work is inspired by the physiology of the human visual system is not clear in the paper. I do not think that using such words "our model is informed by the physiology of the human visual system" in the abstract means this work has new contributions. The key is to how to model the human visual system in your work in details in order to improve the current IQA. Unfortunately, this paper didn't belong this one. 2. Refer to [35] for improvement such as enhancing the scientific contributions. [35] Richard Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In: (2018). cite arxiv:1801.03924Comment: Code and data available https://www.github.com/richzhang/PerceptualSimilarity. URL: http://arxiv.org/abs/1801.03924.


Review 3

Summary and Contributions: This paper proposed a new unsupervised, information-based perceptual quality metric, i.e. PIM. The method is based on optimization of a lower bound of the multivariate mutual information. This proposal has roots in two prominent ideas in neuroscience, efficient coding and slowness. The authors implemented the proposal in deep neural networks, and test it on BAPPS and ImageNet-C. They reported competitive performance of this method to the supervised methods.

Strengths: I very much enjoyed reading this manuscript. The method is to my knowledge novel and principled. It is well founded in Information theory, and broadly inspired by a couple of core principles in neural information processing. The method is purely unsupervised, and does not need human psychophysical judgements to train the model. Yet the method show competitive performance with the fully supervised approach in ref [35]. I found this to be quite remarkable and potentially quite significant. The problem studied in the paper is also high relevant to the NeurIPS community. ** added after rebuttal: After reading through other reviewers' comments and the authors' feedback, I remain very positive of this paper. I think the idea in the paper is novel, the experiments were reasonable and the results are very promising.

Weaknesses: My main concern is that the results are a bit preliminary (lacking more comprehensive comparison to some of the previous methods, such as ref [29] as the authors acknowledged), although the results look very promising for sure. It is also not entirely clear whether the improvement of the performance mainlycomes from the generic advantage of the objective function, or the choice of the hyper-parameters or the model architecture.

Correctness: The claims and method are sound.

Clarity: The paper is well written. The presentation in Section 3.4 perhaps could be improved. Right now, it is a bit difficult to get the key message.

Relation to Prior Work: The relation to prior work is generally well discussed.

Reproducibility: Yes

Additional Feedback: Fig 2 and Fig3 could benefit by have a bit more detailed figure legends.


Review 4

Summary and Contributions: The paper proposes PIM (Perceptual Information Metric) which is an image quality metric learned in an unsupervised manner by enforcing two loss functions - 1. Compression and 2. Consistency across time. The authors compare PIM to other proposed metrics on multiple datasets and show improvements. They also do ablation studies to show how each of the choices made in the paper yield various improvements.

Strengths: Competitive results on multiple benchmarks, ablation studies, additional qualitative experiments on ImageNet-C

Weaknesses: The paper is built on previous works and one of the main new directions proposed is the notion of consistency (perceptual metric not changing between immediate or nearby frames in a video). While this intuition seems reasonable I worry that many artifacts like - motion blur, face blur, aliasing etc. are things that could change perceptual metric significantly. It would be great to hear from authors on whether they worked on uncompressed video and if not talk a little more about the importance of using consistency metric. Since, this is one of the main novelty of the paper I would like to make sure this part is well justified.

Correctness: Yes, I did not see anything wrong in the setup and the claims made by the authors.

Clarity: Yes, for most part. The results section can be improved further, especially maybe contrast where PIM performs better qualitatively and talk about why.

Relation to Prior Work: Yes, a latest reference that might be related is shared below: From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality Z Ying, H Niu, P Gupta, D Mahajan, D Ghadiyaram, A Bovik

Reproducibility: Yes

Additional Feedback: