NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1963
Title:Adaptive GNN for Image Analysis and Editing

Reviewer 1

* summary: This paper makes a connection between Graph neural network (GNN) and some computer vision tasks. They introduce an adaptive GNN formulated as a label propagation system, which can be related to two CV operations: filtering and propagation. Their adaptive GNN is designed based on guided map, graph Laplacian and node weight. The guided map and node weight are associated with filtering and propagation diffusion task in computer vision, and kernel of graph Laplacian is related to the diffusion pattern in computer vision task. They applied their model for quotient image analysis (QIA) and designed various illumination editing tasks for faces and scenes. * strengths: - The main idea of relating the GNN to some CV tasks is really interesting. - I like the way they narrate their work. After introducing their framework, they discuss how this model is a generalization of several models introduced for propagation diffusion and filtering. * Notes: - I think there was a lack of visualization or diagram in the paper. They could benefit from some pictures to make the main idea more understandable and easier to follow. - It would be very informative if they could explain the intuition that why and how “the guided map and node weight determines whether a GNN leads to filtering or propagation diffusion, and the kernel of graph Laplacian controls diffusion pattern.” - My main concern about the paper is the lack of enough experiments to show the efficacy of their propose model. They performed a qualitative experiments on a handful of images for the tasks of “face relighting,“ Illumination-Aware Face Swapping,“ “Transfiguring” as well as “low-light image enhancement.” They showed the results on a few images only. More importantly, there were not any qualitative experiments in the paper. How about post-hoc crowdsourced workers to rank the enhanced images; and then compare various methods together. - In the experiments, I would explain in a few sentences how their images are semi-supervised. - It would be very helpful if they could discuss the computational cost of their framework? - There is a minor grammatical mistake in sentence “199 in in background, eyes and eyebrows, while preserve the information in facial region. The setting of” - There was a question mark for Fig that needed to be corrected (I think it should be Fig 2). It is on page 7, last paragraph, second line.

Reviewer 2

The paper shows that the GNN model of Scarselli et al. (2009) can be adjusted (with many parameters) to be related to other filtering and diffusion methods that are based on a graph Laplacian $L$. However there is no significant and concrete novel insight that could be derived from these connections. At the end of Section 2, only obvious statement like "Based on the mathematical analysis, the diffusion pattern is controlled by the kernel of $L$" could be made. Important questions like "how to quantify the effectiveness of a kernel of $L$, and what is the best kernel of $L$ for a given task?" are not addressed. In fact, the kernel for QIA in Eq. (12) was chosen rather heuristically without any further analysis. Therefore, the paper does not really advance our understanding of GNN. The framework QIA-GNN in Section 3 was also constructed quite heuristically, and too complicated with three layers L1/L2/L3. There is no solid analysis and justification for these layers. The task QIA is subjective, with no clear objective metric for comparison. Therefore, it is difficult to assess the merit of the proposed GNN model. ***** POST REBUTTAL I have read the author response that provides some positive comparisons, both qualitative and quantitative. However, these experiments are small and could be cherry picked. I slightly increase my score from 4 to 5.

Reviewer 3

The paper addressed the quotient image based image editing with a single GNN framework. After describing the formulation, the experiment went on with the system with multiple different applications in image editing, including face relighting, swapping, transfiguring, image enhancement, etc. However, the paper are missing many experimental details. The most critical components of the experiments that are missing: (1) Are there any learning-based components in the system? If not, how much handcrafting is needed for different tasks? If yes, how are they trained? (2) What are the roles of each module (FQIA-GNN-L1, L2,L3 etc) in the various experiments? It might be helpful to show the illumination map along with the results. (3) What are the outputs of each module in each experiment? How are they obtained by these modules in different tasks? Some other questions/comments: (4) How does this method derive the two relighting results in Figure 2? What are the experimental settings that generate these two results? (5) There are weaknesses in the experiments. For example, in relighting, it would be beneficial to show relighting of multiple subjects with a single reference to demonstrate consistency. (6) On image enhancement, it would be better to show a quantitative evaluation, similar to the baseline [28].   In conclusion, although the paper is has an interesting and ambitious goal: a unified GNN framework for image editing, the delivery is not convincing. The experiments are weak and many details are missing. POST REBUTTAL comments: The authors seem to have tried to address most of my comments in the rebuttal regarding the level of details in the experiments and additional quantitative evaluations. The a uthors also included a small user study. I would increase the score from 4 to 6 after the rebuttal, and suggest that the authors include all additional experiments and visualizations in the final manuscript.

Reviewer 4

Originality: The paper proposes a novel and principled way of using and constructing GNNs for a wide variety of tasks. They claim their technique generalizes to multiple tasks. Clarity: The paper is fairly clear, well referenced, and well written. Some minor typos noted below. There are also some details in regards to the preprocessing of faces that could be clarified (see below). Significance: The paper proposes a principled way of constructing GNNs that appear to generalize to several useful computer vision applications. Additional experimental results and/or a small scale user study would be helpful to assess how the GNNs' output compare to other techniques and whether they generalize well. The authors should demonstrate their GNN is robust to both The main contribution here is providing some analysis on how these GNNs can be constructed in a principled way for this task. This way of parameterizing GNNs has the potential to be useful as parts of future computer vision pipeline. Typos: Line 78?: pre-deep-learning era Line 246?: generates Line 258: broken figure reference Line 273: We focus Line 283: rang -> range Consider adding a citation for "Retinex theory" line 275.