NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:332
Title:ETNet: Error Transition Network for Arbitrary Style Transfer

Reviewer 1

-- I am impressed with the improvement, in terms of visual quality, compared to previous approaches. The visual examples have lots of details and preserves high-level content structure better. -- The idea is intuitive. I am not aware of other similar approaches, to the best of my knowledge.

Reviewer 2

(1) Originality: This paper first introduces error-correction and diffusion mechanism into the style transfer literature, which separates it from existing works. Meantime, style transfer via iterative refinement is not a novel idea as it has been applied by the WCT method [17]. (2) Quality: This paper has provided both qualitative and quantitative experiments to show the superiorities of the proposed style transfer method. However, there are several concerns: a. The main concern is the meaning of equation (3). Equation (2) associates the style and content error via a learnable weight W to obtain a full error feature. In equation (3), the affinity is calculated between this full error feature and the stylized image features. As far as I can see, these two features have different meanings and are not comparable to each other. So what is the motivation to calculate the affinity between them? How would a further multiplication between this affinity and the full error feature diffuse the error to the whole image? b. In equation (4), why not simply concatenate the error feature of layer i with error feature of layer i-1, why using a fusion layer with learnable weight? c. As for the performance of the proposed algorithm. First, I do not think 0.5680 second per image (Table 2) is real-time (as stated in the introduction). Second, this work has to separately train each error transition network separately, that would increase training burdens compared to other methods such as AdaIn or WCT. However, the proposed algorithm does achieve lower style loss and the results look better compared to other state-of-the-art methods. (3) Clarity: In general, this paper is easy to follow but contains some spelling mistakes, e.g., Line 123, inputted -> input Line 109, 124, outputted -> output There are also some confusions in equations. Such as the ‘*’ symbol in equation (3), I suppose it to be an element-wise multiplication, which is different from the matrix multiplication in equation (2), which the author(s) should make clear. (4) Significance: This paper aims at improving the existing style transfer method via interactive error-correction. The framework is novel compared to other style transfer methods. Post rebuttal: The rebuttal indeed does not clearly answer my question. However, after revisiting the details of the paper, and inferring from the comments of other reviewers, I could get the insights, i.e., not strictly, a forward way of optimizing the transferred image, with the attention mechanism involved. Thus I raised the rating.

Reviewer 3

The idea of self error correction is smart. It requires the computation of style/content features in multiple resolution. It would be nice to discuss the relation with iterative back-propagation.