__ Summary and Contributions__: The objective here is to propose a neural network for spectral TV decomposition of images. The solution relies on a neural network (using an architecture from image denoising) that learns the mapping from image space to TV decomposition.

__ Strengths__: +The proposed solution demonstrate the capacity of convolutional neural network to learn a complex transformation such as spectral TV decomposition
+By using a neural network, the time needed to compute the decomposition is reduced by several orders of magnitude.
+Experiments are well designed and investigate several interesting aspects: the capacity of the trained NN to learn the underlying properties of this spectral TV transform and different neural network architecture

__ Weaknesses__: After reading the rebuttal and the other reviews I am changing my rating to 6. It is clear that there is an interest in seeing this paper presented at the conference.
I hope the authors will integrate the discussion presented in the rebuttal.
Weaknesses:
-The main weakness is probably the relatively limited contribution: this is a fully supervised setup, using an existing neural network architecture. In addition to this, although the results are compelling, the impact and interest don’t seem sufficient for NeurIPS.
-I think spectral TV decomposition is not well known, and one way the authors could (and could have addressed the issue) is to demonstrate some application scenarios where spectral TV decomposition is the best available tool or represents an important preprocessing step.
-Another way to address the issue would be to provide a more accessible introduction and background knowledge on spectral TV decomposition. In its current form, section 2 sets the basic equations but goes too much into unneeded technical details, especially given the proposed neural network solution. It would be much more effective to present the high level ideas and the potential of TV decomposition to motivate the reader in using the proposed solution.

__ Correctness__: The experiments seem to support the claim. I didn’t notice any particular issue.

__ Clarity__: Paper is well written but I think the authors could make the background section more interesting for readers not familiar with spectral TV decomposition

__ Relation to Prior Work__: References seem ok

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper presents a method for non-linear spectral decomposition of images by approximating the whole pipeline of spectral Total Variation (TV) decomposition with a deep neural network. The proposed approximation is able to capture key properties of the model based decomposition while offering a significant speedup of up to four orders of magnitude.

__ Strengths__: The paper is well written and easy to read. A succinct description of the spectral TV decomposition is given in Section 2, while the proposed method is also briefly but clearly described in Section 3. Important design choices, as for example the choice of DnCNN as a reference architecture in contrast with other well-known architectures, as for example the U-Net and the FFDnet, are theoretically justified and also supported by the evaluation. The experimental evaluation is quite comprehensive (considering also the supplemental material) and shows that the proposed TVspecNET approximates well the spectral TV decomposition while the properties of the model based decomposition (one-homogeneity, rotation and translation invariance) are also captured well. The proposed method potentially has a direct impact on numerous image processing applications including denoising, image fusion and texture separation.

__ Weaknesses__: The main weakness is that the method falls in the category of methods that seek to approximate some function (in this case the PDE of spectral TV decomposition) using a deep neural network. No specially designed architecture is introduced, however the choice of the reference architecture is well motivated as discussed above.
Another more practical issue is that by construction the network can only approximate a decomposition into a fixed number of bands. The choice of using dyadically combined bands helps in the evaluation of the decomposition properties, however it would be also interesting to see how close the decomposition by TVspecNET is with respect to the model based one when a larger number of bands is considered. Have the authors considered results for a finer graded decomposition? Is the approximation equally faithful?
Finally, for completeness it would be important to provide some additional details regarding DnCNN which is used as the reference architecture for TVspecNET.

__ Correctness__: The claims and the method seem correct. All relevant choices are well justified and the evaluation is quite comprehensive highlighting the contributions of the proposed method.

__ Clarity__: The paper is clear and easy to read. Both the proposed method and the related background are clearly presented.

__ Relation to Prior Work__: Prior work is sufficiently mentioned and discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: Reference to Table 3 should be added in the last paragraph of Section 4.3
L.215: we we
# Comments after the rebuttal:
The authors addressed my concerns in their response. I think that this is a valuable submission as it proposes a way to perform the spectral TV decomposition much more efficiently than the classical methods. In my opinion the paper should be accepted and, given that the authors will include the clarifications provided in their rebuttal, the paper will be stronger overall.

__ Summary and Contributions__: The total variation decomposition of an image can be used to identify and enhance (resp. suppress)
structures at a particular scale. This decomposition is expensive to compute using standard methods.
Paper describes a method to learn a network that produces a set of 5 bands (and implicitly a residual) from
an image. Network and learning are straightforward, but produce an accurate and useful decomposition
very much faster than standard methods.

__ Strengths__: This method produces an approximate (but very good) solution to a standard image representation problem very
much faster than the standard methods. The approximate decomposition has the properties one expects from
the original representation (one-homogeneity; translational and rotational invariance). Curiously, the loss
simply requires the network to reproduce examples of bands, rather than (say) imposing some consistency constraint
between bands. There is good evidence that the approximate decomposition generalizes, in the sense that
the bands produced for out of training images "make sense" as spectral decompositions of the original image. There is overwhelming evidence that the
approximate decomposition is very fast.

__ Weaknesses__: The main concern is that, from the point of view of learning, the paper is relatively straightforward. I
discount this concern, because the paper produces a representation known to be of interest very much faster than standard methods.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: I don't agree with authors (ll274-5) that the main relevance of this paper is to neural network solutions of PDEs. To my mind, the key strength is producing a known useful representation accurately and very much faster than existing methods.
---- added post discussion process -----
I stand by review above, after discussion with other referees.

__ Summary and Contributions__: This paper proposes a neural network called TVspecNET that can reproduce the spectral TV decomposition of images while significantly (by more than three orders of magnitude) reducing the computation time to obtain the decomposition once the network is trained.

__ Strengths__: The theory explanation and experiment results reflect the soundness of the claims.
The authors propose a deep learning approach to approximate the non-linear spectral decomposition of images as the first ones and significantly speed up the computation of the classical, model driven approach that is based on solving a gradient flow.
The method proposed is relevant to the NeurIPS community.

__ Weaknesses__: The description about proposed method (Chapter 3) is a little insufficient, which may make readers difficult to understand the central idea of this paper.

__ Correctness__: The experiments is correct to support their claims and method.

__ Clarity__: Proofreading is still needed, for example, double "we" appear in the line 215.

__ Relation to Prior Work__: In prior work, the decomposition of an image into its TV-spectral bands gives qualitatively highly desirable results its computational realisation is cumbersome as it amounts to the solution of a series of non-smooth optimisation problems. And this paper proposes a neural network approach for obtaining a spectral image decomposition.

__ Reproducibility__: Yes

__ Additional Feedback__: