Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Authors define generalized sliced wasserstein distances by appealing to the notion of generalized radon transform. They provide some elementary properties (e.g. saying when it is a distance),argue they are preferred over vanilla sliced wasserstein and conclude with some empirical validation. Clarity: the paper is clear and well written Originality: the material presented here is original and has not appeared elsewhere Quality and significance: here are my concerns, which altogether prevent me to recommend publication. I recognize the hard work that has been put on this paper, and I think it is a nice idea, but I believe the results should be strengthen. My main concern is that it is not made clear why the generalized sliced distance is a significant improvement over the vanilla sliced distance. It is hard to tell from current experiments: figure 2 compares between several generalized sliced and the linear one. In many cases results are better for the linear distance, making me wonder whether the improvement seen in some cases might be an artifact of multiple comparisons. Because of this, I am not fully convinced that the improvement on figure 3 is not a consequence of having chosen the best model. Authors should therefore make a strong point about e.g. how to tune the generalized distance in such a way that it will consistently outperforms the linear one (and not overfitting). A discussion in terms of complexity is also encouraged Currently, the first 5 pages of the paper are mostly definitions and review of other results, but I would expect more substance for a neurips submission there. If that section is shrunk a bit and replaced with more experimental validation (some of this is available in the supplement, but still needs some polishing) it would mean a substantial improvement.
The paper establishes connection of sliced-Wasserstein distance to Randon transform. From this viewpoint, a family of generalized SW distances are defined from generalized Randon transform. This enables a number of contributions including in particular non-linear slicing, and using only a single projection. Empirical evaluations shows the advantages of the proposed distance family over the classical sliced-Wasserstein distance such as much less iterations and better data generation quality. The problem is important. The analysis is new and convincing. Empirical evaluations also nicely demonstrate the proposed generalized distance scheme.
***** After Author Response and Reviewer Discussions ***** I have gone through all the other reviews, the meta-reviewer's comment, and the authors' feedback. I will keep my evaluation unchanged, and emphasize a point found by the meta-reviewer, that is the authors should not claim ReLU or leaky ReLU to satisfy the smooth conditions. **************************************************************** Originality: The methodology provided in this paper is, to our best knowledge, new. It is an innovative combination of the well-known techniques, namely the SW and Radon Transform. The paper is clearly different from the previous contributions and the related literature is sufficiently cited. Quality: The submission is technically sound. The claims are well supported by the analysis, although due to time restriction I have not gone through the proofs. This work is complete. The authors are honest about evaluating both the strengths and weakness of their work. Clarity: The submission is clearly and elegantly organized and written with good style. Since the proof is provided in the supplementary materials, it is sufficient for the expert readers to validate the claims. Significance: The results are obviously important. I believe that the method developed here would attract lots of interests and push the frontier of the research forward. For the suitable problems, the methodology would advance the state of the art in a demonstratable way. This paper provides unique theoretical approach.