Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper connects a well-known result about the limits of diffeomorphisms, and applies it to the recent neural ODE model. The authors do experiments to show how adding extra channels reduces the computational cost of these models as well. R1 makes the valid point that the theoretical result was shown in 1955, and that the engineering trick of making layers wider is resnets existed previously. However, I'd say that the main contribution of this paper is in connecting these ideas to neural ODEs, and giving a possible explanation of why wider layers help in resnets. This paper also pushes forward our practical understanding of training neural ODEs. It's a clear story, and well-written paper. However, the paper and rebuttal avoided reporting absolute (probably poor) classification results. It's unclear if investigating neural ODEs for the classification problems considered here is a pressing direction. There are other uses for neural ODEs besides classification, but it's not clear that we can add extra dimensions in those settings without losing other good properties. For example, the paper and rebuttal mention normalizing flows, and one of the positive reviewers thought that would make sense as an application. However, it's not immediately clear how to apply augmentation to flows: one would no longer immediately get the probability of the data, the main attraction of flows. We encourage the authors to update the camera-ready to be upfront about any limitations in the classification performance of the architectures explored.