Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Overall the paper is very interesting and proposes a novel method. However, there are questions about generalization of the approach beyond the simple datasets/tasks tested in the paper
Summary: The paper looks at the problem of modelling sequential data, specifically image data. It proposes to combine a (beta-)VAE model with a Neural ODE. The VAE encodes the input image to a location and velocity, the Neural ODE computes the dynamics over time, the VAE then decodes using the location parameters. To model the velocity, the authors extend the Neural ODE to be second order. The paper contains extensive introduction to the method, including ODE, VI, beta-VAE, generative models, ODE flow. The model used for the dynamics is a Bayesian Neural Network, which is a neural network with a distribution over every weight. The output of this model is a distribution. The authors show impressive results on the CMU walking data set, bouncing balls, and Rotating MNIST, comparing to a variety of different methods. Discussion: The paper is well written, and introduces an interesting application of the Neural ODE. The approach of embedding the Neural ODE within other approaches is particularly appealing and seems like a strong way of encoding inductive bias into the model. Section 2 and 3 are an excellent introduction into the multitude of components used in the model. Figure 1 and 2 are both great illustrations. The experiments are sensible and described in detail, including particular ways alternative methods were evaluated. The usage and efficacy of the Bayesian neural network is not well explained. It's unclear if using a BNN over a regular NN gave any advantage, further it's unclear how the BNN was optimized. The authors mention variational inference but do not go further into detail (reparametrization trick? mean field?). Did the authors find any difference in NFEs for solving the ODE between using a BNN vs NN? The paper only briefly explain ODE2VAE-KL, and the difference between (16) and (17) can use more explanation. Do you run the encoder for every triple of input frames and compare that with the output of the ODE at those times? i.e. how do I interpret the third term of (17)? Given that ODE2VAE-KL is the best performing model, it would help to explain why this setup is beneficial and perhaps an analysis into the difference with regular ODE2VAE. The paper compares with fairly dated method (aside from the newer ODE based methods). Could the authors comment on performance compared to a new method such as "Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects" (Kosiorek et al.)? Especially on the more challenging dataset moving mnist? -- Rebuttal -- The rebuttal clarified some important points. I have raised my score to a 7.
1. It's original that this paper proposes a second order ODE that allows modelling the latent dynamic ODE state decomposed as position and momentum. 2. It's original to connect second order ODEs and Bayesian neural networks with VAE models. 3. The paper is well written and organized. 4. The method proposed in this paper can be applied for high-dimensional sequential data and learn the embedding of high dimensional trajectories. 5. The proposed methods are evaluated on a diverse of datasets with state-of-the-art performance.