Title:Neural Multisensory Scene Inference

An interesting extension of GQN for modelling multimodal representations of 3D scenes using an amortized product of experts to allow training with some of the modalities missing. The paper is not clear enough about prior work on using PoEs for multimodal generative modelling, but the authors promised to correct this. The authors are urged to include the PoE baseline results the reviewers requested when finalizing the paper.