Four knowledgeable referees support accept and I accept. We encourage and expect the authors to incorporate the reviewers' suggestions for improving the paper. In particular, please show how the rendering changes by adding/removing 1-6 objects using a model trained on scenes with 4 objects, and please address R2's concerns regarding the rendering method. NOTE FROM PROGRAM CHAIRS: The paper is accepted, however please revise and expand the Broader Impact statement in the camera-ready version. The current statement is biased towards potential positive effects of the work and does not adequately address the risks of 'ill-intended image manipulation'.