Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
I thank the authors for their submission. The paper presents a framework for analyzing behavioral videos coming by combining nonlinear autoencoders and ARHMM. I also thank the reviewers for their detailed and thoughtful comments and suggestions. The reviewers agree that the paper is well motivated and well written, however they also raise serious concerns about the quality and interpretability of the results. The reviewers make the following suggestions: 1. Please detail in the paper why using a nonlinear auto-encoder is important or beneficial, seeing as it qualitatively doesn't seem to make much of a difference in terms of performance. What are the benefits of using the CVAE? 2. The inferred “behavioural syllables” do not appear to be interpretable. The paper could be greatly improved if the authors could show more explicitly how the method can provide more insights in the relationship between neural activity and behaviour. At the very least, the authors can show that the behaviour syllables are stable / repeatably produced. 3. Please show / comment on how the system would behave in more complex settings. I strongly encourage the authors to take into account the reviewers' comments and concerns for the final manuscript, most importantly regarding the interpretability of the behavior syllables.