The paper investigates out-of-distribution behavior of deep generative models, specifically the counter intuitive results reported in prior work where deep generative models were shown to assign higher likelihood to out-of-distribution inputs. The authors propose a new white noise test (WN test), theoretically motivate the proposed test and show that it outperforms likelihood and likelihood ratios. The reviewers raised concerns about experimental setup (other datasets and models), WN assumption and connections to other related methods such as typicality test. This was a borderline paper. During the discussion, majority of the reviewers agreed that the author rebuttal addresses their major concerns except for R2. After reading the paper carefully, I lean towards accept as the paper presents an interesting idea. Even though there are some gaps in the current version, the proposed revisions should strengthen the paper. I recommend comparison to typicality test with likelihood (e.g. batch_size=1) as that would help better situate this work in the wider literature and readers understand which are the biggest contributors to the better performance. Minor suggestion: Previous work has already suggested mismatch between high likelihood regions and typical set as a possible explanation. The authors cite these papers in the related work at the end of the paper, while some of the closely related papers [25,29] should probably be cited earlier in the intro when discussing the typicality explanation.