NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2389
Title:Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections

This paper proposes BRAINet as to combine Bayesian structure learning and Bayesian neural networks. In detail, the method assumes a confounder on the input features X and the discriminative network parameters \phi, where this confounder is defined as the generative graph structure on X, and the discriminative network shares the same structure as the generative one. Given observations X and Y, the approach first sample the generative graph structure from the posterior given X, then train the parameters of the corresponding discriminative network in order to fit the posterior distribution of phi given X and Y. Experiments are performed on calibration and OOD tasks, with MC-dropout and deep Ensembles as the main comparing baselines. Reviewers include experts in Bayesian structure learning and Bayesian neural networks. They read the author feedback carefully and engaged in post-rebuttal discussion actively. The author feedback addressed some of the confusions from the reviewers, and some of them increased the score to vote for acceptance with moderate/low confidence. However the main issues still need to be addressed: 1. Clarity: the current form of the paper is hard to follow for people from (Bayesian) deep learning community, which is the community that the paper seems to target for; 2. Novelty: the algorithm is an extension to the B2N/B-RAI approaches to deep neural networks; 3. Comparisons with other BNN methods are missing. Prior design and OOD task are two important and trending topics in Bayesian deep learning, so this paper provides a timely and potentially important contribution to the field. However after a brief read through of the paper, I agree with the reviewers that the clarity of the presentation needs to be improved. In particular, I think the big picture of the whole pipeline is not discussed in a clear way, and an algorithm describing the whole process of sampling graph then training \phi would be much helpful. Also the paper will benefit from a clear description of the novel contribution as compared to B2N/B-RAI. Finally I hope the authors can add in the extra experiments (some of them provided in author feedback) to the camera ready.