### Reviewer 1

This paper addresses an interesting problem and proposes a solution that is shown to have empirical advantages. However, the text of the paper has been poorly executed. The notation has not been clearly defined or explained. In line 75, $x*$, $\hat{\theta}$ are not defined. These variables have been consistently used in the rest of the paper. I had to read [18] to understand this notation. In lines 74-80 the definition of a prior network is unclear. In lines 196-199 the intuitive explanation for why prior networks are more robust to adversarial attacks is also unclear. This diminishes the quality of this paper as a standalone piece of work. The main contribution of this work is the improved training criterion. In previous work, prior networks were trained under the forward KL divergence while this paper proposes to use the reverse KL divergence instead. This implies empirical benefits in training. It is also shown empirically that these networks have better out of distribution detection performance and in some cases are shown to be more robust to adversarial attacks. However, in complex datasets like CIFAR-100 the improvement shown is only modest, so it would be nice to see the performance of these networks on more datasets (like ImageNet). ------------------------------------------------------------------------------------------------------------------------------------------------ In light of the author response I tend to keep my overall score (6).