This paper presents an improved method for learning binary classifiers from positive and unlabeled data. Prior work has required the specification of the proportion of positive data in the unlabeled data set. This parameter is difficult to estimate and the resulting classifier is sensitive to it. This paper addresses that problem by minimizing the divergence between the classifier and an ideal Bayesian classifier using variational inference. While this paper is not the first to attempt to do away with the class prior estimation problem, this paper reports better empirical performance with theoretical results on consistency. As noted by all of the reviewers, the paper is very clearly written and helpfully provides a summary table comparing and contrasting prior work with the current work. The reviewers noted that positive and unlabeled data problems are growing in prevalence and the topic is timely. Some reviewers noted the novelty of the MixUp regularization approach, though there were some concerns about the generality of this approach beyond image data. There were other concerns about the applicability of the method in that in many problems it may be reasonable to assume the class prior probability is well-known and if so other methods may perform better. And, there were some concerns about the assumption the labeled and unlabeled data are selected completely at random---though this assumption is common in the area. Overall, these weaknesses were deemed to be minor and more or less outweighed by the strengths. The discussion focused attention on the balance of strengths and weaknesses and some comments were thought to be well addressed by the authors. Overall, the novelty of the approach combined with the theoretical and experimental evidence of efficacy lead me to recommend acceptance.