Hui Chen, Fangqing Liu, Yin Wang, Liyue Zhao, Hao Wu
Learning binary classiﬁers only from positive and unlabeled (PU) data is an important and challenging task in many real-world applications, including web text classiﬁcation, disease gene identiﬁcation and fraud detection, where negative samples are difﬁcult to verify experimentally. Most recent PU learning methods are developed based on the misclassiﬁcation risk of the supervised learning type, and they may suffer from inaccurate estimates of class prior probabilities. In this paper, we introduce a variational principle for PU learning that allows us to quantitatively evaluate the modeling error of the Bayesian classiﬁer directly from given data. This leads to a loss function which can be efﬁciently calculated without involving class prior estimation or any other intermediate estimation problems, and the variational learning method can then be employed to optimize the classiﬁer under general conditions. We illustrate the effectiveness of the proposed variational method on a number of benchmark examples.