Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)
Jing Gao, Feng Liang, Wei Fan, Yizhou Sun, Jiawei Han
Little work has been done to directly combine the outputs of multiple supervised and unsupervised models. However, it can increase the accuracy and applicability of ensemble methods. First, we can boost the diversity of classification ensemble by incorporating multiple clustering outputs, each of which provides grouping constraints for the joint label predictions of a set of related objects. Secondly, ensemble of supervised models is limited in applications which have no access to raw data but to the meta-level model outputs. In this paper, we aim at calculating a consolidated classification solution for a set of objects by maximizing the consensus among both supervised predictions and unsupervised grouping constraints. We seek a global optimal label assignment for the target objects, which is different from the result of traditional majority voting and model combination approaches. We cast the problem into an optimization problem on a bipartite graph, where the objective function favors smoothness in the conditional probability estimates over the graph, as well as penalizes deviation from initial labeling of supervised models. We solve the problem through iterative propagation of conditional probability estimates among neighboring nodes, and interpret the method as conducting a constrained embedding in a transformed space, as well as a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives.