Probabilistic Multi-Task Feature Selection

Part of Advances in Neural Information Processing Systems 23 (NIPS 2010)

Bibtex Metadata Paper


Yu Zhang, Dit-Yan Yeung, Qian Xu


Recently, some variants of the $l_1$ norm, particularly matrix norms such as the $l_{1,2}$ and $l_{1,\infty}$ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the $l_{1,2}$ and $l_{1,\infty}$ norms by considering a family of $l_{1,q}$ norms for $1 < q\le\infty$ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the $l_{1,q}$ norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization~(EM) algorithms to learn all model parameters, including $q$, automatically. Experiments have been conducted on two cancer classification applications using microarray gene expression data.