Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics

Ang, Man Shun; Ma, Jianzhu; Liu, Nianjun; Huang, Kun; Wang, Yijie

Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics

Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

Bibtex Paper Reviews And Public Comment » Supplemental

Authors

Man Shun Ang, Jianzhu Ma, Nianjun Liu, Kun Huang, Yijie Wang

Abstract

We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane.For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton's method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature.We provide a theory for partial explanation and justification of the method.We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more).We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem.Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.

Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics

Authors

Abstract

Name Change Policy