Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper


Ben Blum, David Baker, Michael Jordan, Philip Bradley, Rhiju Das, David E. Kim


Rosetta is one of the leading algorithms for protein structure prediction today. It is a Monte Carlo energy minimization method requiring many random restarts to find structures with low energy. In this paper we present a resampling technique for structure prediction of small alpha/beta proteins using Rosetta. From an ini- tial round of Rosetta sampling, we learn properties of the energy landscape that guide a subsequent round of sampling toward lower-energy structures. Rather than attempt to fit the full energy landscape, we use feature selection methods—both L1-regularized linear regression and decision trees—to identify structural features that give rise to low energy. We then enrich these structural features in the second sampling round. Results are presented across a benchmark set of nine small al- pha/beta proteins demonstrating that our methods seldom impair, and frequently improve, Rosetta’s performance.