Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)

Bibtex Metadata Paper

Authors

Sepp Hochreiter, Klaus Obermayer

Abstract

We investigate the problem of learning a classiflcation task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column ob- jects may belong to difierent sets, and the entries in the matrix express the relationships between them. We interpret the matrix el- ements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these ker- nels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classi- fler which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not pos- itive deflnite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microar- rays, where \column" objects correspond to samples, \row" objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classifl- cation and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classiflcation results.