Sepp Hochreiter, Klaus Obermayer
We investigate the problem of learning a classiﬂcation task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column ob- jects may belong to diﬁerent sets, and the entries in the matrix express the relationships between them. We interpret the matrix el- ements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these ker- nels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classi- ﬂer which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not pos- itive deﬂnite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microar- rays, where \column" objects correspond to samples, \row" objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classiﬂ- cation and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classiﬂcation results.