Result Analysis of the NIPS 2003 Feature Selection Challenge

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Bibtex Metadata Paper


Isabelle Guyon, Steve Gunn, Asa Ben-Hur, Gideon Dror


The NIPS 2003 workshops included a feature selection competi- tion organized by the authors. We provided participants with five datasets from different application domains and called for classifica- tion results using a minimal number of features. The competition took place over a period of 13 weeks and attracted 78 research groups. Participants were asked to make on-line submissions on the validation and test sets, with performance on the validation set being presented immediately to the participant and performance on the test set presented to the participants at the workshop. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neu- ral networks with ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Ran- dom Forests, kernel methods, or neural networks as a classification engine. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at for post-challenge submissions to stimulate further research.

1 Introduction

Recently, the quality of research in Machine Learning has been raised by the sus- tained data sharing efforts of the community. Data repositories include the well known UCI Machine Learning repository [13], and dozens of other sites [10]. Yet, this has not diminished the importance of organized competitions. In fact, the proliferation of datasets combined with the creativity of researchers in designing

experiments makes it hardly possible to compare one paper with another [12]. A number of large conferences have regularly organized competitions (e.g. KDD, CAMDA, ICDAR, TREC, ICPR, and CASP). The NIPS workshops offer an ideal forum for organizing such competitions. In 2003, we organized a competition on the theme of feature selection, the results of which were presented at a workshop on feature extraction, which attracted 98 participants. We are presently preparing a book combining tutorial chapters and papers from the proceedings of that work- shop [9]. In this paper, we present to the NIPS community a concise summary of our challenge design and the findings of the result analysis.

2 Benchmark design

We formatted five datasets (Table 1) from various application domains. All datasets are two-class classification problems. The data were split into three subsets: a training set, a validation set, and a test set. All three subsets were made available at the beginning of the benchmark, on September 8, 2003. The class labels for the validation set and the test set were withheld. The identity of the datasets and of the features (some of which were random features artificially generated) were kept secret. The participants could submit prediction results on the validation set and get their performance results and ranking on-line for a period of 12 weeks. By December 1st, 2003, which marked the end of the development period, the participants had to turn in their results on the test set. Immediately after that, the validation set labels were revealed. On December 8th, 2003, the participants could make submissions of test set predictions, after having trained on both the training and the validation set. Some details on the benchmark design are provided in this Section.