Martin Law, Anil Jain, Mário Figueiredo
There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difﬁcult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the ﬁrst one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-information- based feature relevance criterion to the unsupervised case. Feature selec- tion is then carried out by a backward search scheme. This scheme can be classiﬁed as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.