Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)
We introduce a novel adaptive non-parametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N their span recovers the entropy minimizing set that supports at least = K/N (100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance = 1 - . A method for implementing this non-parametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.