Kernel PCA and De-Noising in Feature Spaces

Part of Advances in Neural Information Processing Systems 11 (NIPS 1998)

Bibtex Metadata Paper


Sebastian Mika, Bernhard Schölkopf, Alex Smola, Klaus-Robert Müller, Matthias Scholz, Gunnar Rätsch


Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be con(cid:173) sidered as a natural generalization of linear principal component anal(cid:173) ysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by ker(cid:173) nel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approxi(cid:173) mate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.

1 peA and Feature Spaces Principal Component Analysis (PC A) (e.g. [3]) is an orthogonal basis transformation. The new basis is found by diagonalizing the centered covariance matrix of a data set {Xk E RNlk = 1, ... ,f}, defined by C = ((Xi - nates in the Eigenvector basis are called principal components. The size of an Eigenvalue >. corresponding to an Eigenvector v of C equals the amount of variance in the direction of v. Furthermore, the directions of the first n Eigenvectors corresponding to the biggest n Eigenvalues cover as much variance as possible by n orthogonal directions. In many ap(cid:173) plications they contain the most interesting information: for instance, in data compression, where we project onto the directions with biggest variance to retain as much information as possible, or in de-noising, where we deliberately drop directions with small variance.

(Xk))T). The coordi(cid:173)