Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
David Hoyle, Magnus Rattray
We derive the limiting form of the eigenvalue spectrum for sample co- variance matrices produced from non-isotropic data. For the analysis of standard PCA we study the case where the data has increased variance along a small number of symmetry-breaking directions. The spectrum depends on the strength of the symmetry-breaking signals and on a pa- rameter (cid:11) which is the ratio of sample size to data dimension. Results are derived in the limit of large data dimension while keeping (cid:11) fixed. As (cid:11) increases there are transitions in which delta functions emerge from the upper end of the bulk spectrum, corresponding to the symmetry-breaking directions in the data, and we calculate the bias in the corresponding eigenvalues. For kernel PCA the covariance matrix in feature space may contain symmetry-breaking structure even when the data components are independently distributed with equal variance. We show examples of phase-transition behaviour analogous to the PCA results in this case.