Revitalizing SVD for Global Covariance Pooling: Halley’s Method to Overcome Over-Flattening

Jiawei Gu, Ziyue Qiao, Xinming Li, Zechao Li

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Global Covariance Pooling (GCP) has garnered increasing attention in visual recognition tasks, where second-order statistics frequently yield stronger representations than first-order approaches. However, two main streams of GCP---Newton--Schulz-based iSQRT-COV and exact or near-exact SVD methods---struggle at opposite ends of the training spectrum. While iSQRT-COV stabilizes early learning by avoiding large gradient explosions, it over-compresses significant eigenvalues in later stages, causing an \emph{over-flattening} phenomenon that stalls final accuracy. In contrast, SVD-based methods excel at preserving the high-eigenvalue structure essential for deep networks but suffer from sensitivity to small eigenvalue gaps early on. We propose \textbf{Halley-SVD}, a high-order iterative method that unites the smooth gradient advantages of iSQRT-COV with the late-stage fidelity of SVD. Grounded in Halley's iteration, our approach obviates explicit divisions by $(\lambda_i - \lambda_j)$ and forgoes threshold- or polynomial-based heuristics. As a result, it prevents both early gradient explosions and the excessive compression of large eigenvalues. Extensive experiments on CNNs and transformer architectures show that Halley-SVD consistently and robustly outperforms iSQRT-COV at large model scales and batch sizes, achieving higher overall accuracy without mid-training switches or custom truncations. This work provides a new solution to the long-standing dichotomy in GCP, illustrating how high-order methods can balance robustness and spectral precision to fully harness the representational power of modern deep networks.