Demystifying Spectral Feature Learning for Instrumental Variable Regression

Dimitri Meunier, Antoine Moulin, Jakub Wornbard, Vladimir Kostic, Arthur Gretton

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs \emph{spectral features} - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments. We derive a generalization error bound for a two-stage least squares estimator based on spectral features, and gain insights into the method's performance and failure modes. We show that performance depends on two key factors, leading to a clear taxonomy of outcomes. In a \emph{good} scenario, the approach is optimal. This occurs with strong \emph{spectral alignment}, meaning the structural function is well-represented by the top eigenfunctions of the conditional operator, coupled with this operator's slow eigenvalue decay, indicating a strong instrument. Performance degrades in a \emph{bad} scenario: spectral alignment remains strong, but rapid eigenvalue decay (indicating a weaker instrument) demands significantly more samples for effective feature learning. Finally, in the \emph{ugly} scenario, weak spectral alignment causes the method to fail, regardless of the eigenvalues' characteristics. Our synthetic experiments empirically validate this taxonomy. We further introduce a practical procedure to estimate these spectral properties from data, allowing practitioners to diagnose which regime a given problem falls into. We apply this method to the dSprites dataset, demonstrating its utility.