Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)
Veeranjaneyulu Sadhanala, Yu-Xiang Wang, James L. Sharpnack, Ryan J. Tibshirani
We consider the problem of estimating the values of a function over $n$ nodes of a $d$-dimensional grid graph (having equal side lengths $n^{1/d}$) from noisy observations. The function is assumed to be smooth, but is allowed to exhibit different amounts of smoothness at different regions in the grid. Such heterogeneity eludes classical measures of smoothness from nonparametric statistics, such as Holder smoothness. Meanwhile, total variation (TV) smoothness classes allow for heterogeneity, but are restrictive in another sense: only constant functions count as perfectly smooth (achieve zero TV). To move past this, we define two new higher-order TV classes, based on two ways of compiling the discrete derivatives of a parameter across the nodes. We relate these two new classes to Holder classes, and derive lower bounds on their minimax errors. We also analyze two naturally associated trend filtering methods; when $d=2$, each is seen to be rate optimal over the appropriate class.