Near Optimal Sketching of Low-Rank Tensor Regression

Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)

Bibtex Metadata Paper Reviews Supplemental

Authors

Xingguo Li, Jarvis Haupt, David Woodruff

Abstract

We study the least squares regression problem $\min_{\Theta \in \RR^{p_1 \times \cdots \times p_D}} \| \cA(\Theta) - b \|_2^2$, where $\Theta$ is a low-rank tensor, defined as $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots \circ \theta_D^{(r)}$, for vectors $\theta_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$. %$R$ is small compared with $p_1,\ldots,p_D$, Here, $\circ$ denotes the outer product of vectors, and $\cA(\Theta)$ is a linear function on $\Theta$. This problem is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d=1}^D p_D$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $\Phi \in \RR^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_{\Theta} \|\Phi \cA(\Theta) - \Phi b\|_2^2$, for which $\|\Phi \cA(\Theta) - \Phi b\|_2^2 = (1 \pm \varepsilon) \| \cA(\Theta) - b \|_2^2$ holds simultaneously for all $\Theta$. We obtain a significantly smaller dimension and sparsity in the randomized linear mapping $\Phi$ than is possible for ordinary least squares regression. Finally, we give a number of numerical simulations supporting our theory.