NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4344
Title:Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space

Reviewer 1


		
This paper proposes a new algorithm – Interaction Hard Thresholding (IntHT) for the quadratic regression problem, a variant of Iterative Hard Thresholding that uses the special quadratic structure to devise a new way to (approx.) extract the top elements of a $p^2$ size gradient in sub-$p^2$ time and space. Then the authors show that IntHT linearly converges to a consistent estimate under standard high-dimensional sparse recovery assumptions. Perhaps, the algorithm is the first one to provably accurately solve quadratic regression problem in sub-quadratic time and space. Overall I find the results and the algorithms are quite interesting although the main idea of the new algorithm is quite simple: It mainly combines the existing technique from [20] with IHT to compute the top elements of the full gradient. The paper is well organized and presented. Minor comments: -As also noted in the paper, the quadratic regression problem has a strong background in real applications. Could you give a real example that $\Theta$ is sparse? - I am quite curious about the proof novelty. What is the key novel of the proof? ------After rebuttal----------------------- The authors have addressed my concerns well.

Reviewer 2


		
Originality: This paper applies approaches in Maximum Inner Product to the problem of quadratic regression in order to reduce the time and space complexities. It is also interesting that convergence analysis can be established for such methods. Quality: Although theoretical results have been established for the proposed methods, the strength and optimality of the established results are unclear. Furthermore, the numerical simulations do not show the trade-off between statistical and computational efficiencies. Clarity: The paper is basically well-written. Significance: The problem is not really well motivated, so it is hard to evaluate the significance of the proposed method. -----------After rebuttal---------- Given my previous concerns are mostly addressed, I changed my rating from 5 to 7.

Reviewer 3


		
In this paper, the authors study the problem of quadratic regression, whereby the response is modeled as function of both linear and quadratic terms, rather than the typical linear model. Naively, one would expect the dependence on the dimension of the problem to increase quadratically, and so the objective of this paper is to approximately capture the top entries of a (p^2-sized) gradient in time and space that are sub-p^2, under certain assumptions. Concretely, the authors propose the algorithm Interaction Hard Thresholding (IntHT), a variant of Iterative Hard Thresholding, which can recover a sparse solution (when such a solution is able to model the data well), with sub-quadratic complexity. Experiments done on synthetic data validate the theoretical conclusions. ====== Strengths ====== This work provides a new thresholding algorithm for the problem of quadratic regression that achieves sub-quadratic dependence on the dimension of the problem. It is also noted in the paper that the results do not follow immediately by previous analyses due to the difficulty of finding the exact top elements of the gradient. Thus, one of the key contributions of the paper is to show that linear convergence holds, even with only an approximate top element extraction. ====== Weaknesses ====== The main weakness of the paper is that it requires the assumption that there exists some sufficiently good solution that is also sparse. A discussion of how realistic this assumption is would be welcome, as when this assumption does not hold (i.e., only dense solutions suffice), sub-quadratic rates are no longer achieved. ====== Minor Comments ====== - under “Linear Model” on first page: extraneous “)” - p.7: obtian -> obtain - p.7: “despite the hard thresholding is not accurate?” -> “despite the hard thresholding not being accurate?” - p.7: different bs -> different b’s ========================================== The authors have addressed my concerns regarding the motivation for the setting.