Paper ID: 473
Title: Low-Rank Matrix and Tensor Completion via Adaptive Sampling
Reviews

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors propose sequential and active formulations for low rank matrix completion including a formulation where the underlying matrix/tensor is known to be noisy. The resultant formulations have strong performance bounds with regard to the number of entries that need to be polled as a function of the matrix rank.

Theorem 3 is in my opinion a main contribution and it was good to see it was experimentally verified. The lower bound for uniform sampling given in theorem 4 is also useful, but it appears you did not experimentally verify how close your work is to this lower bound. Please correct me if I'm wrong.

The experimental results are solid and I presume limited to simulated results due to availability of real data. The comparison to SVT shows a clear time improvement though I don't think you could draw the conclusion it is always an order of magnitude faster given your results.
Q2: Please summarize your review in 1-2 sentences
A very interesting problem with a novel problem setting, solid theoretical bounds that are empirically verified. The only down side to the paper is the lack of experimental results on real world problems, but that is a minor issue in my opinion.

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper provides algorithms and results for low rank tensor completion under adaptive sampling. The authors propose algorithms with following guarantees:
1. For the case of completing a $T^{th}$ order tensor of rank $r$, $O(r^{2T-2} T^2 n \log r)$ samples suffice.
2. Specializing the above for matrix completion yields $O(r^2 n \log r)$ sample complexity.
3. For noisy matrix completion (under Gaussian noise), $O(r^2 n polylog(n))$ samples suffice.

The authors further show that their approach (theoretically) outperforms any passive sampling scheme.

Quality: The main contribution of the paper is an algorithm that completes a rank-r T^th order tensor of dimension $nxnx...n$ using adaptive sampling and has a provable sample complexity of $r^O(T) O(n)$. The key take away from here is that the authors obtain optimal dependence in the tensor dimensions, though the dependence is exponential in the tensor order. The key insight in the paper is the fact that inner products between two (incoherent) vectors can be estimated by sampling. However, this idea is not new and has been established in prior work by Balzano et. al. (as mentioned in the paper by the authors). The authors then establish incoherence of tensor modes and the analysis uses standard techniques.

For the noisy case, the authors adapt the algorithm of Deshpande et. al. which samples vectors using inner products. Since the inner products are not known in the current case, the authors propose sampling to estimate inner products, and then apply Deshpande et. al.'s algorithm.

Clarity: The paper is well written and easy to follow.

Originality: The ideas used in the paper are by now standard in the literature. However, the authors use them in a new setting.

Significance: After reading the author feedback, I agree that obtaining the right dependence on the matrix dimensions is an important step. However, in my opinion, the other results in the paper (specialization to matrix completion and noisy matrix completion) are not very interesting.
Q2: Please summarize your review in 1-2 sentences
The main contribution of the paper is in applying known techniques to a new problem (tensor completion), and obtaining optimal scaling in terms of the tensor dimension. However, the other results in the paper, as well as analysis are not very interesting.

Submitted by Assigned_Reviewer_7

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
the paper suggests a matrix reconstruction algorithm for matrices in case of incoherence in the columns only,
not necessarily in the rows. a similar result for tensors, where incoherence is in all dimensions except the
last one.


the idea is very simple. for matrices, it goes like this: recovering a column-incoherent rank-r matrice amounts
to recovering (a basis of) the column space, because given the column space you can recover each column
by sampling order of r log r entries and solving equations. to get the column space basis, you greedily
"grow" a basis by, at each iteration, trying all columns until you find one column which is not spanned
by the columns you already have. checking whether a column is not spanned can be done again by sampling O(r log r)
entries of that column. each time you find column, you add it to your basis by completely uncovering it,
hence this is an adaptive sampling strategy. it seems like a reasonable idea, although i didn't check the proof.
the same idea is applied to tensors in a recursive fashion.

compared to nuclear norm minimization techniques, the paper claims that this algorithm gives better bounds
in certain cases where the incoherence of the rows is poor (around sqrt r). the improvement is in the
logarithmic and constant terms. arguably also the algorithm here is simpler to implement (compared to SVT technology).
also the proof is simpler. I didn't check (in the appendix) whether the claim of improvement from log(n) to log(r)
is correct, but I couldn't think of a counter example for this either.

the paper also presents an algorithm and analysis for noisy matrix reconstruction, for matrices that are sum of
low rank and gaussian. the algorithm seems to be based on adaptation of Deshpande et al's approach, where
columns are sampled with probability proportional to their (sampled) projection onto the orthogonal candidate
column space. As before, in certain parameter regimes, the approach here can be better in terms of
error/sample size tradeoff.



detailed comments
p 2
"exact recover" -> "exact recovery"

"less well known" -> "less known"

p. 3
"One readily verifies that \langle x,y \rangle = vec(x)^Tvec(y) "
you didn't define \langle x,y \rangle for tensors, so there is nothing to verify. are you trying to define \langle x,y \rangle here?


p. 5
theorem 4: are you just "adapting" the proof strategy, or are you quoting the theorem from [6]?


page 6
last paragraph before section 5 - extremely hand wavy and oversimplistic.

theorem 5 is a guarantee for the algorithm described in the beginning of section 5.
why is the algorithm not written in pseudocode like algorithm 1?

p. 7
"set up" --> setup

p. 8
figure 1 - which is SVT and which is your algorithm?

It would have been nice to see a specialization algorithm 1 for matrices (which is the most interesting case).


The authors should be aware of recent work by Ailon, Chen, Huan, in which the second part is on adaptive sampling for pairwise clustering matrices (although the approach is completely different).
Q2: Please summarize your review in 1-2 sentences
strength
--------
simple matrix recovery with adaptive sampling with logarithmically better bounds compared to other approaches in
certain extreme cases. also handles tensors, for which much less is known.
a recovery algorithm is also shown for the (gaussian) noise case.

weakness
--------
in other parameter regimes, the bounds of the algorithms here are worse. it would have been nice to compare
algorithms with real matrix data.
also regarding the experiments for exact recovery, it is not mentioned what the row incoherence is. it would have been
nice to see in practice how row incoherence affects the comparison between this algorithm and SVT.

Submitted by Assigned_Reviewer_8

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors develop algorithms for performing adaptive matrix (and
tensor) completion. The authors also provide upper bounds for their
method as well as lower-bounds for the non-adaptive matrix completion
problem. I think that both results are relevant. The tensor problem
seems more interesting than the matrix version. The writing of the
paper could be improved upon.

A comparison is made to work by Gittens, where the authors of this
report state that Gittens requires row and column incoherence. To be
fair, that result is on symmetric matrices, so the statement is
vacuous. Further, the sample complexity of his result is $r n \log r$,
which is better than the $r^2$ dependency presented in this result.

The experiments do not fully explore the range of behavior of the
presented method. One regime is when the rank $r$ increases and mutual
incoherence is bounded for both the row-space and column-space. How
would the presented method behave in that setting? The theoretical
results state that there will be an $r^2$ dependency, while most
result in matrix completion establish a linear dependency on $r$.

In the computational comparison the authors state that SVT takes 2
hours. What algorithm did the authors use? For example MATLAB's built
in sparse SVD method can find the top $100$ left singular vectors of a 10k by 10k matrix in five minutes on an old laptop.

I am not 100% sure the lower-bounds are correct, but the other bounds
seem to hold.

Minor comments:

#The probability statement in Theorem 2 should be restated with
$\delta$ redefined as $\delta=\delta^'/(r^T T)$.

#The usage of $T$ changes between the noisy and noiseless setting and
$T$ is also used for transpose in the noisy section, which makes
reading the section more difficult.

#In Theorem 4 there is a typo indicating that $T$ is the probability of
sampling error.
Q2: Please summarize your review in 1-2 sentences
The author's present an interesting algorithm for adaptive tensor completion and also present theoretical results to justify the results. A more thorough experimental analysis should be conducted to verify the behavior for higher order tensors.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their comments and suggestions and take this opportunity to address their most pressing concerns.

Reviewers 4, 7, and 8 all commented that more experimental results would strengthen the paper by exploring various parameter dependencies in practice. With page restrictions, we were obligated to present the experiments we found most relevant to our primary goal of verifying our theoretical results. We can add a deeper empirical study to the appendix of the paper, where we thoroughly explore our algorithms performance over a variety of parameter regimes (including on tensors).

Reviewer 7 and 8 mention that for some parameter regimes our matrix completion bound is worse than existing results. Without any assumption about ||UV^T||_{\infty} this is actually not the case as we discuss in the paragraph following Theorem 3. Moreover, such an assumption, while prevalent in the matrix completion literature, has never been established as necessary and we wanted to demonstrate what is possible without placing such assumptions on our model.

Reviewer 6 expressed concerns about the originality of the results and the justification of adaptive sampling for tensor completion. Theoretically, the insight lies in how one can use a simple primitive like estimating inner products from missing data to develop highly effective algorithms for complex tasks like matrix and tensor completion. To our knowledge, these insights have not be expressed by the machine learning community. With regards to adaptivity, while our upper bounds (as well as existing works [20]) are loose by a power of 2 in dependence on matrix/tensor rank, our bound is significantly better than all existing results for tensor completion as far as dependence on tensor size goes -- we require sample complexity proportional to the sum of the tensor dimensions where all other work (see the paragraph after Theorem 2) has dependence on the product of the tensor dimensions. Comparing with our lower bound (Theorem 4), we demonstrate an improvement by a logarithmic factor in the tensor dimensions over any algorithm that uses random sampling, which justifies investigating adaptive sampling schemes. Nevertheless, we do agree that there is a gap between our tensor completion bound and the number of parameters in the tensor and closing this gap remains an open question as we mention in the last section. It is also worth noting that tensor completion is just emerging as a problem of interest and our community's understanding is still fairly limited. We believe our results can serve as a baseline for future work on tensor completion, in particular because we establish the correct dependence on tensor dimensions (if not tensor rank).

Minor comments:
Reviewer 4 commented that we did not experimentally study the gap between our algorithm's performance and the lower bound. We can add the lower bound to the plots in Figure 1, which would demonstrate this gap.

In the plots in Figure 1 the right subspace is generated by orthogonalizing a random gaussian matrix.

In the computational comparison, we used the iterative singular value thresholding algorithm of Cai, Candes and Shen. The algorithm involves repeated computations of the SVD, which results in poor computational performance. All algorithms were implemented in python using the NumPy and SciPy libraries.

Theorem 4 is not a restatement of the theorem from [6], it is an adaptation to the tensor setting. Moreover their proof has a minor flaw which is corrected in ours.