
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a subspace exploration algorithm for the problems of orthogonal NMF and nonnegative PCA (which are tightly related). According to the authors, this is the first algorithm that comes with performance guarantees that are characterized by a tradeoff between the approximation error and the computational complexity. Although the complexity scales exponentially with some user controlled parameters (order of low rank approximation, and target rank), simulations indicate good performance in practice.
Quality: The paper is technically sound, with proofs about the main theorems, discussion about their shortcomings (exponential complexity), and simulations to establish state of the art results in practice.
Clarity: The paper is mostly clearly written. I think one clarifications that is important but is missing is the following: Algorithm 3 LocalOptW runs with complexity O(2^k*k*m). Naively, for Algorithm 1, algorithm 3 has to be run for every choice of C. But the number of choices for C is also exponentially large (the authors should state explicitly how large it is). This would make the whole algorithm even more computationally hard. However the authors argue (footnote 2) that when the two algorithms are combined (as in the application shown in this paper), Alg 3 can be simplified into a O(k*m) procedure. While intuitive, the authors should make explicitly clear how this is done, and perhaps rewrite algs 1 and 3 to demonstrate that.
Originality: To the best of my knowledge the approach is original.
Significance: The paper establishes both theoretical bounds (for the first time), and state of the art results through simulations. I'd like to see a comparison (or a discussion) of the various different methods, also with respect to the runtime of the various algorithms. The complexity bound O((1/\epsilon)^(1/\epsilon)) can grow rapidly, so it'd be nice to see that the gained performance does not come at a terrible computational cost.
Since ONMF and NNPCA are largely equivalent, why does the proposed method perform better than EM for NNPCA (table 1) but worse than EM for ONMF (table 2) for the same dataset MFEAT PIX? Is there something that I miss here?
Typo: last paragraph of the results section, Table 1 should be Table 2 (I think)
Q2: Please summarize your review in 12 sentences
The paper establishes for the first time performance bounds for the problems of ONMF and NNPCA though an algorithm that has exponential complexity. Some additional clarifications are needed before possible publication.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes an algorithm for solving nonnegative matrix factorization subject to ortogonality of factors. The algorithm comes with theoretical guarantee of the approximation and with bounds on the polynomial computation time. The motivation for imposing orthogonality on NMF (separation of components) is wellemphasized and theoretical findings are illustrated on numerical experiments (synthetic and realdata). The paper contains 3 Algorithms: (A1) LowRankNNPCA, (A2) ONMFS (A3) LocalOptW. (A2) uses (A1) to solve the problem in Eq. (1). (A1) can be used independently for solving problem Eq. (3). (A1) calls (A3) as a subroutine multiple times, after having denoised the input matrix using a rankr approximation. (A3) performs a greedy search over all sign configurations (orthants) to optimize the sum of squares objective over the nonconvex domain W_k. The cost is exponential in k but k is supposed to be small.
I overall enjoyed reading the paper and think its results are interesting.
There is a key element in A1 which is the weakness of this work. The search over the epsilonnet N_eps^k(S_2^r) can be prohibitively expensive. Authors admit that the difficulty of NMF is hidden here. The difficulty of this step also limits reproducibility and extensions of the approach.
Minor remarks.
The notation T_{SVD} (computation time of SVD) is introduced without definition. There is an ambiguity: I guess it means complexity of computing (r+1) SVD, not full SVD, which can be computed faster than a full SVD when r is small. This is a point in favor of the approach which can be more clear there.
The notation for this N_.. also needs to be defined earlier in the text. the caligraphed N is usually used for normal distribution.
Q2: Please summarize your review in 12 sentences
The paper studies NMF with orthogonality constraints. Algorithms come with theoretical guarantees and numerical experiments. Overall wellwritten
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a new algorithm for NNPCA and ONMF, with some theoretical warranty on the given approximation. It is the first that such warranty are provided in this context. The theoretical results are nice.
The first part of the paper is very well written, until Sec. 2.2. The problem is nicely presented as well as the state of the art. The beginning of Sec. 2.2 is difficult to read. The authors tried to give a "simple" overview of their algorithm to presents the theoretical results, but this overview is not very clear : some important notations appear only 3 pages later (I think in particular on the \mathcal{N}_{\epsilon/2}...).
The theoretical results are clear, but we only understand at this stage that it relies on a first SVD approximation of the data matrix M, the main loop of Alg. 1 remains unintuitive.
Sec. 3 goes back on the algorithm. Unfortunately, the algorithm remains very difficult to understand, particularly without the supplementary materials. I did not understand the construction of the epsilonnet, which seems very important to obtain an efficient algorithm.
The experimental section is clear, but suffers from important drawbacks. The only showed results are the explained variance on various dataset, compared to various approached. As ONMF seems adequat for classification, some results on a classification performances would be very welcome ! Moreover, some computational time comparison would also be nice.
finally, I think it would be interested to mention the "Projective NMF" ([a]), which aims to give a "simple" approximation of the ONMF (without warranty of the orthogonality of W), but which is very easy to optimize.
[a] Yuan, Z. and Oja, E., Projective nonnegative matrix factorization for image compression and feature extraction. In: Image Analysis Springer, Berlin, Germany, pp. 333342, 2005.
Q2: Please summarize your review in 12 sentences
An interesting paper with a new algorithm for NNPCA and ONMF, but sometimes very difficult to read, and with some lack in the experimental section.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We are grateful that the reviewers enjoyed reading
our paper and appreciate their positive feedback. We now address the
remarks of Reviewers 1, 2 and 3.
REVIEWER 1
* [...]
Algorithm 3 LocalOptW runs with complexity O(2^k*k*m).[...] algorithm 3
has to be run for every choice of C. But the number of choices for C is
also exponentially large [...] the authors argue [...] when the two
algorithms are combined [...] Alg 3 can be simplified into a O(k*m)
procedure. While intuitive, the authors should make explicitly clear how
this is done [...].
Indeed, Alg 3 is called as a subroutine by Alg
1 for each choice of C. That is explicitly reflected in the complexity of
Alg 1 (Thm 1); the latter captures the size of the net. (Similarly for Alg
2 in Thm 2). The footnote points out that an additional factor of 2^k in
the complexity of Alg 3 can be saved when Alg 3 is used as a subroutine of
Alg 1 instead of a standalone procedure. That is a minor remark since it
does not affect the asymptotic complexity expression for Alg 1. We will
make this more clear.
* I'd like to see a comparison [...] of the
various different methods, also with respect to the runtime [...]. The
complexity bound [...] can grow rapidly, so it'd be nice to see that the
gained performance does not come at a terrible computational cost.
Indeed, the complexity scales unfavorably in epsilon and is
prohibitive from a practical point of view. In theory, it is interesting
because it yields a novel guarantee for the ONMF problem. In practice, for
small epsilon our algorithm is slower than the other methods. In our
experiments, we terminate our algorithm after a few samples have been
examined, and still get competitive results. All methods run for a few
minutes. Also, note that our algorithm allows for a certain degree of
parallelization, not exploited here.
* Since ONMF and NNPCA are
largely equivalent, why does the proposed method perform better than EM
for NNPCA [...] but worse than EM for ONMF [...] for the same dataset
[...]? Is there something that I miss here?
This point is indeed
confusing. These are two different algorithms. The EM algorithm for NNPCA
is from [26], while the one for ONMF from [11]. They both rely on an
EMbased approach. Given that our algorithm outperformed the other
approaches in the NNPCA objective, we did not use them for the ONMF
experiments.
REVIEWER 2
* There is a key element in A1
which I wish to understand better. The search over the epsilonnet
N_eps^k(S_2^r) can be prohibitively expensive. The notation for this N_..
also needs to be defined earlier in the text.
Indeed, the size of
the epsilonnet is the computational bottleneck of our approach.
Determining whether it can be improved or showing it cannot is an open
problem. The notation for the epsilonnet is admittedly not ideal! We
will move the definition earlier in the text.
* The notation
T_{SVD} (computation time of SVD) is introduced without definition. There
is an ambiguity: I guess it means complexity of computing (r+1) SVD,
[...]. This is a point in favor of the approach which can be more clear
there.
That is correct and will be addressed in the final
manuscript.
REVIEWER 3
* The beginning of Sec. 2.2 is
difficult to read. The authors tried to give a "simple" overview of their
algorithm to presents the theoretical results, but this overview is not
very clear : some important notations appear only 3 pages later (I think
in particular on the \mathcal{N}_{\epsilon/2}...). The theoretical
results are clear, but [...] the main loop of Alg. 1 remains
unintuitive.
The reviewer is right. We felt that a more "intuitive"
description of Alg 1 would also require a more technical description.
Given the available space, we decided to postpone the deeper technical
description of the main algorithm until Sec 3 and present the main
theoretical results first. We will at least ensure that all notation is
properly introduced and try to improve that section.
* I did not
understand the construction of the epsilonnet, which seems very important
to obtain an efficient algorithm.
To obtain our results, it
suffices to consider a simple net obtained by sampling rdim points i.i.d.
from a Gaussian distribution and normalizing their length. The size of the
net is indeed the computational bottleneck of our approach.
* The
experimental section is clear, but suffers from important drawbacks. [...]
As ONMF seems adequate for classification, some results on a
classification performances would be very welcome ! Moreover, some
computational time comparison would also be nice.
It would indeed
be interesting to include such results in the supplemental material. Most
importantly, we should have included a discussion on the time comparison
among various methods. We will try to address that in the final
manuscript. 
