Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
Summary: Analytic shrinkage means using a known
formula for the optimal strength of regularization for an estimated
covariance matrix. This paper shows that the known formula is valid under
weaker conditions than in the original proof. An algorithm for better
practical shrinkage is provided also.
Quality: High.
Clarity: Good.
Originality: Good, assuming that the
central result really is new.
Significance: Mathematically
significant. Of limited practical importance, because the recommended
regularization strength remains the same, and experiments show limited
benefit for the new AOC method.
DETAILED COMMENTS
19:
"implies" should be "requires" as on line 45.
90: Intuitively,
from where do the very high order moments arise? 8 here and 16 later.
Should we take these seriously in practice?
291: This iterated
process is the main practical contribution. Please provide more detail,
maybe pseudocode. Figure 3 can be smaller to make space.
302:
"percentual" is not a word and in any case should be "fraction."
373: What factor model is used here? Explain. Is it significant
that AOC does better than the factor model?
386 to 388: Discuss
why QDA accuracy is so variable.
396: Discuss why LDA (shrinkage)
is worse than plain LDA. Q2: Please summarize your review
in 1-2 sentences
Good theoretical paper. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors focus on the problem of the estimation of
covariance matrices using empirical estimators. To avoid overfitting in
high-dimensional cases they shrink the covariance matrix towards a
multiple of the identity matrix. The optimal amount of shrinkage \lambda
can be estimated empirically by minimizing the expected error of the
shrinkage estimate. Then they show that existing results about the
consistency of the estimate of \lambda depend on an assumption which does
not hold in practice for most real-world datasets. The authors then
introduce a new wider set of assumptions and proof that they guarantee
consistency. Also, they show that in certain cases there will be no
asymptotic shrinkage of the empirical estimate of the covariance matrix.
As a solution that they propose is then to perform the shrinkage in the
orthogonal complement to the eigen vector with largest eigen value. Since
a dataset may have multiple large eigenvalues they iterate the procedure
and automatically select the number of retained eigen directions. A series
of experiments with synthetic and real-world data validate the proposed
approach.
Quality:
The paper seems to be sound, with the
experiments validating the theoretical results. I did not verified the
proofs of the main theorems since these are in the supplementary material,
which I am not required to examine. The experimental evaluation seems to
be thorough.
Clarity:
The paper is clearly written and
well organized. However, the main document does not describe how the
orthogonal complement shrinkage is actually performed and the motivation
for this approach is also not strong enough in the paper. The authors
should clarify how this technique is implemented in practice.
Originality:
I find interesting and worth on its own the
contribution of the authors to show that the existing shrinkage techniques
will not produce enough shrinkage in some cases. The proposed orthogonal
complement shrinkage is novel up to my knowledge. However, it is a bit
incremental since it represents only a modification of already existing
approaches.
Significance:
In my opinion, the most
significant contribution is to point out the failure of the shrinkage
method in several relevant cases, as well as the theoretical contributions
contained in the paper. The orthogonal complement shrinkage seems to be
less significant than these other elements. Q2: Please
summarize your review in 1-2 sentences
A technically sound and well written paper that
presents some interesting theoretical results and proposes an incremental
contribution to improve existing techniques. Submitted by
Assigned_Reviewer_7
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper makes several contributions in the area of
analytic shrinkage for regularized covariance matrix estimation.
First the authors show that an assumption on the covariance
structure (bound on the 8th moments in the eigenbasis) that is used to
prove the consistency of analytic shrinkage does not hold in several real
datasets from different fields by deriving two checkable consequences of
this result: (i) the largest eigenvalue grows with the fourth root
(ii) the eigenvalue dispersion is bounded
The authors then
provide weaker assumptions that do not place any constraints on the
covariance structure and show that analytic shrinkage is still consistent
under these weaker assumptions. They also provide an expected result
showing a problem with the limit behavior of analytic shrinkage that
probably applies to many real world datasets.
Next the authors
provide an alternative shrinkage based procedure for regularized
covariance matrix estimation they call oc-shrinkage. In this method
shrinkage is only applied to the orthogonal complement of the first (or
first few) eigenvectors. They show this procedure still results in a
consistent estimator and empirically compare it to analytic shrinkage in
several different domains. Oc-shrinkage outperforms analytic shrinkage in
each case.
This paper contains novel and theoretically sensible
ideas for regularized covariance matrix estimation which have very clear
important real world applications. The results appear to be correct and
rigorous (I was not able to check all of the proofs) and validated by the
empirical results.
The paper is well written and clear for the
most part. I think section 5 could be improved to provide more details.
While the basic idea of oc-shrinkage is clear, I'm not sure how all the
details in the procedure work out in practice. Since this is probably the
most important contribution (at least most important deliverable) of the
paper, a step by step procedure for implementing oc-shrinkage in practice
would be useful. Q2: Please summarize your review in 1-2
sentences
The authors provide new results which generalize
analytic shrinkage for regularized covariance matrix estimation to
arbitrary covariance structures and provide a novel alternative procedure
that appears to outperform the standard analytic shrinkage procedure while
retaining consistency. The ideas presented are sensible and rigorously
justified and the novel alternative procedure could lead to significant
improvements in covariance estimation in many real world
applications.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
First of all we would like to thank the reviewers for
their helpful and encouraging comments to our manuscript.
We see
as a recurring comment raised by the reviewers, that our novel shrinkage
proceedure should be described in a bit more detail so that it can be more
easily implemented in practice. We think that this is a really helpful
point: in the revision we will include a more detailed description in the
main text and add pseudo-code.
For the information of the
reviewers, we provide the pseudocode below.
The ingredients of
aoc-shrinkage are - calc_lambda(X): returns the standard shrinkage
strength - calc_Delta_R(X,lambda_1,nu_1,lambda_2,nu_2): returns an
estimate of EMSE (cmp. eq. 3) improvement of lambda_1, nu_1 over
lambda_2,nu_2 (( C_shr = (1-lambda)*S + lambda*nu*eye(p) )) -
calc_sigma(X,lambda_1,nu_1,lambda_2,nu_2): returns an estimate of the
std. dev. of Delta_R We agree that the formulas for Delta_R and sigma
have to be put explicitly in the text/supplemental.
Let S be the
covariance matrix, V the matrix of eigenvectors sorted with descending
eigenvalues and l the vector of eigenvalues: 1. r* = 0, r = 1 2.
lambda_old = calc_lambda(X) 3. nu_old = mean(l) 4. while true
______# shrinkage on the orthogonal complement ______# of r
directions: 5. ____X_oc = V(:,r+1:p)'*X 6. ____lambda_new =
calc_lambda( X_oc ) 7. ____nu_new = mean( l(r+1:p) ) ______#
calculate estimate and std. dev. of ______#exp. mean squared error
difference: 8. ____Delta_R = calc_Delta_R(X_oc,
________lambda_new,nu_new,lambda_old,nu_old) 9. ____sigma_R =
calc_sigma(X_oc, ________lambda_new,nu_new,lambda_old,nu_old) ______#
if there is an improvement, ______#accept r and continue. else stop.
10. ___if Delta_R - m * sigma_R > 0 11. ________r* = r* + 1
12. ________r = r + 1 13. ________lambda_old = lambda_new 14.
________nu_old = nu_new 15. ___else 16. ________break 17.
___end 18. end 19. P_pca = V(:,1:r*)*(V(:,1:r*)' 20. P_oc =
V(:,r*+1:p)*V(:,r*+1:p)' 21. C_shr = (1-lambda_old)*S +
lambda_old*eye(p)*nu_old 22. return P_pca*S*P_pca' + P_oc*C_shr*P_oc'
|