
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper studies bandit convex optimization problem in the adversarial setting. Loss functions are assumed to be Lipschitz and smooth. The stateoftheart result for this setting is algorithm of Saha and Tewari with regret bound of O(T^{2/3}). Main contribution of the current paper is an algorithm that enjoys an improved bound of O(T^{5/8}). Similar to the earlier work, the algorithm is based on a reduction to online convex optimization. To achieve the improved bound, the sequence of gradient estimates are weighted nonuniformly such that the influence of the most recent estimates is decreased.
I enjoyed reading this paper. It addresses an important question and the proposed algorithm is simple and intuitive. The paper is very wellwritten and provides an excellent overview of existing techniques and the main ideas.
Q2: Please summarize your review in 12 sentences
I enjoyed reading this paper. It addresses an important question and the proposed algorithm is simple and intuitive. The paper is very wellwritten and provides an excellent overview of existing techniques and the main ideas.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
As I have stated in review summary, the significance of the paper, as least in my opinion, cannot be questioned. It is a simple question of whether the proofs are correct. I will highlight a couple of points which I believe are mistakes. Other than that, I have a few minor questions regarding presentation of the paper (which are not important in terms of accept/reject decision, but if the paper gets in, will be important in terms of clarity).
Line 322323 The authors state that by definitions of \gar{g}_t and the \alpha variables, it is true that \sum_{s=1}^t \alpha_{s,t} \hat{g}_s= \sum_{s=1}^t \bar{g}_s. In fact, this is an important claim because then the dual averaging step reduces to an operation with gradient estimates \bar{g}_1,...\bar{g}_T.
I simply cannot see how this is true. I feel this is basic algebra. First of all, even for the equality to hold, we must have \eta \sum_{s=1}^t \bar{g}_s (I can understand missing \eta is a typo). However, even with that, I dont think the equality holds. From definition, it can be seen that \eta \sum_{s=1}^t \bar{g}_s=
\sum_{s=1}^t \eta \frac{ts+1}{K+1} \hat{g}_s. Thus, for each \hat{g}_s, the coefficient associated with it is :\eta \frac{ts+1}{K+1}. However, for all s, the coefficient is not equal to \alpha_{s,t}. By definition, \alpha_{s,t}=\eta \frac{ts+1}{K+1} only for s \> tk. So, how is the equality holding when s\le tk?
Lemma 11: (8c)\le 0. When i looked at the proof of this statement, I once again felt there was a mistake. It is stated in the proof that sum of the \bar{f}_t(x^*) terms is equal to sum of the \hat{f}_t(x^*) terms. I dont see how that is true. This can be simply tested by taking T=2 and K=1 (allowed by condition of Thm. 9) By definitions, for T=2 and K=1, \sum_{t=1}^T \bar{f}_t(x^*)= \hat{f}_1(x^*) + (1/2) \hat{f}_2(x^*), which is obviously not equal to \hat{f}_1(x^*) + \hat{f}_2(x^*).
I did not even look through the rest of the proofs. I find the two mistakes very surprising since they seem to be basic algebra. It is of course possible that the authors made some typos which have completely changed some important definitions. Otherwise, I will be happy to be corrected.
In terms of presentations, I have a few small questions: 1. In abstract, in line 026, it should be that the regret guarantee is for smooth convex functions, not all convex functions. For the general Lipschitz BCO problem, the lower bound might still be \Omega (T^{2/3}) (or even higher, since the best known result is Flaxman's O(T^{3/4}). 2. I am not sure what Thm.5 means (or rather, the definition of K_{y,\epsilon}. In the definition of K_{y,\epsilon}, there is an \alpha term present and no epsilon term. Is \alpha= \epsilon? It seems to be a typo. 3. Thm 9. In the regret bound, the term \epsilon appears. Nowhere in the algorithm or in the statement of Thm.9 do we have \epsilon. I understand that \epsilon probably appears while using an upper bound on the selfconcordant R(), but this is quite poor in terms of presentation. Please modify the statement of Thm 9 to explicitly mention \epsilon. Or dont mention \epsilon at all and use T (like in Saha and Tewari).
4. In Saha and Tewari's paper, the authors dont analyze regret directly in terms of x^*. They instead consider a term \bar{x}, which is 1/\sqrt(T) away from boundary of K. This allows R(\bar{x}) \le 2 \new log(T). How do you use x^* directly?
5. Please provide definition of Minkowski functional in supplement to make the paper selfcontaining. Specifically, I would like to know what is the Minkowski's func. associated with a Dikin's ellipsoid. Like specifically what does the functional look like for Dikin's ellipsoid?
Update: It appears that the proofs are correct. I accept the paper.
Q2: Please summarize your review in 12 sentences
The paper is straight forward in terms of accept/reject decision. It is a purely technical paper, promising a better regret bound for the well studied BCO problem, with smooth functions. So, if the proofs are correct, the paper should be interesting for people interested in the BCO problem. I have found, what I believe, are important mistakes in the paper and that's why I am rejecting it. I will definitely accept the paper if it is actually my mistake rather than mistakes in paper.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper improves on the state of the art upper bound on the regret of adversarial smooth function minimization in the bandit setting. In particular, the authors show that by averaging past gradients in some fixedsize window, one can lower to variance of the gradient estimates generated by the algorithm of Saha & Tewari. Thus, the regret bound is improved from O(T^{2/3}) to O(T^{5/8}).
This result is interesting since it refutes the conjecture that the minimax regret rate in this setting is O(T^{2/3}), and strengthens the possibility that the lower bound of Omega(T^{1/2}) is tight.
The writing is very good.
The results are novel and wellpresented. Although the improvement over past results is small, its impact is significant. Thus, in my opinion, this paper should be accepted.
Additional comments ===================  Line 148: v should be sampled from the unit ball and not from the sphere. v is sampled from the unit sphere for the calculation of the gradient but not the perturbed function value (Stoke's theorem).  Line 188: \z\_x must be strictly less than 1.  Line 322: Missing eta in the RHS of the equation.
Q2: Please summarize your review in 12 sentences
The paper improves on the state of the art upper bound on the regret of adversarial smooth function minimization in the bandit setting. An excellent paper and should definitely be accepted.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their helpful comments.
We address specific points raised in the reviews:
Reviewer
2: ===========
* Possible mistakes in Eq. on Line 322323, and
in proof of Lemma 11.
We rechecked our derivation and verified its
correctness. As you pointed out, we are missing \eta in line 322323, but
otherwise everything seems to be correct  see details below. In the final
version, we will elaborate on why the equation is correctthe
computation might be confusing indeed.
Before we address the
concrete concern, we point out that even if the equation were wrong, our
main result would still be intact. This equation simply establishes that
dual averaging with the alphaweighted \hat{g}_t vectors is equivalent to
dual averaging with uniformly weighted \bar{g}_t vectors. Namely, it
merely gives us a way to unify the pseudocodes of our algorithm and the
previous algorithms. The algorithm that we analyze is (unweighted) dual
averaging with \bar{g}_t vectors, and the definition of \alpha does not
play a critical role in our analysis.
* "From definition, it can
be seen that \eta \sum_{s=1}^t \bar{g}_s = \sum_{s=1}^t \eta
\frac{ts+1}{K+1} \hat{g}_s."
This is incorrect: notice that in the
sum \sum_{s=1}^t \bar{g}_s, each \frac{1}{k+1} \hat{g}_s term is summed up
exactly k+1 times, except for the last k terms (corresponding to
s=tk+1,...,t1,t). Thus, the coefficient of each \hat{g}_s term is
precisely \eta, except for the last k terms whose coefficients are \eta
\frac{ts+1}{k+1}. This is exactly the definition of the alpha_{s,t}
coefficients in Eq.5.
* "It is stated in the proof that sum of the
\bar{f}_t(x^*) terms is equal to sum of the \hat{f}_t(x^*)
terms."
In fact, in the proof of Lemma 11 (see lines 731734) it is
stated that the first sum can only be smaller than the second, and is not
necessarily equal to it. This is indeed the case in your test example with
T=2 and k=1.
* Other comments: thank you for spotting these
typos/inconsistencies; they will all be fixed in the final
version.
Reviewer 5: ===========
* "The paper
marginally improves a known regret bound"; "techniques do not seem
convincing for fully closing the gap between known lower and upper
bounds".
The BCO problem is a difficult and fundamental open
problem in online learning. Progress on such problems is often achieved
one step at a time. The previous papers on this topic were also
incremental steps, but we are glad that they were published because we
were able to build our result on top of them. Similarly, we hope that
others will build on our result and take additional steps towards the
resolution of the open problem. We think we did more than just marginally
improve the bound: we proposed a new algorithmic ingredient and presented
its nontrivial analysis. We agree that closing the gap will probably
require heavier machinery, but at the same time feel that our results are
a nontrivial step that the NIPS audience will be interested to hear
about.

