
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Good paper on an interesting and uptodate topic, albeit not so easy to read.
Q2: Please summarize your review in 12 sentences
The authors address the duelling bandits problem without assuming the existence of a Condorcet winner. They propose new algorithms and derive regret bounds.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper tackles the dueling bandits problem, focusing on the scenarios where a Condorcet winner does not exist. In particular it focuses on finding the Copeland winner (i.e., the arm that beats maximal number of other arms) and proposes two algorithms for this task. The paper provides theoretical bounds on these algorithms which indicate better asymptotic regret performance than previous approaches for identifying the Copeland winner.
Overall the paper (with the supplementary material) was quite solid in terms of material. However I did have some significant concerns.
I wanted to start my critique with the informativeness and "quality" of the actual 9 page material. I recognize that this is a matter of principle to which the authors may object, but in my honest opinion I would like to reward the authors and NIPS submissions that took the extra effort of making a paper selfcontained and informative within the 9 page limit, although that may have meant leaving out certain interesting results/analysis/experiments. At the end of the day what gets published is the 9 pager, which is what I believe should be the determining factor of a paper's quality. After all the NIPS guidelines clearly state that I reserve the right to do so:
"Note that the reviewers and the program committee reserve the right to judge the paper solely on the basis of the 9 pages of the paper; looking at any extra material is up to the discretion of the reviewers and is not required."
With this in mind, I actually find the overall quality and clarity of the 9 pager itself fairly lacking. Most of the interesting analysis and all the other empirical evidence are hidden away in the 20+ page appendix. In fact the overall analysis of the algorithms itself is fairly hard to understand as the 9pager itself presents it in a rather hardtounderstand manner. Most of the paper constantly makes forward references to the supplementary material to complete the story, though its' purpose is to "supplement" the main content not serve as a continuation of it.
Some of the things which I would have liked to see in the actual 9pager itself would include:
a) Significance: Why is this work significant? Lately the work on dueling bandits is becoming increasingly incremental with different approaches improving marginally over the previous on different metrics. So why is the Copeland winner what we should really be interested in as compared to the Borda winner or the von Neumann winner or some yettobepublished winner? Because at the end of the day what a practitioner cares about is the method that works best in practice not the specific metric that was used to get to that method. Towards this, I don't really find this paper providing any compelling evidence that the Copeland winner is the right one to aim for.
b) (Quality+Significance) Related to the above, but why would Condorcet methods not work well in cases such as this (even though the Condorcet winner may not exist). The authors mention that these methods may fail for these problems, but the 9 pager itself does not contain any evidence (theoretical or empirical) indicating the same.
c) (Quality+Clarity) Providing empirical evidence in the main material (even a small study whose setup can be detailed in the appendix) could convince a practitioner to switch such a method. A set of involved hardtogauge theoretical results alone do not constitute a compelling enough set of evidence in my opinion.
Again while I understand that the authors have addressed some of these criticisms in the supplementary material, I am basing this review on the 9 page material to be fair to all the other papers I review which have relied on me reading just the 9 pages and not a 32 pager.
Q2: Please summarize your review in 12 sentences
This paper proposes theoretically sound solutions to wellestablished problem (dueling bandits). While the methods and accompanying analysis (in the appendix) are sound, what concerns me here is the fact that the actual 9page content of the paper is not informative enough and instead relies on a 20+ page appendix to make the paper selfcontained.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper considers the dueling bandit problem where the Condorcet winner does not exist, but instead the Copeland winners are considered to be optimal arms. The authors come up with two algorithms called CCB and SCB whose regret bounds are of order K^2 + K log K and K log K log T respectively which is a striking result in the case of Copeland ranking.
The CCB algorithm is based on the mixture of the principle of ``optimism followed by pessimism''. More detailed, it first selects an arm to be compared based on the optimistic estimate of the preference matrix. That is, it selects one of the possible Copeland winners. The other arm to be compared is chosen based on the pessimistic estimate of preference matrix, that is, it selects an arm which likely rejects the hypothesis of that the first arm that is already chosen is indeed a Copeland winner.
The SCB algorithm is an explore and then exploit algorithm. The core of the proposed method is a PAC algorithm which identifies a Copeland winner with high probability. This PAC algorithm is repeatedly called with a exponentially increasing budget.
The paper is very dense. I have to admit that I did not check the whole appendix which consists of 20 pages!! I read through the experimental study which is enjoy to read, but unfortunately the authors had decided to defer it to the supplementary material. The analysis is very rigorous the main steps of the proof can be understood without the appendix. I liked the paper in general thus I have only some minor comments.
Comments:
In the pseudocode of CCB, I was not able to find out the role of line 9A. Why it is needed to reset these sets? Please make this issue clear in the text.
Appendix C.2, Table 1: % signe should be put after the number in the cell which is concerning to Random Walk  Borda
Q2: Please summarize your review in 12 sentences
The paper is very dense but at the same time it is wellwritten and technically sound.
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes two algorithms for the copeland dueling bandits problem.
The two algorithms are analyzed theoretically and some experimental results are contained in the supplementary material.
From a theoretical perspective, this paper contains some nice work. The algorithms are intuitive, although I'm surprised SCB works so well.
Perhaps the authors can discuss some intuition in more depth?
My main critiques come from the writing style.
Most notably, the second half of the main paper is not very readable, with statement of complicated lemma after complicated lemma.
I think the authors should strongly consider moving these all to the appendix, and replacing them with much coarser statements that provide more intuition.
This paper is generally lacking in providing intuition.
The description of Algorithm 2 is a bit weird, since it calls an algorithm in the appendix.
I don't really see the necessity of this.
I think the authors should find space to move some of the empirical results to the main body of the paper.
Perhaps also smaller/simplified versions of Figure 7 & 8.
More broadly, the comparison between Copeland winner and von Neumann winner is interesting.
It's quite surprising that they match so often in your experiments, since I believe (correct me if I'm wrong) the von Neumann winner is often not deterministic.
Perhaps the authors could comment on this further.
Citation 19 needs to be updated.
*** RESPONSE TO AUTHOR REBUTTAL *** I thank the authors for their rebuttal.
I strongly encourage the authors to restructure their paper to be more clear.
Q2: Please summarize your review in 12 sentences
The paper gives a solid theoretical result.
I think the paper could be better written.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
General comments:
We would like to thank the
reviewers for their insightful comments. We're pleased that the technical
contribution was so well received. Regarding the issue of balancing
content in the main paper vs. the appendix, we'd like to point out that it
is straightforward for us to adjust this in the CRC. In particular, we
could easily: 1 Move most of the technical lemmas to the appendix and
replace them with more intuitive explanations of the results. 2 Move
some of the experimental results to the main paper. 3 Include in the
main paper a compare and contrast between the various dueling bandit
solution concepts.
Specific comments:
Reviewer 1:
a)
The goal of the paper is not to show that the Copeland winner is a
superior solution concept. On the contrary, many solution concepts have
good arguments in their favor, and the tradeoffs between them is a subtle
subject that has occupied the field of social choice theory for
decades.
That said, there is a clear argument in favor of the
Copeland winner over the Condorcet winner, namely that it doesn't require
assuming a Condorcet winner exists. Since this assumption was the main
barrier to the practical application of Condorcet methods, and we provide
the first efficient Copeland method, thereby removing this barrier, we
feel that the contribution of the paper is not incremental at
all.
We hope our "general comments" above address your comments (b)
and (c), as the experimental results clearly demonstrate the hazard of
using algorithms such as RUCB that rely on the Condorcet
assumption.
Reviewer 2: Line 9A in Alg 1 is a cautionary measure
that resets the sets B^i if at some point it turned out that one of the
arms in B^i beats arm a_i by a wide margin instead of losing to it, which
is what the arms in B^i are supposed to do. We will clarify this in the
CRC. Also, thanks for pointing out the typo.
We hope our "general
comments" above address your other concerns.
Reviewer
3: Regarding the Copeland winner vs. the von Neumann winner, note that
the numbers in the second row of Table 1 in Appendix C.2 measure when the
support of the von Neumann winner intersects with the set of winners
according to each definition. Moreover, the following paragraph states
that in 94.2% of cases, the Copeland winners were completely contained
inside the support of the von Neumann winner.
We hope our "general
comments" above address your other concerns.
Reviewer 4: We are
unsure how to react to this review. In the absence of specific examples of
problems with the writing, we cannot respond meaningfully. However, the
other reviewers did not mention any problems with sentence construction.
Furthermore, the paper was written and edited by native English speakers.
While there may be a few inevitable typos, in general we are confident the
paper is grammatically correct.
Moreover, regarding the claim that
the results can be obtained "using the usual confidence bound approach for
preference learning with a slight modification", note that the direct
confidence bound approach leads to algorithms with regret O(K^2 log T). As
pointed out repeatedly in the paper, including in the abstract, the main
novelty of the algorithms and the analyses presented here is the fact that
they have a linear dependence on K, and this is achieved using ideas that
are orthogonal to the standard confidence bound approach. Indeed, people
familiar with the problem often find it surprising that a linear
dependence in K is even possible, so we believe that describing our
results as "slight modifications" is simply not justified, and urge
Reviewer 4 to discuss the matter with the other reviewers and
reconsider.
Reviewers 5 and 6: Thank you for your comments.
Please let us know if you have any recommendations for the improvement of
the exposition that are not addressed by the "general comments"
above. 
