NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 590 Online multiclass boosting

### Reviewer 1

This paper aims to develop online version of the multiclass boosting algorithm. The authors combine the ideas in  and  I order to develop their framework. Overall, the paper is well written and easy to follow. It also seems that framework is technically sound. However, there are certain parts of the paper that is not well-justified or motivated. I talk about them below: 1- The title of Section 3 is “an optimal algorithm”. It is not clear in what sense the developed algorithm in this section is optimal. There is no theory proving that the bounds obtained using the developed algorithm is optimal for online multi-class boosting. If mistakes bounds of O(T) (for example in case of 0-1 loss) is optimal for online multi-class boosting, it should be proved or there should be some reference to where is has already been shown. 2- Now, let's say it can be proved that such bounds are optimal for online boosting. What is the motivation behind online boosting when the bounds are suboptimal when compared to online learning with expert advice (O(log(T))? Why having multiple weak learners are useful at all when it doesn’t lead to better results in terms of (mistake) bounds compared to standard online learning models? 3- Given that in online boosting, you already have all the weak learners available, why do you still calculate the potential function sequentially, assuming you don’t have the knowledge of labels for any weak learner after the current weak learner i. This is reflected in designing the potential function in Equation (3) which results in computing the weights w^i(t) and cost matrix D_t^i sequentially. To me, it seems that computing the potential function by using the outputs of all weak learners is more precise and I am not sure why such an approach is not taken here. Will that result in better bounds? And here are some minor problems: Section 3.1: what do you mean by “For optimal algorithm, we assume that \alpha_t^{i}=1, \forall i,t”? The notation is sometimes difficult. As an example, it took me some times to understand s(r) in Section 3.1 refers to element r of vector s. I would have used s[r] instead of s(t) to prevent the confusion with a function. Section 3.2: “Under 0-1 loss” is repeated twice in 3rd sentence.

### Reviewer 2

This paper develops appropriate theory and proposes two algorithms for online multi class boosting. One particularly notable result is a weak learning condition for online multi class boosting. This paper would be a notable addition to this recently growing literature, therefore, I recommend acceptance. A few comments: 1. On page 3, in definition 1, it is worth commenting on whether there are any theoretical developments describing how S, the excess loss, decreases with the number of training examples. This would seem a basic requirement for weak learning in an online setting. 2. In the experimental section, it is worth noting that the comparison to the original online boosting algorithm is not quite fair since it was developed as a two-class algorithm. The experimental results on the online boosting algorithm are worth retaining because, as the authors point out, it is a good baseline.