Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors of this submission present an algorithm for online prediction of graph labelings (in fact switching labelings). Their algorithm is based on using standard techniques from the literature to obtain a mistakes bound whose main component is a non symmetric cost of switching that takes into account only experts entering the comparator sequence. The main original result of the paper seems to be in section 3.1., where the construction of the specialists and their updates and runtime is analyzed and the binary tree basis is introduced. It appears this is the main idea behind the improvements claimed in the paper.
This paper studies the problem of online prediction of binary labels in a graph. Nature first chooses a sequence of labelings of the graph and at each round, the learner receives a vertex, predicts its label and receives the true label. The labels may vary over time and the goal of the learner is to minimize his number of mistakes. The paper proposes an algorithm based on sampling a random spanning tree, linearizing it and combining a well-chosen basis of experts that predict constant predictions over clusters of vertices. An upper-bound on the expected number of mistakes is provided which depends on the regularity of the labelings over the graph and over time. Experiments on real data are finally provided with convincing results. Strengths: - The clarity and quality of the submission are ok. The results are interesting and the experiments are convincing. - I liked the fact that the upper-bounds smoothly depend on the variation of the labelings and that the labelings of the entire graph could be arbitrary given that they match the observed labels. Weaknesses: - To my opinion, the setting and the algorithm lack a bit of originality and might seem as incremental combinations of methods of graph labelings prediction and online learning in a switching environment. Yet, the algorithm for graph labelings is efficient, new and seem different from the existing ones. - Lower bounds and optimality of the results are not discussed. In the conclusion section, it is asked whether the loglog(T) can be removed. Does this mean that up to this term the bounds are tight? I would like more discussions on this. More comparison with existing upper-bounds and lower-bound without switches could be made for instance. In addition, this could be interesting to plot the upper-bound on the experiments, to see how tight is the analysis. Other comments: - Only bounds in expectation are provided. Would it be possible to get high-probability bounds? For instance by using ensemble methods as performed in the experiments. Some measure about the robustness could be added to the experiments (such as error bars or standard deviation) in addition to the mean error. - When reading the introduction, I thought that the labels were adversarially chosen by an adaptive adversary. It seems that the analysis is only valid when all labels are chosen in advance by an oblivious adversary. Am I right? This should maybe be clarified. - This paper deals with many graph notions and it is a bit hard to get into it but the writing is generally good though more details could sometimes be provided (definition of the resistance distance, more explanations on Alg. 1 with brief sentences defining A_t, Y_t,...). - How was alpha tuned in the experiments (as 1/(t+1) or optimally)? - Some possible extensions could be discussed (are they straightforward?): directed or weighted graph, regression problem (e.g, to predict the number of bikes in your experiment)... Typo: l 268: the sum should start at 1
The authors introduce a practical O(log n) per step online algorithm for predicting graph labelings by a novel combination/appropriate adaptation of online learning techniques and provide a mistake bound analysis and some experimental support. The paper is well written and clearly presented, and does a good job of surveying related work and putting their method into context. A clear accept. Minor comments: - line 239: earlier work [Partition Tree Weighting, Veness, White, Bowling, Gyogy, DCC, 2013] also used a similar binary temporal discretization to provide a O(log n) algorithm for non-stationary time series modelling.