Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Line 218: missing superscript for Hessian The idea of using algorithmic stability to characterize learnability is a standard one on learning theory and there have been other definitions of stability which have governed online learnability in the following works: Stability Conditions for Online Learnability by Stephane Ross, J. Andrew Bagnell, The Interplay Between Stability and Regret in Online Learning by Ankan Saha, Prateek Jain, and Ambuj Tewari, Policy Regret in Repeated Games by Raman Arora, Michael Dinitz, Teodor V. Marinov, Mehryar Mohri. Maybe it would be beneficial for the paper if the authors briefly discussed the different forms of stability proposed by the above works and how they can be related to Definition 2.1 and Definition 2.2. Overall the paper is clearly written and the intuition and sketches of proofs in the main paper give enough clarity so that the appendix is easy to read. The proofs seem novel and are really a great combination of simple but insightful ideas. I think the attempt at unifying proofs for first order regret bounds for many different types of online learning algorithms is an important one and can give more insights about novel regret minimizing algorithms.
This paper reconsidered online learning and bandits from the stability viewpoint by means of differential privacy tools. Though differential privacy is a concept about data security, its main idea is about the output stability, which itself is an important topic in many learning problems including online learning. Back to this paper, authors defined two types of stability based on original \delta approximation max divergence, which were then used to derive first order bounds under full information and partial information feedback respectively. In detail, for online smooth generalized linear loss function, authors obtained first order bound through the first type of stability. As a special case, first order bound for OLO is new. Besides, authors found a close relation between stability defined here with differential consistency, which is also a key concept in the analysis of online learning and bandits. For MAB, authors re-derived some zero order and first order bounds via a unified analysis. Further, a new perturbation based algorithm was proposed to achieve first order bound for bandits with expert advice. Though its theoretical guarantee is not optimal, the algorithm looks interesting and computationally efficient compared with previous optimal but sophisticated algorithm MYGA. The paper is well-written. The idea and analysis in this paper are new, and provide a new approach to analyze general online learning and bandits. Besides, considering some new results are obtained, I tend to accept this paper. However, in my personal view, results in this paper are not so strong in the following sense: 1. First order bounds under full information feedback (mainly Theorem 3.2 and Corollary 3.3) depend on dimension d. Dependency over dimension is a notorious phenomenon in differential privacy literature, so it doesn’t look so surprising that most results in this paper depend on dimension d. However, since first order bound in  (see reference below) doesn’t have a such dependency, and we do not need to protect privacy here, I wonder whether it is possible to get rid of the dimensional dependency here; 2. Algorithm 1 is actually computationally inefficient, since we have to solve an ERM at each round, is it possible to provide an elegant update (for example, like perturbed OGD) here? 3. In partial information setting, since we can only calculate the probability p_t approximately via Geometric Resampling, it would be better if authors could derive results under the consideration of approximation error. Overall, I tend to a weak accept about this paper.
Originality: The results of this paper unifies and generalizes previous works on this topic. Through drawing connections between online learning and differential privacy, this work makes it possible for scholars in either of these two communities to exploit readily available results in the other community for their research. The paper does a good job comparing and contrasting its results with previous works which highlight the significance of its contributions. Quality: All the claims of the paper are well supported by sound theoretical analyses. This paper is indeed a complete piece of work and it opens up a lot of future research directions to build upon results of this work. I made a high-level check of the proofs in the supplementary file and they seemed to be technically sound. Clarity: This paper is well-written overall. However, I think it's necessary to provide more explanation and intuition for the Obj-Pert and GBPA algorithms in the paper. Currently, it might be a bit hard for someone not familiar with the literature to fully understand the algorithms. Significance: The main contribution of this paper is to draw connections between online learning and differential privacy. Researchers could build upon the results of this work and pursue further research directions in either of these topics. Additionally, the paper provides a unifying framework to analyze a number of different problems that were studied separately in the earlier works.