NeurIPS 2020

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms


Review 1

Summary and Contributions: This paper considers online convex optimization with switching costs and studies how to reduce the impact of prediction errors on performance. A new algorithm, Receding Horizon Inexact Gradient (RHIG) is presented and regret bounds are derived under a general model of prediction errors. Additionally, numerical examples in the context of a quadrotor tracking problem are included.

Strengths: This is a theoretical paper that makes a clear contribution to a growing literature. It is relevant to NeurIPS and bridges control and online learning. The topic of predictions in online optimization has been studied a lot in the past few years at the intersection of control and learning, and typical papers use very idealistic models without realistic assumptions about errors. This paper considers a general error model and is still able to derive strong analytic results.

Weaknesses: The main contribution of the paper is the theoretical results and analysis and the paper provides more general results in an more realistic model than the prior literature. However, it was difficult for me to understand the technical novelty in the proofs. In particular, the algorithmic idea is related to prior work that mixes RHS and gradient algorithms, e.g., [12]. I am interested to have the authors clarify what new ideas/techniques in the proof enable the new results?

Correctness: I have verified the provided proofs.

Clarity: The paper is clear, but some details in the presentation can be improved. -- It is useful to highlight more clearly the novelty of the techniques used in the analysis. -- More discussion of the differentiation between RHIG and other algorithms in the literature would be helpful. There is a growing zoo of related algorithms and understanding what is crucial for the noisy predictions is valuable.

Relation to Prior Work: The relationship to prior work is discussed clearly, albeit concisely. If space allows I would suggest including additional context about the related work.

Reproducibility: Yes

Additional Feedback: Thank you for your author response. I have read it and it clarified a few issues for me around the prediction model.


Review 2

Summary and Contributions: The paper studies the case of strongly-convex online optimization with switching cost and where the learner has access to multi-step ahead predictions. Namely, at each round, the learner receives inaccurate prediction of all future losses. The paper proceeds in giving an algorithm (RHIG) that provides bounded policy regret (against any non-stationary benchmark) that scales with the variation of the losses. The algorithm sets a parameter W of how many lookahead steps it utilizes and the regret bound shows that W trades off between the variation of the losses and the accuracy of the predictions. In particular, the regret bound decays exponentially in W. The authors conclude by evaluating their algorithm on a synthetic experiment.

Strengths: The paper introduces a new model for online optimization.

Weaknesses: The main body of the paper does not include any explanation about the regret bound. The authors never explain why the regret bound decays exponentially in W nor give any other intuition on the algorithm.

Correctness: Given that there is no explanation of the analysis in the main body of the paper, that makes it difficult to verify the correctness of the proofs.

Clarity: I found the paper to be written okay for the most part, though it obscures much of the technical details (see 'weaknesses').

Relation to Prior Work: This work extends the RHGD algorithm of Li et al. to the case where future predictions are inaccurate.

Reproducibility: Yes

Additional Feedback: Here are some references that I think were missed: 1. "Online Learning for Adversaries with Memory: Price of Past Mistakes", Anava et al. See Section 6. This work addresses a model similar to the authors' stochastic model in the context of multi-step prediction. 2. "Online Learning with Predictable Sequences", Rakhlin and Sridharan. This work addresses the case where the learner has information on the next time step and gives a regret bound that scales with the accuracy of the predictions. -- After rebuttal -- The rebuttal did not alleviate concerns that I had with the papers writing and the readers' ability to validate the main results. Specifically: the explanation of the proof and the main theorem statement, and the lacking comparison to previous work. I am therefore keeping my score.


Review 3

Summary and Contributions: This paper considers smoothed online convex optimization (SOCO) with multi-step-ahead predictions, where in each round t the learner can get access to a series of noised predictions of the stage cost functions in the following rounds, and the prediction errors can be either fixed or stochastic. The authors propose a variant of the RHGD algorithm to solve this problem. Theoretical analysis shows that the dynamic regret of the proposed method is related to the function variations and perdition errors. Numerical experiments demonstrate that the proposed method outperforms previous algorithms when the prediction errors are large.

Strengths: 1. The problem is well-motivated. Previous work on SOCO [14] assumes that the in each round t the stage functions in the following W rounds can be precisely revealed to the learner, which is unrealistic and hinders their applications to many domains. In this paper, the authors successfully relaxed this assumption by proposing a new model where the learner can only get access to a noised version of the future stage cost functions. 2. The authors successfully proposed a variant of RHGD to solve this problem. They provide a dynamic regret bound for the proposed algorithm. Moreover, the numerical experiments show that the proposed method outperforms previous algorithms when the prediction errors are large.

Weaknesses: Significance: 1. This paper assumes that all the stage cost functions share the same form, and the function form is known to the learner before the whole iteration process, which seems to be a very limited setting. In previous work (e.g. [14]), the type of loss functions can be different in different rounds, and in round t only the types of the stage cost functions in the following W rounds are revealed to the learner. 2. This paper and [14] assume the stage cost functions are both smooth and strongly convex, which is also a very restrictive assumption. It would be nice if the authors could give more examples of loss functions (other than quadratic functions) which satisfy these assumptions. 3. I think the theoretical results are not very clear. For example, in Corollary 1, it is hard to tell what is the exact order of the regret bound. It would be better if the authors could give more specific examples of delta, in which cases the regret bounds are sublinear (with respect to V_T and T). Novelty: It seems to me that both the proposed algorithm and its theoretical analysis are largely based on those in [14]. The main theoretical challenge is how to deal with the prediction errors, and according to Lemma 4 and Lemma 5, it seems that the error terms can be handled pretty straightforwardly. It would be nice if the authors could highlight the main theoretical difficulties in the main paper. -----------------------Post Rebuttal--------------------------- The authors cleared most of my concerns in the retuttal. I am happy to raise my score.

Correctness: I have read the main paper and made high level checks of the proofs, and I didn’t find any significant errors.

Clarity: The paper is generally well-written and structured clearly. However, as mentioned above, I think the theoretical results (dynamic regret bounds) are not very clear. Perheaps the authors can add more discuessions and give more examples on this problem.

Relation to Prior Work: The relation to prior work is clearly discuessed in general. However, it seems that the proof of Theorem 1 is largely based on the proof of Theorem 1 in [14]. It would be better if the authors can make it more clear in the main paper and the appendix.

Reproducibility: Yes

Additional Feedback: In Corollary 3, for small W (say W=1), is the second term linear with respect to T?


Review 4

Summary and Contributions: The paper introduce a general online convex optimization algorithmic framework that can handle additional switching costs. The algorithm leverage long-term prediction to improve online performance. The tradeoff between accuracy that longer-term prediction provides and the larger error of lone-term prediction is crucial, and the paper incorporate it as a parameter to tune. Both theoretical regret bounds and empirical demonstration are provided to support the algorithm.

Strengths: Online convex optimization is extensively studied in the literature. Adding a switch cost is less studied but has a wide range of application. I think this paper provides a general algorithmic framework to understand this problem better.

Weaknesses: The key weakness of this paper is the comparison to other possible approaches. The very first baseline the paper need to compare is RHGD algorithm, which they does not explicitly compare their regret bound and empirical result. Since RHIG is the natural extension of RHGD, I think a direct comparison to show the benefit is important, otherwise the novelty is not enough. Another line of research is reinforcement learning, which we can take the switching cost naturally as the reward given after applying action to the current state. I don't find a detailed discussion of this line of work in the paper.

Correctness: I take a quick look at the theory and seems the claim the proof are sound. The numerical experiment methodology is also standard.

Clarity: I enjoy reading most of the sections, the only confusing part for me the difference between RHIG and RHGD, which I only find a few line (126-128) to describe their difference. I am not fully convinced that their difference is so significant without showing the exact comparison both theoretically and empirically.

Relation to Prior Work: As I mentioned in the discussion on weakness part, I think the paper need to add more detailed discussion with RHGD and reinforcement learning.

Reproducibility: Yes

Additional Feedback: