Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This work proposes algorithms for the online-within-online meta-learning setting as oppposed to the more prevalent statistical setting. In this particular meta-learning setting tasks arrive sequentially manner (outer loop) and then the learning per task itself happens in an online fashion. The aim is to have low average regret over tasks. The inner loop optimization is done via Online Mirror Descent (OMD). The inner algorithm design is carefully chosen to provide good approximations of (sub)-gradients of the outer meta objective. Specifically two nested online primal-dual online algorithms are utilized. This algorithmic framework is then extended from the adversarial to the statistical setting by two nested online-to-batch conversions. Experiments on synthetic data and on the movielens-100k dataset which contains the rationgs of different users to different movies are presented where the proposed algorithms perform significantly better as a function of number of training tasks than treating the tasks independently (ITL). In the movielens-100k dataset each of the 943 users' ratings are considered a separate task. Comments: While I will leave the judgement of theoretical novelty to more qualified reviewers I have a number of more practical questions: - Does this framework extend to meta-reinforcement learning settings trivially? (the online feedback is episodic sparse reward) or does that require fundamental change in viewpoint? - Minor nitpick: While the experiments on movielens dataset is a good one and I understand that scope of the paper is to study OWO setting theoretically, are there fundamental challenges to scaling up to say supervised datasets in rich observation spaces (meta-datasets of images as proposed in Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples by Zhu et al. recently)? - Minor nitpick: The relegation of related work to supplementary is a bit odd. It actually served to better situate the paper in light of related work and especially with respect to the batch statistical setting papers and other online meta-learning settings. Update: Thanks to the authors comments on extensions to reinforcement learning. I have maintained my good scores.
In this paper, a primal-dual online learning framework is proposed to deal with the online within online meta-learning problem. This setting is important in practice since it can be used to model the life-long learning scenario naturally. Comparing to recent advances on this topic , this paper proposes an alternative primal-dual view, providing a more general algorithmic framework and more refined theoretical results. The contribution is somewhat significant to this point for providing novel tools for future researches. On the other hand, as the primal-dual approach is classical for online learning, I expected deeper insights for thinking meta-learning under this perspective instead of a direct application of the theoretical tools. The discussion towards this direction is limited unfortunately. It seems that the proposed approach is more a theoretically guaranteed solver for classical objectives instead of a novel view of the original problem. While overall, I still think the paper proposes a good alternative to previous works thus I lean towards acceptance. Further comments: In the current version, the discussions of the related work are put in the supplementary materials. I strongly suggest putting them back in the main paper, especially for the overview of the other works under the online-within-online scenario for better comparison.  G. Denevi, C. Ciliberto, R. Grazzi, and M. Pontil, Learning-to-learn stochastic gradient descent with biased regularization. ICML 2019.  C. Finn, A. Rajeswaran, S. Kakade, and S. Levine. Online meta-learning. ICML 2019.  M. Khodak, M.-F. Balcan, and A. Talwalkar. Provable guarantees for gradient-based meta-learning. ICML 2019. ------ after rebuttal The author response does not change my evaluations very much. The main contribution lies in proposing the theoretical analysis of the online-within-online meta-learning setting from the online primal-dual framework, which is a good alternative comparing to related work.
In terms of clarity, this submission is fairly good at writing since I find it's not hard to follow this submission, the notations are clear. But I'm not quite sure about the significance of this submission since I only literally checked proof in it and is not very familiar with learning theory literature.