Reviews: Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

The paper formulates the problem of optimizing a strategy for crawling remote contents to track their changes as an optimization problem called the freshness crawl scheduling problem. This problem is an obviously important problem in applications like Internet search engine, and the presented formulation seems to give a practical solution to those applications. The paper presents an algorithm for solving the freshness crawl scheduling problem to optimality, assuming that the contents change rates are known. The idea behind the algorithm is based on the deep understanding of statistics and continuous optimization, and it seems to me that the contribution is solid (although I could not very all the technical details). For the case where the contents change rates are not known, a reinforcement learning algorithm is presented. Basically it estimates the rates as their MLEs. As a learning algorithm, the approach is not non-trivial, but I think it is reasonable. The convergence of the algorithm is guaranteed, and it is also valuable. The authors provided experimental results as well. They crawled 18,532,326 URLs daily over 14 weeks. It shows that the presented algorithm scales. This is an important feature because the web crawling requires processing a large number of sites. To conclude, the paper presents reasonable formulations and algorithms for the important practical problem. The technical contributions are solid from aspects of both theoretical optimization study and practical engineering work. The paper is written carefully and is easy to follow. I slightly worry that this paper is not very suitable for presentation at machine learning conferences (to my opinion, this paper is more suitable for data mining or web conferences), but I think that the paper is a good contribution even for Neurips. Update: After reading the author's rebuttal, I still maintain my score.

Reviewer 2

The paper presents the problem meticulously by defining the different components affecting the freshness of the crawler. In order to maximize the freshness, the problem is modeled as minimizing the cost of tracking defined through a harmonic staleness penalty function, where the authors elaborate on the choice of this harmonic function with sound reasons. The cost function is optimized according to how the change is observed in the source, whether complete or incomplete. The authors present two algorithms for each change type, supplemented with prepositions and their proof. These algorithms are cast into a model-based Reinforcement learning problem that tries to automatically learn the change/update rate of each source. The paper lacks organization and proper presentation. It relies on several details that are only presented in the supplement (extra ~15 pages). As is, the paper is only a problem description and presentation of a proposed approach. Hence, it is missing a conclusion, description of the dataset, details of the experiment and proper analysis.

Reviewer 3

In this article, the authors propose methodology for scheduling a web crawler. They frame their scheduler as characterized by the solution to an optimization problem (that characterizes the cost of staleness). They show how to determine parameters of the scheduler when parameters of the environment are known. In addition, they extend their algorithm to learn parameters of the environment, and optimize against them when they are unknown. The authors additionally provide optimality guarantees for their methodology. The article is generally very clear. The introduction and problem formulation do a great job of clearly presenting the problem. Near the end of the paper, I struggled with the exposition a bit more in the reinforcement learning section, though perhaps this section is necessarily dense (given space constraints of the paper, and the additional complexity of the section). It wasn't clear to me in the case of complete change observations if the method proposed was optimal, or if was only that the optimal p_w were selected for strategies within that class. Some clarification here would be great. It is noted that the first two problems are convex. The algorithms given solve each problem in O(|W|^2) [and it is stated that practically the run time is closer to linear]. It would be worth noting the run-time of a general interior point solver --- I imagine this is O(|W|^3). The empirical experiments seemed well thought out, and fairly illustrative.

Paper ID:	315
Title:	Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Reviewer 1

Reviewer 2

Reviewer 3