Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors present an algorithm for leveraging noisy "weak" labels at multiple resolutions (e.g., frames/scenes/full-video) taking into account the sequential nature of labels as well as the correlation between labels in order to produce a combined label that is then used to train a downstream model. The authors provide theoretical guarantees that bound the convergence rate of the combining model parameters under reasonable conditions. This convergence rate can then be used in a standard way to guarantee the generalization of the downstream algorithm. Finally, the authors also present extensive experiments on real-world datasets where multi-resolution labeling could reasonably be used and show superior performance over three baseline methods, including Data Programming which the authors claim is state-of-the-art. I along with the reviewers agree that the design and analysis of the algorithm provides a significant contribution and that the empirical results demonstrate effectiveness. I recommend for acceptance.