NeurIPS 2019

**Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center**

### Reviewer 1

The paper is sound and technically correct. There are contributions but I would not term them as significant. I'm not familiar with standard data sets in weak supervision for sequential data and thus I cannot assess the relevance and soundness of the computational study. The improvement is definitely significant, to the extent that it leans towards "too good to be true." I think algorithms based on graphical models (variational inference, Gibbs) should be added as benchmarks. The paper is very well written; easy to understand and without mistakes as far as I'm concerned. Based on the title I was expecting a stronger connection with classification. After all, the end goal is classification. Computational experiments are definitely about classification and there is the loss function bound stated in 'End Model Generalization.' However, this is a bound on loss and not generalization to unseen data. Besides, the bound is a direct consequence of Theorem. In summary, a true generalization bound would be much more significant.

### Reviewer 2

The paper proposed one novel multi-resolution weak supervision problem for sequential data, which is quite interesting for both research and practical application. The problem formulation and key challenges are clearly demonstrated. For the proposed approach, the authors provide convincing explanations. Experimental resutls are also promising. For the weakness, see the following improvements suggestion.

### Reviewer 3

Overall, the paper introduces an interesting algorithm with detailed theoretical and experimental analysis. My only comments are minor: - the introduction of the full set of untied parameters in 2.2 then tying them in 2.3 seems cumbersome and a bit confusing on the first read - could the tied parameters be introduced directly instead? - line 147 references section 3.1.4 but there is no such section in the paper Update: The paper discusses the advantages of the proposed algorithm over Gibbs sampling approaches in terms of speed and convergence, and compares against a Gibbs baseline without sequential correlations, but it seems it should also be possible to create a Gibbs-based model that does consider sequential correlations. It would be interesting to see a comparison against this stronger baseline in terms of empirical task performance.