NeurIPS 2019

**Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center**

### Reviewer 1

This paper studies the problem of learning soft state aggregation of a Markov model, where there are r hidden meta states, each corresponds to a distribution over the observed state of the Markov model. Under the anchor state assumption, the authors propose an algorithm that provably learns the state aggregation model from the Markov chainâ€™s trajectory. They evaluated their algorithm on a Manhattan taxi-trip dataset which yields interesting discoveries. There has been lots of work on estimating the low rank transition matrix itself and on matrix factorization in the topic modelling setting, and this work seems to be connecting the two problems. The paper is presented well and easy to follow. I have the following questions regarding the novelty and impact of this paper. Questions/concerns: 1. As mentioned in line 97, there are lots of work on estimating the transition matrix of a Markov model. Will similar result as in this paper hold if one first applies such an algorithm, e.g [28], and then simply runs the algorithm for topic modeling/non-negative matrix factorization, e.g. [4]? If no theoretical guarantee can be show, maybe this can serve as a benchmark in the experiment? 2. The authors emphasize in line 273 that the major difference between this word and topic modeling is that the observations from a Markov chain trajectory are correlated while these are independent in the topic modelling setting. However, one could instead sample from the trajectory once in a while (only when the chain mixes). It seems to me that the problem reduces to a topic modelling problem while the sample size only shrink by a factor of tau_*. Can the authors comment on that? 3. Certain minimax lower bound is known for the factorization problem in the topic modelling setting [22], is it possible to obtain similar result here? I appreciate the author's clarification. My questions are addressed. I've updated my score to 6.

### Reviewer 2

Overall the paper has strong and interesting results to share. However, I would be happier if the authors improved the exposition of the material.

### Reviewer 3

The authors could do a better job pointing out the novelty of their techniques -- i.e., where these depart from standard spectral methods.