The paper proposes a new benchmark called gSCAN, for learning compositional and grounded natural language understanding. The argument is that to evaluate compositional generalization, situated language understanding (grounding) is necessary. It evaluates eight types of compositional generalization methods with the benchmark. The conclusion is that gSCAN can be used as a useful benchmark. Strength • A new benchmark dataset is created. The work is novel. • The arguments appear to be sound. • Experiments are conducted. Weakness • The paper is generally clearly written. There are places in which the writing can be improved, however. • The setting might be too narrow and artificial from the practical point of view. Discussions have been made among the reviewers. The conclusion is that the paper can be accepted. The authors are strongly suggested to address the writing issues pointed out by the reviewers. It is important to let people in other fields to better understand the content of the paper.