Reviews: Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

This paper investigates the problem of multi-round natural language image retrieval, using annotations from the Visual Genome dataset for training and evaluation. After feedback and reviewer discussion, this paper received final ratings of 6, 6 and 7. Despite some concerns about the use of non-sequential annotation data for a sequential task, the reviewers found the proposed model to be generally sound and the experimental evaluation convincing, and the AC agrees. However, we would encourage the authors to pay close attention to the reviewer feedback when preparing the final paper version. In particular, the author feedback committed to including the additional baselines requested by R1, so these should be included in the final version as promised.

Paper ID:	1524
Title:	Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries