Reviews: Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

This paper proposes a mechanism for conditioning temporal convolutions on a sentence embedding in the context of aligning sentences with video segments. The reviewers agree that this is solid work with good experimental results. The novelty of the work appears limited to the context of the sentence grounding task, and as such is somewhat incremental. However the reviewers highlight the efficiency of the approach in terms of memory and computation, and feel the results will be of interest to vision and language researchers.

Paper ID:	294
Title:	Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos