NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper presents research on resource efficient video analysis. The reviewers appreciate the frame gating approach and solid methodology to perform a dynamic decision on inference-time resources that should be used for classifying an input video. The model is fully differentiable (via Gumbel-softmax), in contrast to RL-based approached for learning similar frame-skipping methods. The reviewers also note that the empirical evaluation is solid, with good comparisons to baselines/ablation studies. While there are some concerns regarding the magnitude of the contributions (e.g. relevance with LSTM-based models vs. other temporal deep learning architectures) and novelty, on balance it is a solid, well-written paper that makes a clear contribution to efficient video analysis.