Mr. HiSum: A Large-scale Dataset for Video Highlight Detection and Summarization

Sul, Jinhwan; Han, Jihoon; Lee, Joonseok

doi:10.52202/075280-1764

Mr. HiSum: A Large-scale Dataset for Video Highlight Detection and Summarization

Jinhwan Sul, Jihoon Han, Joonseok Lee

Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Datasets and Benchmarks Track

Bibtex Paper Supplemental

Abstract

Video highlight detection is a task to automatically select the most engaging moments from a long video. This problem is highly challenging since it aims to learn a general way of finding highlights from a variety of videos in the real world.The task has an innate subjectivity because the definition of a highlight differs across individuals. Therefore, to detect consistent and meaningful highlights, prior benchmark datasets have been labeled by multiple (5-20) raters. Due to the high cost of manual labeling, most existing public benchmarks are in extremely small scale, containing only a few tens or hundreds of videos. This insufficient benchmark scale causes multiple issues such as unstable evaluation or high sensitivity in traintest splits. We present Mr. HiSum, a large-scale dataset for video highlight detection and summarization, containing 31,892 videos and reliable labels aggregated over 50,000+ users per video. We empirically prove reliability of the labels as frame importance by cross-dataset transfer and user study.

DOI

10.52202/075280-1764

Abstract

DOI

Name Change Policy