COHESIV: Contrastive Object and Hand Embedding Segmentation In Video

Shan, Dandan; Higgins, Richard; Fouhey, David

COHESIV: Contrastive Object and Hand Embedding Segmentation In Video

Dandan Shan, Richard Higgins, David Fouhey

Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

Bibtex Paper Reviews And Public Comment » Supplemental

Abstract

In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand's motion explains other pixels' motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets.

Abstract

Name Change Policy