Unsupervised learning of object structure and dynamics from videos
SUPPLEMENTAL VIDEOS
Video generation quality across models (Human3.6M)
Comparison of video generation quality across models. Marker on the left is green for observed frames and red for predicted frames. Columns show different examples.
Sample diversity (Human3.6M)
Videos in the same row were conditioned on the same oberved frames.
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
Example 7
Example 8
Example 9
Example 10
Keypoint manipulation (Human3.6M)
Keypoints for each limb were manually identified based on the left-most image. Keypoints for a single limb were then manipulated by rotating them around the joint of the limb, while holding the other keypoints static. Columns shows different examples.
Video generation quality across models (Basketball)
Comparison of video generation quality across models. Marker on the left is green for observed frames and red for predicted frames. Each column shows a different example.
Action-conditional video generation quality (DMCS)
Video generation quality for the DeepMind Control Suite dataset. A single model was trained on data from all tasks. Columns show different examples.