We provide supplementary video outputs of several cases from our test set. 
Each video presents the input video frames with a set of pre-extracted point tracks that are used as input to our network and presented in corresponding colors (left side), and the output cameras and dynamic 3D structure (right side). 
The output camera trajectory is presented as gray frustums, whereas the current camera is marked in red.
The reconstructed 3D scene points are presented in corresponding colors to the input tracks. 
Note that the outputs presented in the videos were obtained at inference time, with a single feed-forward prediction, without any optimization or fine-tuning, on unseen test cases. 