This submission proposes a new benchmark for visual understanding of spatial object relationships in 3D. It initially received three reviews with all positive scores (8,6,7). The reviewers appreciated the scale and the efforts put in making this dataset balanced, but also noted that it is not completely realistic (only two objects in each scene). The rebuttal addressed other concerns of the reviewers. For these reasons, the AC's recommendation is to accept this submission as a spotlight.