Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors present a joint framework for acquiring both visual concepts of objects and linguistic metaconcepts describing relationships between the visual concepts in visual reasoning tasks (images paired with question-answer pairs), and demonstrate that this works on synthetic and real-world image datasets. Part of the novelty of this work is in incorporating the metaconcepts into visual concept learning, and the proposed model somewhat mirrors how human learning is done. The approach to concept learning aids in zero-shot learning. Reviewers would like to see more careful and thorough experimental validation, and are concerned that the metaconcepts are not realistic.