NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Overall, the paper is clearly written, with issues and limitations encountered of data collection clearly described. The issue of biased dataset is an important one and it is good to see efforts to tackle this. The main significance of the paper is to point out the importance of controlling for biases in the data used for training machine learning models, and perhaps inspires others to do more controlled data collection. The framework/data is of some interest to researcher working on object recognition. Strengths: - Clearly written and well organized. - The data collection was thorough and careful, with manually verification of final data. Weaknesses: - The controls is limited and missing important variation such as occlusion and clutter. It also introduces biases of its own (for instance, to achieve the different rotations, some of the objects are unnaturally positioned or held by a person). - The experiments conducted on the dataset do not reveal any special insight on how to improve the models or if any of the detectors are more robust than any other. ---- Post rebuttal: I found the rebuttal to be thoughtful. The dataset, and more importantly, careful thinking about biases, is of benefit to the community. I would be happy to see the paper accepted.
Reviewer 2
Possible typo on page 7, section 4.2. “Object detector performance” - did you mean to say “object recognition”? This is an interesting paper. While the observation that ImageNet images are not representative of many task-specific computer vision problems is fairly well-known (Torralba & Efros 2011), this work does an important job of evaluating generalization to a specifically controlled set of image manipulations. By gathering a specifically annotated dataset, this paper also helps reduce overfitting to confounders that we might not want our object recognition models to pay attention to (e.g. background context). One nice experiment in this paper was revealing the breakdown of performance gap induced by background, object rotation, and viewpoint, respectively, and crucially, showing that if the right conditions are chosen, object recognition performance is restored (thus removing the possible explanation that the drop in performance is primarily caused by some other variable such as lighting conditions). ImageNet was constructed in an era where “in-distribution” generalization was quite difficult, i.e. models were not very good even when they had access to the “brittle priors”/contextual confounders. Thus, it does serve its purpose for validating in-distribution “generalization” to a wide variety of classes, even though the evaluation set still comes from “aesthetic images” (Recht et al). Till evidence shows otherwise, I don’t consider ObjectNet to be a *better* evaluation than ImageNet, unless the authors demonstrate that ObjectNet evaluation scores are more accurate by the metrics that ImageNet test accuracy cares about too. One experiment would be measuring ImageNet eval accuracy on a model trained using ObjectNet. Even if this is not a major motivation of the paper, I am still curious what the numbers look like to get a qualitative understanding of the diversity in this dataset. Finally, it remains to be seen whether performing model selection / optimization against this metric results in models that generalize well to object rotations in, say, a factory setting. Software like ARKit (iOS) or ARCore (Android) could be used to assist the crowdsourced human operators in collecting much more accurate object poses / camera poses than relying on the human to position the camera and object accurately.
Reviewer 3
The paper makes an interesting and timely contribution in investigating controlled dataset collection, and the impact of different axes of variation on object detection. In general, the community agrees on the importance of these questions, but there is very little work done to provide answers. As such, the originality and significance of the work is high. Clarity of the paper is also good, and release of the dataset and code should help with reproducibility. *** Post-rebuttal comments After reading the other reviews and the rebuttal, I am more convinced that this paper should be accepted. The rebuttal addressed concerns in a thoughtful and concrete manner.