NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
------------------------------------------------ Comments after reading rebuttal: I've read the rebuttal and appreciate that the authors created a new experiment demonstrating their technique, and which addresses my concern about there only being one demonstration of what is obviously a highly general method. Still, if you can do some version of the Mayan hieroglyphics, or work that example into the introduction, it would improve the paper even more. My score has been raised from 6 to 7. ------------------------------------------------ The paper proposes jointly learning a "perception model" (a neural network), which outputs discrete symbols, along with a "logic model", whose job it is to explain those discrete symbols. They restrict themselves to classification problems, i.e., a mapping from perceptual input to {0,1}; the discrete symbols output by the perception model act as latent variables sitting in between the input and the binary decision. Their approach is to alternate between (1) inferring a logic program consistent with the training examples, conditioned on the output of the perception model, and (2) training the perception model to predict the latent discrete symbols. Because the perception model may be unreliable, particularly early on in training, the logic program is allowed to revise or abduce the outputs of perception. The problem they pose -- integrating learned perception with learned symbolic reasoning -- is eminently important. The approach is novel and intuitive, while still having enough technical ingenuity to be nonobvious. The papers is mostly well-written (although see at the end for a list of recommended revisions). However, they demonstrate their approach on only one very small and artificial feeling problem, when the framework is obviously so generically applicable (for example, could you teach the system to count how many ducks vs geese are in a picture? How about detecting whether a pair of images contain the same object? Or, decoding a sound wave into phonemes and then a word; or playing tic-tac-toe from pixels; or inferring a program from a hand-drawn diagram, or a motor program from a hand-drawn character?). Despite the small-scale nature of the experimental domain, the experiments themselves are excellently chosen, for example they investigate a transfer-learning regime where either the perceptual model for the logical knowledge model is transferred between two variations of their task, and they compare against strong baselines. For these reasons, I weakly recommend that this paper be accepted. Although this paper gets many things right, the one problem is the experiment domain, which feels simple and "toy", and should be ideally supplemented with either another simple domain or a more "real" problem. The supplement does an above-and-beyond job of motivating your domain using Mayan hieroglyphics - definitely put this in the main paper. Can your system solve a version of those Mayan hieroglyphics? Figure 6 suggests that the learned logical theories could be human interpretable. This is great. Can you show some examples of the learned logical rules? Questions for the authors that were unclear in the paper (these should be addressed in a revision): How much of the logical theory can you learn? I browsed through your supplement, and it seems to show that you give a lot, but not all, of the symbolic concepts needed. The paper (lines 200-206) actually seem to imply that even more prior knowledge is given, but the supplement (and the source code to the extent that I browsed it) actually show that what you learned was more impressive (for example, you learned logical definitions of xor, while the text says you give it "a definition of bitwise operations", which I had incorrectly assumed included AND/OR/etc.). What if you were to replace the logical theory learner with something more like metagol - would you get an even more powerful symbolic learner by piggybacking on a system like that? Why do you need the MLP on top of the relational features? Why can't you just use the logical theory and p() at the final iteration? How is equation (3) not equivalent to maximizing the number of examples covered by $B\cup\Delta_C\cup p$ ? Line 148 makes it sound like not satisfying equation 1 means that Con=0, but if I understand it correctly you can both not satisfy equation 1 *and* have Con>0. In equation 5, you have a hard constraint on |\delta|. Why not have a soft constraint, and instead maximize $Con - \lambda |\delta|$, where $\lambda$ would be a hyper parameter? This would get rid of the $M$ hyperparameter. Also, what do you set $M$ to? Can you say something about RACOS? I and many readers will not be familiar with it. Just one qualitative sentence would suffice. Why is RBA so much harder than DBA? Is it only because of the perceptual front-end, i.e. they have the exact same logical structure but different images? Small comments: Figure 7 would benefit from titles on the graphs I wouldn't call your new approach "abductive learning", because "abductive reasoning" is already a widespread term and is easily confused with what you have named your approach. Something like Neurosymbolic Abduction seems both more accurate and has less of a namespace collision with existing terms.
Reviewer 2
* Approaches such as DeepProblog are referenced in the Related Works section, where it is stated that "the structures of first-order logical clauses exploited by these systems are assumed to have already existed" and "they could not obtain new symbolic knowledge during their learning process". I think this point would have deserved more attention, since the image-equation experiment was also conducted for DeepProblog. The difference between the two approaches is not entirely clear, since also the proposed setting relies on a given background logic (though this point is shown only in the supplementary material). Moreover, the experimental section does not properly describes which kind of new symbolic knowledge could be obtained from the proposed approach. * It is clear that fitting the paper within the page limit is not clear, but, in general, many interesting details seem to be in the supplementary material. * The comparison with the results achieve by humans is too briefly explained in the paper, and it is not clear which is the aim of such an experiment. * Some references are missing, about the combination of machine learning and symbolic reasoning in the abductive reasoning scenario. For example, see: -- d'Avila Garcez, Gabbay, Ray, Woods, "Abductive reasoning in neural-symbolic systems", Topoi, 2007 -- Hooldobler, Philipp, Wernhard, "An Abductive Model for Human Reasoning" AAAI Spring Symposium, 2011 Minor corrections: - Pag. 2, "Probabilistic Logic Programming[4]" -> space missing before reference [4] - Pag. 4, "which is consist with" -> "which is consistent with" - Pag. 4, "can consistent with" -> "can be consistent with" - Pag. 7, "For each length it containing" -> "For each length it contains" Originality: moderate, as the novelty with respect to recent neural-symbolic approaches is not entirely clear. Quality: high, sound methodology and experiments. Clarity: high, but could be improved in the experimental section. Significance: moderate, but the research area is very interesting, and this research direction is worth investigating
Reviewer 3
The overall problem of integrating neural and symbolic methods via combining deep learning with deductive, inductive, and abductive logical reasoning is an interesting and important problem, as the authors discuss. The framework that is introduced is interesting and novel and combines deep learning for perception with abductive logical reasoning to provide weakly-labelled training data for the deep-learning perception component. The technique is fairly precisely defined but it was a little hard following all of the notation and equations. It would have been nice to have a sample concrete problem as a example to illustrate the description of the notation and algorithm as it was being discussed in section 3. Waiting until section 4 to see how this abstract formalism was grounded in a concrete problem was a bit frustrating. I find the author's use of the name "abductive learning" for their framework overly broad and insufficiently precise, there has been a range of prior work on using abduction in learning. You should give your work a more descriptive and specific title focusing on the issue of integrating deep learning for perception with abduction and deduction for providing it training data. A lot of other existing work could be called "abductive learning" this term is too general. Particularly, although the paper reviews a number of related works combining machine learning and abduction, there is a range of work from the mid-90's on this topic that is not mentioned, including: Inductive Learning For Abductive Diagnosis, Cynthia A. Thompson and Raymond J. Mooney, In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), 664-669, Seattle, WA, August 1994. A collection of relevant papers on the topic from this era is this book: P. A. Flach and A. C. Kakas, editors, Abduction and Induction, 2000. Kluwer Academic This older work could be cited and discussed. Some details of the method I found confusing and/or limiting . When introduced, the abduced knowledge Delta_c is described as a "set of first-order logical clauses" when eventually it seemed to be clear that these could only be ground literals, not general clauses. The system for abductive logic programming can only abduce a set of ground literals as assumptions, not general clauses. Therefore, the only symbolic knowledge it can learn is specific ground literals, not general quantified clauses (e.g. Horn rules). The fixed knowledge, B, which apparently must be prespecified and cannot be learned or modified by learning, must be carefully crafted to allow the abducibles to represent the requisite symbolic knowledge to be learned as a set of ground literals. This seems very limiting and requires carefully hand-crafting the actual symbolic knowledge B which cannot be learned. How could B be automatically learned or revised? The parameter M, seems like a fairly ad-hoc hyper-parameter which must be manually tuned for a particular problem. How is this parameter set? The test problem of visual equation classification seems very contrived and very toy. It seems the knowledge based B had to be carefully crafted just for this specific problem and this knowledge cannot be learned or modified. It would be good to show the actual clauses in B, I would have liked to have seen application of the approach to a more realistic, real-world problem rather than this single, highly-artificial problem. Overall, I am mildly recommending accept since I think integrating neural and symbolic learning and reasoning is a very important problem and, despite its limitations, the paper presents an interesting new approach for doing this.