Review for NeurIPS paper: Inductive Quantum Embedding

NeurIPS 2020

Inductive Quantum Embedding

Review 1

Summary and Contributions: The paper extends a technique for geometrical embedding of logical hierarchies, known as 'quantum embedding' introduced by [Garg 2019]. Quantum Embedding encodes entities as vectors in a high-dimensional space, and concepts as linear subspaces, where the probability that an entity belongs to a subspace is proportional to the squared norm of it's projection. This principle mimics the principle of measurement in quantum mechanics, which leads to the name of the procedure (Note that this is a completely classical algorithm and does not have any relationship to quantum computing). Furthermore, it is desired that the subspaces obey certain relationships that would encode their logical structure, for example the negation of a subspace is it's orthogonal complement, the AND is it's intersection and the OR is it's vector sum. The main question that is addressed in the paper is how one can construct embeddings that approximately satisfy these constraints. In the original work of [Garg et. al 2019] the desired properties are encoded in a non-convex loss function that is minimized by SGD. The current paper reformulates this optimization problem and claims 2 main benefits over previous work. 1) The procedure is inductive and provides and allows the embedding of a new test point to be computed incrementally. 2) An alternating minimization method is developed for optimizing the loss function, that is empirically substantially faster than the scheme developed in [Garg 2019]. The scheme is empirically tested on a task for Fine-Grained Entity Type Classification and achieves state of the art accuracy on benchmark test sets, and close to the state of the art in F1 and F2, while running much faster than the previous procedure of Garg et. al.

Strengths: Quantum Embedding seems like a promising approach and thus improvements to the pipeline for computing it are quite valuable, especially given the non-convex regime of optimization. The ability to incrementally compute test embeddings is also a massive improvement at inference time.

Weaknesses: The state of the art results accuracy presented seen to be a feature more of the quantum embedding approach than the exact improvements presented here. It would be nice to have an accuracy comparison with the original method of Garg et. al. (EDIT: In the author feedback it is pointed out that such a comparison does exist in the paper. I have increased my score to reflect this.) The observations made regarding the geometry of the computed embeddings are not very surprising or informative. Rotational invariance of the embedding is almost necessitated since every quantity of interest in the embeddings is rotationally invariant. The discussion at the end of Section 3, mostly contributes to understanding the role of various optimization terms and hyperparameters rather than the embedding itself.

Correctness: I did not find any issues with correctness.

Clarity: The paper is mostly clear but can be a little difficult to parse for someone new to the fields of Knowledge Representation, and the associated tasks in machine learning and NLP. A gentler introduction to these would have been appreciated.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This submission is continued research on the use of quantum logic inspired embedding for knowledge representation (KR). It provides two improvements over the prior-art [Garg et al, NeurIPS 19]: (1) allowing inductive reasoning of quantum embedding; (2) a faster training method (via empirical comparison).

Strengths: The new inductive reasoning functionality is highly desirable for KR. The formulation of the Inductive Quantum Embedding problem is nice and intuitive. The empirical evaluation part is also well designed.

Weaknesses: Although one can intuitively understand a few nice properties of the Inductive Quantum Embedding (IQE) problem, it is not clear whether the current formulation is the only choice. Since the evaluation of this optimization formulation, especially its efficiency, is fully empirical, it would be more convincing if one can show other simple alternatives of IQE is less promising.

Correctness: The theoretical part looks sound although the reviewer hasn’t checked all the details. The empirical part is solid.

Clarity: Mostly well written.

Relation to Prior Work: It would be better if authors could elaborate more about the tasks implemented in [Garg et al 19]. The review had to read the original paper to see why inductive reasoning was not implemented in [Garg 19]. Post-Author-Response: thanks for promising to include more details about previous work.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: The paper proposes two advancements over Quantum Embedding: making it inductive instead of transductive and making it faster. The authors show an application to fine-grained entity type classification. In this case, the authors compute the quantum embeddings of the entities in the knowledge base and then train a NN to predict the embeddings. This NN is then used at test time to produce the embedding of the test entity that is used to determine class membership. Differently from the original QE paper, they do not use SGD to compute the embeddings but an alternating optimization algorithm. The author show that their algorithm is 9 times faster than SGD. Application to the FIGER dataset shows that the approach can achieve the best accuracy, even though the second best F1.

Strengths: The authors tackle an interesting problem, that of embedding entities of knowledge bases in a way such that the logical structure of the kb is preserved. Quantum embedding is shown to be more accurate that for example Glove embedding (figure 3 and 4). The proposal makes it inductive permitting its use to classify unseen entities. Moreover, the proposal is faster than the previous algorithm.

Weaknesses: While the algorithm is shown to be faster than QE, the comparison does not take into account the time for training the NN. The performance on FIGER are second best for F1, even though the method is more generally applicable than the best one for F1.

Correctness: The theoretical claims seem sound, the main proofs are in supplementary material The proofs are quite involved, I could follow the general reasoning but sometimes not the details.

Clarity: The paper is well written, the concept are clearly explained.

Relation to Prior Work: Prior work seems to be taken correctly into account.

Reproducibility: Yes

Additional Feedback: Please report also on NN training time. --------------------- After reading the other reviews and the feedback, I believe the authors have satisfactorily addressed the comments so I will keep my score