NeurIPS 2020

Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding

Review 1

Summary and Contributions: The paper proposed SRAP, Search Recurrent Architectures as the Path-interstellar, which defines the path interstellar as a recurrent neural architecture search problem for the short-term and long-term information to learn from the relational path.

Strengths: 1. The paper analyzes the difficulty and importance of using the relational path to learn the short-term and long-term information in KGs. 2. The paper proposes a domain-specific search space for SRAP. 3. The paper designs a Hybrid-search algorithm to search efficiently.

Weaknesses: 1. Section 3.2.2 "Hybird-search Algorithm". Should this be "Hybrid-search Algorithm"? 2. As the proposed method needs to search through paths for long-term information, it's like to find conflict facts or relations. How to deal with the conflict is not mentioned in the paper. 3. It would be interesting to include 2 extra methods, SRAP with only macro and SRAP with only micro, to illustrate the effectiveness of macro/micro split.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Some relevant work was not compared.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper proposes to search the "Path interstellar" (a recurrent architecture) for learning representations in knowledge graphs. It solves the problem of learning short/long-term information on a relational path, which is extremely important for representing KGs. The path interstellar can adapt well to different tasks as indicated by the thorough experiments based on the well-defined search space and the novel hybrid search algorithm.

Strengths: - Clear motivation. The motivation for defining “path interstellar” is strong and clearly stated. By comparing the learning ability of triplet-based, path-based, and GCN-based methods, the path interstellar (Definition 1) is proposed as the basic model to learn from KGs. This motivation has also been verified by a case study on synthetic data (experiments in section 4.2). - Domain-specific and well-defined search space. The authors propose a novel recurrent search space specific for the path learning problem. The searched components are either motivated by the models in the literature (combinators, activations) or by the learning problem (connections). Then the short/long-term information can be controlled by search in the space. - Novel search algorithm. The hybrid search algorithm is a novel and attractive solution to solve the efficiency problem of stand-alone search and robustness of the one-shot search. The analysis of parameter-sharing and the hybrid solution may have a border impact on the broad neural architecture search area. - Extensive experiments. Apart from the case study in section 4.2, the authors show the superior of the searched models of SRAP on entity alignment and link prediction tasks in section 4.3. The efficiency of the hybrid-search algorithm is shown in section 4.4, as well as the time analysis in section 4.5. Additional experiments to understand the problem of parameter-sharing are also clearly given in the appendix B.2.

Weaknesses: - Even though the efficiency of the search algorithm is improved. The search cost still takes tens of hours. It seems that the search algorithm is conducted independently on each data set. However, the data statistics indicate some similarities among them, e.g., DBP-WD, DBP-YG. Considering the similarity should be helpful for this problem. - In the setup, the paths are sampled from the graph and then learned by the searched models. It is not sure whether the search problem will be influenced when the paths change.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes. - Theory: The difference with previous knowledge graph embedding models are shown in Table 1. And the difference with the previous NAS problem is discussed in section 3. - Practice: The empirical results are compared with both embedding models in section 4.3 and NAS algorithms in section 4.4.

Reproducibility: Yes

Additional Feedback: Overall, this paper is clearly written, well-motivated, and strongly novel work to learn from knowledge graphs. Here are some points and questions that could make further improvements. 1. It is well-known that the natural policy gradient is costly. The authors can add some discussion on the efficiency problem of conducting the natural gradient algorithm. 2. Path ranking algorithm in “Random Walk Inference and Learning in A Large Scale Knowledge Base” is a well-known method to learn paths in knowledge graphs. It’s better to include this paper as a reference. 3. In the experiment setup, why not sample the path and train the model simultaneously? 4. If the graph evolves, e.g., adding edges, entities, or relations, can this algorithm adapt to the new one? ----------------------------- I have read the author's rebuttal, and decide not to change my score.

Review 3

Summary and Contributions: The paper presents a neural architecture search (NAS) model for path-based KG embedding. The authors also propose a hybird-search algorithm to search optimal connection patterns from input path embeddings to the output representations. The novelty of this model lies in that it can adaptively use different network architectures to model short-term or long-term dependency in KG paths. The experiments demonstrate the effectiveness of the proposed KG embedding model and search algorithm.

Strengths: To me, the proposed neural architecture search (NAS) model for path-based KG embedding is interesting and novel. It is also relevant to NeurIPS. The experiments are extensive, including two popular KG embedding tasks on the corresponding benchmark datasets. The results show that the proposed model outperforms the SOTA path-based KG embedding models on both tasks.

Weaknesses: It seems that the proposed model does not do well in terms of efficiency. It costs a lot of time for the entity alignment and link prediction tasks on the small benchmark datasets. How does it scale to large datasets? I think this is the biggest weakness of the proposed model.

Correctness: Overall, the claims and the proposed method are reasonable. The experiments are conducted on two popular KG embedding tasks (link prediction and entity alignment) using benchmark datasets. One concern I have regarding the experiments is that some new models for link prediction or entity alignment, e.g., CompGCN (although it is not path-based model), are proposed in the recent years and it would be better to add them for comparison to make the paper more convincing.

Clarity: Overall, this paper is well written. But some sentences are somewhat difficult for me to understand. For example, lines 31-32 say "based on the relational paths, the triplet-based models work on every triplet (s_i, r_i, o_i) individually". What does it mean? It would be better to clarify such sentences.

Relation to Prior Work: Yes. The authors claim that the paper is the first work applying neural architecture search methods on KG embedding tasks, and the related work section also provides a detailed introduction about both sides. Table 1 also gives a clear comparison on how the proposed method differs from existing KG embedding models. Good Job! By the way, the citation numbers of R-GCN and GCN-Align in Table 1 are missing. Besides, I also know a related work AutoSF, which should be added for a discussion or comparison. Yongqi Zhang, Quanming Yao, Wenyuan Dai, Lei Chen: AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. ICDE 2020: 433-444

Reproducibility: Yes

Additional Feedback: Please see the detailed comments. === after rebuttal === I have read the authors' response. Thanks for addressing my concerns.

Review 4

Summary and Contributions: This research applied neural architecture search (NAS) to the path-based knowledge graph embedding problem, designed the NAS search space and search algorithm for the problem, and demonstrated the necessity and effectiveness of the algorithm through experiments.

Strengths: Due to the unique input characteristics of entities and relationships of path-based knowledge graph embedding problem, a specific recurrent function is necessary, such as the manually designed function RSN. The author claims that specific structure needs to be designed for specific KG problem, so the NAS method is suitable in this scenario. The performance achieved by the NAS algorithm in Entity Alignment and Link Prediction tasks exceeds most benchmarks. It seems that the paper is the first study to apply the NAS method to Path-based knowledge graph embedding problems

Weaknesses: The weakness of one-shot NAS methods in complex search space is well-known, and common solution is to trade off computational complexity and performance. The algorithm proposed in this paper seeks balance by allocating different resources in different subspaces of the search space, and claims that it can be generalized to other complex NAS problems, but the trade-off of computational complexity and performance has been explored a lot, for example, The NAS algorithm in this paper is not very novel actually.

Correctness: The authors claim that the NAS algorithm can achieve better results in the path-based knowledge graph embedding problem makes sense. The study supports this claim in section4.2/4.3 empirically. What is the difference between S1 and S2 in Section 4.2? Why do p2 and p3 behave more differently on S1 and behave the same on S2? In the comparison of section4.4, for DARTS, the gradient is obtained from loss on training set since validation metric is not differentiable. Is it fair to be compared with the algorithm optimized based on validation metric?

Clarity: The writing of the article is good. There are a few errors, such as “uptimize” on line 179, figure 2 on line 206 which should be figure 3, and so on.

Relation to Prior Work: The article compares its search space with previous works such as PTransE, Chains, RSN, etc., and shows the benefits of NAS method. If the article claims that the proposed NAS algorithm is sufficiently innovative, it’s necessary to be compared with more benchmarks elaborately. In addition, the content of this paper is almost completely consistent with

Reproducibility: Yes

Additional Feedback: