{"title": "Embedding Logical Queries on Knowledge Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 2026, "page_last": 2037, "abstract": "Learning low-dimensional embeddings of knowledge graphs is a powerful approach used to predict unobserved or missing edges between entities. However, an open challenge in this area is developing techniques that can go beyond simple edge prediction and handle more complex logical queries, which might involve multiple unobserved edges, entities, and variables. For instance, given an incomplete biological knowledge graph, we might want to predict \"em what drugs are likely to target proteins involved with both diseases X and Y?\" -- a query that requires reasoning about all possible proteins that might interact with diseases X and Y. Here we introduce a framework to efficiently make predictions about conjunctive logical queries -- a flexible but tractable subset of first-order logic -- on incomplete knowledge graphs. In our approach, we embed graph nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space. By performing logical operations within a low-dimensional embedding space, our approach achieves a time complexity that is linear in the number of query variables, compared to the exponential complexity required by a naive enumeration-based approach. We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.", "full_text": "Embedding Logical Queries on Knowledge Graphs\n\nWilliam L. Hamilton\n\nPayal Bajaj Marinka Zitnik Dan Jurafsky\u2020\n\nJure Leskovec\n\n{wleif, pbajaj, jurafsky}@stanford.edu, {jure, marinka}@cs.stanford.edu\n\nStanford University, Department of Computer Science, \u2020Department of Linguistics\n\nAbstract\n\nLearning low-dimensional embeddings of knowledge graphs is a powerful ap-\nproach used to predict unobserved or missing edges between entities. However,\nan open challenge in this area is developing techniques that can go beyond simple\nedge prediction and handle more complex logical queries, which might involve\nmultiple unobserved edges, entities, and variables. For instance, given an incom-\nplete biological knowledge graph, we might want to predict what drugs are likely\nto target proteins involved with both diseases X and Y?\u2014a query that requires\nreasoning about all possible proteins that might interact with diseases X and Y.\nHere we introduce a framework to ef\ufb01ciently make predictions about conjunctive\nlogical queries\u2014a \ufb02exible but tractable subset of \ufb01rst-order logic\u2014on incomplete\nknowledge graphs. In our approach, we embed graph nodes in a low-dimensional\nspace and represent logical operators as learned geometric operations (e.g., transla-\ntion, rotation) in this embedding space. By performing logical operations within a\nlow-dimensional embedding space, our approach achieves a time complexity that\nis linear in the number of query variables, compared to the exponential complexity\nrequired by a naive enumeration-based approach. We demonstrate the utility of\nthis framework in two application studies on real-world datasets with millions\nof relations: predicting logical relationships in a network of drug-gene-disease\ninteractions and in a graph-based representation of social interactions derived from\na popular web forum.\n\n1\n\nIntroduction\n\nA wide variety of heterogeneous data can be naturally represented as networks of interactions between\ntyped entities, and a fundamental task in machine learning is developing techniques to discover or\npredict unobserved edges using this graph-structured data. Link prediction [25], recommender\nsystems [48], and knowledge base completion [28] are all instances of this common task, where the\ngoal is to predict unobserved edges between nodes in a graph using an observed set of training edges.\nHowever, an open challenge in this domain is developing techniques to make predictions about more\ncomplex graph queries that involve multiple unobserved edges, nodes, and even variables\u2014rather\nthan just single edges.\nOne particularly useful set of such graph queries, and the focus of this work, are conjunctive\nqueries, which correspond to the subset of \ufb01rst-order logic using only the conjunction and existential\nquanti\ufb01cation operators [1]. In terms of graph structure, conjunctive queries allow one to reason\nabout the existence of subgraph relationships between sets of nodes, which makes conjunctive queries\na natural focus for knowledge graph applications. For example, given an incomplete biological\nknowledge graph\u2014containing known interactions between drugs, diseases, and proteins\u2014one could\npose the conjunctive query: \u201cwhat protein nodes are likely to be associated with diseases that have\nboth symptoms X and Y?\u201d In this query, the disease node is an existentially quanti\ufb01ed variable\u2014\ni.e., we only care that some disease connects the protein node to these symptom nodes X and Y.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: Two example conjunctive graph queries. In the boxes we show the query, its natural language\ninterpretation, and the DAG that speci\ufb01es this query\u2019s structure. Below these boxes we show subgraphs that\nsatisfy the query (solid lines), but note that in practice, some of these edges might be missing, and we need to\npredict these missing edges in order for the query to be answered. Dashed lines denote edges that are irrelevant\nto the query. The example on the left shows a path query on the Reddit data; note that there are multiple nodes\nthat satisfy this query, as well as multiple paths that reach the same node. The example on the right shows a\nmore complex query with a polytree structure on the biological interaction data.\n\nValid answers to such a query correspond to subgraphs. However, since any edge in this biological\ninteraction network might be unobserved, naively answering this query would require enumeration\nover all possible diseases.\nIn general, the query prediction task\u2014where we want to predict likely answers to queries that can\ninvolve unobserved edges\u2014is dif\ufb01cult because there are a combinatorial number of possible queries\nof interest, and any given conjunctive query can be satis\ufb01ed by many (unobserved) subgraphs (Figure\n1). For instance, a naive approach to make predictions about conjunctive queries would be the\nfollowing: First, one would run an edge prediction model on all possible pairs of nodes, and\u2014after\nobtaining these edge likelihoods\u2014one would enumerate and score all candidate subgraphs that might\nsatisfy a query. However, this naive enumeration approach could require computation time that is\nexponential in the number of existentially quanti\ufb01ed (i.e., bound) variables in the query [12].\nHere we address this challenge and develop graph query embeddings (GQEs), an embedding-based\nframework that can ef\ufb01ciently make predictions about conjunctive queries on incomplete knowledge\ngraphs. The key idea behind GQEs is that we embed graph nodes in a low-dimensional space\nand represent logical operators as learned geometric operations (e.g., translation, rotation) in this\nembedding space. After training, we can use the model to predict which nodes are likely to satisfy\nany valid conjunctive query, even if the query involves unobserved edges. Moreover, we can make\nthis prediction ef\ufb01ciently, in time complexity that is linear in the number of edges in the query and\nconstant with respect to the size of the input network. We demonstrate the utility of GQEs in two\napplication studies involving networks with millions of edges: discovering new interactions in a\nbiomedical drug interaction network (e.g., \u201cpredict drugs that might treat diseases associated with\nprotein X\u201d) and predicting social interactions on the website Reddit (e.g., \u201crecommend posts that\nuser A is likely to downvote, but user B is likely to upvote\u201d).\n\n2 Related Work\n\nOur framework builds upon a wealth of previous research at the intersection of embedding methods,\nknowledge graph completion, and logical reasoning.\nLogical reasoning and knowledge graphs. Recent years have seen signi\ufb01cant progress in using\nmachine learning to reason with relational data [16], especially within the context of knowledge\ngraph embeddings [6, 23, 18, 28, 29, 45], probabilistic soft logic [3], and differentiable tensor-based\nlogic [11, 33]. However, existing work in this area primarily focuses on using logical reasoning to\nimprove edge prediction in knowledge graphs [14, 13, 27], for example, by using logical rules as\nregularization [15, 20, 35, 37]. In contrast, we seek to directly make predictions about conjunctive\nlogical queries. Another well-studied thread in this space involves leveraging knowledge graphs\nto improve natural language question answering (QA) [4, 5, 47]. However, the focus of these QA\napproaches is understanding natural language, whereas we focus on queries that are in logical form.\n\n2\n\np1p2p3p4upvotedownvotesubscribebelongupvoteupvotebelongbelongbelongc1d1d2p1p2c1c2assocassocassocc\u21e43c\u21e42c\u21e43Pd1d2C?PC?uuC?.9P:upvote(u,P)^belong(P,C?)\u201cPredictcommunitiesC?inwhichuseruislikelytoupvoteapost\u201d\u201cPredictdrugsC?thatmighttargetproteinsthatareassociatedwiththegivendiseasenodesd1andd2\u201dtargettargettargetC?.9P:assoc(d1,P)^assoc(d2,P)^target(P,C?)\fFigure 2: Schema diagrams for the biological interaction network and the Reddit data. Note that in the Reddit\ndata words are only used as features for posts and are not used in any logical queries. Note also that for directed\nrelationships, we add the inverses of these relationships to allow for a richer query space.\n\nProbabilistic databases. Our research also draws inspiration from work on probabilistic databases\n[9, 12]. The primary distinction between our work and probabilistic databases is the following:\nWhereas probabilistic databases take a database containing probabilistic facts and score queries, we\nseek to predict unobserved logical relationships in a knowledge graph. Concretely, a distinguishing\nchallenge in our setting is that while we are given a set of known edge relationships (i.e., facts), all\nmissing edge relationships could possibly be true.\nNeural theorem proving. Lastly, our work builds closely upon recent advancements in neural\ntheorem proving [34, 43], which have demonstrated how neural networks can prove \ufb01rst-order logic\nstatements in toy knowledge bases [36]. Our main contribution in this space is providing an ef\ufb01cient\napproach to embed a useful subset of \ufb01rst-order logic, demonstrating scalability to real-world network\ndata with millions of edges.\n\n3 Background and Preliminaries\n\nWe consider knowledge graphs (or heterogeneous networks) G = (V,E) that consists of nodes\nv \u2208 V and directed edges e \u2208 E of various types. We will usually denote edges e \u2208 E as binary\npredicates e = \u03c4 (u, v), \u03c4 \u2208 R, where u, v \u2208 V are nodes with types \u03b31, \u03b32,\u2208 \u0393, respectively, and\n\u03c4 : \u03b31 \u00d7 \u03b32 \u2192 {true, false} is the edge relation. When referring generically to nodes we use\nthe letters u and v (with varying subscripts); however, in cases where type information is salient we\nwill use distinct letters to denote nodes of different types (e.g., d for a disease node in a biological\nnetwork), and we omit subscripts whenever possible. Finally, we use lower-case script (e.g., vi) for\nthe actual graph nodes and upper-case script for variables whose domain is the set of graph nodes\n(e.g., Vi). Throughout this paper we use two real-world networks as running examples:\nExample 1: Drug interactions (Figure 2.a). A knowledge graph derived from a number from public\nbiomedical databases (Appendix B). It consists of nodes corresponding to drugs, diseases, proteins,\nside effects, and biological processes. There are 42 different edge types, including multiple edge\ntypes between proteins (e.g., co-expression, binding interactions), edges denoting known drug-disease\ntreatment pairs, and edges denoting experimentally documented side-effects of drugs. In total this\ndataset contains over 8 million edges between 97,000 nodes.\nExample 2: Reddit dynamics (Figure 2.b). We also consider a graph-based representation of Red-\ndit, one of the most popular websites in the world. Reddit allows users to form topical communities,\nwithin which users can create and comment on posts (e.g., images, or links to news stories). We\nanalyze all activity in 105 videogame related communities from May 1-5th, 2017 (Appendix B). In\ntotal this dataset contains over 4 million edges denoting interactions between users, communities and\nposts, with over 700,000 nodes in total (see Figure 2.b for the full schema). Edges exist to denote\nthat a user created,\u201cupvoted\u201d, or \u201cdownvoted\u201d a post, as well as edges that indicate whether a user\nsubscribes to a community\n\n3\n\nCAUSEINTERACT_XASSOCTARGETHAS_FUNCTIONTREATIS_APROTEINPROCESSSIDE EFFECTDRUGDISEASEINTERACT_XbaSUBSCRIBEBELONGCONTAINCOMMENTCREATEDOWNVOTEUPVOTEUSERCOMMUNITYWORDPOST7 types of X:- physical binding- co-expression- catalysis- activation- inhibition, etc.29 types of X:- cardiovascular- reproductive- cognition, etc.\f3.1 Conjunctive graph queries\n\nIn this work we seek to make predictions about conjunctive graph queries (Figure 1). Speci\ufb01cally,\nthe queries q \u2208 Q(G) that we consider can be written as:\n\nq = V? . \u2203V1, ..., Vm : e1 \u2227 e2 \u2227 ... \u2227 en,\nwhere ei = \u03c4 (vj, Vk), Vk \u2208 {V?, V1, ..., Vm}, vj \u2208 V, \u03c4 \u2208 R\nor ei = \u03c4 (Vj, Vk), Vj, Vk \u2208 {V?, V1, ..., Vm}, j (cid:54)= k, \u03c4 \u2208 R.\n\n(1)\n\nIn Equation (1), V? denotes the target variable of the query, i.e., the node that we want the query to\nreturn, while V1, ..., Vm are existentially quanti\ufb01ed bound variable nodes. The edges ei in the query\ncan involve these variable nodes as well as anchor nodes, i.e., non-variable/constant nodes that form\nthe input to the query, denoted in lower-case as vj.\nTo give a concrete example using the biological interaction network (Figure 2.a), consider the query\n\u201creturn all drug nodes that are likely to target proteins that are associated with a given disease node d.\u201d\nWe would write this query as:\n\nq = C?.\u2203P : ASSOC(d, P ) \u2227 TARGET(P, C?),\n\nand we say that the answer or denotation of this query(cid:74)q(cid:75) is the set of all drug nodes that are likely\n\nto be connected to node d on a length-two path following edges that have types TARGET and ASSOC,\nrespectively. Note that d is an anchor node of the query: it is the input that we provide. In contrast,\nthe upper-case nodes C? and P , are variables de\ufb01ned within the query, with the P variable being\nexistentially quanti\ufb01ed. In terms of graph structure, Equation (2) corresponds to a path. Figure 1\ncontains a visual illustration of this idea.\nBeyond paths, queries of the form in Equation (1) can also represent more complex relationships. For\nexample, the query \u201creturn all drug nodes that are likely to target proteins that are associated with the\ngiven disease nodes d1 and d2\u201d would be written as:\n\n(2)\n\nC?.\u2203P : ASSOC(d1, P ) \u2227 ASSOC(d2, P ) \u2227 TARGET(P, C?).\n\nIn this query we have two anchor nodes d1 and d2, and the query corresponds to a polytree (Figure 1).\nIn general, we de\ufb01ne the dependency graph of a query q as the graph with edges Eq = {e1, ..., en}\nformed between the anchor nodes v1, ..., vk and the variable nodes V?, V1, ..., Vm (Figure 1). For a\nquery to be valid, its dependency graph must be a directed acyclic graph (DAG), with the anchor\nnodes as the source nodes of the DAG and the query target as the unique sink node. The DAG\nstructure ensures that there are no contradictions or redundancies.\nNote that there is an important distinction between the query DAG, which contains variables, and\na subgraph structure in the knowledge graph that satis\ufb01es this query, i.e., a concrete assignment of\nthe query variables (see Figure 1). For instance, it is possible for a query DAG to be satis\ufb01ed by a\nsubgraph that contains cycles, e.g., by having two bound variables evaluate to the same node.\nObserved vs. unobserved denotation sets. If we view edge relations as binary predicates, the graph\nqueries de\ufb01ned by Equation (1) correspond to a standard conjunctive query language [1], with the\nrestriction that we allow at most one free variable. However, unlike standard queries on relational\ndatabases, we seek to discover or predict unobserved relationship and not just answer queries that\nunobserved denotation set(cid:74)q(cid:75) that we are trying to predict, and we assume that(cid:74)q(cid:75) is not fully\nexactly satisfy a set of observed edges. Formally, we assume that every query q \u2208 Q(G) has some\nobserved denotation set of a query, denoted(cid:74)q(cid:75)train, which corresponds to the set of nodes that exactly\nobserved in our training data. To avoid confusion on this point, we also introduce the notion of the\nanswer pairs that are known in the training data, i.e., (q, v\u2217), v\u2217 \u2208(cid:74)q(cid:75)train, so that we can generalize to\npairs that rely on edges which are unobserved in the training data (q, v\u2217), v\u2217 \u2208(cid:74)q(cid:75) \\(cid:74)q(cid:75)train.\n\nsatisfy q according to our observed, training edges. Thus, our goal is to train using example query-\n\nparts of the graph that involve missing edges, i.e., so that we can make predictions for query-answer\n\n4 Proposed Approach\n\nThe key idea behind our approach is that we learn how to embed any conjunctive graph query into\na low-dimensional space. This is achieved by representing logical query operations as geometric\n\n4\n\n\fFigure 3: Overview of GQE framework. Given an input query q, we represent this query according to its DAG\nstructure, then we use Algorithm 1 to generate an embedding of the query based on this DAG. Algorithm 1\nstarts with the embeddings of the query\u2019s anchor nodes and iteratively applies geometric operations P and I to\ngenerate an embedding q that corresponds to the query. Finally, we can use the generated query embedding to\npredict the likelihood that a node satis\ufb01es the query, e.g., by nearest neighbor search in the embedding space.\n\nscore(q, zv) =\n\nq \u00b7 zv\n(cid:107)q(cid:107)(cid:107)zv(cid:107)\n\n.\n\n(3)\n\nembeddings:1\n\noperators that are jointly optimized on a low-dimensional embedding space along with a set of node\nembeddings. The core of our framework is Algorithm 1, which maps any conjunctive input query q\nto an embedding q \u2208 Rd using two differentiable operators, P and I, described below. The goal is\nto optimize these operators\u2014along with embeddings for all graph nodes zv \u2208 Rd,\u2200v \u2208 V\u2014so that\nthe embedding q for any query q can be generated and used to predict the likelihood that a node v\nzv, so that the likelihood or \u201cscore\u201d that v \u2208(cid:74)q(cid:75) is given by the distance between their respective\nsatis\ufb01es the query q. In particular, we want to generate query embeddings q and node embeddings\nThus, our goal is to generate an embedding q of a query that implicitly represents its denotation(cid:74)q(cid:75);\ni.e., we want to generate query embeddings so that score(q, zv) = 1,\u2200v \u2208(cid:74)q(cid:75) and score(q, zv) =\n0,\u2200v /\u2208(cid:74)q(cid:75). At inference time, we take a query q, generate its corresponding embedding q, and then\n\nperform nearest neighbor search\u2014e.g., via ef\ufb01cient locality sensitive hashing [21]\u2014in the embedding\nspace to \ufb01nd nodes likely to satisfy this query (Figure 3).\nTo generate the embedding q for a query q using Algorithm 1, we (i) represent the query using its\nDAG dependency graph, (ii) start with the embeddings zv1, ..., zvn of its anchor nodes, and then (iii)\nwe apply geometric operators, P and I (de\ufb01ned below) to these embeddings to obtain an embedding\nq of the query. In particular, we introduce two key geometric operators, both of which can be\ninterpreted as manipulating the denotation set associated with a query in the embedding space.\nGeometric projection operator, P:. Given a query embedding q and an edge type \u03c4, the projection\n(cid:74)q(cid:48)(cid:75) = \u222av\u2208(cid:74)q(cid:75)N (v, \u03c4 ), where N (v, \u03c4 ) denotes the set of nodes connected to v by edges of type \u03c4.\noperator P outputs a new query embedding q(cid:48) = P(q, \u03c4 ) whose corresponding denotation is\nThus, P takes an embedding corresponding to a set of nodes(cid:74)q(cid:75) and produces a new embedding that\ncorresponds to the union of all the neighbors of nodes in(cid:74)q(cid:75), by edges of type \u03c4. Following a long\nline of successful work on encoding edge and path relationships in knowledge graphs [23, 18, 28, 29],\nwe implement P as follows:\n(4)\nwhere Rd\u00d7d\nis a trainable parameter matrix for edge type \u03c4. In the base case, if P is given a node\nembedding zv and edge type \u03c4 as input, then it returns an embedding of the neighbor set N (v, \u03c4 ).\nGeometric intersection operator, I:. Suppose we are given a set of query embeddings q1, ..., qn,\nall of which correspond to queries with the same output node type \u03b3. The geometric intersection\n\nP(q, \u03c4 ) = R\u03c4 q,\n\n\u03c4\n\n1We use the cosine distance, but in general other distance measures could be used.\n\n5\n\nQuery DAGOperations in an embedding spacePIPPqzd1zd2Input queryNearest neighbor lookup to \ufb01nds nodes that satisfy the queryd1d2C?PC?.9P:target(C?,P)^assoc(P,d2)^assoc(P,d2)\f(5)\n\ncorresponds to(cid:74)q(cid:48)(cid:75) = \u2229i=1,...,n(cid:74)q(cid:75)i, i.e., it performs set intersection in the embedding space. While\noperator I takes this set of query embeddings and produces a new embedding q(cid:48) whose denotation\npath projections of the form in Equation (4) have been considered in previous work on edge and path\nprediction, no previous work has considered such a geometric intersection operation. Motivated by\nrecent advancements in deep learning on sets [32, 46], we implement I as:\nI({q1, ..., qn}) = W\u03b3\u03a8 (NNk(qi),\u2200i = 1, ...n}) ,\n\nwhere NNk is a k-layer feedforward neural network, \u03a8 is a symmetric vector function (e.g., an\nelementwise mean or min of a set over vectors), W\u03b3, B\u03b3 are trainable transformation matrices\nfor each node type \u03b3 \u2208 \u0393, and ReLU denotes a recti\ufb01ed linear unit. In principle, any suf\ufb01ciently\nexpressive neural network that operates on sets could be also employed as the intersection operator\n(e.g., a variant of Equation 5 with more hidden layers), as long as this network is permutation invariant\non its inputs [46].\nQuery inference using P and I. Given the geometric projection operator P (Equation 4) and\nthe geometric intersection operator I (Equation 5) we can use Algorithm 1 to ef\ufb01ciently generate\nan embedding q that corresponds to any DAG-structured conjunctive query q on the network. To\ngenerate a query embedding, we start by projecting the anchor node embeddings according to their\noutgoing edges; then if a node has more than one incoming edge in the query DAG, we use the\nintersection operation to aggregate the incoming information, and we repeat this process as necessary\nuntil we reach the target variable of the query. In the end, Algorithm 1 generates an embedding q of a\nquery in O(d2E) operations, where d is the embedding dimension and E is the number of edges in\nthe query DAG. Using the generated embedding q we can predict nodes that are likely to satisfy this\nquery by doing a nearest neighbor search in the embedding space. Moreover, since the set of nodes is\nknown in advance, this nearest neighbor search can be made highly ef\ufb01cient (i.e., sublinear in |V|)\nusing locality sensitive hashing, at a small approximation cost [21].\n\n4.1 Theoretical analysis\n\nFormally, we can show that in an ideal setting Algorithm 1 can exactly answer any conjunctive query\non a network. This provides an equivalence between conjunctive queries on a network and sequences\nof geometric projection and intersection operations in an embedding space.\nTheorem 1. Given a network G = (V,E), there exists a set of node embeddings zv \u2208 Rd,\u2200v \u2208 V,\ngeometric projection parameters R\u03c4 \u2208 Rd\u00d7d,\u2200\u03c4 \u2208 R, and geometric intersection parameters\nW\u03b3, B\u03b3 \u2208 Rd\u00d7d,\u2200\u03b3 \u2208 \u0393 with d = O(|V |) such that for all DAG-structured queries q \u2208 Q(G)\ncontaining E edges the following holds: Algorithm 1 can compute an embedding q of q using O(E)\napplications of the geometric operators P and I such that\ni.e., the observed denotation set of the query(cid:74)q(cid:75)train can be exactly computed in the embeddings space\nby Algorithm 1 using O(E) applications of the geometric operators P and I.\nTheorem 1 is a consequence of the correspondence between tensor algebra and logic [11] combined\nwith the ef\ufb01ciency of DAG-structured conjunctive queries [1], and the full proof is in Appendix A.\n\nif v /\u2208(cid:74)q(cid:75)train\nif v \u2208(cid:74)q(cid:75)train\n\nscore(q, zv) =\n\n(cid:26)0\n\n\u03b1 > 0\n\n,\n\n4.2 Node embeddings\n\nIn principle any ef\ufb01cient differentiable algorithm that generates node embeddings can be used as the\nbase of our query embeddings. Here we use a standard \u201cbag-of-features\u201d approach [44]. We assume\nthat every node of type \u03b3 has an associated binary feature vector xu \u2208 Zm\u03b3 , and we compute the\nnode embedding as\n(6)\nwhere Z\u03b3 \u2208 Rd\u00d7m\u03b3 is a trainable embedding matrix. In our experiments, the xu vectors are one-hot\nindicator vectors (e.g., each node gets its own embedding) except for posts in Reddit, where the\nfeatures are binary indicators of what words occur in the post.\n\nZ\u03b3xu\n|xu|\n\nzu =\n\n,\n\n6\n\n\f4.3 Other variants of our framework\n\nAbove we outlined one concrete implementation of our GQE framework. However, in principle, our\nframework can be implemented with alternative geometric projection P and intersection I operators.\nIn particular, the projection operator can be implemented using any composable, embedding-based\nedge prediction model, as de\ufb01ned in Guu et al., 2015 [18]. For instance, we also consider variants\nof the geometric projection operator based on DistMult [45] and TransE [6]. In the DistMult model\nthe matrices in Equation (4) are restricted to be diagonal, whereas in the TransE variant we replace\nEquation (4) with a translation operation, PTransE(q, \u03c4 ) = q + r\u03c4 . Note, however, that our proof of\nTheorem 1 relies on speci\ufb01c properties of projection operator described in Equation (4).\n\n4.4 Model training\nThe geometric projection operator P, intersection operator I, and node embedding parameters can\ntraining query q, we uniformly sample a positive example node v\u2217 \u2208(cid:74)q(cid:75)train and negative example\nbe trained using stochastic gradient descent on a max-margin loss. To compute this loss given a\nnode vN /\u2208(cid:74)q(cid:75)train from the training data and compute:\n\nL(q) = max (0, 1 \u2212 score(q, zv\u2217 ) + score(q, zvN )) .\n\nFor queries involving intersection operations, we use two types of negative samples: \u201cstandard\u201d\nnegative samples are randomly sampled from the subset of nodes that have the correct type for a\nquery; in contrast, \u201chard\u201d negative samples correspond to nodes that satisfy the query if a logical\nconjunction is relaxed to a disjunction. For example, for the query \u201creturn all drugs that are likely to\ntreat disease d1 and d2\u201d, a hard negative example would be diseases that treat d1 but not d2.\n\n5 Experiments\n\nWe run experiments on the biological interaction (Bio) and Reddit datasets (Figure 2). Code and data\nis available at https://github.com/williamleif/graphqembed.\n\n5.1 Baselines and model variants\n\nWe consider variants of our framework using the projection operator in Equation 4 (termed Bilinear),\nas well as variants using TransE and DistMult as the projection operators (see Section 4.3). All\nvariants use a single-layer neural network in Equation (5). As a baseline, we consider an enumeration\napproach that is trained end-to-end to perform edge prediction (using Bilinear, TransE, or DistMult)\nand scores possible subgraphs that could satisfy a query by taking the product (i.e., a soft-AND)\nof their individual edge likelihoods (using a sigmoid with a learned scaling factor to compute the\nedge likelihoods). However, this enumeration approach has exponential time complexity w.r.t. to the\nnumber of bound variables in a query and is intractable in many cases, so we only include it as a\ncomparison point on the subset of queries with no bound variables. (A slightly less naive baseline\nvariant where we simply use one-hot embeddings for nodes is similarly intractable due to having\nquadratic complexity w.r.t.\nto the number of nodes.) As additional ablations, we also consider\nsimpli\ufb01ed variants of our approach where we only train the projection operator P on edge prediction\nand where the intersection operator I is just an elementwise mean or min. This tests how well\nAlgorithm 1 can answer conjunctive queries using standard node embeddings that are only trained to\nperform edge prediction. For all baselines and variants, we used PyTorch [30], the Adam optimizer,\nan embedding dimension d = 128, a batch size of 256, and tested learning rates {0.1, 0.01, 0.001}.\n5.2 Dataset of train and test queries\n\nTo test our approach, we sample sets of train/test queries from a knowledge graph, i.e., pairs (q, v\u2217),\nwhere q is a query and v\u2217 is a node that satis\ufb01es this query. In our sampling scheme, we sample a\n\ufb01xed number of example queries for each possible query DAG structure (Figure 4, bottom). For each\npossible DAG structure, we sampled queries uniformly at random using a simple rejection sampling\napproach (described below).\nTo sample training queries, we \ufb01rst remove 10% of the edges uniformly at random from the graph\nand then perform sampling on this downsampled training graph. To sample test queries, we sample\n\n7\n\n\fTable 1: Performance on test queries for different variants of our framework. Results are macro-averaged across\nqueries with different DAG structures (Figure 4, bottom). For queries involving intersections, we evaluate both\nusing standard negative examples as well as \u201chard\u201d negative examples (Section 4.4), giving both measures equal\nweight in the macro-average. Figure 4 breaks down the performance of the best model by query type.\n\nGQE training AUC\nAPR\nEdge training AUC\nAPR\n\nBio data\nBilinear DistMult\n\n91.0\n91.5\n79.2\n78.6\n\n90.7\n91.3\n86.7\n87.5\n\nTransE\n88.7\n89.9\n78.3\n81.6\n\nReddit data\nBilinear DistMult\n\n76.4\n78.7\n59.8\n60.1\n\n73.3\n74.7\n72.2\n73.5\n\nTransE\n75.9\n78.4\n73.0\n75.5\n\nFigure 4: AUC of the Bilinear GQE model on both datasets, broken down according to test queries with different\ndependency graph structures, as well as test queries using standard or hard negative examples.\n\nfrom the original graph (i.e., the complete graph without any removed edges), but we ensure that\nthe test query examples are not directly answerable in the training graph. In other words, we ensure\nthat every test query relies on at least one deleted edge (i.e., that for every test query example (q,\n\nv\u2217), v\u2217 /\u2208(cid:74)q(cid:75)train). This train/test setup ensures that a trivial baseline\u2014which simply tries to answer\n\na query by template matching on the observed training edges\u2014will have an accuracy that is no\nbetter random guessing on the test set, i.e., that every test query can only be answered by inferring\nunobserved relationships.\nSampling details. In our sampling scheme, we sample a \ufb01xed number of example queries for each\npossible query DAG structure. In particular, given a DAG structure with E edges\u2014speci\ufb01ed by a\nvector d = [d1, d2, ..., dE] of node out degrees, which are sorted in topological order [42] \u2014we\nsample edges using the following procedure: First we sample the query target node (i.e., the root of\nthe DAG); next, we sample d1 out-edges from this node and we add each of these sampled nodes to\na queue; we then iteratively pop nodes from the queue, sampling di+1 neighbors from the ith node\npopped from the queue, and so on. If a node has di = 0, then this corresponds to an anchor node\nin the query. We use simple rejection sampling to cope with cases where the sampled nodes cannot\nsatisfy a particular DAG structure, i.e., we repeatedly sample until we obtain S example queries\nsatisfying a particular query DAG structure.\nTraining, validation, and test set details. For training we sampled 106 queries with two edges and\n106 queries with three edges, with equal numbers of samples for each different type of query DAG\nstructure. For testing, we sampled 10,000 test queries for each DAG structure with two or three edges\nand ensured that these test queries involved missing edges (see above). We further sampled 1,000 test\nqueries for each possible DAG structure to use for validation (e.g., for early stopping). We used all\nedges in the training graph as training examples for size-1 queries (i.e., edge prediction), and we used\na 90/10 split of the deleted edges to form the test and validation sets for size-1 queries.\n\n5.3 Evaluation metrics\n\nFor a test query q we evaluate how well the model ranks a node v\u2217 that does satisfy this query\n\nv\u2217 \u2208(cid:74)q(cid:75) compared to negative example nodes that do not satisfy it, i.e., vN /\u2208(cid:74)q(cid:75). We quantify this\nwe rank the true node against min(1000,|{v /\u2208(cid:74)q(cid:75)|) negative examples (that have the correct type\n\nperformance using the ROC AUC score and average percentile rank (APR). For the APR computation,\n\n8\n\n EVALUATING ON HARD NEGATIVE EXAMPLESQuery dependency graph AUC\fTable 2: Comparing GQE to an enumeration baseline that performs edge prediction and then computes logical\nconjunctions as products of edge likelihoods. AUC values are reported (with analogous results holding for the\nAPR metric). Bio-H and Reddit-H denote evaluations where hard negative examples are used (see Section 5.3).\n\nEnum. Baseline\n\nGQE\n\nBio\n0.985\n0.989\n\nBio-H Reddit Reddit-H\n0.731\n0.743\n\n0.643\n0.645\n\n0.910\n0.948\n\nfor the query) and compute the percentile rank of the true node within this set. For queries containing\nintersections, we run both these metrics using both standard and \u201chard\u201d negative examples to compute\nthe ranking/classi\ufb01cation scores, where\u201chard\u201d negative examples are nodes that satisfy the query if a\nlogical conjunction is relaxed to a disjunction.\n\n5.4 Results and discussion\n\nTable 1 contains the performance results for three variants of GQEs based on bilinear transformations\n(i.e., Equation 4), DistMult, and TransE, as well as the ablated models that are only trained on edge\nprediction (denoted Edge Training).2 Overall, we can see that the full Bilinear model performs the\nbest, with an AUC of 91.0 on the Bio data and an AUC of 76.4 on the Reddit data (macro-averaged\nacross all query DAG structures of size 1-3). In Figure 4 we breakdown performance across different\ntypes of query dependency graph structures, and we can see that its performance on complex queries\nis very strong (relative to its performance on simple edge prediction), with long paths being the most\ndif\ufb01cult type of query.\nTable 2 compares the best-performing GQE model to the best-performing enumeration-based baseline.\nThe enumeration baseline is computationally intractable on queries with bound variables, so this\ncomparison is restricted to the subset of queries with no bound variables. Even in this restricted\nsetting, we see that GQE consistently outperforms the baseline. This demonstrates that performing\nlogical operations in the embedding space is not only more ef\ufb01cient, it is also an effective alternative\nto enumerating the product of edge-likelihoods, even in cases where the latter is feasible.\nThe importance of training on complex queries. We found that explicitly training the model to\npredict complex queries was necessary to achieve strong performance (Table 1). Averaging across all\nmodel variants, we observed an average AUC improvement of 13.3% on the Bio data and 13.9% on\nthe Reddit data (both p < 0.001, Wilcoxon signed-rank test) when using full GQE training compared\nto Edge Training. This shows that training on complex queries is a useful way to impose a meaningful\nlogical structure on an embedding space and that optimizing for edge prediction alone does not\nnecessarily lead to embeddings that are useful for more complex logical queries.\n\n6 Conclusion\n\nWe proposed a framework to embed conjunctive graph queries, demonstrating how to map a practical\nsubset of logic to ef\ufb01cient geometric operations in an embedding space. Our experiments showed that\nour approach can make accurate predictions on real-world data with millions of relations. Of course,\nthere are limitations of our framework: for instance, it cannot handle logical negation or disjunction,\nand we also do not consider features on edges. Natural future directions include generalizing the\nspace of logical queries\u2014for example, by learning a geometric negation operator\u2014and using graph\nneural networks [7, 17, 19] to incorporate richer feature information on nodes and edges.\n\nAcknowledgements\n\nThe authors thank Alex Ratner, Stephen Bach, and Michele Catasta for their helpful discussions and\ncomments on early drafts. This research has been supported in part by NSF IIS-1149837, DARPA\nSIMPLEX, Stanford Data Science Initiative, Huawei, and Chan Zuckerberg Biohub. WLH was also\nsupported by the SAP Stanford Graduate Fellowship and an NSERC PGS-D grant.\n\n2We selected the best \u03a8 function and learning rate for each variant on the validation set.\n\n9\n\n\fReferences\n[1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of databases: The logical level. Addison-\n\nWesley, 1995.\n\n[2] M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K. Dolinski,\nS. Dwight, J. Eppig, et al. Gene Ontology: tool for the uni\ufb01cation of biology. Nature Genetics,\n25(1):25, 2000.\n\n[3] S. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random \ufb01elds and\n\nprobabilistic soft logic. JMRL, 18(109):1\u201367, 2017.\n\n[4] J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on freebase from question-answer\n\npairs. In EMNLP, 2013.\n\n[5] A. Bordes, S. Chopra, and J. Weston. Question answering with subgraph embeddings. In\n\nEMNLP, 2014.\n\n[6] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings\n\nfor modeling multi-relational data. In NIPS, 2013.\n\n[7] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning:\n\nGoing beyond Euclidean data. IEEE Signal Processing Magazine, 34(4):18\u201342, 2017.\n\n[8] A. Brown and C. Patel. A standard database for drug repositioning. Scienti\ufb01c Data, 4:170029,\n\n2017.\n\n[9] R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In VLDB, 1987.\n\n[10] A. Chatr-Aryamontri et al. The BioGRID interaction database: 2015 update. Nucleic Acids\n\nRes., 43(D1):D470\u2013D478, 2015.\n\n[11] W. Cohen. Tensorlog: A differentiable deductive database. arXiv:1605.06523, 2016.\n\n[12] N. Dalvi and D. Suciu. Ef\ufb01cient query evaluation on probabilistic databases. In VLDB, 2007.\n\n[13] R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and\nA. McCallum. Go for a walk and arrive at the answer: Reasoning over paths in knowledge\nbases using reinforcement learning. ICLR, 2018.\n\n[14] R. Das, A. Neelakantan, D. Belanger, and A. McCallum. Chains of reasoning over entities,\n\nrelations, and text using recurrent neural networks. In EACL, 2016.\n\n[15] T. Demeester, T. Rockt\u00e4schel, and S. Riedel. Lifted rule injection for relation embeddings. In\n\nEMNLP, 2016.\n\n[16] L. Getoor and B. Taskar. Introduction to Statistical Relational Learning. MIT press, 2007.\n\n[17] J. Gilmer, S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for\n\nquantum chemistry. ICML, 2017.\n\n[18] K. Guu, J. Miller, and P. Liang. Traversing knowledge graphs in vector space. EMNLP, 2015.\n\n[19] W. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and\n\napplications. IEEE Data Engineering Bulletin, 2017.\n\n[20] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing. Harnessing deep neural networks with logic rules.\n\nIn ACL, 2016.\n\n[21] P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of\n\ndimensionality. In ACM Symp. Theory Comput., 1998.\n\n[22] A. Kahn. Topological sorting of large networks. Communications of the ACM, 5(11):558\u2013562,\n\n1962.\n\n10\n\n\f[23] D. Krompa\u00df, M. Nickel, and V. Tresp. Querying factorized probabilistic triple databases. In\n\nInternational Semantic Web Conference, pages 114\u2013129, Cham, 2014.\n\n[24] M. Kuhn et al. The SIDER database of drugs and side effects. Nucleic Acids Res., 44(D1):D1075\u2013\n\nD1079, 2015.\n\n[25] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Assoc.\n\nInform. Sci. and Technol., 58(7):1019\u20131031, 2007.\n\n[26] J. Menche et al. Uncovering disease-disease relationships through the incomplete interactome.\n\nScience, 347(6224):1257601, 2015.\n\n[27] A. Neelakantan, B. Roth, and A. McCallum. Compositional vector space models for knowledge\n\nbase inference. In AAAI, 2015.\n\n[28] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning\n\nfor knowledge graphs. Proc. IEEE, 104(1):11\u201333, 2016.\n\n[29] M. Nickel, V. Tresp, and H. Kriegel. A three-way model for collective learning on multi-\n\nrelational data. In ICML, 2011.\n\n[30] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,\nL. Antiga, and A. Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop,\n2017.\n\n[31] Ja. Pi\u00f1ero, N. Queralt-Rosinach, \u00c0. Bravo, J. Deu-Pons, A. Bauer-Mehren, M. Baron, F. Sanz,\nand L. Furlong. DisGeNET: a discovery platform for the dynamical exploration of human\ndiseases and their genes. Database, 2015, 2015.\n\n[32] C. Qi, H. Su, K. Mo, and L. Guibas. Pointnet: Deep learning on point sets for 3d classi\ufb01cation\n\nand segmentation. In CVPR, 2017.\n\n[33] G. Ramanathan. Towards a model theory for distributed representations. In AAAI Spring\n\nSymposium Series, 2015.\n\n[34] T. Rockt\u00e4schel. Combining representation learning with logic for language processing.\n\narXiv:1712.09687, 2017.\n\n[35] T. Rockt\u00e4schel, M. Bo\u0161njak, S. Singh, and S. Riedel. Low-dimensional embeddings of logic. In\n\nACL Semantic Parsing, pages 45\u201349, 2014.\n\n[36] T. Rockt\u00e4schel and S. Riedel. End-to-end differentiable proving. In NIPS, 2017.\n\n[37] T. Rockt\u00e4schel, S. Singh, and S. Riedel. Injecting logical background knowledge into embed-\n\ndings for relation extraction. In NAACL HLT, pages 1119\u20131129, 2015.\n\n[38] T. Rolland et al. A proteome-scale map of the human interactome network. Cell, 159(5):1212\u2013\n\n1226, 2014.\n\n[39] D. Szklarczyk et al. STITCH 5: augmenting protein\u2013chemical interaction networks with tissue\n\nand af\ufb01nity data. Nucleic Acids Res., 44(D1):D380\u2013D384, 2015.\n\n[40] D. Szklarczyk et al. The STRING database in 2017: quality-controlled protein\u2013protein associa-\n\ntion networks, made broadly accessible. Nucleic Acids Res., 45(D1):D362\u2013D368, 2017.\n\n[41] N. Tatonetti et al. Data-driven prediction of drug effects and interactions. Science Translational\n\nMedicine, 4(125):12531, 2012.\n\n[42] K. Thulasiraman and Madisetti N. Swamy. Graphs: theory and algorithms. John Wiley & Sons,\n\n2011.\n\n[43] M. Wang, Y. Tang, J. Wang, and J. Deng. Premise selection for theorem proving by deep graph\n\nembedding. In NIPS, 2017.\n\n[44] L. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, and J. Weston. Starspace: Embed all the\n\nthings! In AAAI, 2017.\n\n11\n\n\f[45] Bi. Yang, W. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning\n\nand inference in knowledge bases. ICLR, 2015.\n\n[46] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R> Salakhutdinov, and A. J. Smola. Deep\n\nsets. In NIPS, 2017.\n\n[47] Y. Zhang, H. Dai, Z. Kozareva, A. Smola, and L. Song. Variational reasoning for question\n\nanswering with knowledge graph. AAAI, 2018.\n\n[48] T. Zhou, J. Ren, M. Medo, and Y. Zhang. Bipartite network projection and personal recommen-\n\ndation. Phys. Rev. E, 76(4):046115, 2007.\n\n[49] M. Zitnik, M. Agrawal, and J. Leskovec. Modeling polypharmacy side effects with graph\n\nconvolutional networks. Bioinformatics, 2018.\n\n12\n\n\f", "award": [], "sourceid": 1018, "authors": [{"given_name": "Will", "family_name": "Hamilton", "institution": "McGill University / FAIR"}, {"given_name": "Payal", "family_name": "Bajaj", "institution": "Stanford University"}, {"given_name": "Marinka", "family_name": "Zitnik", "institution": "Stanford University"}, {"given_name": "Dan", "family_name": "Jurafsky", "institution": "Stanford University"}, {"given_name": "Jure", "family_name": "Leskovec", "institution": "Stanford University and Pinterest"}]}