{"title": "Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 10197, "page_last": 10207, "abstract": "Vulnerability identification is crucial to protect the software systems from attacks\nfor cyber security. It is especially important to localize the vulnerable functions\namong the source code to facilitate the fix. However, it is a challenging and tedious\nprocess, and also requires specialized security expertise. Inspired by the work\non manually-defined patterns of vulnerabilities from various code representation\ngraphs and the recent advance on graph neural networks, we propose Devign, a\ngeneral graph neural network based model for graph-level classification through\nlearning on a rich set of code semantic representations. It includes a novel Conv\nmodule to efficiently extract useful features in the learned rich node representations for graph-level classification. The model is trained over manually labeled datasets built on 4 diversified large-scale open-source C projects that incorporate high complexity and variety of real source code instead of synthesis code used in previous works. The results of the extensive evaluation on the datasets demonstrate that Devign outperforms the state of the arts significantly with an average of 10.51% higher accuracy and 8.68% F1 score, increases averagely 4.66% accuracy and 6.37% F1 by the Conv module.", "full_text": "Devign: Effective Vulnerability Identi\ufb01cation by\nLearning Comprehensive Program Semantics via\n\nGraph Neural Networks\n\nYaqin Zhou1, Shangqing Liu1, \u2217, Jingkai Siow1, Xiaoning Du1, \u2217, and Yang Liu1\n\n1{yqzhou, shangqin001, jingkai001, xiaoning.du, yangliu}@ntu.edu.sg\n\n1Nanyang Technological University\n\n*Co-corresponding author\n\nAbstract\n\nVulnerability identi\ufb01cation is crucial to protect the software systems from attacks\nfor cyber security. It is especially important to localize the vulnerable functions\namong the source code to facilitate the \ufb01x. However, it is a challenging and tedious\nprocess, and also requires specialized security expertise. Inspired by the work\non manually-de\ufb01ned patterns of vulnerabilities from various code representation\ngraphs and the recent advance on graph neural networks, we propose Devign, a\ngeneral graph neural network based model for graph-level classi\ufb01cation through\nlearning on a rich set of code semantic representations. It includes a novel Conv\nmodule to ef\ufb01ciently extract useful features in the learned rich node representations\nfor graph-level classi\ufb01cation. The model is trained over manually labeled datasets\nbuilt on 4 diversi\ufb01ed large-scale open-source C projects that incorporate high\ncomplexity and variety of real source code instead of synthesis code used in\nprevious works. The results of the extensive evaluation on the datasets demonstrate\nthat Devign outperforms the state of the arts signi\ufb01cantly with an average of\n10.51% higher accuracy and 8.68% F1 score, increases averagely 4.66% accuracy\nand 6.37% F1 by the Conv module.\n\n1\n\nIntroduction\n\nThe number of software vulnerabilities has been increasing rapidly recently, either reported publicly\nthrough CVE (Common Vulnerabilities and Exposures) or discovered internally in proprietary code.\nIn particular, the prevalence of open-source libraries not only accounts for the increment, but also\npropagates impact. These vulnerabilities, mostly caused by insecure code, can be exploited to attack\nsoftware systems and cause substantial damages \ufb01nancially and socially.\nVulnerability identi\ufb01cation is a crucial yet challenging problem in security. Besides the classic\napproaches such as static analysis [1, 2], dynamic analysis [3\u20138] and symbolic execution, a number\nof advances have been made in applying machine learning as a complementary approach. In these\nearly methods [9\u201311], features or patterns hand-crafted by human experts are taken as inputs by\nmachine learning algorithms to detect vulnerabilities. However, the root causes of vulnerabilities vary\nby types of weaknesses [12] and libraries, making it impractical to characterize all vulnerabilities in\nnumerous libraries with the hand-crafted features.\nTo improve usability of the existing approaches and avoid the intense labor of human experts on\nfeature extraction, recent works investigate the potential of deep neural networks on a more automated\nway of vulnerability identi\ufb01cation [13\u201315]. However, all of these works have major limitations in\nlearning comprehensive program semantics to characterize vulnerabilities of high diversity and\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fcomplexity in real source code. First, in terms of learning approaches, they either treat the source\ncode as a \ufb02at sequence, which is similar to natural languages, or represent it with only partial\ninformation. However, source code is actually more structural and logical than natural languages\nand has heterogeneous aspects of representation such as Abstract Syntax Tree (AST), data \ufb02ow,\ncontrol \ufb02ow and etc. Moreover, vulnerabilities are sometimes subtle \ufb02aws that require comprehensive\ninvestigation from multiple dimensions of semantics. Therefore, the drawbacks in the design of\nprevious works limit their potentiality to cover various vulnerabilities. Second, in terms of training\ndata, part of the data in [14] is labeled by static analyzers, which introduced high percentage of false\npositives that are not real vulnerabilities. Another part, like [13], are simple arti\ufb01cial code (even with\n\u201cgood\u201d or \u201cbad\u201d inside the code to distinguish the vulnerable code and non-vulnerable code) that are\nfar beyond the complexity of real code [16].\nTo this end, we propose a novel graph neural network based model with composite programming\nrepresentation for factual vulnerability data. This allows us to encode a full set of classical pro-\ngramming code semantics to capture various vulnerability characteristics. A key innovation is a\nnew Conv module which takes as input a graph\u2019s heterogeneous node features from gated recurrent\nunits. The Conv module hierarchically chooses more coarse features via leveraging the traditional\nconvolutional and dense layers for graph level classi\ufb01cation. Moreover, to both testify the potential\nof the composite programming embedding for source code and the proposed graph neural network\nmodel for the challenging task of vulnerability identi\ufb01cation, we compiled manually labeled data sets\nfrom 4 popular and diversi\ufb01ed libraries in C programming language. We name this model Devign\n(Deep Vulnerability Identi\ufb01cation via Graph Neural Networks).\n\u2022 In the composite code representation, with ASTs as the backbone, we explicitly encode the program\ncontrol and data dependency at different levels into a joint graph of heterogeneous edges with each\ntype denoting the connection regarding to the corresponding representation. The comprehensive\nrepresentation, not considered in previous works, facilitates to capture as extensive types and\npatterns of vulnerabilities as possible, and enables to learn better node representation through\ngraph neural networks.\n\u2022 We propose the gated graph neural network model with the Conv module for graph-level classi\ufb01ca-\ntion. The Conv module learns hierarchically from the node features to capture the higher level of\nrepresentations for graph-level classi\ufb01cation tasks.\n\u2022 We implement Devign, and evaluate its effectiveness through manually labeled data sets (cost\naround 600 man-hours) collected from the 4 popular C libraries. We make two datasets public\ntogether with more details (https://sites.google.com/view/devign). The results show that Devign\nachieves an average 10.51% higher accuracy and 8.68% F1 score than baseline methods. Mean-\nwhile, the Conv module brings an average 4.66% accuracy and 6.37% F1 gain. We compare Devign\nwith well-known static analyzers, where Devign outperforms signi\ufb01cantly with a 27.99% higher\naverage F1 score for all the analyzers and on all the datasets. We apply Devign to 40 latest CVEs\ncollected from the 4 projects and get 74.11% accuracy, manifesting its usability in discovering new\nvulnerabilities.\n\n2 The Devign Model\nVulnerability patterns manually crafted with the code property graphs, integrating all syntax and\ndependency semantics, have been proved to be one of the most effective approaches [17] to detect\nsoftware vulnerabilities. Inspired by this, we designed Devign to automate the above process on code\nproperty graphs to learn vulnerable patterns using graph neural networks [18]. The Devign architecture\nis shown in Figure 1, which includes the three sequential components: 1) Graph Embedding Layer\nof Composite Code Semantics, which encodes the raw source code of a function into a joint graph\nstructure with comprehensive program semantics; 2) Gated Graph Recurrent Layers, which learn the\nfeatures of nodes through aggregating and passing information on neighboring nodes in graphs; and\n3) the Conv module that extracts meaningful node representation for graph-level prediction.\n\n2.1 Problem Formulation\n\nMost machine learning or pattern based approaches predict vulnerability at the coarse granularity\nlevel of a source \ufb01le or an application, i.e., whether a source \ufb01le or an application is potentially\nvulnerable [10, 17, 13, 15]. Here we analyze vulnerable code at the function level which is a \ufb01ner\nlevel of granularity in the overall \ufb02ow of vulnerability analysis. We formalize the identi\ufb01cation\nof vulnerable functions as a binary classi\ufb01cation problem, i.e., learning to decide whether a given\n\n2\n\n\fFigure 1: The Architecture of Devign\n\nfunction in raw source code is vulnerable or not. Let a sample of data be de\ufb01ned as ((ci, yi)|ci \u2208\nC, yi \u2208 Y), i \u2208 {1, 2, . . . , n}, where C denotes the set of functions in code, Y = {0, 1}n represents\nthe label set with 1 for vulnerable and 0 otherwise, and n is the number of instances. Since ci is a\nfunction, we assume it is encoded as a multi-edged graph gi(V, X, A) \u2208 G (See Section 2.2 for the\nembedding details). Let m be the total number of nodes in V , X \u2208 Rm\u00d7d is the initial node feature\nmatrix where each vertex vj in V is represented by a d-dimensional real-valued vector xj \u2208 Rd.\nA \u2208 {0, 1}k\u00d7m\u00d7m is the adjacency matrix, where k is the total number of edge types. An element\ns,t \u2208 A equal to 1 indicates that node vs, vt is connected via an edge of type p, and 0 otherwise.\nep\nThe goal of Devign is to learn a mapping from G to Y, f : G (cid:55)\u2192 Y to predict whether a function is\nvulnerable or not. The prediction function f can be learned by minimizing the loss function below:\n\nn(cid:88)\n\ni=1\n\n(1)\nwhere L(\u00b7) is the cross entropy loss function, \u03c9(\u00b7) is a regularization, and \u03bb is an adjustable weight.\n\nmin\n\nL(f (gi(V, X, A), yi|ci)) + \u03bb\u03c9(f )\n\n2.2 Graph Embedding Layer of Composite Code Semantics\n\nAs illustrated in Figure 1, the graph embedding layer EM B is a mapping from the function code ci\nto graph data structures as the input of the model, i.e.,\n\ngi(V, X, A) = EM B(ci),\u2200i = {1, . . . , n}\n\n(2)\n\nIn this section, we describe the motivation and method on why and how to utilize the classical code\nrepresentations to embed the code into a composite graph for feature learning.\n\n2.2.1 Classical Code Graph Representation and Vulnerability Identi\ufb01cation\n\nIn program analysis, various representations of the program are utilized to manifest deeper semantics\nbehind the textual code, where classic concepts include ASTs, control \ufb02ow, and data \ufb02ow graphs\nthat capture the syntactic and semantic relationships among the different tokens of the source\ncode. Majority of vulnerabilities such as memory leak are too subtle to be spotted without a joint\nconsideration of the composite code semantics [17]. For example, it is reported that ASTs alone can\nbe used to \ufb01nd only insecure arguments [17]. By combining ASTs with control \ufb02ow graphs, it enables\nto cover two more types of vulnerabilities, i.e., resource leaks and some use-after-free vulnerabilities.\nBy further integrating the three code graphs, it is possible to describe most types except two that need\nextra external information (i.e., race condition that depends on runtime properties and design errors\nthat are hard to model without details on the intended design of a program)\nThough the vulnerability templates in [17] are manually crafted in the form of graph traversals, it\nconveyed the key insight and proved the feasibility to learn a broader range of vulnerability patterns\nthrough integrating properties of ASTs, control \ufb02ow graphs and data \ufb02ow graphs into a joint data\nstructure. Besides the three classical code structures, we also take the natural sequence of source\ncode into consideration, since the recent advance on deep learning based vulnerability detection has\ndemonstrated its effectiveness [13, 14]. It can complement the classical representations because its\nunique \ufb02at structure captures the relationships of code tokens in a \u2018human-readable\u2019 fashion.\n\n2.2.2 Graph Embedding of Code\n\nNext we brie\ufb02y introduce each type of the code representations and how we represent various\nsubgraphs into one joint graph, following a code example of integer over\ufb02ow as in Figure 2(a) and its\ngraph representation as shown in Figure 2(b).\n\n3\n\n......Conv LayerGated Graph Recurrent LayerGraph Embedding LayerA1A2A3A4A5A6B1B2B3B4B5B6N1N2N3N4N5N6..................AAGRUBBGRUNNGRUInputAggregation01\fFigure 2: Graph Representation of Code Snippet with Integer Over\ufb02ow\n\nAbstract Syntax Tree (AST) AST is an ordered tree representation structure of source code. Usu-\nally, it is the \ufb01rst-step representation used by code parsers to understand the fundamental structure of\nthe program and to examine syntactic errors. Hence, it forms the basis for the generation of many\nother code representations and the node set of AST V ast includes all the nodes of the rest three code\nrepresentations used in this paper. Starting from the root node, the codes are broken down into code\nblocks, statements, declaration, expressions and so on, and \ufb01nally into the primary tokens that form\nthe leaf nodes. The major AST nodes are shown in Figure 2. All the boxes are AST nodes, with\nspeci\ufb01c codes in the \ufb01rst line and node type annotated. The blue boxes are leaf nodes of AST and\npurple arrows represent the child-parent AST relations.\nControl Flow Graph (CFG) CFG describes all paths that might be traversed through a program\nduring its execution. The path alternatives are determined by conditional statements, e.g., if, for,\nand switch statements. In CFGs, nodes denote statements and conditions, and they are connected by\ndirected edges to indicate the transfer of control. The CFG edges are highlighted with green dashed\narrows in Figure 2. Particularly, the \ufb02ow starts from the entry and ends at the exit, and two different\npaths derive at the if statements.\nData Flow Graph (DFG) DFG tracks the usage of variables throughout the CFG. Data \ufb02ow is\nvariable oriented and any data \ufb02ow involves the access or modi\ufb01cation of certain variables. A DFG\nedge represents the subsequent access or modi\ufb01cation onto the same variables. It is illustrated by\norange double arrows in Figure 2 and with the involved variables annotated over the edge. For\nexample, the parameter b is used in both the if condition and the assignment statement.\nNatural Code Sequence (NCS) In order to encode the natural sequential order of the source code,\nwe use NCS edges to connect neighboring code tokens in the ASTs. The main bene\ufb01t with such\nencoding is to reserve the programming logic re\ufb02ected by the sequence of source code. The NCS\nedges are denoted by red arrows in Figure 2, connect all the leaf nodes of the AST.\nConsequently, a function ci can be denoted by a joint graph g with the four types of subgraphs (or 4\ntypes of edges) sharing the same set of nodes V = V ast. As shown in Figure (2), every node v \u2208 V\nhas two attributes, Code and Type. Code contains the source code represented by v, and the type of v\ndenotes the type attribute. The initial node representation xv shall re\ufb02ect the two attributes. Hence,\nwe encode Code by using a pre-trained word2vec model with the code corpus built on the whole\nsource code \ufb01les in the projects, and Type by label encoding. We concatenate the two encodings\ntogether as the initial node representation xv.\n\n2.3 Gated Graph Recurrent Layers\n\nThe key idea of graph neural networks is to embed node representation from local neighborhoods\nthrough the neighborhood aggregation. Based on the different techniques for aggregating neigh-\nborhood information, there are graph convolutional networks [19], GraphSAGE [20], gated graph\nrecurrent networks [18] and their variants. We chose the gated graph recurrent network to learn the\nnode embedding, because it allows to go deeper than the other two and is more suitable for our data\nwith both semantics and graph structures [21].\nGiven an embedded graph gi(V, X, A), for each node vj \u2208 V , we initialize the node state vector\nj \u2208 Rz, z \u2265 d using the initial annotation by copying xj into the \ufb01rst dimensions and padding\nh(1)\nj , 0](cid:62). Let T be\nextra 0\u2019s to allow hidden states that are larger than the annotation size, i.e., h1\nthe total number of time-step for neighborhood aggregation. To propagate information throughout\ngraphs, at each time step t \u2264 T , all nodes communicate with each other by passing information via\n\nj = [x(cid:62)\n\n4\n\nshort a=32767:IdentifierDeclStatementa=32767AssignmentAST Edge CFG EdgeDFG Edge NCS EdgebIdentifierCFG Entrybshort bParametershortReturnTypea=32767PrimaryExpa+bAddExpshortParameterTypeaIdentifieraIdentifieraIdentifieraIdentifierbIdentifiershortIdentifierTypeb>0Condition0PrimaryExpbIdentifieraIdentifiershort add(short b)(cid:885)FunctionDefIf(b>0):IfStatementa=a+b:Assignmentreturn a:ReturnStatementCFG Exitaa1short add (shortb) {2 short a = 32767;3if(b > 0) {4 a = a + b;5 }6 return a;7 }(cid:894)(cid:258)(cid:895)(cid:3)(cid:18)(cid:381)(cid:282)(cid:286)(cid:3)(cid:28)(cid:454)(cid:258)(cid:373)(cid:393)(cid:367)(cid:286)(cid:3)(cid:381)(cid:296)(cid:3)(cid:47)(cid:374)(cid:410)(cid:286)(cid:336)(cid:286)(cid:396)(cid:3)(cid:75)(cid:448)(cid:286)(cid:396)(cid:296)(cid:367)(cid:381)(cid:449)(cid:894)(cid:271)(cid:895)(cid:3)(cid:39)(cid:396)(cid:258)(cid:393)(cid:346)(cid:3)(cid:90)(cid:286)(cid:393)(cid:396)(cid:286)(cid:400)(cid:286)(cid:374)(cid:410)(cid:258)(cid:410)(cid:349)(cid:381)(cid:374)\f(cid:18)\n\n(cid:20)\n\n(cid:21)\n\n(cid:19)\n\nedges dependent on the edge type and direction (described by the pth adjacent matrix Ap of A, from\nthe de\ufb01nition we can \ufb01nd that the number of adjacent matrix equals to edge types), i.e.,\n\na(t\u22121)\nj,p = A\n\n(cid:62)\np\n\nh(t\u22121)(cid:62)\n\n, . . . , h(t\u22121)(cid:62)\n\n(3)\nwhere Wp \u2208 Rz\u00d7z is the weight to learn and b is the bias. In particular, a new state aj,p of node vj is\ncalculated by aggregating information of all neighboring nodes de\ufb01ned on the adjacent matrix Ap on\nedge type p. The remaining steps are gated recurrent unit (GRU) that incorporate information from\nall types with node v and the previous time step to get the current node\u2019s hidden state h(t)\n\n+ b\n\nWp\n\nm\n\n1\n\ni,v, i.e.,\n\nj = GRU (h(t\u22121)\nh(t)\n\n, AGG({a(t\u22121)\n\n}k\np=1))\n\n(4)\nwhere AGG(\u00b7) denotes an aggregation function that could be one of\nthe functions\n{M EAN, M AX, SU M, CON CAT} to aggregate the information from different edge types to\ncompute the next time-step node embedding h(t). We use the SU M function in the implementation.\nThe above propagation procedure iterates over T time steps, and the state vectors at the last time step\ni = {h(T )\nH (T )\n\nj=1 is the \ufb01nal node representation matrix for the node set V .\n\nj }m\n\nj,p\n\nj\n\n2.4 The Conv Layer\n\nThe generated node features from the gated graph recurrent layers can be used as input to any\nprediction layer, e.g., for node or link or graph-level prediction, and then the whole model can be\ntrained in an end-to-end fashion. In our problem, we require to perform the task of graph-level\nclassi\ufb01cation to determine whether a function ci is vulnerable or not. The standard approach to graph\nclassi\ufb01cation is gathering all these generated node embeddings globally, e.g., using a linear weighted\nsummation to \ufb02atly adding up all the embeddings [18, 22] as shown in Eq (5),\n\n(cid:18)(cid:88)\n\n(cid:19)\n\n\u02dcyi = Sigmoid\n\nM LP ([H (T )\n\ni\n\n, xi])\n\n(5)\n\ni\n\nwhere the sigmoid function is used for classi\ufb01cation and M LP denotes a Multilayer Perceptron\nand xi to a Rm vector. This kind of approach hinders\n(MLP) that maps the concatenation of H (T )\neffective classi\ufb01cation over entire graphs [23, 24].\nThus, we design the Conv module to select sets of nodes and features that are relevant to the\ncurrent graph-level task. Previous works in [24] proposed to use a SortPooling layer after the graph\nconvolution layers to sort the node features in a consistent node order for graphs without \ufb01xed\nordering, so that traditional neural networks can be added after it and trained to extract useful features\ncharacterizing the rich information encoded in graph. In our problem, each code representation graph\nhas its own prede\ufb01ned order and connection of nodes encoded in the adjacent matrix, and the node\nfeatures are learned through gated recurrent graph layers instead of graph convolution networks\nthat requires to sort the node features from different channels. Therefore, we directly apply 1-D\nconvolution and dense neural networks to learn features relevant to the graph-level task for more\neffective prediction1. We de\ufb01ne \u03c3(\u00b7) as a 1-D convolutional layer with maxpooling, then\n\n\u03c3(\u00b7) = M AXP OOL(cid:0)Relu(cid:0)CON V (\u00b7)(cid:1)(cid:1)\ni = \u03c3(cid:0)[H (T )\n(cid:1)\ni = \u03c3(cid:0)Z (l\u22121)\n(cid:1)\ni = \u03c3(cid:0)Y (l\u22121)\ni = \u03c3(cid:0)H (T )\n))(cid:1)\n\n, xi](cid:1), . . . , Z (l)\n(cid:1), . . . , Y (l)\n\u02dcyi = Sigmoid(cid:0)AV G(M LP (Z (l)\n\ni ) (cid:12) M LP (Y (l)\n\nY (1)\n\nZ (1)\n\ni\n\ni\n\ni\n\nLet l be the number of convolutional layers applied, then the Conv module, can be expressed as\n\ni\n\n(7)\n(8)\n(9)\nwhere we \ufb01rstly apply traditional 1-D convolutional and dense layers respectively on the concatena-\ntion [H (T )\n, followed by a pairwise multiplication on the two\noutputs, then an average aggregation on the resulted vector, and at last make a prediction.\n\n, xi] and the \ufb01nal node features H (T )\n\ni\n\ni\n\ni\n\n(6)\n\n3 Evaluation\n\nWe evaluate the bene\ufb01ts of Devign against a number of state-of-the-art vulnerability discovery\nmethods, with the goal of understanding the following questions:\n\n1We also tried LSTMs and BiLSTMs (with and without attention mechanisms) on the sorted nodes in AST order, however, the convolution\n\nnetworks work best overall.\n\n5\n\n\fTable 1: Data Sets Overview\n\nProject\n\nSec. Rel. Commits VFCs Non-VFCs Graphs Vul Graphs Non-Vul Graphs\n\nLinux Kernel\nQEMU\nWireshark\nFFmpeg\nTotal\n\n12811\n11910\n10004\n13962\n48687\n\n8647\n4932\n3814\n5962\n23355\n\n4164\n6978\n6190\n8000\n25332\n\n16583\n15645\n20021\n6716\n58965\n\n11198\n6648\n6386\n3420\n27652\n\n5385\n8997\n13635\n3296\n31313\n\nQ1 How does our Devign compare to the other learning based vulnerability identi\ufb01cation methods?\nQ2 How does our Conv module powered Devign compare to the Ggrn with the \ufb02at summation in\nEq (5) for the graph-level classi\ufb01cation task?\nQ3 Can Devign learn from each type of the code representations (e.g., a single-edged graph with one\ntype of information)? And how do the Devign models with the composite graphs (e.g., all types of\ncode representations) compare to each of the single-edged graphs?\nQ4 Can Devign have a better performance compared to some static analyzers in the real scenario\nwhere the dataset is imbalanced with an extremely low percentage of vulnerable functions?\nQ5 How does Devign perform on the latest vulnerabilities reported publicly through CVEs?\n\n3.1 Data Preparation\n\nIt is never trivial to obtain high-quality data sets of vulnerable functions due to the demand of quali\ufb01ed\nexpertise. We noticed that despite [15] released data sets of vulnerable functions, the labels are\ngenerated by statistic analyzers which are not accurate. Other potential datasets used in [25] are\nnot available. In this work, supported by our industrial partners, we invested a team of security to\ncollect and label the data from scratch. Besides raw function collection, we need to generate graph\nrepresentations for each function and initial representations for each node in a graph. We describe the\ndetailed procedures below.\nRaw Data Gathering To test the capability of Devign in learning vulnerability patterns, we evaluate\non manually-labeled functions collected from 4 large C-language open-source projects that are\npopular among developers and diversi\ufb01ed in functionality, i.e., Linux Kernel, QEMU, Wireshark, and\nFFmpeg.\nTo facilitate and ensure the quality of data labelling, we started by collecting security-related commits\nwhich we would label as vulnerability-\ufb01x commits or non-vulnerability \ufb01x commits, and then\nextracted vulnerable or non-vulnerable functions directly from the labeled commits. The vulnerability-\n\ufb01x commits (VFCs) are commits that \ufb01x potential vulnerabilities, from which we can extract\nvulnerable functions from the source code of versions previous to the revision made in the commits.\nThe non-vulnerability-\ufb01x commits (non-VFCs) are commits that do not \ufb01x any vulnerability, similarly\nfrom which we can extract non-vulnerable functions from the source code before the modi\ufb01cation.\nWe adopted the approach proposed in [26] to collect the commits. It consists of the following two\nsteps. 1) Commits Filtering. Since only a tiny part of commits are vulnerability related, we exclude\nthe security-unrelated commits whose messages are not matched by a set of security-related keywords\nsuch as DoS and injection. The rest, more likely security-related, are left for manual labelling. 2)\nManual Labelling. A team of four professional security researchers spent totally 600 man-hours to\nperform a two round data labelling and cross-veri\ufb01cation.\nGiven a VFC or non-CFC, based on the modi\ufb01ed functions, we extract the source code of these\nfunctions before the commit is applied, and assign the labels accordingly.\nGraph Generation We make use of the open-source code analysis platform for C/C++ based on code\nproperty graphs, Joern [17], to extract ASTs and CFGs for all functions in our data sets. Due to some\ninner compile errors and exceptions in Joern, we can only obtain ASTs and CFGs for part of functions.\nWe \ufb01lter out these functions without ASTs and CFGs or with oblivious errors in ASTs and CFGs.\nSince the original DFGs edges are labeled with the variables involved, which tremendously increases\nthe number of the types of edges and meanwhile complicates embedded graphs, we substitute the\nDFGs with three other relations, LastRead (DFG_R), LastWrite (DFG_W), and ComputedFrom\n(DFG_C) [27], to make it more adaptive for the graph embedding. DFG_R represents the immediate\nlast read of each occurrence of the variable. Each occurrence can be directly recognized from the\nleaf nodes of ASTs. DFG_W represents the immediate last write of each occurrence of variables.\nSimilarly, we make these annotations to the leaf node variables. DFG_C determines the sources of a\n\n6\n\n\fTable 2: Classi\ufb01cation accuracies and F1 scores in percentages: The two far-right columns give\nthe maximum and average relative difference in accuracy/F1 compared to Devign model with the\ncomposite code representations, i.e., Devign (Composite).\n\nMethod\n\nLinux Kernel\nACC\nF1\n\nQEMU\n\nACC\n\nF1\n\nWireshark\nF1\n\nACC\n\nFFmpeg\nF1\n\nACC\n\nCombined\nF1\n\nACC\n\nMax Diff\nF1\n\nACC\n\nAvg Diff\nF1\n\nACC\n\nMetrics + Xgboost\n3-layer BiLSTM\n\n67.17\n67.25\n3-layer BiLSTM + Att 75.63\n70.72\n\nCNN\n\nGgrn (AST)\nGgrn (CFG)\nGgrn (NCS)\n\nGgrn (DFG_C)\nGgrn (DFG_R)\nGgrn (DFG_W)\nGgrn (Composite)\n\nDevign (AST)\nDevign (CFG)\nDevign (NCS)\n\nDevign (DFG_C)\nDevign (DFG_R)\nDevign (DFG_W)\nDevign (Composite)\n\n72.65\n78.79\n78.68\n70.53\n72.43\n71.09\n74.55\n80.24\n80.03\n79.58\n78.81\n78.25\n78.70\n79.58\n\n79.14\n80.41\n82.66\n79.55\n\n81.28\n82.35\n81.84\n81.03\n80.39\n81.27\n79.93\n\n84.57\n82.91\n81.41\n83.87\n80.33\n84.21\n84.97\n\n59.49\n57.85\n65.79\n60.47\n\n70.08\n71.42\n72.99\n69.30\n68.63\n71.65\n72.77\n\n71.31\n74.22\n72.32\n72.30\n73.77\n72.54\n74.33\n\n61.27\n57.75\n59.92\n59.29\n\n66.84\n67.74\n69.98\n56.06\n56.35\n65.88\n66.25\n\n65.19\n70.73\n68.98\n70.62\n70.60\n71.08\n73.07\n\n70.39\n69.08\n74.50\n70.48\n\n79.62\n79.36\n78.13\n73.17\n74.15\n72.72\n78.79\n\n79.04\n79.62\n79.75\n79.95\n80.66\n80.59\n81.32\n\n61.31\n55.61\n58.52\n58.15\n\n64.56\n65.40\n59.80\n50.83\n52.25\n51.04\n67.32\n\n64.37\n66.05\n65.88\n66.47\n66.17\n66.68\n67.96\n\n67.17\n53.27\n61.71\n53.42\n\n63.54\n65.00\n65.63\n63.75\n63.75\n64.37\n64.46\n\n65.63\n66.89\n67.29\n65.83\n66.46\n67.50\n69.58\n\n63.76\n69.51\n66.01\n66.58\n\n70.43\n71.79\n69.09\n69.44\n71.49\n70.52\n70.33\n\n71.83\n70.22\n68.89\n70.12\n72.12\n70.86\n73.55\n\n61.36\n59.40\n69.57\n63.36\n\n67.74\n70.62\n70.43\n65.52\n66.74\n63.05\n70.35\n\n69.21\n71.32\n70.82\n69.88\n71.49\n71.41\n72.26\n\n63.76\n65.62\n68.65\n60.13\n\n64.67\n70.86\n69.86\n64.57\n62.91\n63.26\n69.37\n\n69.99\n71.27\n68.45\n70.21\n70.92\n71.14\n73.26\n\n14.84 11.80 10.30\n16.48 15.32 14.04\n8.54\n5.97\n16.16 13.78 11.72\n\n13.15\n\n6.93\n4.58\n3.95\n9.05\n7.17\n9.21\n5.12\n\n3.95\n2.69\n2.29\n3.75\n3.12\n2.08\n\n-\n\n8.59\n5.33\n8.16\n17.13\n16.72\n16.92\n6.82\n\n7.88\n3.33\n4.81\n3.43\n4.64\n2.69\n\n-\n\n4.69\n2.38\n2.24\n6.96\n6.27\n6.84\n3.23\n\n2.33\n1.00\n1.46\n2.06\n1.29\n1.27\n\n-\n\n8.71\n8.78\n7.41\n9.82\n\n5.01\n2.93\n4.45\n10.18\n9.88\n8.17\n3.92\n\n3.37\n2.33\n3.84\n2.30\n2.53\n1.77\n\n-\n\nTable 3: Classi\ufb01cation accuracies and F1 scores in percentages under the real imbalanced setting\n\nMethod\n\nCppcheck\nF1\n\nACC\n\nFlaw\ufb01nder\nACC\nF1\n\nCXXX\n\nACC\n\nF1\n\n3-layer BiLSTM 3-layer BiLSTM + Att\nACC\n\nACC\n\nF1\n\nF1\n\nLinux\n\nQEMU\n\n75.11\n\n89.21\n\n0\n\n0\n\nWireshark\n\n89.19\n\n10.17 89.92\n\n78.46\n\n12.57 19.44 5.07\n\n18.25\n\n86.24\n\n7.61\n\n9.46\n\n33.64 9.29\n\n29.07\n\n33.26 3.95\n\n91.39\n\nFFmpeg\n\n87.72\n\n0\n\n80.34\n\n12.86 36.04 2.45\n\n11.17\n\nCombined 85.41\n\n2.27\n\n85.65\n\n10.41 29.57 4.01\n\n9.65\n\n13.12\n\n15.54\n\n10.75\n\n18.71\n\n16.59\n\n8.79\n\n78.43\n\n84.90\n\n8.98\n\n15.58\n\n16.16\n\n10.50\n\n28.35\n\n16.48\n\n16.24\n\nCNN\n\nACC\n\nF1\n\nDevign (Composite)\nACC\n\nF1\n\n29.03 15.38 69.41\n\n75.88 18.80 89.27\n\n86.09\n\n8.69\n\n89.37\n\n70.07 31.25 69.06\n\n72.47 17.94 75.56\n\n24.64\n\n41.12\n\n42.05\n\n34.92\n\n27.25\n\nvariable. In an assignment statement, the left-hand-side (lhs) variable is assigned with a new value\nby the right-hand-side (rhs) expression. DFG_C captures such relations between the lhs variable\nand each of the rhs variable. Further, we remove functions with node size greater than 500 for\ncomputational ef\ufb01ciency, which accounts for 15%. We summarize the statistics of the data sets in\nTable 1.\n\n3.2 Baseline Methods\n\nIn the performance comparison, we compare Devign with the state-of-the-art machine-learning-based\nvulnerability prediction methods, as well as the gated graph recurrent network (Ggrn) that used the\nlinearly weighted summation for classi\ufb01cation.\nMetrics + Xgboost [25]: We collect totally 4 complexity metrics and 11 vulnerability metrics for\neach function using Joern, and utilize Xgboost for classi\ufb01cation. Here we did not use the proposed\nbinning and ranking method because it was not learning based, but a heuristic designed to rank the\nlikelihood of being vulnerable for the full functions in a project. We search the best parameters via\nBayes Optimization [28].\n3-layer BiLSTM [13]: It treats the source code as natural languages and input the tokenized code\ninto bidirectional LSTMs with initial embeddings trained via Word2vec. Here we implemented a\n3-layer bidirectional for the best performance.\n3-layer BiLSTM + Att: It is an improved version of [13] with the attention mechanism [29].\nCNN [14]: Similar to [13], it takes source code as natural languages and utilizes the bag of words to\nget the initial embeddings of code tokens, and then feeds them to CNNs to learn.\n\n3.3 Performance Evaluation\n\nDevign Con\ufb01guration In the embedding layer, the dimension of word2vec for the initial node\nrepresentation is 100. In the gated graph recurrent layer, we set the the dimension of hidden states as\n200, and number of time steps as 6. For the Conv parameters of Devign, we apply (1, 3) \ufb01lter with\n\n7\n\n\fReLU activation function for the \ufb01rst convolution layer which is followed by a max pooling layer\nwith (1, 3) \ufb01lter and (1, 2) stride, and (1, 1) \ufb01lter for the second convolution layer with a max pooling\nlayer with (2, 2) \ufb01lter and (1, 2) stride. We use the Adam optimizer with learning rate 0.0001 and\nbatch size 128, and L2 regularization to avoid over\ufb01tting. We randomly shuf\ufb02e each dataset and split\n75% for the training and the rest 25% for validation. We train our model on Nvidia Graphics Tesla\nM40 and P40, with 100-epoch patience for early stopping.\nResults Analysis We use accuracy and F1 score to measure performance. Table 2 summarizes all the\nexperiment results. First, we analyze the results regarding Q1, the performance of Devign with other\nlearning based methods. From the results on baseline methods, Ggrn and Devign with composite\ncode representations, we can see that both Ggrn and Devign signi\ufb01cantly outperform the baseline\nmethods in all the data sets. Especially, compared to all the baseline methods, the relative accuracy\ngain by Devign is averagely 10.51%, at least 8.54% on the QEMU dataset. Devign (Composite)\noutperforms the 4 baseline methods in terms of F1 score as well, i.e., the relative gain of F1 score\nis 8.68% on the average and the minimum relative gains on each dataset (Linux Kernel, QEMU,\nWirshark, FFmpeg and Combined) are 2.31%, 11.80%, 6.65%, 4.04% and 4.61% respectively. As\nLinux follows best practices of coding style, the F1 score 84.97 by Devign is the highest among all\ndatasets. Hence, Devign with comprehensive semantics encoded in graphs performs signi\ufb01cantly\nbetter than the state-of-the-art vulnerability identi\ufb01cation methods.\nNext, we investigate the answer to Q2 about the performance gain of Devign against Ggrn. We\n\ufb01rst look at the score with the composite code representation. It demonstrates that, in all the data\nsets, Devign reaches higher accuracy (an average of 3.23%) than Ggrn, where the highest accuracy\ngain is 5.12% on the FFmpeg data set. Also Devign gets better F1, an average of 3.92% higher than\nGgrn, where the highest F1 gain is 6.82 % on the QEMU data set. Meanwhile, we look at the score\nwith each single code representation, from which, we get similar conclusion that generally Devign\nsigni\ufb01cantly outperforms Ggrn, where the maximum accuracy gain is 9.21% for the DFG_W edge\nand the maximum F1 gain is 17.13% for the DFG_C. Overall the average accuracy and F1 gain by\nDevign compared with Ggrn are 4.66%, 6.37% among all cases, which indicates the Conv module\nextracts more related nodes and features for graph-level prediction.\nThen we check the results for Q3 to answer whether Devign can learn different types of code\nrepresentation and the performance on composite graphs. Surprisingly we \ufb01nd that the results learned\nfrom single-edged graphs are quite encouraging in both of Ggrn and Devign. For Ggrn, we \ufb01nd\nthat the accuracy in some speci\ufb01c types of edges is even slightly higher than that in the composite\ngraph, e.g., both CFG and NCS graphs have better results on the FFmpeg and combined data set. For\nDevign, in terms of accuracy, except the Linux data set, the composite graph representation is overall\nsuperior to any single-edged graph with the gain ranging from 0.11% to 3.75%. In terms of F1 score,\nthe improvement brought by composite graph compared with the single-edged graphs is averagely\n2.69%, ranging from 0.4% to 7.88% in the Devign in all tests. In summary, composite graphs help\nDevign to learn better prediction models than single-edged graphs.\nTo answer Q4 about the comparison with static analyzers on the real imbalanced dataset, we randomly\nsampled the test data to create imbalanced datasets with 10% vulnerable functions according to a large\nindustrial analysis [26]. We compare with the well-known open-source static analyzers Cppcheck,\nFlaw\ufb01nder, and a commercial tool CXXX which we hide the name for legal concern. The results\nare shown in Table 3, where our approach outperforms signi\ufb01cantly with a 27.99% higher average\nF1 score compared with the performance of all the analyzers and on all the datasets (individual and\ncombined). Meanwhile, static analyzers tend to miss most vulnerable functions and have high false\npositives, e.g., Cppcheck found 0 vulnerability in 3 out of the 4 single project datasets.\nFinally to answer Q5 on the latest exposed vulnerabilities, we scrape the latest 10 CVEs of each project\nrespectively to check whether Devign can be potentially applied to identify zero-day vulnerabilities.\nBased on commit \ufb01x of the 40 CVEs, we totally get 112 vulnerable functions. We input these\nfunctions into the trained Devign model and achieve an average accuracy of 74.11%, which manifests\nDevign\u2019s potentiality of discovering new vulnerabilities in practical applications.\n\n4 Related Work\n\nThe success of deep learning has inspired the researchers to apply it for more automated solutions to\nvulnerability discovery on source code [15, 13, 14]. The recent works [13, 15, 14] treat source code\n\n8\n\n\fas \ufb02at natural language sequences, and explore the potential of natural language process techniques\nin vulnerability detection. For instance, [15, 13] built models upon LSTM/BiLSTM neural networks,\nwhile [14] proposed to use the CNNs instead.\nTo overcome the limitations of the aforementioned models on expressing logic and structures in code,\na number of works have attempted to probe more structural neural networks such as tree structures\n[30] or graph structures [18, 31, 27] for various tasks. For instance, [18] proposed to generate\nlogical formulas for program veri\ufb01cation through gated graph recurrent networks, and [27] aimed at\nprediction of variable names and variable miss-usage. [31] proposed Gemini for binary code similarity\ndetection, where functions in binary code are represented by attributed control \ufb02ow graphs and input\nStructure2vec [22] for learning graph embedding. Different from all these works, our work targeted at\nvulnerability identi\ufb01cation, and incorporated comprehensive code representations to express as many\ntypes of vulnerabilities as possible. Beside, our work adopt gated graph recurrent layers in [18] to\nconsider semantics of nodes (e.g., node annotations) as well as the structural features, both of which\nare important in vulnerability identi\ufb01cation. Structure2vec focuses primarily on learning structural\nfeatures. Compared with [27] that applies gated graph recurrent network for variable prediction, we\nexplicitly incorporate control \ufb02ow graph into the composite graph and propose the Conv module for\nef\ufb01cient graph-level classi\ufb01cation.\n\n5 Conclusion and Future Work\n\nWe introduce a novel vulnerability identi\ufb01cation model Devign that is able to encode a source-code\nfunction into a joint graph structure from multiple syntax and semantic representations and then\nleverage the composite graph representation to effectively learn to discover vulnerable code. It\nachieved a new state of the art on machine-learning-based vulnerable function discovery on real\nopen-source projects. Interesting future works include ef\ufb01cient learning from big functions via\nintegrating program slicing, applying the learnt model to detect vulnerabilities cross projects, and\ngenerating human-readable or explainable vulnerability assessment.\n\nAcknowledgements\n\nThis work was supported by Alibaba-NTU JRI project (M4062640.J4A), Security: A Compositional\nApproach Of Building Security Veri\ufb01ed System (M4192001.023.710079) and National Research\nFoundation, Prime Ministers Of\ufb01ce, Singapore Under its National Cybersecurity R&D Program\n(Award No. NRF2018NCR-NCR005-001).\n\nReferences\n[1] Z. Xu, B. Chen, M. Chandramohan, Y. Liu, and F. Song, \u201cSpain: security patch analysis for\nbinaries towards understanding the pain and pills,\u201d in Proceedings of the 39th International\nConference on Software Engineering.\n\nIEEE Press, 2017, pp. 462\u2013472.\n\n[2] M. Chandramohan, Y. Xue, Z. Xu, Y. Liu, C. Y. Cho, and H. B. K. Tan, \u201cBingo: Cross-\narchitecture cross-os binary search,\u201d in Proceedings of the 2016 24th ACM SIGSOFT Interna-\ntional Symposium on Foundations of Software Engineering. ACM, 2016, pp. 678\u2013689.\n\n[3] Y. Li, Y. Xue, H. Chen, X. Wu, C. Zhang, X. Xie, H. Wang, and Y. Liu, \u201cCerebro: context-\naware adaptive fuzzing for effective vulnerability detection,\u201d in Proceedings of the 2019 27th\nACM Joint Meeting on European Software Engineering Conference and Symposium on the\nFoundations of Software Engineering. ACM, 2019, pp. 533\u2013544.\n\n[4] H. Chen, Y. Xue, Y. Li, B. Chen, X. Xie, X. Wu, and Y. Liu, \u201cHawkeye: Towards a desired\ndirected grey-box fuzzer,\u201d in Proceedings of the 2018 ACM SIGSAC Conference on Computer\nand Communications Security. ACM, 2018, pp. 2095\u20132108.\n\n[5] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu, \u201cSteelix: program-state based\nbinary fuzzing,\u201d in Proceedings of the 2017 11th Joint Meeting on Foundations of Software\nEngineering. ACM, 2017, pp. 627\u2013637.\n\n9\n\n\f[6] J. Wang, B. Chen, L. Wei, and Y. Liu, \u201cSuperion: grammar-aware greybox fuzzing,\u201d in\nIEEE Press, 2019,\n\nProceedings of the 41st International Conference on Software Engineering.\npp. 724\u2013735.\n\n[7] \u2014\u2014, \u201cSky\ufb01re: Data-driven seed generation for fuzzing,\u201d in 2017 IEEE Symposium on Security\n\nand Privacy (SP).\n\nIEEE, 2017, pp. 579\u2013594.\n\n[8] Y. Xue, Z. Xu, M. Chandramohan, and Y. Liu, \u201cAccurate and scalable cross-architecture cross-os\n\nbinary code search with emulation,\u201d IEEE Transactions on Software Engineering, 2018.\n\n[9] S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, \u201cPredicting vulnerable software\ncomponents,\u201d in Proceedings of the 14th ACM Conference on Computer and Communications\nSecurity, ser. CCS \u201907. New York, NY, USA: ACM, 2007, pp. 529\u2013540. [Online]. Available:\nhttp://doi.acm.org/10.1145/1315245.1315311\n\n[10] V. H. Nguyen and L. M. S. Tran, \u201cPredicting vulnerable software components with dependency\ngraphs,\u201d in Proceedings of the 6th International Workshop on Security Measurements and\nMetrics, ser. MetriSec \u201910. New York, NY, USA: ACM, 2010, pp. 3:1\u20133:8. [Online]. Available:\nhttp://doi.acm.org/10.1145/1853919.1853923\n\n[11] Y. Shin, A. Meneely, L. Williams, and J. A. Osborne, \u201cEvaluating complexity, code\nchurn, and developer activity metrics as indicators of software vulnerabilities,\u201d IEEE\nTrans. Softw. Eng., vol. 37, no. 6, pp. 772\u2013787, Nov. 2011.\n[Online]. Available:\nhttp://dx.doi.org/10.1109/TSE.2010.81\n\n[12] \u201cCWE List Version 3.1,\u201d \"https://cwe.mitre.org/data/index.html\", 2018.\n\n[13] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, \u201cVuldeepecker: A deep\nlearning-based system for vulnerability detection,\u201d in 25th Annual Network and Distributed\nSystem Security Symposium (NDSS 2018), 2018.\n\n[14] R. Russell, L. Kim, L. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. Ellingwood, and\nM. McConley, \u201cAutomated vulnerability detection in source code using deep representation\nlearning,\u201d in 2018 17th IEEE International Conference on Machine Learning and Applications\n(ICMLA).\n\nIEEE, 2018, pp. 757\u2013762.\n\n[15] H. K. Dam, T. Tran, T. Pham, S. W. Ng, J. Grundy, and A. Ghose, \u201cAutomatic feature learning\n\nfor vulnerability prediction,\u201d arXiv preprint arXiv:1708.02368, 2017.\n\n[16] Juliet test suite. [Online]. Available: https://samate.nist.gov/SRD/around.php\n\n[17] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, \u201cModeling and discovering vulnerabilities with\ncode property graphs,\u201d in Proceedings of the 2014 IEEE Symposium on Security and Privacy,\nser. SP \u201914. Washington, DC, USA: IEEE Computer Society, 2014, pp. 590\u2013604. [Online].\nAvailable: http://dx.doi.org/10.1109/SP.2014.44\n\n[18] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, \u201cGated graph sequence neural networks,\u201d\n\narXiv preprint arXiv:1511.05493, 2015.\n\n[19] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, \u201cModeling\nrelational data with graph convolutional networks,\u201d in European Semantic Web Conference.\nSpringer, 2018, pp. 593\u2013607.\n\n[20] P. Veli\u02c7ckovi\u00b4c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, \u201cGraph attention\n\nnetworks,\u201d arXiv preprint arXiv:1710.10903, 2017.\n\n[21] \u201cRepresentation Learning on Networks,\u201d \"http://snap.stanford.edu/proj/embeddings-www/\",\n\n2018.\n\n[22] H. Dai, B. Dai, and L. Song, \u201cDiscriminative embeddings of latent variable models for structured\n\ndata,\u201d in International conference on machine learning, 2016, pp. 2702\u20132711.\n\n[23] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, \u201cHierarchical graph repre-\nsentation learning with differentiable pooling,\u201d in Advances in Neural Information Processing\nSystems, 2018, pp. 4805\u20134815.\n\n10\n\n\f[24] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, \u201cAn end-to-end deep learning architecture for\n\ngraph classi\ufb01cation,\u201d in Thirty-Second AAAI Conference on Arti\ufb01cial Intelligence, 2018.\n\n[25] X. Du, B. Chen, Y. Li, J. Guo, Y. Zhou, Y. Liu, and Y. Jiang, \u201cLeopard: Identifying vulner-\nable code for vulnerability assessment through program metrics,\u201d in Proceedings of the 41st\nInternational Conference on Software Engineering, 2019, pp. 60\u201371.\n\n[26] Y. Zhou and A. Sharma, \u201cAutomated identi\ufb01cation of security issues from commit messages\nand bug reports,\u201d in Proceedings of the 2017 11th Joint Meeting on Foundations of Software\nEngineering, ser. ESEC/FSE 2017. New York, NY, USA: ACM, 2017, pp. 914\u2013919. [Online].\nAvailable: http://doi.acm.org/10.1145/3106237.3117771\n\n[27] M. Allamanis, M. Brockschmidt, and M. Khademi, \u201cLearning to represent programs with\n\ngraphs,\u201d 11 2017.\n\n[28] J. Snoek, H. Larochelle, and R. P. Adams, \u201cPractical bayesian optimization of machine learning\n\nalgorithms,\u201d in Advances in neural information processing systems, 2012, pp. 2951\u20132959.\n\n[29] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, \u201cHierarchical attention networks\nfor document classi\ufb01cation,\u201d in Proceedings of the 2016 Conference of the North American\nChapter of the Association for Computational Linguistics: Human Language Technologies,\n2016, pp. 1480\u20131489.\n\n[30] L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, \u201cConvolutional neural networks over tree\n\nstructures for programming language processing.\u201d in AAAI, vol. 2, no. 3, 2016, p. 4.\n\n[31] X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, \u201cNeural network-based graph embedding\nfor cross-platform binary code similarity detection,\u201d in Proceedings of the 2017 ACM SIGSAC\nConference on Computer and Communications Security. ACM, 2017, pp. 363\u2013376.\n\n11\n\n\f", "award": [], "sourceid": 5389, "authors": [{"given_name": "Yaqin", "family_name": "Zhou", "institution": "Nanyang Technological University"}, {"given_name": "Shangqing", "family_name": "Liu", "institution": "Nanyang Technological University"}, {"given_name": "Jingkai", "family_name": "Siow", "institution": "Nanyang Technological University"}, {"given_name": "Xiaoning", "family_name": "Du", "institution": "Nanyang Technological University"}, {"given_name": "Yang", "family_name": "Liu", "institution": "Nanyang Technology University, Singapore"}]}