{"title": "Certifiable Robustness to Graph Perturbations", "book": "Advances in Neural Information Processing Systems", "page_first": 8319, "page_last": 8330, "abstract": "Despite the exploding interest in graph neural networks there has been little effort to verify and improve their robustness. This is even more alarming given recent findings showing that they are extremely vulnerable to adversarial attacks on both the graph structure and the node attributes. We propose the first method for verifying certifiable (non-)robustness to graph perturbations for a general class of models that includes graph neural networks and label/feature propagation. By exploiting connections to PageRank and Markov decision processes our certificates can be efficiently (and under many threat models exactly) computed. Furthermore, we investigate robust training procedures that increase the number of certifiably robust nodes while maintaining or improving the clean predictive accuracy.", "full_text": "Certi\ufb01able Robustness to Graph Perturbations\n\nAleksandar Bojchevski\n\nTechnical University of Munich\na.bojchevski@in.tum.de\n\nStephan G\u00fcnnemann\n\nTechnical University of Munich\n\nguennemann@in.tum.de\n\nAbstract\n\nDespite the exploding interest in graph neural networks there has been little effort\nto verify and improve their robustness. This is even more alarming given recent\n\ufb01ndings showing that they are extremely vulnerable to adversarial attacks on\nboth the graph structure and the node attributes. We propose the \ufb01rst method for\nverifying certi\ufb01able (non-)robustness to graph perturbations for a general class\nof models that includes graph neural networks and label/feature propagation. By\nexploiting connections to PageRank and Markov decision processes our certi\ufb01cates\ncan be ef\ufb01ciently (and under many threat models exactly) computed. Furthermore,\nwe investigate robust training procedures that increase the number of certi\ufb01ably\nrobust nodes while maintaining or improving the clean predictive accuracy.\n\n1\n\nIntroduction\n\nAs the number of machine learning models deployed in the real world grows, questions regarding\ntheir robustness become increasingly important. In particular, it is critical to assess their vulnerability\nto adversarial attacks \u2013 deliberate perturbations of the data designed to achieve a speci\ufb01c (malicious)\ngoal. Graph-based models suffer from poor adversarial robustness [13, 60], yet in domains where\nthey are often deployed (e.g. the Web) [50], adversaries are pervasive and attacks have a low cost\n[9, 26]. Even in scenarios where adversaries are not present such analysis is important since it allows\nus to reason about the behavior of our models in the worst case (i.e. treating nature as an adversary).\nHere we focus on semi-supervised node classi\ufb01cation \u2013 given a single large (attributed) graph and\nthe class labels of a few nodes the goal is to predict the labels of the remaining unlabelled nodes.\nGraph Neural Networks (GNNs) have emerged as the de-facto way to tackle this task, signi\ufb01cantly\nimproving performance over the previous state-of-the-art. They are used for various high impact\napplications across many domains such as: protein interface prediction [20], classi\ufb01cation of scienti\ufb01c\npapers [28], fraud detection [44], and breast cancer classi\ufb01cation [36]. Therefore, it is crucial to asses\ntheir sensitivity to adversaries and ensure they behave as expected.\nHowever, despite their popularity there is scarcely any work on certifying or improving the robustness\nof GNNs. As shown in Z\u00fcgner et al. [60] node classi\ufb01cation with GNNs is not robust and can even\nbe attacked on multiple fronts \u2013 slight perturbations of either the node features or the graph structure\ncan lead to wrong predictions. Moreover, since we are dealing with non i.i.d. data by taking the\ngraph structure into account, robustifying GNNs is more dif\ufb01cult compared to traditional models \u2013\nperturbing only a few edges affects the predictions for all nodes. What can we do to fortify GNNs\nand make sure they produce reliable predictions in the presence of adversarial perturbations?\nWe propose the \ufb01rst method for provable robustness regarding perturbations of the graph structure.\nOur approach is applicable to a general family of models where the predictions are a linear function\nof (personalized) PageRank. This family includes GNNs [29] and other graph-based models such as\nlabel/feature propagation [7, 53]. Speci\ufb01cally, we provide: 1. Certi\ufb01cates: Given a trained model\nand a general set of admissible graph perturbations we can ef\ufb01ciently verify whether a node is\ncerti\ufb01ably robust \u2013 there exists no perturbation that can change its prediction. We also provide non-\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\frobustness certi\ufb01cates via adversarial examples. 2. Robust training: We investigate robust training\nschemes based on our certi\ufb01cates and show that they improve both robustness and clean accuracy.\nOur theoretical \ufb01ndings are empirically demonstrated and the code is provided for reproducibility1.\nInterestingly, in contrast to existing works on provable robustness [23, 46, 59] that derive bounds (by\nrelaxing the problem), we can ef\ufb01ciently compute exact certi\ufb01cates for some threat models.\n\n2 Related work\n\nNeural networks [41, 21], and recently graph neural networks [13, 60, 58] and node embeddings [5]\nwere shown to be highly sensitive to small adversarial perturbations. There exist many (heuristic)\napproaches aimed at robustifying these models, however, they have only limited usefulness since there\nis always a new attack able to break them, leading to a cat-and-mouse game between attackers and\ndefenders. A more promising line of research studies certi\ufb01able robustness [23, 35, 46]. Certi\ufb01cates\nprovide guarantees that no perturbation regarding a speci\ufb01c threat model will change the prediction\nof an instance. So far there has been almost no work on certifying graph-based models.\nDifferent heuristics have been explored in the literature to improve robustness of graph-based models:\n(virtual) adversarial training [10, 16, 40, 49], trainable edge weights [48], graph encoder re\ufb01ning and\nadversarial contrastive learning [45], transfer learning [42], smoothing distillation [10], decoupling\nstructure from attributes [31], measuring logit discrepancy [51], allocating reliable queries [56],\nrepresenting nodes as Gaussian distributions [57], and Bayesian graph neural networks [52]. Other\nrobustness aspects of graph-based models (e.g. noise or anomalies) have also been investigated\n[3, 6, 24]. However, none of these works provide provable guarantees or certi\ufb01cates.\nZ\u00fcgner & G\u00fcnnemann [59] is the only work that proposes robustness certi\ufb01cates for graph neural\nnetworks (GNNs). However, their approach can handle perturbations only to the node attributes. Our\napproach is completely orthogonal to theirs since we consider adversarial perturbations to the graph\nstructure instead. Furthermore, our certi\ufb01cates are also valid for other semi-supervised learning\napproaches such as label/feature propagation. Nonetheless, there is a critical need for both types\nof certi\ufb01cates given that GNNs are shown to be vulnerable to attacks on both the attributes and the\nstructure. As future work, we aim to consider perturbations of the node features and the graph jointly.\n\n3 Background and preliminaries\nLet G = (V,E) be an attributed graph with N = |V| nodes and edge set E \u2286 V \u00d7 V. We denote with\nA \u2208 {0, 1}N\u00d7N the adjacency matrix and X \u2208 RN\u00d7D the matrix of D-dimensional node features\nfor each node. Given a subset VL \u2286 V = {1, . . . , N} of labelled nodes the goal of semi-supervised\nnode classi\ufb01cation is to predict for each node v \u2208 V one class in C = {1, . . . , K}. We focus\non deriving (exact) robustness certi\ufb01cates for graph neural networks via optimizing personalized\nPageRank. We also show (Appendix 8.1) how to apply our approach for label/feature propagation [7].\nTopic-sensitive PageRank. The topic-sensitive PageRank [22, 27] vector \u03c0G(z) for a graph G and\na probability distribution over nodes z is de\ufb01ned as \u03c0G,\u03b1(z) = (1 \u2212 \u03b1)(IN \u2212 \u03b1D\u22121A)\u22121z. 2 Here\nj Aij. Intuitively, \u03c0(z)u represent the\nprobability of random walker on the graph to land at node u when it follows edges at random with\nprobability \u03b1 and teleports back to the node v with probability (1 \u2212 \u03b1)zv. Thus, we have \u03c0(z)u \u2265 0\nu \u03c0(z)u = 1. For z = ev, the v-th canonical basis vector, we get the personalized PageRank\nvector for node v. We drop the index on G, \u03b1 and z in \u03c0G,\u03b1(z) when they are clear from the context.\nGraph neural networks. As an instance of graph neural network (GNN) methods we consider an\nadaptation of the recently proposed PPNP approach [29] since it shows superior performance on the\nsemi-supervised node classi\ufb01cation task [19]. PPNP unlike message-passing GNNs decouples the\nfeature transformation from the propagation. We have:\n\nD is a diagonal matrix of node out-degrees with Dii =(cid:80)\nand(cid:80)\n\nY = softmax(cid:0)\u03a0symH(cid:1), Hv,: = f\u03b8(Xv,:), \u03a0sym = (1 \u2212 \u03b1)(IN \u2212 \u03b1D\u22121/2AD\u22121/2)\u22121 (1)\n\nwhere IN is the identity, \u03a0sym \u2208 RN\u00d7N is a symmetric propagation matrix, H \u2208 RN\u00d7C collects\nthe individual per-node logits, and Y \u2208 RN\u00d7C collects the \ufb01nal predictions after propagation. A\n\n1Code, data, and supplementary material available at https://www.daml.in.tum.de/graph-cert\n2In practice we do not invert the matrix, but rather we solve the associated sparse linear system of equations.\n\n2\n\n\fneural network f\u03b8 outputs the logits Hv,: by processing the features Xv,: of every node v indepen-\ndently. Multiplying them with \u03a0sym we obtain the diffused logits Hdiff := \u03a0symH which implicitly\nincorporate the graph structure and avoid the expensive multi-hop message-passing procedure.\nTo make PPNP more amenable to theoretical analysis we replace \u03a0sym with the personalized\nPageRank matrix \u03a0 = (1 \u2212 \u03b1)(IN \u2212 \u03b1D\u22121A)\u22121 which has a similar spectrum. Here each\nrow \u03a0v,: = \u03c0(ev) equals to the personalized PageRank vector of node v. This model which we\ndenote as \u03c0-PPNP has similar prediction performance to PPNP. We can see that the diffused logit\nafter propagation for class c of node v is a linear function of its personalized PageRank score:\nHdiff\nv,c = \u03c0(ev)T H:,c, i.e. a weighted combination of the logits of all nodes for class c. Similarly, the\n= \u03c0(ev)T (H:,c1 \u2212 H:,c2 ) de\ufb01ned as the difference in logits for\nmargin mc1,c2 (v) = Hdiff\nv,c1\nnode v for two given classes c1 and c2 is also linear in \u03c0(ev). If minc myv,c(v) < 0, where yv is the\nground-truth label for v, the node is misclassi\ufb01ed since the prediction equals arg maxc Hdiff\nv,c.\n\n\u2212 Hdiff\n\nv,c2\n\n4 Robustness certi\ufb01cates\n\n4.1 Threat model, fragile edges, global and local budget\n\nWe investigate the scenario in which a subset of edges in a directed graph are \"fragile\", i.e. an attacker\nhas control over them, or in general we are not certain whether these edges are present in the graph.\nFormally, we are given a set of \ufb01xed edges Ef \u2286 E that cannot be modi\ufb01ed (assumed to be reliable),\nand set of fragile edges F \u2286 (V \u00d7 V) \\ Ef . For each fragile edge (i, j) \u2208 F the attacker can decide\nwhether to include it in the graph or exclude it from the graph, i.e. set Aij to 1 or 0 respectively. For\nany subset of included F+ \u2286 F edges we can form the perturbed graph \u02dcG = (V, \u02dcE := Ef \u222a F+).\nAn excluded fragile edge (i, j) \u2208 F \\ F+ is a non-edge in \u02dcG. This formulation is general, since we\ncan set Ef and F arbitrarily. For example, for our certi\ufb01cate scenario given an existing clean graph\nG = (V,E) we can set Ef = E and F \u2286 V \u00d7 V which implies the attacker can only add new edges\nto obtain perturbed graphs \u02dcG. Or we can set Ef = \u2205 and F = E so that the attacker can only remove\nedges, and so on. There are 2|F| (exponential) number of valid con\ufb01gurations leading to different\nperturbed graphs which highlights that certi\ufb01cates are challenging for graph perturbations.\nIn reality, perturbing an edge is likely to incur some cost for the attacker. To capture this we introduce\na global budget. The constraint | \u02dcE \\ E| + |E \\ \u02dcE| \u2264 B implies that the attacker can make at most\nB perturbations. The \ufb01rst term equals to the number of newly added edges, and the second to the\nnumber of removed existing edges. Here, including an edge that already exists does not count towards\nthe budget. This is only a design choice that depends on the application, and our method works\nin general. Furthermore, perturbing many edges for a single node might not be desirable, thus we\nalso allow to limit the number of perturbations locally. Let E v = {(v, j) \u2208 E} be the set of edges\nthat share the same source node v. Then, the constraint | \u02dcE v \\ E v| + |E v \\ \u02dcE v| \u2264 bv enforces a local\nbudget bv for the node v. By setting bv = |F v| and B = |F| we can model an unconstrained attacker.\nLetting P(F) be the power set of F, we de\ufb01ne the set of admissible perturbed graphs:\nQF = {(V, \u02dcE := Ef \u222aF+) | F+ \u2208 P(F), | \u02dcE\\E|+|E\\ \u02dcE| \u2264 B, | \u02dcE v\\E v|+|E v\\ \u02dcE v| \u2264 bv,\u2200v} (2)\n\n4.2 Robustness certi\ufb01cates\nProblem 1. Given a graph G, a set of \ufb01xed Ef and fragile F edges, global B and local bv budgets,\ntarget node t, and a model with logits H. Let yt denote the class of node t (predicted or ground-truth).\nThe worst-case margin between class yt and class c under any admissible perturbation \u02dcG \u2208 QF is:\n(3)\nyt,c(t) > 0, node t is certi\ufb01ably robust w.r.t. the logits H, and the set QF .\nIf m\u2217\nOur goal is to verify whether no admissible \u02dcG \u2208 QF can change the prediction for a target node t.\nFrom Problem 1 we see that if the worst margin over all classes m\u2217\nyt,c(t) > 0,\nfor all yt (cid:54)= c, which implies that there exists no adversarial example within QF that leads to a change\nin the prediction to some other class c, that is, the logit for the given class yt is always largest.\n\nyt,c(t) = min \u02dcG\u2208QF myt,c(t) = min \u02dcG\u2208QF \u03c0 \u02dcG(et)T (H:,yt \u2212 H:,c)\nm\u2217\n\nyt,\u2217(t) is positive, then m\u2217\n\nyt,\u2217(t) = minc(cid:54)=yt m\u2217\n\n3\n\n\fChallenges and core idea. From a cursory look at Eq. 3 it appears that \ufb01nding the minimum\nis intractable. After all, our domain is discrete and we are optimizing over exponentially many\ncon\ufb01gurations. Moreover, the margin is a function of the personalized PageRank which has a non-\ntrivial dependency on the perturbed graph. But there is hope: For a \ufb01xed H, the margin myt,c(t) is a\nlinear function of \u03c0(et). Thus, Problem 1 reduces to optimizing a linear function of personalized\nPageRank over a speci\ufb01c constraint set. This is the core idea of our approach. As we will show, if we\nconsider only local budget constraints the exact certi\ufb01cate can be ef\ufb01ciently computed. This is in\ncontrast to most certi\ufb01cates for neural networks that rely on different relaxations to make the problem\ntractable. Including the global budget constraint, however, makes the problem hard. For this case we\nderive an ef\ufb01cient to compute lower bound on the worst-case margin. Thus, if the lower bound is\npositive we can still guarantee that our classi\ufb01er is robust w.r.t. the set of admissible perturbations.\n\n4.3 Optimizing topics-sensitive PageRank with global and local constraints\n\nWe are interested in optimizing a linear function of the topic-sensitive PageRank vector of a graph by\nmodifying its structure. That is, we want to con\ufb01gure a set of fragile edges into included/excluded to\nobtain a perturbed graph \u02dcG maximizing the objective. Formally, we study the general problem:\nProblem 2. Given a graph G, a set of admissible perturbations QF as in Problem 1, and any \ufb01xed\nz, r \u2208 RN , \u03b1 \u2208 (0, 1) solve the following optimization problem: max \u02dcG\u2208QF rT \u03c0 \u02dcG,\u03b1(z).\nSetting r = \u2212(H:,yt \u2212 H:,c) and z = et, we see that Problem 1 is a special case of Problem 2. We\ncan think of r as a reward/cost vector, i.e. rv is the reward that a random walker obtains when visiting\nnode v. The objective value rT \u03c0(z) is proportional to the overall reward obtained during an in\ufb01nite\nrandom walk with teleportation since \u03c0(z)v exactly equals to the frequency of visits to v.\nVariations and special cases of this problem have been previously studied [2, 11, 12, 15, 18, 25, 32].\nNotably, Fercoq et al. [18] cast the problem as an average cost in\ufb01nite horizon Markov decision\nprocess (MDP), also called ergodic control problem, where each node corresponds to a state and the\nactions correspond to choosing a subset of included fragile edges, i.e. we have 2|F v| actions at each\nstate v (see also Fig. 2a). They show that despite the exponential number of actions, the problem can\nbe ef\ufb01ciently solved in polynomial time, and they derive a value iteration algorithm with different\nlocal constraints. They enforce that the \ufb01nal perturbed graph has at most bv total number of edges per\nnode, while we enforce that at most bv edges per node are perturbed (see Sec. 4.1).\nOur approach for local budget only. Inspired by the MDP idea we derive a policy iteration (PI)\nalgorithm which also runs in polynomial time [25]. Intuitively, every policy corresponds to a perturbed\ngraph in QF , and each iteration improves the policy. The PI algorithm allows us to: incorporate our\nlocal constraints easily, take advantage of ef\ufb01cient solvers for sparse systems of linear equations (line\n3 in Alg. 1), and implement the policy improvement step in parallel (lines 4-6 in Alg. 1). It can easily\nhandle very large sets of fragile edges and it scales to large graphs.\nProposition 1. Algorithm 1 which greedily selects the fragile edges \ufb01nds an optimal solution for\nProblem 2 with only local constraints in a number of steps independent of the size of the graph.\n\nAlgorithm 1 POLICY ITERATION WITH LOCAL BUDGET\nRequire: Graph G = (V,E), reward r, set of \ufb01xed Ef and fragile F edges, local budgets bv\n1: Initialization: W0 \u2286 F as any arbitrary subset, AG corresponding to G\n2: while Wk (cid:54)= Wk\u22121 do\nSolve (IN \u2212 \u03b1D\u22121A)x = r for x, where Aij = 1 \u2212 AG\n3:\nLet lij \u2190 (1 \u2212 2AG\n4:\nLet Lv \u2190 {(v, j) \u2208 F | lvj > 0 \u2227 lvj \u2265 top bv largest lvj}, \u2200v \u2208 V\n5:\n\n\u03b1 ) for all (i, j) \u2208 F\n\nij if (i, j) \u2208 Wk\n\n# \ufb02ip the edges\n# calculate the improvement\n\n6: Wk \u2190(cid:83)\n\n7: end while\n8: return Wk\n\nv Lv,\n\nij)(xj \u2212 xi\u2212ri\nk \u2190 k + 1\n\n# optimal graph \u02dcG \u2208 QF obtained by \ufb02ipping all (i, j) \u2208 Wk of G\n\nWe provide the proof in Sec. 8.3 in the appendix. The main idea for Alg. 1 is starting from a random\npolicy, in each iteration we \ufb01rst compute the mean reward before teleportation x for the current policy\n(line 3), and then greedily select the top bv edges that improve the policy (lines 4-6). This algorithm\nis guaranteed to converge to the optimal policy, and thus to the optimal con\ufb01guration of fragile edges.\n\n4\n\n\fFigure 1: The upper part outlines our approach for local budget only: the exact certi\ufb01cate is ef\ufb01ciently\ncomputed with policy iteration. The lower part outlines our 3 step approach for local and global\nbudget: (a) formulate an MDP on an auxiliary graph, (b) augment the corresponding LP with quadratic\nconstraints to enforce the global budget, and (c) apply the RLT relaxation to the resulting QCLP.\n\nCerti\ufb01cate for local budget only. Proposition 1 implies that for local constraints only, the optimal\nsolution does not depend on the teleport vector z. Regardless of the node t (i.e. which z = et in\nEq. 3), the optimal edges to perturb are the same if the admissible set QF and the reward r are the\nsame. This means that for a \ufb01xed QF we only need to run the algorithm K \u00d7 K times to obtain\nthe certi\ufb01cates for all N nodes: For each pair of classes c1, c2 we have a different reward vector\nyt,\u2217(\u00b7) for all N nodes\nr = \u2212(H:,c1 \u2212 H:,c2 ), and we can recover the exact worst-case margins m\u2217\nyt,\u2217(\u00b7) > 0 implies\nby just computing \u03a0 on the resulting K \u00d7 K many perturbed graphs \u02dcG. Now, m\u2217\nyt,\u2217(\u00b7) < 0 implies certi\ufb01able non-robustness due to the exactness of\ncerti\ufb01able robustness, while m\u2217\nour certi\ufb01cate, i.e. we have found an adversarial example for node t.\nOur approach for both local and global budget. Algorithm 1 cannot handle a global budget\nconstraint, and in general solving Problem 2 with global budget is NP-hard. More speci\ufb01cally, it\ngeneralizes the Link Building problem [32] \u2013 \ufb01nd the set of k optimal edges that point to a given\nnode such that its PageRank score is maximized \u2013 which is W[1]-hard and for which there exists no\nfully-polynomial time approximation scheme (FPTAS). It follows that Problem 2 is also W[1]-hard\nand allows no FPTAS. We provide the proof and more details in Sec. 8.5 in the appendix. Therefore,\nwe develop an alternative approach that consists of three steps and is outlined in the lower part of\nFig. 1: (a) We propose an alternative unconstrained MDP based on an auxiliary graph which reduces\nthe action set from exponential to binary by adding only |F| auxiliary nodes; (b) We reformulate the\nproblem as a non-convex Quadratically Constrained Linear Program (QCLP) to be able to handle the\nglobal budget; (c) We utilize the Reformulation Linearization Technique (RLT) to construct a convex\nrelaxation of the QCLP, enabling us to ef\ufb01ciently compute a lower bound on the worst-case margin.\n(a) Auxiliary graph. Given an input graph we add one auxiliary node vij for each fragile edge\n(i, j) \u2208 F. We de\ufb01ne a total cost in\ufb01nite horizon MDP on this auxiliary graph (Fig. 2b) that\nsolves Problem 2 without constraints. The MDP is de\ufb01ned by the 4-tuple (S, (Ai)i\u2208S , p, r), where\nS is the state space (preexisting and auxiliary nodes), and Ai is the set of admissible actions\nin state i. Given action a \u2208 Ai, p(j|i, a) is the probability to go to state j from state i and\nr(i, a) the instantaneous reward. Each preexisting node i has a single action Ai = {a}, reward\nr(i, a) = ri, and uniform transitions p(vij|i, a) = d\u22121\n,\u2200vij \u2208 F i, discounted by \u03b1 for the \ufb01xed\nedges p(j|i, a) = \u03b1 \u00b7 d\u22121\nf \u222a F i| is the degree. For each auxiliary\n,\u2200(i, j) \u2208 Ef , where di = |E i\nnode we allow two actions Avij = {on, off}. For action \"off\" node vij goes back to node i\nwith probability 1 and obtains reward \u2212ri: p(i|vij, off) = 1, r(vij, off) = \u2212ri. For action \"on\"\nnode vij goes only to node j with probability \u03b1 (the model is substochastic) and obtains 0 reward:\np(j|vij, on) = \u03b1, r(vij, on) = 0. We introduce fewer aux. nodes compared to previous work [11, 17].\n(b) Global and local budgets QCLP. Based on this unconstrained MDP, we can derive a correspond-\ning linear program (LP) solving the same problem [34]. Since the MDP on the auxiliary graph has\n(at most) binary action sets, the LP has only 2|V| + 3|F| constraints and variables. This is in strong\ncontrast to the LP corresponding to the previous average cost MPD [18] operating directly on the\noriginal graph that has an exponential number of constraints and variables. Lastly, we enrich the\nLP for the aux. graph MDP with additional constraints enforcing the local and global budgets. The\nconstraints for the local budget are linear, however, the global budget requires quadratic constraints\nresulting in a quadratically constrained linear program (QCLP) that exactly solves Problem 2.\n\ni\n\ni\n\n5\n\n\f(a) Ai = {\u2205,{j},{k},{j, k}}\nFigure 2: Construction of the auxiliary graph. For each fragile edge (i, j) marked with a red dashed\nline, we add one node vij and two actions: {on, off} to the auxiliary graph. If the edge is con\ufb01gured\nas \"on\" vij goes back to node i with prob. 1. If con\ufb01gured as \"off\" it goes only to node j with prob. \u03b1.\n\n(b) Ai = {a} and Avij = Avik = {off, on}\n\nProposition 2. Solving the following QCLP (with decision variables xv, x0\nij) is equiva-\nij, \u03b21\nlent to solving Problem 2 with local and global constraints, i.e. the value of the objective function is\nthe same in the optimal solution. We can recover \u03c0(z)v from xv via \u03c0(z)v = (1 \u2212 kvd\u22121\nv )xv. Here\nij > 0) in the optimal solution.\nkv is the number of \"off\" fragile edges (the ones where x0\n\nij, \u03b20\n\nij, x1\n\n(cid:88)\n\nmax\nxv \u2212 \u03b1\n\n(cid:88)\nv\u2208V xvrv \u2212(cid:88)\n(cid:88)\n(cid:124)\n(cid:123)(cid:122)\n(cid:125)\n(cid:124)\nij \u2265 0,\nij + x1\nx0\nx0\n(cid:125)\n(v,i)\u2208F [(v, i) \u2208 E]x0\n\nxi\ndi\nincoming \ufb01xed edges\n\n(i,v)\u2208Ef\n\n(cid:123)(cid:122)\n\n\u2212 \u03b1\n\nxi\ndi\n\nij =\n\n(cid:124)\n\n,\n\nremoved existing edges\nij\u03b21\nx0\nij = 0,\nij = 0,\n(i,j)\u2208F [(i, j) \u2208 E]\u03b20\n\nx1\nij\u03b20\n\n(cid:88)\n(cid:88)\n\n\u2212(cid:88)\n(cid:124)\n\n(i,j)\u2208F x0\n\nijri\n\njv\n\n(cid:123)(cid:122)\n\nincoming \"on\" edges\n\n(j,v)\u2208F x1\n\n(cid:125)\nij \u2265 0\nx1\n(cid:123)(cid:122)\n(cid:124)\n+ [(v, i) /\u2208 E]x1\nnewly added edges\nij = 1 \u2212 \u03b20\n\u03b21\nij,\nij + [(i, j) /\u2208 E]\u03b21\nij \u2264 B\n\n(cid:125)\n\nij\n\nij\n\nreturning \"off\" edges\n\nbv,\n\n\u2264 xv\ndv\n0 \u2264 \u03b20\n\nij \u2264 1\n\n(v,k)\u2208F x0\n\nvk\n\n(cid:123)(cid:122)\n\n(cid:125)\n\n= (1 \u2212 \u03b1)zv\n\n\u2200v \u2208 V\n\nxv \u2265 0\n\n\u2200(i, j) \u2208 F\n\u2200v \u2208 V\n\n\u2200(i, j) \u2208 F\n\n(4a)\n\n(4b)\n\n(4c)\n\n(4d)\n\n(4e)\n\n(4f)\n\ni\n\nij/x1\n\nij/x1\n\nij (respectively x1\n\nij not being integral, since they share the factor xid\u22121\n\nKey idea and insights. Eqs. 4b and 4c correspond to the LP of the unconstrained MDP. Intuitively,\nthe variable xv maps to the PageRank score of node v, and from the variables x0\nij we can recover\nij) is non-zero then in the optimal policy the\nthe optimal policy: if the variable x0\nfragile edge (i, j) is turned off (respectively on). Since there exists a deterministic optimal policy,\nonly one of them is non-zero but never both. Eq. 4d corresponds to the local budget. Remarkably,\ndespite the variables x0\nfrom Eq. 4c we\ncan exactly count the number of edges that are turned off or on using only linear constraints. Eqs. 4e\nand 4f enforce the global budget. From Eq. 4e we have that whenever x0\nij is nonzero it follows that\nij = 1 since that is the only con\ufb01guration that satis\ufb01es the constraints (similarly for\nij = 0 and \u03b20\n\u03b21\nij). Intuitively, this effectively makes the \u03b20\nij variables \"counters\" and we can utilize them in\nx1\nEq. 4f to enforce the total number of perturbed edges to not exceed B. See detailed proof in Sec. 8.3.\n(c) Ef\ufb01cient Reformulation Linearization Technique (RLT). The quadratic constraints in our\nQCLP make the problem non-convex and dif\ufb01cult to solve. We relax the problem using the Refor-\nmulation Linearization Technique (RLT) [38] which gives us an upper bound on the objective. The\nalternative SDP-relaxation [43] based on semide\ufb01nite programming is not suitable for our problem\nsince the constraints are trivially satis\ufb01ed (see Appendix 8.4 for details). While in general, the RLT\nintroduces many new variables (replacing each product term mimj with a variable Mij) along with\nmultiple new linear inequality constraints, it turns out that in our case the solution is highly compact:\nProposition 3. Given a \ufb01xed upper bound xv for xv and using the RLT relaxation, the quadratic\nconstraints in Eqs. 4e and 4f transform into the following single linear constraint.\nijdi(xi)\u22121 \u2264 B\n\nijdi(xi)\u22121 + [(i, j) /\u2208 E]x1\n\n(cid:88)\n\n(i,j)\u2208F [(i, j) \u2208 E]x0\n\nij/\u03b21\n\n(5)\n\n6\n\n\fij, x1\n\nProof provided in Sec. 8.3 in the appendix. By replacing Eqs. 4e and 4f with Eq. 5 in Proposition 2,\nwe obtain a linear program which can be ef\ufb01ciently solved. Remarkably, we only have xv, x0\nij as\ndecision variables since we were able to eliminate all other variables. The solution is an upper bound\non the solution for Problem 2 and a lower bound on the solution for Problem 1. The \ufb01nal relaxed\nQCLP can also be interpreted as a constrained MPD with a single additional constraint (Eq. 5) which\nadmits a possibly randomized optimal policy with at most one randomized state [1].\nCerti\ufb01cate for local and global budget. To solve the relaxed QCLP and compute the \ufb01nal certi\ufb01cate\nwe need to provide the upper bounds xv for the constraint in Eq. 5. Since the quality of the RLT\nrelaxation depends on the tightness of these upper bounds, we have to carefully select them. We\nprovide here one solution (see Sec. 8.6 in the appendix for a faster to compute, but less tight,\nalternative): Given an instance of Problem 2, we can set the reward to r = ev and invoke Algorithm\n1, which is highly ef\ufb01cient, using the same fragile set and the same local budget. Since this explicitly\nmaximizes xv, the objective value of the problem is guaranteed to give a valid upper bound xv.\nInvoking this procedure for every node, leads to the required upper bounds.\nNow, to compute the certi\ufb01cate with local and global budget for a target node t, we solve the relaxed\nproblem for all c (cid:54)= yt, leading to objective function values Lct \u2265 \u2212m\u2217\nyt,c(t) (minus due to the\nchange from min to max). Thus, L\u2217,t = minc(cid:54)=yt \u2212Lct is a lower bound on the worst-case margin\nm\u2217\nyt,\u2217(t). If the lower bound is positive then node t is guaranteed to be certi\ufb01ably robust \u2013 there\nexists no adversarial attack (among all graphs in QF ) that can change the prediction for node t.\nFor our policy iteration approach if m\u2217\nyt,\u2217(t) < 0 we are guaranteed to have found an adversarial\nexample since the certi\ufb01cate is exact, i.e. we also have a non-robustness certi\ufb01cate. However in this\ncase, if the lower bound L\u2217,t is negative we do not necessarily have an adversarial example. Instead,\nwe can perturb the graph with the optimal con\ufb01guration of fragile edges for the relaxed problem, and\ninspect whether the predictions change. See Fig.1 for an overview of both approaches.\n\n5 Robust training for graph neural networks\n\nyv ,c(v)\n\n\u2202Hi,yv\n\n= \u03c0\u2217(ev)i and \u2202m\u2217\n\nyv ,c(v)\n\u2202Hi,c\n\nIn Sec. 4 we introduced two methods to ef\ufb01ciently compute certi\ufb01cates given a trained \u03c0-PPNP model.\nWe now show that these can naturally be used to go one step further \u2013 to improve the robustness\nof the model. The main idea is to utilize the worst-case margin during training to encourage the\nmodel to learn more robust weights. Optimizing some robust loss L\u03b8 with respect to the model\nparameters \u03b8 (e.g. for \u03c0-PNPP \u03b8 are the neural network parameters) that depends on the worst-case\nmargin m\u2217\nyv,\u2217(v) is generally hard since it involves an inner optimization problem, namely \ufb01nding\nyv,c(v) (and, thus, L\u03b8) w.r.t.\nthe worst-case margin. This prevents us to easily take the gradient of m\u2217\nthe parameters \u03b8. Previous approaches tackle this challenge by using the dual [46].\nInspecting our problem, however, we see that we can directly compute the gradient. Since m\u2217\nyv,c(v)\n(respectively the corresponding lower bound) is a linear function of H = f\u03b8(X) and \u03c0G, and\nfurthermore the admissible set QF over which we are optimizing is compact, it follows from\nDanskin\u2019s theorem [14] that we can simply compute the gradient of the loss at the optimal point. We\nhave \u2202m\u2217\n= \u2212\u03c0\u2217(ev)i, i.e. the gradient equals to the optimal (\u00b1)\nPageRank scores computed in our certi\ufb01cation approaches.\nRobust training. To improve robustness Wong & Kolter [46] proposed to optimize the robust\ncross-entropy loss: LRCE = LCE(y\u2217\n(v)), where LCE is the standard cross-entropy loss\noperating on the logits, and m\u2217\nyv,c(v). Previous\nwork has shown that if the model is overcon\ufb01dent there is a potential issue when using LRCE\nsince it encourages high certainty under the worst-case perturbations [58]. Therefore, we also\nstudy the alternative robust hinge loss. Since the attacker wants to minimize the worst-case margin\nm\u2217\nyt,\u2217(t) (or its lower bound), a straightforward idea is to try to maximize it during training. To\nachieve this we add a hinge loss penalty term to the standard cross-entropy loss. Speci\ufb01cally:\nv, Hdiff\nsingle node v is positive if m\u2217\nyv,c(v) < M and zero otherwise \u2013 the node v is certi\ufb01ably robust with a\nmargin of at least M. Effectively, if all training nodes are robust, the second term becomes zero, thus,\nreducing LCEM to the standard cross-entropy loss with robustness guarantees. Note again that we\ncan easily compute the gradient of these losses w.r.t. the (neural network) parameters \u03b8.\n\nyv,c(v))(cid:3). The second term for a\n\n(v) is a vector such that at index c we have m\u2217\n\nLCEM =(cid:80)\n\nv,: ) +(cid:80)\n\n(cid:2)LCE(y\u2217\n\nmax(0, M \u2212 m\u2217\n\nv,\u2212m\u2217\n\nyv\n\nyv\n\nv\u2208VL\n\nc\u2208C,c(cid:54)=y\u2217\n\nv\n\n7\n\n\ft\ns\nu\nb\no\nr\nd\ne\n\ufb01\n\ni\nt\nr\ne\nc\n%\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\nF1: 0.83 0.82 0.73\n\n\u03c0-PPNP both\nF-Prop. both\nL-Prop. both\n\nrem.\nrem.\nrem.\n\n1 2 3 4 5 6 7 8 9 10\nlocal attack strength s\n\n1.0\n\n0.9\n\n0.8\n\nt\ns\nu\nb\no\nr\n\nd\ne\n\ufb01\ni\nt\nr\ne\nc\n%\n\n\u03b1\n0.6\n0.7\n0.8\n0.9\n\n1 2 3 4 5 6 7 8 9 10\nlocal attack strength s\n\nn\ni\ng\nr\na\nm\ne\ns\na\nc\n-\nt\ns\nr\no\nw\n\n5\n\n0\n\u22125\n\nlocal attack strength s\n\n1\n\n10\n\n0.2\n\n0.0\nneighborhood purity\n\n0.4\n\n0.6\n\n0.8\n\n(a) Cora-ML, local budget\n\n(b) Citeseer, impact of \u03b1\n\nFigure 3: Increasing local attack strength s (local budget bv = max(dv \u2212 11 + s, 0)) decreases ratio\nof certi\ufb01ed nodes. (a) The graph is more robust to removing edges, \u03c0-PPNP is most robust overall.\n(b) Lowering \u03b1 improves the robustness. (c) Nodes with higher neighborhood purity are more robust.\n\n(c) Cora-ML, purity vs. m\u2217\n\nyt,\u2217(t)\n\n6 Experimental results\n\nSetup. We focus on evaluating the robustness of \u03c0-PPNP without robust training and label/feature\npropagation using our two certi\ufb01cation methods. We also verify that robust training improves the\nrobustness of \u03c0-PPNP while maintaining high predictive accuracy. We demonstrate our claims on two\npublicly available datasets: Cora-ML (N = 2, 995,|E| = 8, 416, D = 2, 879, K = 7) [4, 30] and\nCiteseer (N = 3, 312,|E| = 4, 715, D = 3, 703, K = 6) [37] with further experiments on Pubmed\n(N = 19, 717,|E| = 44, 324, D = 500, K = 3) [37] in the appendix. We con\ufb01gure \u03c0-PPNP with\none hidden layer of size 64 and set \u03b1 = 0.85. We select 20 nodes per class for the train/validation\nset and use the rest for the test set. We compute the certi\ufb01cates w.r.t. the predictions, i.e. we set yt\nin m\u2217\nyt,\u2217(t) to the predicted class for node t on the clean graph. See Sec. 8.2 in the appendix for\nfurther experiments and Sec. 8.7 for more details. Note, we do not need to compare to any previously\nintroduced adversarial attacks on graphs [13, 58, 60], since by the de\ufb01nition of a certi\ufb01cate for a\ncerti\ufb01ably robust node w.r.t. a given admissible set QF there exist no successful attack within that set.\nWe construct several different con\ufb01gurations of \ufb01xed and fragile edges to gain a better understanding\nof the robustness of the methods to different kind of adversarial perturbations. Namely, \"both\" refers\nto the scenario where F = V \u00d7 V, i.e. the attacker is allowed to add or remove any edge in the graph,\nwhile \"rem.\" refers to the scenario where F = E for a given graph G = (V,E), i.e. the attacker can\nonly remove existing edges. In addition, for all scenarios we specify the \ufb01xed set as Ef = Emst,\nwhere (i, j) \u2208 Emst if (i, j) belongs to the minimum spanning tree (MST) on the graph G.3\nRobustness certi\ufb01cates: Local budget only. We investigate the robustness of different graphs and\nsemi-supervised node classi\ufb01cation methods when the attacker has only local budget constraints.\nWe set the local budget bv = max(dv \u2212 11 + s, 0) relative to the degree dv of node v in the clean\ngraph, and we vary the local attack strength s with lower s leading to a more restrictive budget. Such\nrelative budget is justi\ufb01ed since higher degree nodes tend to be more robust in general [59, 60]. We\nthen apply our policy iteration algorithm to compute the (exact) worst-case margin for each node.\nIn Fig. 3a we see that the number of certi\ufb01ably robust nodes when the attacker can only remove edges\nis signi\ufb01cantly higher compared to when they can also add edges which is consistent with previous\nwork on adversarial attacks [60]. As expected, the share of robust nodes decreases with higher budget,\nand \u03c0-PPNP is signi\ufb01cantly more robust than label propagation since besides the graph it also takes\nadvantage of the node attributes. Feature propagation has similar performance (F1 score) but it is\nless robust. Note that since our certi\ufb01cate is exact, the remaining nodes are certi\ufb01ably non-robust! In\nSec. 8.2 in the appendix we also investigate certi\ufb01able accuracy \u2013 the ratio of nodes that are both\ncerti\ufb01ably robust and at the same time have a correct prediction. We \ufb01nd that the certi\ufb01able accuracy\nis relatively close to the clean accuracy, and it decreases gracefully as we in increase the budget.\nAnalyzing in\ufb02uence on robustness. In Fig. 3b we see that decreasing the damping factor \u03b1 is an\neffective strategy to signi\ufb01cantly increase the robustness with no noticeable loss in accuracy (at most\n0.5% for any \u03b1, not shown). Thus, \u03b1 provides a useful trade-off between robustness and the size of\nthe effective neighborhood: higher \u03b1 implies higher PageRank scores (i.e. higher in\ufb02uence) for the\n\n3 Fixing the MST edges ensures that every node is reachable by every other node for any policy. This is only\nto simplify our earlier exposition regarding the MDPs and can be relaxed to e.g. reachable at the optimal policy.\n\n8\n\n\ft\ns\nu\nb\no\nr\n\nd\ne\n\ufb01\n\ni\nt\nr\ne\nc\n%\n\n0.8\n\n0.6\n\n0.4\n\ns = 12, local+global\ns = 10, local+global\ns = 06, local+global\n\nlocal\nlocal\nlocal\n\n0\n\n100\n\nglobal budget\n\n200\n\ns\ne\ng\nd\ne\n\ne\nl\ni\ng\na\nr\nf\n\n10000\n\n5000\n\n0\n\n0.38\nsec.\n0.49\nsec.\n\n1\n\nCora-ML\nCiteseer\n\n0.83\nsec.\n1.16\nsec.\n\n1.26\nsec.\n\n1.41\nsec.\n\n1.79\nsec.\n1.59\nsec.\n\n2\n\n3\n\n4\n\npolicy iterations\n\nt\ns\nu\nb\no\nr\nd\ne\n\ufb01\n\ni\nt\nr\ne\nc\n%\n\n0.9\n\n0.8\n\n0.7\n\n3.00\nsec.\n5.00\nsec.\n\n5\n\nF1: 0.70 0.73 0.72\n\nLCE\n\nLCEMLRCE\n\n1 2 3 4 5 6 7 8 9 10\nlocal attack strength s\n\n(a) Cora-ML, global budget\n\n(b) Certi\ufb01cate ef\ufb01ciency\n\n(c) Citeseer, robust training\n\nFigure 4: (a) The global budget can signi\ufb01cantly restrict the attacker compared to having only local\nconstraints. (b) Even for large fragile sets Algorithm 1 only needs few iterations to \ufb01nd the optimal\nPageRank. (c) Our robust training successfully increases the percentage of certi\ufb01ably robust nodes.\nThe increase is largest for the local attack strength that we used during training (s = 10, dashed line).\n\nneighbors. In general we recommend to set the value as low as the accuracy allows. In Fig. 3c we\ninvestigate what contributes to certain nodes being more robust than others. We see that neighborhood\npurity \u2013 the share of nodes with the same class in a respective node\u2019s two-hop neighborhood \u2013 plays an\nimportant role. High purity leads to high worst-case margin, which translates to certi\ufb01able robustness.\nRobustness certi\ufb01cates: Local and global budget. We demonstrate our second approach based on\nthe relaxed QCLP problem by analyzing the robustness as we increase the global budget. We set\nF = E, i.e. the attacker can only remove edges, and vary the local attack strength s corresponding\nto local budget bv = max(dv \u2212 11 + s, 0). We see in Fig.4a that by additionally enforcing a global\nbudget we can signi\ufb01cantly restrict the success of the attacker compared to having only a local budget\n(dashed lines). The global constraint increases the number of robust nodes, validating our approach.\nEf\ufb01ciency. Fig. 4b demonstrates the ef\ufb01ciency of our approach: even for fragile sets as large as 104,\nAlgorithm 1 \ufb01nds the optimal solution in just a few iterations. Since each iteration is itself ef\ufb01cient\nby utilizing sparse matrix operations, the overall wall clock runtime (shown as text annotation) is\non the order of few seconds. In Sec. 8.2 in the appendix, we further investigate the runtime as we\nincrease the number of nodes in the graph, as well as the runtime of our relaxed QCLP.\nRobust training. While not being our core focus, we investigate whether robust training improves\nthe certi\ufb01able robustness of GNNs. We set the fragile set F = E and vary the local budget. The\nvertical line on Fig. 4c indicates the local budget used to train the robust models with losses LRCE\nand LCEM . We see that both of our approaches are able to improve the percent of certi\ufb01ably robust\nnodes, with the largest improvement (around 13% increase) for the local attack strength we trained\non (s = 10). Furthermore, the F1 scores on the test split for Citeseer are as follows: 0.70 for LCE,\n0.72 for LRCE, and 0.73 for LCEM , i.e. the robust training besides improving the ratio of certi\ufb01ed\nnodes, it also improves the clean predictive accuracy of the model. LRCE has a higher certi\ufb01able\nrobustness, but LCEM has a higher F1 score. There is room for improvement in how we approach\nthe robust training: e.g. similar to Z\u00fcgner & G\u00fcnnemann [59] we can optimize over the worst-case\nmargin for the unlabeled in addition to the labeled nodes. We leave this as a future research direction.\n\n7 Conclusion\n\nWe derive the \ufb01rst (non-)robustness certi\ufb01cate for graph neural networks regarding perturbations\nof the graph structure, and the \ufb01rst certi\ufb01cate overall for label/feature propagation. Our certi\ufb01cates\nare \ufb02exible w.r.t. the threat model, can handle both local (per node) and global budgets, and can\nbe ef\ufb01ciently computed. We also propose a robust training procedure that increases the number of\ncerti\ufb01ably robust nodes while improving the predictive accuracy. As future work, we aim to consider\nperturbations and robusti\ufb01cation of the node features and the graph structure jointly.\n\nAcknowledgments\n\nThis research was supported by the German Research Foundation, Emmy Noether grant GU 1409/2-1,\nand the German Federal Ministry of Education and Research (BMBF), grant no. 01IS18036B. The\nauthors of this work take full responsibilities for its content.\n\n9\n\n\fReferences\n[1] Altman, E. Constrained Markov decision processes, volume 7. CRC Press, 1999.\n\n[2] Avrachenkov, K. and Litvak, N. The effect of new links on google pagerank. Stochastic Models,\n\n22(2), 2006.\n\n[3] Bojchevski, A. and G\u00fcnnemann, S. Bayesian robust attributed graph clustering: Joint learning\nof partial anomalies and group structure. In AAAI Conference on Arti\ufb01cial Intelligence, 2018.\n\n[4] Bojchevski, A. and G\u00fcnnemann, S. Deep gaussian embedding of graphs: Unsupervised\ninductive learning via ranking. In International Conference on Learning Representations, ICLR,\n2018.\n\n[5] Bojchevski, A. and G\u00fcnnemann, S. Adversarial attacks on node embeddings via graph poisoning.\n\nIn International Conference on Machine Learning, ICML, 2019.\n\n[6] Bojchevski, A., Matkovic, Y., and G\u00fcnnemann, S. Robust spectral clustering for noisy data:\nModeling sparse corruptions improves latent embeddings. In International Conference on\nKnowledge Discovery and Data Mining, KDD, pp. 737\u2013746, 2017.\n\n[7] Buchnik, E. and Cohen, E. Bootstrapped graph diffusions: Exposing the power of nonlinearity.\nIn Abstracts of the 2018 ACM International Conference on Measurement and Modeling of\nComputer Systems, SIGMETRICS, 2018.\n\n[8] Cai, L. Parameterized complexity of cardinality constrained optimization problems. The\n\nComputer Journal, 51(1):102\u2013121, 2008.\n\n[9] Castillo, C. and Davison, B. D. Adversarial web search. Foundations and Trends in Information\n\nRetrieval, 4(5), 2010.\n\n[10] Chen, J., Wu, Y., Lin, X., and Xuan, Q. Can adversarial network attack be defended? arXiv\n\npreprint arXiv:1903.05994, 2019.\n\n[11] Cs\u00e1ji, B. C., Jungers, R. M., and Blondel, V. D. Pagerank optimization in polynomial time by\n\nstochastic shortest path reformulation. In ALT, 21st International Conference, 2010.\n\n[12] Cs\u00e1ji, B. C., Jungers, R. M., and Blondel, V. D. Pagerank optimization by edge selection.\n\nDiscrete Applied Mathematics, 169, 2014.\n\n[13] Dai, H., Li, H., Tian, T., Huang, X., Wang, L., Zhu, J., and Song, L. Adversarial attack on graph\n\nstructured data. In International Conference on Machine Learning, ICML, 2018.\n\n[14] Danskin, J. M. The theory of max-min and its application to weapons allocation problems.\n\n1967.\n\n[15] de Kerchove, C., Ninove, L., and Van Dooren, P. Maximizing pagerank via outlinks. Linear\n\nAlgebra and its Applications, 429(5-6), 2008.\n\n[16] Feng, F., He, X., Tang, J., and Chua, T.-S. Graph adversarial training: Dynamically regularizing\n\nbased on graph structure. arXiv preprint arXiv:1902.08226, 2019.\n\n[17] Fercoq, O. Optimization of Perron eigenvectors and applications: from web ranking to\n\nchronotherapeutics. PhD thesis, Ecole Polytechnique X, 2012.\n\n[18] Fercoq, O., Akian, M., Bouhtou, M., and Gaubert, S. Ergodic control and polyhedral approaches\n\nto pagerank optimization. IEEE Trans. Automat. Contr., 58(1), 2013.\n\n[19] Fey, M. and Lenssen, J. E. Fast graph representation learning with pytorch geometric. arXiv\n\npreprint arXiv:1903.02428, 2019.\n\n[20] Fout, A., Byrd, J., Shariat, B., and Ben-Hur, A. Protein interface prediction using graph\n\nconvolutional networks. In Neural Information Processing Systems, NIPS, 2017.\n\n[21] Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples.\n\nIn International Conference on Learning Representations, ICLR, 2015.\n\n10\n\n\f[22] Haveliwala, T. H. Topic-sensitive pagerank.\n\nConference, WWW, 2002.\n\nIn Eleventh International World Wide Web\n\n[23] Hein, M. and Andriushchenko, M. Formal guarantees on the robustness of a classi\ufb01er against\n\nadversarial manipulation. In Neural Information Processing Systems, NIPS, 2017.\n\n[24] Hoang, N., Choong, J. J., and Murata, T. Learning graph neural networks with noisy labels.\n\narXiv preprint arXiv:1905.01591, 2019.\n\n[25] Hollanders, R., Delvenne, J.-C., and Jungers, R. Policy iteration is well suited to optimize\n\npagerank. arXiv preprint arXiv:1108.3779, 2011.\n\n[26] Hooi, B., Shah, N., Beutel, A., G\u00fcnnemann, S., Akoglu, L., Kumar, M., Makhija, D., and\nFaloutsos, C. BIRDNEST: bayesian inference for ratings-fraud detection. In SIAM International\nConference on Data Mining, 2016.\n\n[27] Jeh, G. and Widom, J. Scaling personalized web search. In Twelfth International World Wide\n\nWeb Conference, WWW, 2003.\n\n[28] Kipf, T. N. and Welling, M. Semi-supervised classi\ufb01cation with graph convolutional networks.\n\nIn International Conference on Learning Representations, ICLR, 2017.\n\n[29] Klicpera, J., Bojchevski, A., and G\u00fcnnemann, S. Predict then propagate: Graph neural networks\nmeet personalized pagerank. In International Conference on Learning Representations, ICLR,\n2019.\n\n[30] McCallum, A., Nigam, K., Rennie, J., and Seymore, K. Automating the construction of internet\n\nportals with machine learning. Inf. Retr., 3(2), 2000.\n\n[31] Miller, B. A., \u00c7amurcu, M., Gomez, A. J., Chan, K., and Eliassi-Rad, T. Improving robustness\n\nto attacks against vertex classi\ufb01cation. 2018.\n\n[32] Olsen, M. Maximizing pagerank with new backlinks. In International Conference on Algorithms\n\nand Complexity, pp. 37\u201348, 2010.\n\n[33] Olsen, M., Viglas, A., and Zvedeniouk, I. A constant-factor approximation algorithm for\nthe link building problem. In International Conference on Combinatorial Optimization and\nApplications, pp. 87\u201396, 2010.\n\n[34] Puterman, M. L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John\n\nWiley & Sons, Inc., 1994.\n\n[35] Raghunathan, A., Steinhardt, J., and Liang, P. S. Semide\ufb01nite relaxations for certifying\nrobustness adversarial examples. In Neural Information Processing Systems, NeurIPS, 2018.\n\n[36] Rhee, S., Seo, S., and Kim, S. Hybrid approach of relation network and localized graph\nconvolutional \ufb01ltering for breast cancer subtype classi\ufb01cation. In International Joint Conference\non Arti\ufb01cial Intelligence, IJCAI, 2018.\n\n[37] Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., and Eliassi-Rad, T. Collective\n\nclassi\ufb01cation in network data. AI Magazine, 29(3), 2008.\n\n[38] Sherali, H. D. and Tuncbilek, C. H. A reformulation-convexi\ufb01cation approach for solving\n\nnonconvex quadratic programming problems. Journal of Global Optimization, 7(1), 1995.\n\n[39] Sokol, M., Avrachenkov, K., Gon\u00e7alves, P., and Mishenin, A. Generalized optimization\nframework for graph-based semi-supervised learning. In SIAM International Conference on\nData Mining, 2012.\n\n[40] Sun, K., Guo, H., Zhu, Z., and Lin, Z. Virtual adversarial training on graph convolutional\n\nnetworks in node classi\ufb01cation. arXiv preprint arXiv:1902.11045, 2019.\n\n[41] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fer-\ngus, R. Intriguing properties of neural networks. In International Conference on Learning\nRepresentations, ICLR, 2014.\n\n11\n\n\f[42] Tang, X., Li, Y., Sun, Y., Yao, H., Mitra, P., and Wang, S. Robust graph neural network against\n\npoisoning attacks via transfer learning. arXiv preprint arXiv:1908.07558, 2019.\n\n[43] Vandenberghe, L. and Boyd, S. Semide\ufb01nite programming. SIAM review, 38(1), 1996.\n\n[44] Wang, J., Wen, R., Wu, C., Huang, Y., and Xion, J. Fdgars: Fraudster detection via graph\nconvolutional networks in online app review system. In Companion of The 2019 World Wide\nWeb Conference, WWW, 2019.\n\n[45] Wang, S., Chen, Z., Ni, J., Yu, X., Li, Z., Chen, H., and Yu, P. S. Adversarial defense framework\n\nfor graph neural network. arXiv preprint arXiv:1905.03679, 2019.\n\n[46] Wong, E. and Kolter, J. Z. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. In International Conference on Machine Learning, ICML, 2018.\n\n[47] Wu, F., Zhang, T., Souza Jr, A. H. d., Fifty, C., Yu, T., and Weinberger, K. Q. Simplifying graph\n\nconvolutional networks. In International Conference on Machine Learning, ICML, 2019.\n\n[48] Wu, H., Wang, C., Tyshetskiy, Y., Docherty, A., Lu, K., and Zhu, L. Adversarial examples\nfor graph data: Deep insights into attack and defense. In International Joint Conference on\nArti\ufb01cial Intelligence, IJCAI, pp. 4816\u20134823, 7 2019.\n\n[49] Xu, K., Chen, H., Liu, S., Chen, P., Weng, T., Hong, M., and Lin, X. Topology attack and defense\nfor graph neural networks: An optimization perspective. In International Joint Conference on\nArti\ufb01cial Intelligence, IJCAI.\n\n[50] Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. Graph\nconvolutional neural networks for web-scale recommender systems. In International Conference\non Knowledge Discovery & Data Mining, KDD, 2018.\n\n[51] Zhang, Y., Khan, S., and Coates, M. Comparing and detecting adversarial attacks for graph\ndeep learning. In Proc. Representation Learning on Graphs and Manifolds Workshop, Int. Conf.\nLearning Representations, New Orleans, LA, USA, 2019.\n\n[52] Zhang, Y., Pal, S., Coates, M., and \u00dcstebay, D. Bayesian graph convolutional neural networks\n\nfor semi-supervised classi\ufb01cation. In AAAI Conference on Arti\ufb01cial Intelligence, 2019.\n\n[53] Zhou, D. and Burges, C. J. C. Spectral clustering and transductive learning with multiple views.\n\nIn International Conference on Machine Learning, ICML, 2007.\n\n[54] Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Sch\u00f6lkopf, B. Learning with local and\n\nglobal consistency. In Neural Information Processing Systems, NIPS, 2003.\n\n[55] Zhou, D., Huang, J., and Sch\u00f6lkopf, B. Learning from labeled and unlabeled data on a directed\n\ngraph. In International Conference on Machine Learning, ICML, 2005.\n\n[56] Zhou, K., Michalak, T. P., and Vorobeychik, Y. Adversarial robustness of similarity-based link\n\nprediction. International Conference on Data Mining, ICDM, 2019.\n\n[57] Zhu, D., Zhang, Z., Cui, P., and Zhu, W. Robust graph convolutional networks against adversarial\n\nattacks. In International Conference on Knowledge Discovery & Data Mining, KDD, 2019.\n\n[58] Z\u00fcgner, D. and G\u00fcnnemann, S. Adversarial attacks on graph neural networks via meta learning.\n\nIn International Conference on Learning Representations, ICLR, 2019.\n\n[59] Z\u00fcgner, D. and G\u00fcnnemann, S. Certi\ufb01able robustness and robust training for graph convolutional\nnetworks. In International Conference on Knowledge Discovery & Data Mining, KDD, 2019.\n\n[60] Z\u00fcgner, D., Akbarnejad, A., and G\u00fcnnemann, S. Adversarial attacks on neural networks for\ngraph data. In International Conference on Knowledge Discovery & Data Mining, KDD, 2018.\n\n12\n\n\f", "award": [], "sourceid": 4514, "authors": [{"given_name": "Aleksandar", "family_name": "Bojchevski", "institution": "Technical University of Munich"}, {"given_name": "Stephan", "family_name": "G\u00fcnnemann", "institution": "Technical University of Munich"}]}