{"title": "REM: From Structural Entropy to Community Structure Deception", "book": "Advances in Neural Information Processing Systems", "page_first": 12938, "page_last": 12948, "abstract": "This paper focuses on the privacy risks of disclosing the community structure in an online social network. By exploiting the community affiliations of user accounts, an attacker may infer sensitive user attributes. This raises the problem of community structure deception (CSD), which asks for ways to minimally modify the network so that a given community structure maximally hides itself from community detection algorithms. We investigate CSD through an information-theoretic lens. To this end, we propose a community-based structural entropy to express the amount of information revealed by a community structure. This notion allows us to devise residual entropy minimization (REM) as an efficient procedure to solve CSD. Experimental results over 9 real-world networks and 6 community detection algorithms show that REM is very effective in obfuscating the community structure as compared to other benchmark methods.", "full_text": "REM: From Structural Entropy To Community\n\nStructure Deception\n\nYiwei Liu1,2, Jiamou Liu3, Zijian Zhang1,3, Liehuang Zhu1\u2217, Angsheng Li4\n\n1School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China\n\n2Institute of Cyberspace Research, Zhejiang University, Zhejiang 310027, China\n\n3School of Computer Science, The University of Auckland, Auckland 1142, New Zealand\n\n4School of Computer Science, Beihang University, Beijing 100083, China\n\n{yiweiliu, zhangzijian, liehuangz}@bit.edu.cn, jiamou.liu@auckland.ac.nz, angsheng@buaa.edu.cn\n\nAbstract\n\nThis paper focuses on the privacy risks of disclosing the community structure\nin an online social network. By exploiting the community af\ufb01liations of user\naccounts, an attacker may infer sensitive user attributes. This raises the problem of\ncommunity structure deception (CSD), which asks for ways to minimally modify\nthe network so that a given community structure maximally hides itself from\ncommunity detection algorithms. We investigate CSD through an information-\ntheoretic lens. To this end, we propose a community-based structural entropy to\nexpress the amount of information revealed by a community structure. This notion\nallows us to devise residual entropy minimization (REM) as an ef\ufb01cient procedure\nto solve CSD. Experimental results over 9 real-world networks and 6 community\ndetection algorithms show that REM is very effective in obfuscating the community\nstructure as compared to other benchmark methods.\n\n1\n\nIntroduction\n\nSocial networking sites facilitate effective communication through the means of Web feeds, discussion\ngroups, timelines and more. Such a platform is characterized by a structure that consists of user\naccounts and their links. Discovering hidden patterns in this network structure is a compelling\napplication of graph data mining algorithms. In particular, community detection stands out as one of\nthe most important graph mining methods [11, 16, 23, 26]. Communities emerge as people naturally\nbond with those within the same working environment, family, or those who share similar tastes,\ninterests and political viewpoints. By exploiting users\u2019 community af\ufb01liations, an attacker may infer\ncertain personal \u2013 and sometimes sensitive \u2013 features of the users in a social network. For example,\nwhen the attacker has some background information asserting that several members of a community\nall work for the same organization. It is easy in this case to infer that other members of the same\ncommunity also have ties with the organization. [29] showed that information about the community\nmemberships of a user (i.e., the groups of a social network to which a user belongs) is suf\ufb01cient to\nuniquely identify this person, or, at least, to signi\ufb01cantly reduce the set of possible candidates. In\n[25], communities are used to re-identify multiple addresses belonging to a same user in Bitcoin\ntrading networks. Therefore, there is a need to hide the community af\ufb01liations in order to preserve\nthe privacy of online users.\nThis paper addresses the privacy risks due to community detection. Our goal is to minimally modify\nthe network structure so that the community af\ufb01liations maximally hide themselves from community\n\u2217is a professor in the School of Computer Science and Technology, Beijing Institute of Technology. He is\nselected into the Program for New Century Excellent Talents in University from Ministry of Education, China.\nHis research interests include Internet of things, cloud computing security, and blockchain.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fdetection algorithms. Despite growing interests on the privacy issues of social networks, very few\nworks exist that target for community-level anonymization over a social network. Developing effective\nmeans for this problem faces numerous challenges: The \ufb01rst is the lack of a formal and universally-\nagreeable de\ufb01nition of communities. It is thus dif\ufb01cult to propose notions such as k-anonymity that\nare based on counting substructures and are independent from the community detection algorithms\n[32]. The second is the diversity of techniques used for community detection. An attacker may use\nmany methods to identify communities in a network which makes it impossible to pinpoint a single\nobjective metric that guides the deception of communities. The third is the desire to obfuscate not\njust a single community, but rather multiple communities or even all communities in the network. As\nopposed to existing work e.g. [28, 10] which focus on the deception of a single targeted community,\nwe are interested in nullifying the community detection algorithms so that they are ineffective to\nidentify any original communities in the obfuscated network.\nThis paper studies the community structure deception problem (CSD) that seeks a way to obfuscate\na given community structure of a network through adding a \ufb01xed number of edges. (1) To solve\nthis problem, we propose an information-theoretic perspective to this problem. This involves\nde\ufb01ning community-based structural entropy that captures the amount of information revealed by the\ncommunity structure of a social network. (2) We propose a method to effectively nullify community\ndetection algorithms based on the principle of residual entropy minimization (REM). REM clearly\noutperforms other schemes with the same goal which include a benchmark based on modularity\nminimization. (3) Our work derives new insights regarding structural entropy of a graph. These\ninsights enable highly ef\ufb01cient implementation of our algorithm. (4) We experimentally validate the\nperformance of our algorithm over 9 real-world networks and 6 community detection algorithms.\n\nRelated work. [2] showed that simply removing user identity is not suf\ufb01cient to protect their\nprivacy in an online social network. [31] systematically examined privacy threats in the online\nspace. Efforts have been made to mitigate such risks on an individual level, i.e., identity leak\n[18, 13, 21, 33, 30, 7, 19], user attribute leak [5], social link disclosure [18, 13, 21, 33, 5], etc.\nCommon structural obfuscation techniques include adding/removing edges, adding random noise,\nand contracting edges/nodes.\nCommunity structure represents a way to partition vertices of a complex network into dense subgraphs\nthat are sparsely connected among each other [11]. Many community detection algorithms exist, e.g.,\nLouvain method that utilizes modularity is commonly used [3]. Recent years, several works that\naddress the problem of hiding a given community in a network have emerged. E.g. [20] aimed to\nhide a community by adding edges. They only considered a modularity-based community detection\nalgorithm. [28, 10] studied this problem by rewiring edges.\nQuantifying structural information is an important challenge in information theory. [24] proposed\nthe \ufb01rst entropy measure for graphs. This is followed by several other notions such as parametric\ngraph entropy [8], Gibbs entropy [1], Shannon entropy and Von Neumann entropy [4]. All of these\nmeasures are simply the Shannon entropy applied to different types of distributions. Based on the\nidea of random walks, the entropy de\ufb01ned in [26] determines the average number of bits per step by\nusing the ergodic node visit frequencies on a network. After that, [17] de\ufb01ned the structure entropy\nof a graph as the minimum numbers of bits to encode the vertex that is accessible from a step of\nrandom walk. In this paper, we follow these ideas and utilize similar notions for community structure\ndeception.\n\n2 Problem Formulation\n\nWe model a social network as an undirected connected graph G = (V, E), where V is a set of\nvertices which represent user accounts and E is a set of edges of the form {u, v}\u2286 V (u (cid:54)= v) which\nrepresent social ties. The volume of any U \u2286 V is the sum of the degrees dv of all v \u2208 U. The\ncommunity structure of G refers to a partition P of V . More formally, P is an equivalence relation\nover the set of vertices V whose equivalence classes are called communities. If i and j in the same\ncommunity, we write (i, j) \u2208 P. We assume that the input to our problem consists of a social network\nG and a community structure P to be obfuscated. This community structure P is characterized by\nhigh internal density and low external density. For convenience, we sometimes abuse the notation\nrepresenting P also as the collection of its equivalence classes {X1, X2, . . . , XL} where L \u2208 N\nand each Xi is a community; \u03bdi denotes the volume of Xi and gi denotes the number of edges with\n\n2\n\n\fexactly one end point in Xi. The following hypothesis lays down the fundamental assumption of the\ncommunity deception problem:\nHypothesis 1 (Community deception hypothesis). The disclosure of the community structure P of a\ngraph G = (V, E) leads to privacy leak and should be avoided.\n\nGiven G = (V, E), a community detector F is a procedure that reveals an equivalence relation F (G)\nover V to resemble the ground truth community structure P. Hypothesis 1 asserts the necessity of\ndistorting the network data G, so that no community detector F will truthfully report the original\ncommunity structure P. In this paper, we focus on network distortions as a result of adding a number\nof \u201cdummy edges\u201d between unconnected vertices in the network.\nDe\ufb01nition 1. For G = (V, E) and a set E(cid:48), an edge expansion is a graph G\u2295E(cid:48) := (V, E\u222aE(cid:48)).2\nGiven G = (V, E) and a community structure P on G, a community structure deceptor produces an\nedge expansion G(cid:48) of G so that any community detection algorithm F is nulli\ufb01ed on G(cid:48). The precise\nde\ufb01nition relies on what it means for the algorithm F to be \u201cnulli\ufb01ed on G(cid:48)\u201d. Several narratives\nexist for this phrase. Suppose P(cid:48) is the community structure F (G(cid:48)) output by F . The \ufb01rst narrative\nasserts that P(cid:48) is dissimilar with P. The second asserts that very little information is revealed about\nP from P(cid:48). The third states that F is ineffective in answering same-community queries.\nNarrative 1: Partition similarity. One may apply a standard set-based metric, Jaccard index, to\ncompare P and P(cid:48): Set J(P,P(cid:48)) := |P \u2229 P(cid:48)|/|P \u222a P(cid:48)| (treating P and P(cid:48) as relations); We adopt\nJ(P,P(cid:48)) for its simplicity and correlation with other measures, e.g., transfer distance of P & P(cid:48) [9].\nA good community structure deceptor should return P(cid:48) with small J(P,P(cid:48)).\nNarrative 2: Mutual information. Normalized mutual information (NMI) measures the amount of\ncommon information between two random variables. Take community structures P = {X1, . . . , Xp}\nand P(cid:48) = {X(cid:48)\n\nH(P) = \u2212 p(cid:88)\n\nq}. De\ufb01ne\n1, . . . , X(cid:48)\n|Xi|\n|Xi|\n|V | log\n\n|V | , and H(P|P(cid:48)) = \u2212 p(cid:88)\n\nq(cid:88)\n\ni=1\n\nj=1\n\ni=1\n\n|Xi \u2229 X(cid:48)\nj|\n\n|V |\n\nlog\n\n|Xi \u2229 X(cid:48)\n\nj|/|V |\n\n|X(cid:48)\n\nj|/|V |\n\n.\n\nI(P,P(cid:48))\n\nMutual information is then de\ufb01ned as I(P,P(cid:48)) := H(P)\u2212 H(P|P(cid:48)). NMI is thus D(P,P(cid:48)) =\nmax{H(P),H(P(cid:48))} [14]. D satis\ufb01es both the normalization and the metric properties, and utilizes the\nrange [0, 1] well [27]. A community structure deceptor should return P(cid:48) with small D(P,P(cid:48)).\nNarrative 3: Query accuracy. One may imagine that the community detection algorithm F\nfacilitates an adversary who aims to perform same-community queries about user accounts. This\nquery returns true for any distinct i, j \u2208 V if (i, j) \u2208 P(cid:48) and false otherwise. The recall of this query\nis\n\nR(P,P(cid:48)) =\n\n|{(i, j) \u2208 P | i (cid:54)= j, (i, j) \u2208 P(cid:48)}|\n\n|{(i, j) \u2208 P | i (cid:54)= j}|\n\n.\n\nA procedure that returns true for any pair of vertices (i, j) with probability 1/2 has an expected recall\nof 50%. Hence F can be considered nulli\ufb01ed when R(P,P(cid:48)) \u2264 50%.\nThe community structure deception (CSD) problem is de\ufb01ned as demanding a community structure\ndeceptor for a given network G with its community structure P. Furthermore, the deceptor should\nadd a bounded budget k \u2208 N of edges to G in the hope to get the best deception effect. One initial\nidea to solve CSD is to \ufb01x a community detector F and S \u2208 {J, D, R}, and to solve the problem\n\nminimize S(P, F (G \u2295 E(cid:48)))\n\nsubject to |E(cid:48)| \u2264 k.\n\nThere are several reasons why this would not be a good approach: (1) The functions J, D and R all\ndepend on the output of the algorithm F ; however the CSD problem demands the obfuscation of the\ncommunity structure P regardless of how communities are detected. (2) Choosing any one of J, D\nand R leads only to optimizing a single criterion. (3) Solving the optimization problem may require\nexamining all k-tuples of potential edges which leads to prohibitive time cost.\nA more reasonable approach is to identify a uniform criterion which is independent of how communi-\nties are detected. One natural candidate for such a metric is modularity. Modularity of P measures\n\n2If E(cid:48) is {e}, we abuse the notation writing G \u2295 e for G \u2295 {e}.\n\n3\n\n\fthe difference between the density of its communities and the expected density of a null model [22]:\n\nMP (G) =\n\n.\n\n(1)\n\nModularity maximization has been a widely-used principle for community detection. In general, a\nlarge max MP (G) implies the existence of a prominent community structure in G. To obfuscate the\ncommunity structure, it thus makes sense \u2013 at least in principle \u2013 to minimize the modularity MP (G).\nDe\ufb01nition 2. A modularity minimizing (MOM) deceptor is an algorithm that outputs an edge e such\nthat the modularity MP (G \u2295 e) is minimized.\nIn actual fact, however, MOM deceptor is not a good choice for CSD : Firstly, it is not hard to prove\nthat, the MOM deceptor will always try to create edges between two communities Xi, Xj in P with\nthe largest combined volume \u03bdi + \u03bdj. Therefore k edges created by iterations of MOM will most\nlikely affect only two communities, and the obfuscated network will not hide P effectively. Secondly,\nmodularity\u2019s signi\ufb01cance primarily lies in identifying the most prominent community structure, i.e.,\nthe one that maximizes modularity. The MOM deceptor, on the other hand, concerns with modularity\nof a given partition P which may not be modularity maximizing. Thirdly, modularity sometimes fails\nfor its purpose since a random graph \u2013 a structure that does not exhibit a clear community structure \u2013\nmay also have partitions with large modularity [12]. These limitations calls for a new method for\nCSD.\n\n3 REM: Residual Entropy-based CSD\n\nTo derive a solution for CSD that is independent of the community detector, it makes sense to inquire\nthe information content of a community structure P in G. Imagine G as a network where vertices\nare able to pass messages through edges. The delivery of a message from a sender u to a receiver v,\nwhere {u, v} \u2208 E, is named a call. Intuitively, a call is one directed \ufb02ow of message. Therefore an\nundirected edge {u, v} allows messages to be passed in both directions. Now imagine, to explore G,\nan exogenous process continuously collects such calls uniformly at random. This differs from the\nrandom walk in [26] where the receiver of a call is the sender of the next call. Hence, at any moment,\nthe probability that v is a call\u2019s receiver is dv/(2|E|). We are interested in an encoding of vertices of\nthe network based on this probability distribution [17].\nDe\ufb01nition 3. Structural entropy H(G) captures the average number of bits needed to encode\nthe receivers of the calls in a lossless way: H(G) equals Shannon\u2019s entropy of the distribution\n(di/2|E|)i\u2208V , i.e.,\n\n(cid:34)\n\nL(cid:88)\n\ni=1\n\n(cid:19)2(cid:35)\n\n(cid:18) \u03bdi\n\n2|E|\n\n\u03bdi \u2212 gi\n2|E| \u2212\n\n|V |(cid:88)\n\ni=1\n\nH(G) := \u2212\n\ndi\n2|E| log2\n\ndi\n2|E| .\n\n(2)\n\ndi\n\u03bdj\n\n\u2212 log2\n\n2|E| = \u2212 log2\n\ndi\n\nH(G) merely expresses the average information of a call in G without assuming any community\nstructure. Now assume the presence of P = {X1, X2, . . . , XL}. The structural information of a\ncommunity Xj consists of two levels: (a) looking from a vertex level, the information of any vertex\ni \u2208 Xj as a receiver of messages, and (b) looking from a community level, the information of the\nentire community Xj as a receiver of messages. These two levels of meanings can be re\ufb02ected in\n(2) through the equation \u2212 log2\n\u03bdj\n2|E|. The \ufb01rst term above is the average\nnumbers of bits necessary to describe i in Xj and the second term is the average numbers of bits\nto describe the community Xj. We once again assume an exogenous process that continuously\ncollects calls between vertices in G u.a.r., but with the following difference: Since we are given\nthe community structure P = {X1, X2, . . . , XL}, we can omit the community level codeword if\nthe sender u and receiver v belong to the same community. Thus, when encoding a vertex i \u2208 Xj,\njust like above, we encode i at the vertex level as \u2212 log2\nand then encode Xj at the community\nlevel as \u2212 log2\n2|E|. For (a), the information for all vertices in Xj as receivers is H(G (cid:22)Xj ) with\nthe probability \u03bdj\n. For (b), the information for Xj as\ni\u2208Xj\na receiver is \u2212 log2\n2|E| since in this case we only consider calls whose\nsenders are not in Xj. The expected information gives us the following structural entropy measure\n[17]:\n\n2|E|, where H(G (cid:22)Xj ) := \u2212(cid:80)\n\n|2E| with the probability gj\n\ndi\n\u03bdj\n\nlog2\n\ndi\n\u03bdj\n\ndi\n\u03bdj\n\n\u03bdj\n\n\u03bdi\n\n4\n\n\fDe\ufb01nition 4. The structural entropy of G relative to P is\n\n(cid:20) \u03bdj\n2|E|H(cid:0)G (cid:22)Xj\n\nL(cid:88)\n\nj=1\n\n(cid:1) \u2212 gj\n\n2|E| log2\n\n(cid:21)\n\n\u03bdj\n2|E|\n\n(3)\n\nHP (G) :=\n\nNote that H(G) = HP (G) when P is either the trivial partition that puts all vertices in the same\ncommunity, or the partition where each community is a singleton. We thus view both H(G) and\nHP (G) as expressing states of the community structure. H(G) expresses the entropy of G in a basic\n\u201creference partition\u201d, and HP re\ufb02ects the effect of enforcing partition P on G. Their difference thus\nmeasures the gained amount of certainty as the communities in P take shape.\nDe\ufb01nition 5. The normalized residual entropy of P is\n\n\u03c1P (G) := (H(G) \u2212 HP (G))/H(G).\n\n(4)\nIn principle, a smaller \u03c1P (G) means that P contains less information about G and thus is harder to\ndetect. To hide the communities in P, it thus makes sense to reduce the residual entropy through\nmodifying the network structure.\nDe\ufb01nition 6. A residual entropy minimizing (REM) deceptor is an algorithm that outputs an edge e\nsuch that the normalized residual entropy \u03c1P (G \u2295 e) is minimized.\nA crude implementation of an REM deceptor examines each potential edge e that is missing from the\ncurrent graph and compares \u03c1P (G \u2295 e). This implementation runs in \u2126(|V |2) time, rendering itself\ninapplicable for large graphs. We instead present an O(L|V |)-implementation where L is the number\nof communities X1, . . . , XL in P. This is a much more ef\ufb01cient implementation assuming L (cid:28) |V |.\nTake s, t \u2208 {1, . . . , L}. A non-edge is a pair {u, v} /\u2208 E with (u, v) \u2208 Xs\u00d7Xt; the volume of this\nnon-edge is du +dv. Assume Xs\u00d7Xt contains a non-edge. Let \u03b4s,t be the smallest degree of any\nvertex v\u2208 Xs\u222aXt that is in a non-edge. Let \u03b2s,t be the smallest volume of any non-edge {u, v} with\nmin{du, dv} = \u03b4s,t. A non-edge {u, v} is called critical if its volume d \u2264 \u03b2s,t and min{du, dv} is\nthe smallest among all non-edges with volume d. Algorithm 1 presents our REM deceptor from the\nfollowing lemmas.\n\nAlgorithm 1: An ef\ufb01cient REM deceptor\nInput: Graph G = (V, E), P = {X1, X2, . . . , XL}\nOutput: A non-edge {u\u2217, v\u2217}\n1 Initialize \u03c1\u2217 \u2190 1;\n2 for s \u2190 1 to L and t \u2190 s to L do\n3\n4\n\nif Xs\u00d7Xt contains no non-edge then\nfor all critical non-edge {u, v} in Xs \u00d7 Xt do\n\nSet \u03c1\u2217 \u2190 \u03c1u,v , u\u2217 \u2190 u , and v\u2217 \u2190 v;\n\ncontinue;\nSet \u03c1u,v \u2190 (H(G \u2295 {u, v}) \u2212 HP (G \u2295 {u, v}))/H(G \u2295 {u, v});\nif \u03c1u,v < \u03c1\u2217 then\n\n5\n6\n7\n8\n9 return {u\u2217, v\u2217};\nLemma 1. For non-edges {u, v},{x, y}, min{du, dv} \u2264 min{dx, dy} and du + dv \u2264 dx + dy\nimplies that H(G \u2295 {u, v}) \u2265 H(G \u2295 {x, y}).\nProof. De\ufb01ne the function F : R \u2192 R by F (x) = (x+1) log2(x+1)\u2212x log2(x)\nfunction F is monotonic, as F (cid:48)(x) = log2(x+1)\u2212log2(x)\n\u2212\n2 ln 2\u00b7(|E|+1)(x+1)x < 0. Moreover, the following hold:\n\n. We remark that the\n> 0, and is convex for x > 0, as F (cid:48)(cid:48)(x) =\n\n2(|E|+1)\n\n2(|E|+1)\n\n1\n\n\u2200u, v, w \u2208 V : H(G \u2295 {u, v}) \u2212 H(G \u2295 {u, w}) = F (dw) \u2212 F (dv),\n\u2200u, v, x, y \u2208 V : H(G \u2295 {u, v})\u2212H(G \u2295 {x, y})\n\n= (H(G \u2295 {u, v})\u2212H(G \u2295 {v, y})) +(H(G \u2295 {v, y})\u2212H(G \u2295 {x, y}))\n= (F (dy) \u2212 F (dv)) + (F (dx) \u2212 F (du))\n\nand thus\n\n(5)\n\n5\n\n\fAssume w.l.o.g. dv \u2264 du, dy \u2264 dx and dv \u2264 dy. If du \u2264 dx, then by (5) and monotonicity of F ,\nwe have H(G \u2295 {u, v}) \u2265 H(G \u2295 {x, y}). Now suppose dx < du. Then dv < dy \u2264 dx < du. By\nLagrange\u2019s mean value theorem, there exist dx < \u03be < du and dv < \u03b6 < dy such that\n\nF (du)\u2212F (dx)\n\nF (dy)\u2212F (dv)\n\n,\n\n= F (cid:48)(\u03be) < F (cid:48)(\u03b6) =\n\ndy\u2212dv\n\ndu\u2212dx\n\n(6)\nwhere the inequality is due to convexity of F . Since du \u2212 dx \u2264 dy \u2212 dv, we have F (du) \u2212 F (dx) <\nF (dy) \u2212 F (dv). The lemma then follows from (5).\nLemma 2. For any two communities Xi, Xj in P, any non-edges e1, e2 whose endpoints link Xi\nand Xj, we have H(G \u2295 e1) \u2212 HP (G \u2295 e1) = H(G \u2295 e2) \u2212 HP (G \u2295 e2).\nProof. De\ufb01ne distributions Y \u223c ( \u03bd1\n, . . . , c1,L\nci,j = di if i\u2208 Xj, and ci,j = 0 otherwise. The entropy of the joint probability is\n\u03bdL\n\n2|E| ) and Z \u223c ( c1,1\n\n2|E| , . . . , \u03bdL\n\n, . . . , cn,L\n\u03bdL\n\n, . . . , cn,1\n\u03bd1\n\n) where\n\n\u03bd1\n\nH(Y, Z) = \u2212 n(cid:88)\n\ndi\n2|E| log2\n\ndi\n\n2|E| = H(G)\n\nBy (7) and the chain rule (see e.g.[6]),\n\ni=1\n\nH(Y, Z) = H(Z|Y ) + H(Y ) =\n\n=\n\n(cid:20) \u03bdj\nL(cid:88)\n2|E|H(cid:0)G (cid:22)Xj\nH(G) \u2212 HP (G) = \u2212 L(cid:88)\n\nj=1\n\nThe following can then be obtained from (3):\n\n\u03bdj\n\nL(cid:88)\n2|E| H(Z|Y = j) + H(Y )\n(cid:1) \u2212 \u03bdj\n\n(cid:21)\n\nj=1\n\n2|E| log2\n\n\u03bdj\n2|E|\n\n\u03bdj \u2212 gj\n2|E|\n\nlog2\n\n\u03bdj\n2|E|\n\nj=1\n\n(7)\n\n(8)\n\nThe lemma follows from (8) as G \u2295 e1 and G \u2295 e2 have the same values of \u03bdj and gj.\nA non-edge e is RE-minimizing if \u03c1P (G \u2295 e) is the smallest among all non-edges. The next lemma\nstates that, to \ufb01nd an RE-minimizing non-edge, an REM deceptor only needs to consider critical\nnon-edges.\nLemma 3. There exists a critical non-edge {u, v} \u2208 Xi \u00d7 Xj for some i, j that is RE-minimizing.\nProof. Take an RE-minimizing non-edge e1 = {x, y} and say x \u2208 Xs, y \u2208 Xt. Suppose e1 is not\ncritical. There are two cases: Firstly, if dx + dy > \u03b2s,t, let e2 ={u, v} be the critical non-edge with\n(u, v) \u2208 Xs \u00d7 Xt and min{du, dv} = \u03b4s,t \u2264 min{dx, dy}. By Lem. 1, H(G \u2295 e2) \u2265 H(G \u2295 e1).\nSecondly, if dx +dy \u2264 \u03b2s,t, then min{du, dv} < min{dx, dy} for some critical non-edge e2 = {u, v}\nbetween Xs and Xt with the same volume dx +dy. In this case, we still have H(G\u2295e2) \u2265 H(G\u2295e1).\nIn either case, Lem. 2 asserts that H(G \u2295 e1) \u2212 HP (G \u2295 e1) = H(G \u2295 e2) \u2212 HP (G \u2295 e2). Thus\nby (4), \u03c1P (G \u2295 e2) \u2264 \u03c1P (G \u2295 e1) and e2 is critical.\nTheorem 2. Alg.1 implements REM deceptor in O(L|V |).\nProof. The Alg. 1 goes over all s, t\u2208{1, . . . , L} and critical non-edges e ={u, v} in Xs \u00d7 Xt to\n\ufb01nd a critical non-edge {u, v} that minimizes \u03c1P (G \u2295 e). By Lemma 3, {u, v} is RE-minimizing.\nFor communities Xs and Xt, suppose a data structure is used that assigns to each node x \u2208 Xs\nand the node x(cid:48) \u2208 Xt where no edge exists between x and x(cid:48), and x(cid:48) is such a node with minimum\ndegree. To \ufb01nd the desired critical edge, the algorithm may scan over all such pairs (x, x(cid:48)), where\nx \u2208 Xs. This takes O(|Xs|). Similarly, the algorithm examines over all pairs (y, y(cid:48)) where y \u2208 Xt\nand y(cid:48) \u2208 Xs is de\ufb01ned analogously as x(cid:48). This takes O(|Xt|). Hence, for Xs and Xt the algorithm\ntakes O(|Xs + Xt|). Thus, for any Xs, the algorithm will take O(L|Xs| + |X1| + \u00b7\u00b7\u00b7 + |XL|) =\nO(L|Xs| + |V |). The overall time takes O(L|X1| + \u00b7\u00b7\u00b7 + L|XL| + L|V |) = O(L|V |). The\n\n6\n\n\fimplementation of the required data structure would store for each node x \u2208 Xs, collections of nodes\nYd, Yd+1, . . . , Yg \u2282 Xj where Yd contains all nodes y \u2208 Xt such that {x, y} is a non-edge and y has\ndegree d, where d, g are least and greatest integers where Yd, Yg are non-empty. This makes sure that\nthe data structure can be built and updated in the required time complexity.\n\nIn real-world networks, the links between two communities are sparse and usually vertices with\nthe smallest degree in each community are not linked. In this case, any critical non-edge {u, v} in\nXs \u00d7 Xt satis\ufb01es du = minx\u2208Xs dx and dv = miny\u2208Xt dy. The algorithm takes only O(L2) when\nthe vertices with the smallest degree in each community are given.\n\n4 Experiments\nDataset. We evaluate the performance of our algorithm over 9 real-world networks from [http:\n//konect.uni-koblenz.de/]. The networks are chosen from a range of domains, including human\ncontacts: jazz (Jaz); animal network: dolphin (Dol); communication network: email (Eml), pretty\ngood privacy (PGP); infrastructure network: powergrid (Pow); computer network: CAIDA (CAI);\nand online networks: Facebook (Fbk), Brightkite (Bri), Livemocha (Liv). Due to the limitations\nof the ef\ufb01ciency of some community detectors F , we do not select a particularly large dataset. In\nfact, our REM deceptor can handle large datasets according to the complexity analysis in Theorem 2.\nTo validate the ef\ufb01ciency of REM, we list the running time of applying Alg. 1 for one edge. This\nis compared with a \u2018crude\u2019 implementation of REM deceptor which resembles Alg. 1, instead of\nexamining only the critical non-edges, goes over all non-edges in G to look for the RE-minimizing\none3. See details in Table 1.\n\nTable 1: Speci\ufb01cs of the datasets , the number of communities and running time.\n\nDataset\nDol\nJaz\nEml\nFbk\nPow\nPGP\nCAI\nBri\nLiv\n\n|V |\n62\n198\n1,133\n2,888\n4,941\n10,680\n26,475\n58,228\n104,103\n\n|E|\n159\n2,741\n5,451\n2,981\n6,594\n24,316\n53,381\n214,078\n2,193,083\n\nbtw\n4\n12\n11\n8\n45\n-\n-\n-\n-\n\nNumber of communities\nspi\ngre\n5\n4\n4\n4\n13\n16\n8\n11\n25\n43\n25\n189\n-\n44\n-\n1682\n189\n-\n\ninf\n5\n7\n70\n11\n486\n1066\n1382\n4813\n\nlou\n5\n4\n10\n8\n43\n96\n38\n687\n14\n\n-\n\nwal\n4\n11\n49\n6\n364\n1574\n667\n6892\n\n-\n\n\u2018crude\u2019 (ms)\n\nREM (ms)\n\n13.7\n74.2\n3960\n26, 300\n66, 300\n5mins\n4hrs\n5.5hrs\n> 1day\n\n0.305\n0.296\n2.40\n5.67\n8.67\n29.8\n28.6\n1010\n179\n\nCommunity detectors. An adversary attacks by applying a community detector F . For this we use\nsix well-known algorithms [11]: (cid:66) Edge-Betweeness(btw) is a hierarchical decomposition process\nwhere edges are removed in decreasing order of their edge betweenness scores and runs in O(|E|2|V |).\n(cid:66) Greedy(gre) is a greedy modularity maximization strategy and runs in O(|V | log2 |V |). (cid:66) In-\nfoMap(inf) detects communities that have the shortest description length for a random walk in O(|E|).\n(cid:66) Louvain(lou) is a multi-level modularity optimization algorithm which runs in O(|V | log |V |). (cid:66)\nSpinGlass(spi) \ufb01nds communities by searching for the ground state of an in\ufb01nite spin glass and runs\nin O(|V |3). (cid:66) WalkTrap(wal) detects communities using random walks and runs in O(|V |2 log |V |).\nTable 1 shows the number of communities found by each algorithm. If the algorithm does not\nterminate in 2 hours on a dataset, a \u2018-\u2019 is written in the table.\nCommunity structure deceptors. We compare REM with two other CS deceptors, including MOM\nand a benchmark RAN that adds randomly chosen non-edges. The experiments aim to: (1) check\nif the normalized residual entropy correlates with the indices J, D, R (in Sec. 2); (2) compare the\neffectiveness of the deceptors in hiding community structures; (3) discuss the preservation of data\u2019s\nkey indicators after applying REM. To this end, we ran the experiments in the worst-case scenario\nthat the initial community structure P is fully detected by community detector F . We then apply\nthe deceptors MOM, REM and RAN to obfuscate the network G and apply F on the obfuscated\n\n3Trials are conducted on a Server Xeon(skylake) platnum 8163 cpu 2.5GHz (12 cores, non-parallel computing)\n\nand 16GBs RAM\n\n7\n\n\fnetwork G \u2295 E(cid:48). The indices {J, D, R} are calculated for P and the new structure P(cid:48), where\nP(cid:48) = F (G \u2295 E(cid:48)). Each value in the \ufb01gures and tables is the average of 30 runs.\n\nFigure 1: The trend of J, D and R with the community normalized residual entropy in our datasets.\n\nTable 2: The J, D and R indices based on different deceptors for 6 kinds of community detectors.\n\nFigure 2: Compare the deception effect of REM, MOM and RAN for dataset Eml.\n\nResult set 1. Fig. 1 plots the the values of scores J, D, R (for communities detected by Louvain) as\nthe normalized residual entropy increases. The three scores unanimously increase with \u03c1P, validating\nour intuition that \u03c1P can be used to obfuscate P. Moreover, the correlations are almost linear for\nFbk, Pow and PGP.\n\nResult set 2. We then examine the performance of the three deceptors over 9 data sets when adding\na budget k edges. Due to varying graph sizes, we set k = 20 for Dol; k = 1000 for Jaz, Eml, Fbk,\n\n8\n\n\fPow; k = 2000 for PGP; k = 10000 for CAI, Bri; and k = 20000 for Liv. Table 2 compares the\nJ, D, R scores for different algorithms. Clearly, REM performs better than MOM and RAN in almost\nall scenarios. For Louvain & SpinGlass, REM gives the unanimous best results across all datasets\nand scores. The recall R for most cases are less than or close to 0.5 for REM which is not true\nfor the other two deceptors. On the other hand, with a small budget percentage k/|E| for larger\ngraphs, REM can achieve better community deception, which means that the advantage of REM\nbecomes more prominent for larger graphs. Fig. 2 shows the trend of J, D, R scores as the number of\nadded edges increases for 6 community detectors over the data set Eml. Among the three deceptors\n{MOM, REM, RAN}, MOM has the worst performance. The only case that MOM performs better\nis for the detector gre, which is based on a greedy modularity maximization strategy. Overall,\nREM achieves the best anonymity in all the six detectors. In particular, under the three algorithms\n{inf,lou,spi}, the J, D, R scores consistently decrease. These results validate REM\u2019s effectiveness in\nhiding community structures.\n\nResult set 3. Finally, we check the preservation of the data after applying REM. First, by imple-\nmenting the REM algorithm, we reduce the Jaccard index to below 0.5 for CSD. We then check the\nchanges of some key indicators, i.e., clustering coef\ufb01cient (CC), mean shortest path length (MSPL),\nand the percentage of nodes with the top-10% Pagerank and Betweenness after applying REM. As\nshown in Table 3, these indicators has no signi\ufb01cant change due to applying REM. In general, a\nlarger network leads to less change. Among them, for Fbk, 10%-PageRank and 10%-betweenness\nchange greatly. This is because the vertices in this data set tend to have very similar PageRank\nand betweenness scores; Since Pow represents a large-scale power grid, it naturally has a large\nmean shortest path length. This value will shift greatly when more links are created between the\ncommunities.\n\nTable 3: The changes of some key indicators after applying REM.\n\nData\nDol\nJaz\nEml\nFbk\nPow\nPGP\nCAI\nBri\n\n|E(cid:48)|\n10\n250\n100\n550\n200\n400\n1000\n1000\n\nJaccard\n1 \u2192 0.44\n1 \u2192 0.48\n1 \u2192 0.39\n1 \u2192 0.48\n1 \u2192 0.49\n1 \u2192 0.45\n1 \u2192 0.49\n1 \u2192 0.44\n\nCC\n\n0.308 \u2192 0.298\n0.520 \u2192 0.498\n0.166 \u2192 0.166\n0.0004 \u2192 0.0004\n0.103 \u2192 0.101\n0.378 \u2192 0.377\n0.007 \u2192 0.007\n0.111 \u2192 0.111\n\nMSPL\n\n3.357 \u2192 2.996\n2.230 \u2192 2.070\n3.606 \u2192 3.577\n3.867 \u2192 3.539\n18.99 \u2192 13.70\n7.485 \u2192 7.279\n3.875 \u2192 3.869\n4.858 \u2192 4.854\n\n10%\u2212 Pagerank\n\n1 \u2192 0.833\n1 \u2192 0.895\n1 \u2192 0.991\n1 \u2192 0.243\n1 \u2192 0.953\n1 \u2192 0.979\n1 \u2192 0.983\n1 \u2192 0.993\n\n10%\u2212 Betweenness\n\n1 \u2192 0.833\n1 \u2192 0.842\n1 \u2192 0.982\n1 \u2192 0.118\n1 \u2192 0.644\n1 \u2192 0.930\n1 \u2192 0.977\n1 \u2192 0.971\n\n5 Conclusions and Future Work\n\nIn this paper, we introduce the community structure deception (CSD) problem, utilize community\nbased structural entropy to the CSD problem, and propose a residual minimization (REM) algorithm.\nWe reduce search space to critical edges to optimize REM, which allows our community structure\ndeceptor to run very ef\ufb01ciently. Experiments show that our algorithm REM performs better than\nRAN and MOM in almost all attack scenarios.\nSome potential directions of future work include (1) extending the method to hide communities\nin weighted and directed graphs; (2) investigating the problem of hiding overlapping community\nstructures; (3) hiding other structural properties, e.g., in\ufb02uential nodes, hierarchies, etc. and (4)\nexplore the connection between structural entropy and community detection.\n\nAcknowledgments\n\nThis work is supported by Provincial Key Research and Development Program of Zhejiang (Grant No.\n2019C03133) and Major Scienti\ufb01c Research Project of Zhejiang Lab (Grant No. 2018FD0ZX01). The\nco-authors Angsheng Li and Jiamou Liu are supported by the National Natural Science Foundation\nof China (No. 61932002). We also thank our anonymous reviewers for their constructive comments.\n\n9\n\n\fReferences\n[1] Kartik Anand and Ginestra Bianconi. Entropy measures for networks: Toward an information\n\ntheory of complex topologies. Phys. Rev. E, 80(4):045102, 2009.\n\n[2] Lars Backstrom, Cynthia Dwork, and Jon Kleinberg. Wherefore art thou r3579x?: anonymized\nsocial networks, hidden patterns, and structural steganography. In WWW, pages 181\u2013190, 2007.\n\n[3] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast\n\nunfolding of communities in large networks. J. Stat. Mech., 2008(10):P10008, 2008.\n\n[4] Samuel L Braunstein, Sibasish Ghosh, and Simone Severini. The laplacian of a graph as a density\nmatrix: a basic combinatorial approach to separability of mixed states. Ann. Combinatorics,\n10(3):291\u2013317, 2006.\n\n[5] Alina Campan and Traian Marius Truta. Data and structural k-anonymity in social networks. In\n\nPinKDD2009, pages 33\u201354. Springer, 2009.\n\n[6] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons,\n\n2012.\n\n[7] Sudipto Das, \u00d6mer E\u02d8gecio\u02d8glu, and Amr El Abbadi. Anonymizing weighted social network\n\ngraphs. In ICDE2010, pages 904\u2013907. IEEE, 2010.\n\n[8] Matthias Dehmer. Information processing in complex networks: Graph entropy and information\n\nfunctionals. Appl. Math. Comput., 201(1-2):82\u201394, 2008.\n\n[9] Lucile Den\u0153ud and Alain Gu\u00e9noche. Comparison of distance indices between partitions. In\n\nData Science and Classi\ufb01cation, pages 21\u201328. Springer, 2006.\n\n[10] Valeria Fionda and Giuseppe Pirro. Community deception or: How to stop fearing community\n\ndetection algorithms. IEEE Trans. Knowl. Data Eng., 30(4):660\u2013673, 2018.\n\n[11] Santo Fortunato. Community detection in graphs. Phys. Rep., 486(3-5):75\u2013174, 2010.\n\n[12] Roger Guimera, Marta Sales-Pardo, and Lu\u00eds A Nunes Amaral. Modularity from \ufb02uctuations in\n\nrandom graphs and complex networks. Phys. Rev. E, 70(2):025101, 2004.\n\n[13] Michael Hay, Gerome Miklau, David Jensen, Don Towsley, and Philipp Weis. Resisting\n\nstructural re-identi\ufb01cation in anonymized social networks. VLDB2008, 1(1):102\u2013114, 2008.\n\n[14] Tarald O Kvalseth. Entropy and correlation: Some comments. IEEE Trans. Syst. Man Cybern.,\n\n17(3):517\u2013519, 1987.\n\n[15] Angsheng Li, Qifu Hu, Jun Liu, and Yicheng Pan. Resistance and security index of networks:\n\nStructural information perspective of network security. Scienti\ufb01c reports, 6:26810, 2016.\n\n[16] Angsheng Li, Jiankou Li, and Yicheng Pan. Discovering natural communities in networks.\n\nPhysica A: Statistical Mechanics and its Applications, 436:878\u2013896, 2015.\n\n[17] Angsheng Li and Yicheng Pan. Structural information and dynamical complexity of networks.\n\nIEEE Trans. Inf. Theory, 62(6):3290\u20133339, 2016.\n\n[18] Kun Liu and Evimaria Terzi. Towards identity anonymization on graphs. In SIGMOD2008,\n\npages 93\u2013106. ACM, 2008.\n\n[19] Lian Liu, Jie Wang, Jinze Liu, and Jun Zhang. Privacy preservation in social networks with\n\nsensitive edge weights. In SDM, pages 954\u2013965, 2009.\n\n[20] Shishir Nagaraja. The impact of unlinkability on adversarial community detection: effects and\n\ncountermeasures. In PETS, pages 253\u2013272, 2010.\n\n[21] Arvind Narayanan and Vitaly Shmatikov. De-anonymizing social networks. In IEEE Security\n\n& Privacy\u201909, pages 173\u2013187. IEEE, 2009.\n\n10\n\n\f[22] Mark EJ Newman and Michelle Girvan. Finding and evaluating community structure in\n\nnetworks. Phys. Rev. E, 69(2):026113, 2004.\n\n[23] Yulong Pei, Nilanjan Chakraborty, and Katia Sycara. Nonnegative matrix tri-factorization\nwith graph regularization for community detection in social networks. In IJCAI2015, pages\n2083\u20132089. AAAI Press, 2015.\n\n[24] Nicolas Rashevsky. Life, information theory, and topology. Bull. Math. Biophys., 17(3):229\u2013235,\n\n1955.\n\n[25] Cazabet Remy, Baccour Rym, and Latapy Matthieu. Tracking bitcoin users activity using\ncommunity detection on a network of weak signals. In International conference on complex\nnetworks and their applications, pages 166\u2013177. Springer, 2017.\n\n[26] Martin Rosvall and Carl T Bergstrom. Maps of random walks on complex networks reveal\n\ncommunity structure. Proc. Natl. Acad. Sci. USA, 105(4):1118\u20131123, 2008.\n\n[27] Nguyen Xuan Vinh, Julien Epps, and James Bailey. Information theoretic measures for cluster-\nings comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn\nRes., 11(Oct):2837\u20132854, 2010.\n\n[28] Marcin Waniek, Tomasz P Michalak, Michael J Wooldridge, and Talal Rahwan. Hiding\nindividuals and communities in a social network. Nature Human Behaviour, 2(2):139, 2018.\n\n[29] Gilbert Wondracek, Thorsten Holz, Engin Kirda, and Christopher Kruegel. A practical attack to\nde-anonymize social network users. In 2010 IEEE Symposium on Security and Privacy, pages\n223\u2013238. IEEE, 2010.\n\n[30] Elena Zheleva and Lise Getoor. Preserving the privacy of sensitive relationships in graph data.\n\nIn PinKDD2008, pages 153\u2013171. Springer, 2008.\n\n[31] Elena Zheleva, Evimaria Terzi, and Lise Getoor. Privacy in social networks. Synth. Lect. Data\n\nMining Knowl. Discovery, 3(1):1\u201385, 2012.\n\n[32] Bin Zhou and Jian Pei. The k-anonymity and l-diversity approaches for privacy preservation in\n\nsocial networks against neighborhood attacks. Knowl. Inf. Syst., 28(1):47\u201377, 2011.\n\n[33] Lei Zou, Lei Chen, and M Tamer \u00d6zsu. K-automorphism: A general framework for privacy\n\npreserving network publication. PVLDB, 2(1):946\u2013957, 2009.\n\n11\n\n\f", "award": [], "sourceid": 7091, "authors": [{"given_name": "Yiwei", "family_name": "Liu", "institution": "Beijing institute of technology"}, {"given_name": "Jiamou", "family_name": "Liu", "institution": "University of Auckland"}, {"given_name": "Zijian", "family_name": "Zhang", "institution": "Beijing Institute of Technology"}, {"given_name": "Liehuang", "family_name": "Zhu", "institution": "Beijing Institute of Technology"}, {"given_name": "Angsheng", "family_name": "Li", "institution": "Beihang University"}]}