NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:7091
Title:REM: From Structural Entropy to Community Structure Deception

Reviewer 1

The paper introduces a new interesting problem: to add a fixed number of edges to obfuscate the community structure of a given graph as much as possible. This problem is different from existing problems that focus on the deception of a single community, and it is related to mitigate privacy risks caused by community detection. While the proposed algorithm simply chooses edges greedily, it is novel in two aspects. First, it is based on a new and understandable measure for the strength of community structure. The paper shows experimentally that minimizing the new measure better obfuscates community structure than minimizing modularity, which is a widely-used measure. Second, the algorithm is carefully designed so that it takes only O(#communities \times #nodes) time. The experimental evaluation of the proposed algorithm is extensive. 9 datasets and 6 community-detection methods are used to show that the proposed algorithm outperforms baseline methods consistently regardless of datasets and community detection methods. However, the two baselines methods used are very weak. One (RAN) is simple random edge addition, and the other one (MOM), which greedily minimizes modularity, surprisingly underperforms RAN. Why MOM underperforms RAN is unclear. Greedy algorithms for minimizing variants of modularity or other measures can be considered as baselines. The paper is well written and easy to follow.

Reviewer 2

This paper considers the problem that information about the community structure of a network, together with information on some nodes, can be used to infer information about other nodes. Thus creating a potential privacy concern. To remedy this one would like to be able to hide the true community structure in a way that does not change the network too much. In this paper, the authors give a formal formulation for this problem. They then define a class of deception algorithms, that would hide the community, and ways to measure how well such an algorithm performs. They then develop a community deception algorithm, based on entropy measures of edges. Their main result is that this algorithm runs is time that, given the number of communities, is linear in the number of nodes. Experiments are conducted to show the algorithm runs fast. In addition, the authors test how well it can deceive six community detection algorithms. Their algorithm is shown to outperform both the base line algorithm, which randomly adds edges between communities, and one which adds edges that minimize modularity. I think the problem this paper tries to address is quite relevant and also interesting from a theoretical perspective. The solution proposed in this paper, adding edges that reduce the information contained in the community structure, seems to me like a very nice approach. The experiments do definitely show that this deception algorithm performs well. Unfortunately, I had some problems with the main theoretical results regarding the running time. In particular, both the standard and the efficient algorithm have a statement that checks, for every pair of communities, if there are nodes in different communities that do not have an edge (a non-edge). Unless some additional information is encoded in the data structure, finding all non-edges has running time |V|^2. The authors do mention that looping over all critical non-edges can be done efficiently, using specifically designed data structures. Unfortunately, I did not understand why this would imply that it can be done in linear time, since there might be a quadratic number of edges. Either I am missing something obvious, there are some details that are not explained, or the result is not true. I believe it is the second option, but I can’t extract the needed information from the text. I would have been nice if I could have looked at the code, but there was no code available in the submission package, nor a link where the code could be downloaded. Even though it is stated in the Reproducibility Response that it should be. I also found several pieces of text that I was unable to understand. This was especially true for Section 3, where the authors explain their algorithm. What also didn’t help was that some objects and expressions where introduced and used, without a proper definition being given. These things made it difficult for me to really understand the innovation in the deception algorithm and the intuition behind it. To summarize, the paper tries to tackle a relevant problem using an interesting idea, whose performance is positively verified by experiments. However, there are several parts of the paper that are very unclear, most notably the text on the new algorithm and the proof of the main theorem. In addition, the Supplementary Material felt lacking in conclusions and proper explanation. Based on this I can only place this submission marginally above the acceptance threshold. ----------------------------------------------------- After rebuttal: The authors addressed my concerns regarding the computational complexity but I still believe the result as stated is somewhat misleading since it assumes a very specific data structure. I really think this point should be made crystal clear when revising the manuscript. Given that the other recommended adjustments will be included in the revised version I will keep my recommendation. ----------------------------------------------------- Below is a more detailed list of comments: Main text Line 46 […schemes of the…] I think this should be “schemes with the” Line 57 […connected between them…] This should be “connected among each other” Line 66 [...random walk…] This should be plural “random walks” Line 68 […numbers bits…] This should be “number of bits”. Also, I believe an article should be included at the end “the random walk”. Line 94 [Suppose \mathcal{P}^\prime be…] This should be “Suppose \mathcal{P}^\prime is” Line 102 [Define…as [25]]. This was the reader has to first go and look at [25] before he/she can continue reading your paper. I would strongly recommend to include the definitions here, to make the main text self-contained. Line 103 [information is then…] I would suggest writing “information is then defined as” Line 104 […utilizes the range [0,1] well] What does this mean? I never encountered a statement about a measure that utilizes a given range well. Please explain. Line 129 [In actual fact, however,…] I would simply write “However,” Lines 142-149 [The behavior…receiver as follow:] I was unable to understand what is explained here. One reason for this could be the structure of the sentences, which made it hard to grasp their meaning. Another reason is the sudden injection of terminology. For example, I did not understand what a codeword is in the last sentence and how it relates to the rest of the text. Please have a good look at the text, including sentence construction and spelling, and rewrite it so that it is more clear to the reader. Line 151 […lossless way.] Please explain what this means. Lines 153-167 [\mathcal{H}(G) merely…entropy measure] Similar to my comment above, this piece of text was very difficult to understand. An important role here is played by the “codeword” which was introduced in the paragraph above but never properly explained. Again, grammar and sentence construction are also part of the confusions. An example of this is the sentence “This wil not…network G.”. This is crucial part of the paper since it explains the main intuition behind your approach. So please rewrite this part so that it is clear. Equation (3): The term \mathcal{H}(G|X_j) is never defined. I think it is the conditional entropy of G given X_j, but it would help the reader if this is mentioned in the text. Lines 175-176 […that \mathcal{P} brings…] This should be “that \mathcal{P} contains”. Also, [..harder to be detected] should be “harder to detect” Line 184 [A non-edge is…] I would suggest writing “non-edge is a pair”. Lines 185-187 […the least degree…] I think that technically this is correct. However, since we are talking about a measured quantity here, it feels that “smallest” is more appropriate. Therefore, I would suggest to write “the smallest degree” of “the minimum degree”. There are two other instances of the word least being used in the next two lines and one more on line 206. Line 192 […F is monotonic…] Shouldn’t you state that is monotonic increasing? Line 199 [For any communities…] I would write “For any two communities” Line 201 What do you mean with Y \sim (v_1/2|E|, …, v_L/2|E|)? I think it means that Y is selected uniformly at random from this set of values. However, the way it is written here is not standard notation in probability theory. I would suggest writing “Let Y be sampled uniformly at random from (v_1/2|E|, …, v_L/2|E|)”. The same of course goes for Z. Lines 217-226: As mentioned in my review above, this proof was very unclear to me. One obvious question I had is why it is L and not L^2? We are going over all pairs among L communities. Moreover, it was not clear at all to me why the second loop lines 5-8 can be done L|V| time. Please expand this proof to include all these details. Also, it would be helpful if you supplied the code where this algorithm (or its efficient version) is implemented. Lines 228-229: What does X_s & X_t mean? Lines 239-240 [To validate…resembles Alg 1….] Can you explain how this edge is selected? Is it at random or according to some selection criteria? Also, what does “crude” mean here? Can you really talk about a crude implementation when you are only computing it for a single given edge? Line 245 […in the decreasing…] This should be “in decreasing” Table 2: Overall, when the REM algorithm is not the best, it still seems to perform on par with the best one. However, this is not the case for the Dol network, where RAN is the best (0.18) and REM is much larger (0.38). Can you explain why this happened? Line 270 […is not held true…] This should be “is not true” Supplementary Material Lines 6-10: This text is difficult to understand. Please rewrite the text, carefully considering the sentence construction and grammar. Section II: It is completely unclear to me why this section is included, and this is never explained. Not in the main text and not here. Are you using a different algorithm for MOM then what is done in the literature? If so, then it would help to reader if you mention this. This would also add more weight to the proofs. Please add more explanation to this section, explaining why it is included and why you need to derive results for the algorithm. Line 66 […chooses a non-edges…] This should be “chooses non-edges” Line 70 […begin and… adding some certain mount…] This should be “beginning and…adding a certain amount” Line 71 […with the increasing of the number of adding edges…] I think this should be “with increasing number of added edges” Line 72 […MOM with adding…] This should be “MOM after adding” Line 92 [We have known that…] I think this should either be “We know that” or “We have shown that” (if you refer to the part in the main text where this is done) Line 103 [line 2, We…] This should be “line 2, we” Lines 107-108 [For example…datasets] I feel that some conclusion is missing here. What do figures 4 and 5 tells us about the deception algorithms? Line 110 […over community with…] This should be “on communities with” Line 124 […A generating small-world…] This should be “Generating a small-world” Line 132 [Fig 6 shows…scores] Why do you not include R in these figures? Line 134 [To LFR…] This should be “For LFR” Line 135 […to small world…] This should be “for the small-world” Line 136 […the small world structure…] What do you mean with small-world structure? Does this involve the scaling of the average shortest path length or the diameter or something else? This is not well-established terminology. Fig 6. Why aren’t these results compared with the baseline RAN?

Reviewer 3

This paper proposed a method REM for community structure deception, aiming to avoid privacy leaking in community detection. This work is motivated from an important application issue. However, advantages of their proposed method are not clear. Rather than proofs on running time, this paper should discuss more on (1) why and how much REM improves on hiding sensitive network structure, (2) how this avoids privacy leaking (an example in main paper rather than supplementary would help), and (3) how this would affect us to extract useful, non-sensitive information from network. One question should be discussed: Can this method hide block membership for all kinds of nodes including those high degree nodes in cluster centers? Or only peripheral nodes on cluster boundaries ( which could be a great amount)?