{"title": "Balancing information exposure in social networks", "book": "Advances in Neural Information Processing Systems", "page_first": 4663, "page_last": 4671, "abstract": "Social media has brought a revolution on how people are consuming news. Beyond the undoubtedly large number of advantages brought by social-media platforms, a point of criticism has been the creation of echo chambers and filter bubbles, caused by social homophily and algorithmic personalization. In this paper we address the problem of balancing the information exposure} in a social network. We assume that two opposing campaigns (or viewpoints) are present in the network, and that network nodes have different preferences towards these campaigns. Our goal is to find two sets of nodes to employ in the respective campaigns, so that the overall information exposure for the two campaigns is balanced. We formally define the problem, characterize its hardness, develop approximation algorithms, and present experimental evaluation results. Our model is inspired by the literature on influence maximization, but we offer significant novelties. First, balance of information exposure is modeled by a symmetric difference function, which is neither monotone nor submodular, and thus, not amenable to existing approaches. Second, while previous papers consider a setting with selfish agents and provide bounds on best response strategies (i.e., move of the last player), we consider a setting with a centralized agent and provide bounds for a global objective function.", "full_text": "Balancing information exposure in social networks\n\nKiran Garimella\n\nAalto University & HIIT\n\nHelsinki, Finland\n\nAristides Gionis\n\nAalto University & HIIT\n\nHelsinki, Finland\n\nkiran.garimella@aalto.fi\n\naristides.gionis@aalto.fi\n\nNikos Parotsidis\n\nUniversity of Rome Tor Vergata\n\nRome, Italy\n\nNikolaj Tatti\n\nAalto University & HIIT\n\nHelsinki, Finland\n\nnikos.parotsidis@uniroma2.it\n\nnikolaj.tatti@aalto.fi\n\nAbstract\n\nSocial media has brought a revolution on how people are consuming news. Be-\nyond the undoubtedly large number of advantages brought by social-media plat-\nforms, a point of criticism has been the creation of echo chambers and \ufb01lter bub-\nbles, caused by social homophily and algorithmic personalization.\nIn this paper we address the problem of balancing the information exposure in\na social network. We assume that two opposing campaigns (or viewpoints) are\npresent in the network, and that network nodes have different preferences towards\nthese campaigns. Our goal is to \ufb01nd two sets of nodes to employ in the respec-\ntive campaigns, so that the overall information exposure for the two campaigns\nis balanced. We formally de\ufb01ne the problem, characterize its hardness, develop\napproximation algorithms, and present experimental evaluation results.\nOur model is inspired by the literature on in\ufb02uence maximization, but there are\nsigni\ufb01cant differences from the standard model. First, balance of information ex-\nposure is modeled by a symmetric difference function, which is neither monotone\nnor submodular, and thus, not amenable to existing approaches. Second, while\nprevious papers consider a setting with sel\ufb01sh agents and provide bounds on best-\nresponse strategies (i.e., move of the last player), we consider a setting with a\ncentralized agent and provide bounds for a global objective function.\n\n1\n\nIntroduction\n\nSocial-media platforms have revolutionized many aspects of human culture, among others, the way\npeople are exposed to information. A recent survey estimates that 62% of adults in the US get\ntheir news on social media [15]. Despite providing many desirable features, such as, searching,\npersonalization, and recommendations, one point of criticism is that social media amplify echo\nchambers and \ufb01lter bubbles: users get less exposure to con\ufb02icting viewpoints and are isolated in their\nown informational bubble. This phenomenon is contributed to social homophily and algorithmic\npersonalization, and is more acute for controversial topics [9, 12, 14].\nIn this paper we address the problem of reducing the \ufb01lter-bubble effect by balancing information\nexposure among users. We consider social-media discussions around a topic that are characterized\nby two or more con\ufb02icting viewpoints. Let us refer to these viewpoints as campaigns. Our approach\nfollows the popular paradigm of in\ufb02uence propagation [18]: we want to select a small number\nof seed users for each campaign so as to maximize the number of users who are exposed to both\ncampaigns. In contrast to existing work on competitive viral marketing, we do not consider the\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fproblem of \ufb01nding an optimal sel\ufb01sh strategy for each campaign separately. Instead we consider a\ncentralized agent responsible for balancing information exposure for the two campaigns Consider\nthe following motivating examples.\nExample 1: Social-media companies have been called to act as arbiters so as to prevent ideological\nisolation and polarization in the society. The motivation for companies to assume this role could\nbe for improving their public image or due to legislation.1 Consider a controversial topic being\ndiscussed in social-media platform X, which has led to polarization and \ufb01lter bubbles. As part of\na new \ufb01lter-bubble bursting service, platform X would like to disseminate two high-quality and\nthought-provoking dueling op-eds, articles, one for each side, which present the arguments of the\nother side in a fair manner. Assume that X is interested in following a viral-marketing approach.\nWhich users should X target, for each of the two articles, so that people in the network are informed\nin the most balanced way?\nExample 2: Government organization Y is initiating a program to help assimilate foreigners who\nhave newly arrived in the country. Part of the initiative focuses on bringing the communities of\nforeigners and locals closer in social media. Organization Y is interested in identifying individuals\nwho can help spreading news of one community into the other.\nFrom the technical standpoint, we consider the following problem setting: We assume that infor-\nmation is propagated in the network according to the independent-cascade model [18]. We assume\nthat there are two opposing campaigns, and for each one there is a set of initial seed nodes, I1 and\nI2, which are not necessarily distinct. Furthermore, we assume that the users in the network are\nexposed to information about campaign i via diffusion from the set of seed nodes Ii. The diffusion\nin the network occurs according to some information-propagation model.\nThe objective is to recruit two additional sets of seed nodes, S1 and S2, for the two campaigns, with\n|S1| + |S2| \u2264 k, for a given budget k, so as to maximize the expected number of balanced users,\ni.e., the users who are exposed to information from both campaigns (or from none).\nWe show that the problem of balancing the information exposure is NP-hard. We develop different\napproximation algorithms for the different settings we consider, as well as heuristic variants of the\nproposed algorithm. We experimentally evaluate our methods, on several real-world datasets.\nAlthough our approach is inspired by the large body of work on information propagation, and resem-\nbles previous problem formulations for competitive viral marketing, there are signi\ufb01cant differences.\nIn particular:\n\u2022 This is the \ufb01rst paper to address the problem of balancing information exposure and breaking\n\u2022 The objective function that best suits our problem setting is related to the size of the symmetric\ndifference of users exposed to the two campaigns. This is in contrast to previous settings that\nconsider functions related to the size of the coverage of the campaigns.\n\u2022 As a technical consequence of the previous point, our objective function is neither monotone\nnor submodular making our problem more challenging. Yet we are able to analyze the problem\nstructure and provide algorithms with approximation guarantees.\n\u2022 While most previous papers consider sel\ufb01sh agents, and provide bounds on best-response strate-\ngies (i.e., move of the last player), we consider a centralized setting and provide bounds for a\nglobal objective function.\n\n\ufb01lter bubbles, using the information-propagation methodology.\n\nOmitted proofs, \ufb01gures, and tables are provided as supplementary material. Moreover, our datasets\nand implementations are publicly available.2\n\n2 Related Work\n\nDetecting and breaking \ufb01lter bubbles. Several studies have observed that users in online social\nnetworks prefer to associate with like-minded individuals and consume agreeable content. This\nphenomenon leads to \ufb01lter bubbles, echo chambers [25], and to online polarization [1, 9, 12, 22].\n\n1For instance, Germany is now \ufb01ning Facebook for the spread of fake news.\n2https://users.ics.aalto.fi/kiran/BalanceExposure/\n\n2\n\n\fOnce these \ufb01lter bubbles are detected, the next step is to try to overcome them. One way to achieve\nthis is by making recommendations to individuals of opposing viewpoints. This idea has been\nexplored, in different ways, by a number of studies in the literature [13, 19]. However, previous\nstudies address the problem of breaking \ufb01lter bubbles by the means of content recommendation. To\nthe best of our knowledge, this is the \ufb01rst paper that considers an information diffusion approach.\nInformation diffusion. Following a large body of work, we model diffusion using the independent-\ncascade model [18]. In the basic model a single item propagates in the network. An extension is\nwhen multiple items propagate simultaneously. All works that study optimization problems in the\ncase of multiple items, consider that items compete for being adopted by users. In other words, every\nuser adopts at most one of the existing items and participates in at most one cascade.\nMyers and Leskovec [23] argue that spreading processes may either cooperate or compete. Com-\npeting contagions decrease each other\u2019s probability of diffusion, while cooperating ones help each\nother in being adopted. They propose a model that quanti\ufb01es how different spreading cascades in-\nteract with each other. Carnes et al. [7] propose two models for competitive diffusion. Subsequently,\nseveral other models have been proposed [4, 10, 11, 17, 21, 27, 28].\nMost of the work on competitive information diffusion consider the problem of selecting the best\nk seeds for one campaign, for a given objective, in the presence of competing campaigns [3, 6].\nBharathi et al. [3] show that, if all campaigns but one have \ufb01xed sets of seeds, the problem for\nselecting the seeds for the last player is submodular, and thus, obtain an approximation algorithm\nfor the strategy of the last player. Game theoretic aspects of competitive cascades in social net-\nworks, including the investigation of conditions for the existence of Nash equilibrium, have also\nbeen studied [2, 16, 26].\nThe work that is most related to ours, in the sense of considering a centralized authority, is the\none by Borodin et al. [5]. They study the problem where multiple campaigns wish to maximize\ntheir in\ufb02uence by selecting a set of seeds with bounded cardinality. They propose a centralized\nmechanism to allocate sets of seeds (possibly overlapping) to the campaigns so as to maximize the\nsocial welfare, de\ufb01ned as the sum of the individual\u2019s sel\ufb01sh objective functions. One can choose\nany objective functions as long as it is submodular and non-decreasing. Under this assumption\nthey provide strategyproof (truthful) algorithms that offer guarantees on the social welfare. Their\nframework applies for several competitive in\ufb02uence models. In our case, the number of balanced\nusers is not submodular, and so we do not have any approximation guarantees. Nevertheless, we can\nuse this framework as a heuristic baseline, which we do in the experimental section.\n\n3 Problem De\ufb01nition\n\nPreliminaries: We start with a directed graph G = (V, E, p1, p2) representing a social network.\nWe assume that there are two distinct campaigns that propagate through the network. Each edge\ne = (u, v) \u2208 E is assigned two probabilities, p1(e) and p2(e), representing the probability that a\npost from vertex u will propagate (e.g., it will be reposted) to vertex v in the respective campaigns.\nCascade model: We assume that information on the two campaigns propagates in the network\nfollowing the independent-cascade model [18]. For instance, consider the \ufb01rst campaign (the process\nfor the second campaign is analogous): we assume that there exists a set of seeds I1 from which the\nprocess begins. Propagation proceeds in rounds. At each round, there exists a set of active vertices\nA1 (initially, A1 = I1), where each vertex u \u2208 A1 attempts to activate each vertex v /\u2208 A1, such\nthat (u, v) \u2208 E, with probability p1(u, v). If the propagation attempt from a vertex u to a vertex v\nis successful, we say that v propagates the \ufb01rst campaign. At the end of each round, A1 is set to be\nthe set of vertices that propagated the campaign during the current round.\nGiven a seed set S, we write r1(S) and r2(S) for the vertices that are reached from S using the\naforementioned cascade process, for the respective campaign. Note that since this process is random,\nboth r1(S) and r2(S) are random variables. Computing the expected number of active vertices is a\n#P-hard problem [8], however, we can approximate it within an arbitrary small factor \u0001, with high\nprobability, via Monte-Carlo simulations. Due to this obstacle, all approximation algorithms that\nevaluate an objective function over diffusion processes reduce their approximation by an additive \u0001.\nThroughout this work we avoid repeating this fact for the sake of simplicity of the notation.\n\n3\n\n\fHeterogeneous vs. correlated propagations: We also need to specify how the propagation on the\ntwo campaigns interact with each other. We consider two settings: In the \ufb01rst setting, we assume\nthat the campaign messages propagate independently of each other. Given an edge e = (u, v), the\nvertex v is activated on the \ufb01rst campaign with probability p1(e), given that vertex u is activated on\nthe \ufb01rst campaign. Similarly, v is activated on the second campaign with probability p2(e), given\nthat u is activated on the second campaign. We refer to this setting as heterogeneous.3 In the second\nsetting we assume that p1(e) = p2(e), for each edge e. We further assume that the coin \ufb02ips for\nthe propagation of the two campaigns are totally correlated. Namely, consider an edge e = (u, v),\nwhere u is reached by either or both campaigns. Then with probability p1(e), any campaign that has\nreached u, will also reach v. We refer to this second setting as correlated.\nNote that in both settings, a vertex may be active by none, either, or both campaigns. This is in\ncontrast to most existing work in competitive viral marketing, where it is assumed that a vertex can\nbe activated by at most one campaign. The intuition is that in our setting activation means merely\npassing a message or posting an article, and it does not imply full commitment to the campaign. We\nalso note that the heterogeneous setting is more realistic than the correlated, however, we also study\nthe correlated model as it is mathematically simpler.\nProblem de\ufb01nition: We are now ready to state our problem for balancing information exposure\n(BALANCE). Given a directed graph, initial seed sets for both campaigns and a budget, we ask to\n\ufb01nd additional seeds that would balance the vertices. More formally:\nProblem 3.1 (BALANCE). Let G = (V, E, p1, p2) be a directed graph, and two sets I1 and I2 of\ninitial seeds of the two campaigns. Assume that we are given a budget k. Find two sets S1 and S2,\nwhere |S1| + |S2| \u2264 k maximizing\n\n\u03a6(S1, S2) = E[|V \\ (r1(I1 \u222a S1) (cid:52) r2(I2 \u222a S2))|] .\n\nThe objective function \u03a6(S1, S2) is the expected number of vertices that are either reached by both\ncampaigns or remain oblivious to both campaigns. Problem 3.1 is de\ufb01ned for both settings, het-\nerogeneous and correlated. When we need to make explicit the underlying setting we refer to the\nrespective problems by BALANCE-H and BALANCE-C. When referring to BALANCE-H, we denote\nthe objective by \u03a6H . Similarly, when referring to BALANCE-C, we write \u03a6C . We drop the indices,\nwhen we are referring to both models simultaneously.\n\nComputational complexity: As expected, the optimization problem BALANCE turns out to be\nNP-hard for both settings, heterogeneous and correlated. A straightforward way to prove it is by\nsetting I2 = V , so the problems reduce to standard in\ufb02uence maximization. However, we provide\na stronger result. Note that instead of maximizing balanced vertices we can equivalently minimize\nthe imbalanced vertices. However, this turns to be a more dif\ufb01cult problem.\nProposition 1. Assume a graph G = (V, E, p1, p2) with two sets I1 and I2 and a budget k. It is\nan NP-hard problem to decide whether there are sets S1 and S2 such that |S1| + |S2| \u2264 k and\nE[|r1(I1 \u222a S1) (cid:52) r2(I2 \u222a S2)|] = 0.\nThis result holds for both models, even when p1 = p2 = 1. This result implies that the minimization\nversion of the problem is NP-hard, and there is no algorithm with multiplicative approximation\nguarantee. It also implies that BALANCE-H and BALANCE-C are also NP-hard. However, we will\nsee later that we can obtain approximation guarantees for these maximization problems.\n\n4 Greedy algorithms yielding approximation guarantees\n\nIn this section we propose three greedy algorithms. The \ufb01rst algorithm yields an approximation\nguarantee of (1 \u2212 1/e)/2 for both models. The remaining two algorithms yield a guarantee for the\ncorrelated model only.\nDecomposing the objective: Recall that the objective function of the BALANCE problem is\n\u03a6(S1, S2). In order to show that this function admits an approximation guarantee, we decompose it\ninto two components. To do that, assume that we are given initial seeds I1 and I2, and let us write\n\n3Although independent is probably a better term than heterogeneous, we adopt the latter to avoid any con-\n\nfusion with the independent-cascade model.\n\n4\n\n\fX = r1(I1) \u222a r2(I2), Y = V \\ X. Here X are vertices reached by any initial seed in the two cam-\npaigns and Y are the vertices that are not reached at all. Note that X and Y are random variables.\nSince X and Y partition V , we can decompose the score \u03a6(S1, S2) as\n\u03a6(S1, S2) = \u2126(S1, S2) + \u03a8(S1, S2), where\n\u2126(S1, S2) = E[|X \\ (r1(I1 \u222a S1) (cid:52) r2(I2 \u222a S2))|] ,\n\u03a8(S1, S2) = E[|Y \\ (r1(I1 \u222a S1) (cid:52) r2(I2 \u222a S2))|] .\n\nWe \ufb01rst show that \u2126(S1, S2) is monotone and submodular. It is well-known that for maximizing\na function that has these two properties under a size constraint, the greedy algorithm computes an\n(1 \u2212 1\nLemma 2. \u2126(S1, S2) is monotone and submodular.\n\ne ) approximate solution [24].\n\nWe are ready to discuss our algorithms.\nAlgorithm 0: ignore \u03a8. Our \ufb01rst algorithm is very simple: instead of maximizing \u03a6, we maximize\n\u2126, i.e., we ignore any vertices that are made imbalanced during the process. Since \u2126 is submodular\nand monotone we can use the greedy algorithm. If we then compare the obtained result with the\nempty solution, we get the promised approximation guarantee. We refer to this algorithm as Cover.\nProposition 3. Let (cid:104)S\u2217\n2(cid:105) be the optimal solution maximizing \u03a6. Let (cid:104)S1, S2(cid:105) be the solution\n1 , S\u2217\nobtained via greedy algorithm maximizing \u2126. Then\nmax{\u03a6(S1, S2), \u03a6(\u2205,\u2205)} \u2265 1 \u2212 1/e\n\n\u03a6(S\u2217\n\n1 , S\u2217\n2 ).\n\n2\n\nAlgorithm 1: force common seeds.\nIgnoring the \u03a8 term may prove costly as it is possible to\nintroduce a lot of new imbalanced vertices. The idea behind the second algorithm is to force \u03a8 = 0.\nWe do this by either adding the same seeds to both campaigns, or adding a seed that is covered\nby an opposing campaign. This algorithm has guarantees only in the correlated setting with even\nbudget k but in practice we can use the algorithm also for the heterogeneous setting. We refer to this\nalgorithm as Common and the pseudo-code is given in Algorithm 1.\n\nAlgorithm 1: Common, greedy algorithm that only adds common seeds\n1 S1 \u2190 S2 \u2190 \u2205;\n2 while |S1| + |S2| \u2264 k do\n\n3\n4\n5\n6\n\nc \u2190 arg maxc \u03a6(S1 \u222a {c} , S2 \u222a {c});\ns1 \u2190 arg maxs\u2208I1 \u03a6(S1, S2 \u222a {s});\ns2 \u2190 arg maxs\u2208I2 \u03a6(S1 \u222a {s} , S2);\nadd the best option among (cid:104)c, c(cid:105), (cid:104)\u2205, s1(cid:105), (cid:104)s2,\u2205(cid:105) to (cid:104)S1, S2(cid:105) while respecting the budget.\n\nWe \ufb01rst show in the following lemma that adding common seeds may halve the score, in the worst\ncase. Then, we use this lemma to prove the approximation guarantee\nLemma 4. Let (cid:104)S1, S2(cid:105) be a solution to BALANCE-C, with an even budget k. There exists a solution\n(cid:104)S(cid:48)\n\n2) \u2265 \u03a6C (S1, S2)/2.\n\n2 such that \u03a6C (S(cid:48)\n\n2(cid:105) with S(cid:48)\n\n1, S(cid:48)\n\n1, S(cid:48)\n\n1 = S(cid:48)\n\n1, Si\n\nIt is easy to see that the greedy algorithm satis\ufb01es the conditions of the following proposition.\nProposition 5. Assume an iterative algorithm where at each iteration, we add one or two vertices\n1, Si\nto our solution until our constraints are met. Let Si\n1 =\n2) be the cost after the i-th iteration. Assume that \u03b7i \u2265 \u03b7i\u22121. Assume\n2 = \u2205. Let \u03b7i = \u03a6C (Si\nS0\nfurther that for i = 1, . . . , k/2 it holds that \u03b7i \u2265 \u03a6C (Si\u22121\n2 \u222a {c}). Then the algorithm\nyields (1 \u2212 1/e)/2 approximation.\nAlgorithm 2: common seeds as baseline. Not allowing new imbalanced vertices may prove to be\ntoo restrictive. We can relax this condition by allowing new imbalanced vertices as long as the gain is\nat least as good as adding a common seed. We refer to this algorithm as Hedge and the pseudo-code\nis given in Algorithm 2. The approximation guarantee for this algorithm\u2014in the correlated setting\nand with even budget\u2014follows immediately from Proposition 5 as it also satis\ufb01es the conditions.\n\n2 be the sets after the i-th iteration, S0\n1 \u222a {c} , Si\u22121\n\n5\n\n\fAlgorithm 2: Hedge, greedy algorithm, where each step is as good as adding the best common\nseed\n1 S1 \u2190 S2 \u2190 \u2205;\n2 while |S1| + |S2| \u2264 k do\n\nc \u2190 arg maxc \u03a6(S1 \u222a {c} , S2 \u222a {c});\ns1 \u2190 arg maxs \u03a6(S1, S2 \u222a {s});\ns2 \u2190 arg maxs \u03a6(S1 \u222a {s} , S2);\nadd the best option among (cid:104)c, c(cid:105), (cid:104)\u2205, s1(cid:105), (cid:104)s2,\u2205(cid:105), (cid:104)s2, s1(cid:105), to (cid:104)S1, S2(cid:105) while respecting the\nbudget.\n\n3\n4\n5\n6\n\n5 Experimental evaluation\n\nIn this section, we evaluate the effectiveness of our algorithms on real-world datasets. We focus\non (i) analyzing the quality of the seeds picked by our algorithms in comparison to other heuristic\napproaches and baselines; (ii) analyzing the ef\ufb01ciency and the scalability of our algorithms; and\n(iii) providing anecdotal examples of the obtained results. Although we setup our experiments in\norder to mimic social behavior, we note that fully realistic experiments would entail the ability to\nintervene in the network, select seeds, and observe the resulting cascades. This, however, is well\nbeyond our capacity and the scope of the paper.\nIn all experiments we set k to range between 5 and 50 with a step of 5. We report averages over\n1 000 random simulations of the cascade process.\nDatasets: To evaluate the effectiveness of our algorithms, we run experiments on real-world data\ncollected from twitter. Let G = (V, E) be the twitter follower graph. A directed edge (u, v) \u2208 E\nindicates that user v follows u; note that the edge direction indicates the \u201cinformation \ufb02ow\u201d from\na user to their followers. We de\ufb01ne a cascade GX = (X, EX ) as a graph over the set of users\nX \u2286 V who have retweeted at least one hashtag related to a topic (e.g., US elections). An edge\n(u, v) \u2208 EX \u2286 E indicates that v retweeted u.\nWe use datasets from six topics with opposing viewpoints, covering politics (US-elections,\nBrexit, ObamaCare), policy (Abortion, Fracking), and lifestyle (iPhone, focusing on iPhone\nvs. Samsung). All datasets are collected by \ufb01ltering the twitter streaming API (1% random sample\nof all tweets) for a set of keywords used in previous work [20]. For each dataset, we identify two\nsides (indicating the two view-points) on the retweet graph, which has been shown to capture best\nthe two opposing sides of a controversy [12]. Details on the statistics of the dataset can be found at\nthe supplementary material.\nAfter building the graphs, we need to estimate the diffusion probabilities for the heterogeneous\nand correlated models. Note that the estimation of the diffusion probabilities is orthogonal to our\ncontribution in this paper. For the sake of concreteness we have used the approach described below.\nOne could use a different, more advanced, method; our methods are still applicable.\nLet q1(v) and q2(v) be an a priori probability of a user v retweeting sides 1 and 2, respectively.\nThese are measured from the data by looking at how often a user retweets content from users and\nkeywords that are discriminative of each side. For example, for US-elections, the discriminative\nusers and keywords for side Hillary would be @hillaryclinton and #imwither, and for Trump, @re-\naldonaldtrump and #makeamericagreatagain. The probability that user v retweets user u (cascade\nprobability) is then de\ufb01ned as\n\npi(u, v) = \u03b1 qi(v) + (1 \u2212 \u03b1)\n\n(cid:18) R(u, v) + 1\n\n(cid:19)\n\nR(v) + 2\n\n,\n\ni = 1, 2,\n\nwhere R(u, v) is the number of times v has retweeted u, and R(v) is the total number of retweets\nof user v. The cascade probabilities pi capture the fact that users retweet content if they see it from\ntheir friends (term R(u,v)+1\nR(v)+2 ) or based on their own biases (term qi(v)). The additive terms in the\nnumerator and denominator provide an additive smoothing by Laplace\u2019s rule of succession.\nWe set the value of \u03b1 to 0.8 for the heterogeneous setting. For \u03b1 = 0 the edge probabilities become\nequal for the two campaigns, which is our assumption for the correlated setting.\n\n6\n\n\fiPhone\n\nObamaCare\n\nUS-elections\n\n2 500\n\n2 000\n\n1 500\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n10\n\n20\n\n30\nbudget k\niPhone\n\n40\n\n50\n\n10\n\n20\n\n30\nbudget k\n\n40\n\n50\n\nObamaCare\n\n400\n\n200\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\nCover\nHedge\nCommon\nGreedy\n\n10\n\n20\n\n30\nbudget k\n\n40\n\n50\n\nUS-elections\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n1 500\n\n1 000\n\n500\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n2 000\n\n1 500\n\n1 000\n\n500\n\n0\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n600\n\n500\n\n400\n\n300\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n80\n\n60\n\n40\n\n20\n\n0\n\n10\n\n20\n\n30\nbudget k\n\n40\n\n50\n\n10\n\n20\n\n30\nbudget k\n\n40\n\n50\n\n10\n\n20\n\n30\nbudget k\n\n40\n\n50\n\nFigure 1: Expected symmetric difference n \u2212 \u03a6C as a function of the budget k. Top row, heteroge-\nneous model, bottom row: Correlated model. Low values are better.\n\n1 \u222a S(cid:48)\n\n2 while Intersection selects S1 and S2 to be equal to k/2 \ufb01rst vertices in S(cid:48)\n\nBaselines. We use 5 different baselines. The \ufb01rst baseline, BBLO, is an adaptation of the framework\nby Borodin et al. [5]. This framework requires an objective function as input, and here we use our\nobjective function \u03a6. The framework works as follows: The two campaigns are given a budget k/2\non the number of seeds that they can select. At each round, we select a vertex v for S1, optimizing\n\u03a6(S1 \u222a {v} , S2), and a vertex w for S2, optimizing \u03a6(S1, S2 \u222a {w}). We should stress that the\ntheoretical guarantees by [5] do not apply because our objective is not submodular.\nThe next two heuristics add a set of common seeds to both campaigns. We run a greedy algorithm\ni with the (cid:96) (cid:29) k vertices Pi that optimizes the function\nfor campaign i = 1, 2 to select the set S(cid:48)\ni \u222a Ii). We consider two heuristics: Union selects S1 and S2 to be equal to the k/2 \ufb01rst distinct\nri(S(cid:48)\n1 \u2229 S(cid:48)\nvertices in S(cid:48)\n2.\nHere the vertices are ordered based on their discovery time.\nFinally, HighDegree selects the vertices with the largest number of followers and assigns them alter-\nnately to the two cascades; and Random assigns k/2 random seeds to each campaign.\nIn addition to the baselines, we also consider a simple greedy algorithm Greedy. The difference\nbetween Cover and Greedy is that, in each iteration, Cover adds the seed that maximizes \u2126, while\nGreedy adds the seed that maximizes \u03a6. We can only show an approximation guarantee for Cover\nbut Greedy is a more intuitive approach, and we use it as a heuristic.\nComparison of the algorithms. We start by evaluating the quality of the sets of seeds computed by\nour algorithms, i.e., the number of equally-informed vertices.\nHeterogeneous setting. We consider \ufb01rst the case of heterogeneous networks. The results for the\nselected datasets are shown in Figure 1. Full results are shown in the supplementary material. Instead\nof plotting \u03a6, we plot the number of the remaining unbalanced vertices, n\u2212\u03a6, as it makes the results\neasier to distinguish; i.e., an optimal solution achieves the value 0.\nThe \ufb01rst observation is that the approximation algorithm Cover performs, in general, worse than\nthe other two heuristics. This is due to the fact that Cover does not optimize directly the objective\nfunction. Hedge performs better than Greedy, in general, since it examines additional choices to\nselect. The only deviation from this picture is for the US-elections dataset, where the Greedy\noutperforms Hedge by a small factor. This may due to the fact that while Hedge has more options,\nit allocates seeds in batches of two.\nCorrelated setting. Next we consider correlated networks. We experiment with the three approx-\nimation algorithms Cover, Common, Hedge, and the heuristic Greedy. The results are shown in\nFigure 1. Cover performs again the worst since it is the only method that introduces new unbalanced\nvertices without caring about their cardinality. Its variant, Greedy, performs much better in practice\neven though it does not provide an approximation guarantee. The algorithms Common, Greedy, and\nHedge perform very similar to each other without a clear winner.\n\n7\n\n\f.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n6\n\n4\n\n2\n\n0\n\nAbortion Brexit Fracking iPhoneObamaCare\n\nUS\n\n.\nf\nf\ni\nd\n\n.\n\nm\nm\ny\ns\n\n4\n\n3\n\n2\n\n1\n\n0\n\ne\ne\nr\ng\ne\nD\nh\ng\nH\n\ni\n\nm\no\nd\nn\na\nR\n\nn\no\ni\nt\nc\ne\ns\nr\ne\nt\nn\nI\n\ne\ng\nd\ne\nH\n\nO\nL\nB\nB\n\ni\n\nn\no\nn\nU\n\n\u00d7103\n\nHeterogeneous\n\n\u00d7103\n\nCorrelated\n\nAbortion Brexit Fracking iPhoneObamaCare\n\nUS\n\nFigure 2: Expected symm. diff. n \u2212 \u03a6 of Hedge and the baselines. k = 20. Low values are better.\n\nComparison with baselines. Our next step is to compare against the baselines. For simplicity, we\nfocus on k = 20; the overall conclucions hold for other budgets. The results for Hedge versus the\n\ufb01ve baselines are shown in Figure 2.\nFrom the results we see that BBLO is the best competitor: its scores are the closest to Hedge, and\nit receives slightly better scores in 3 out of 12 cases. The competitiveness is not surprising because\nwe speci\ufb01cally set the objective function in BBLO to be \u03a6(S1, S2). The Intersection and Union\nalso perform well but are always worse than Hedge. Random is unpredictable but always worse\nthan Hedge. In the case of heterogeneous networks, Hedge selects seeds that leave less unbalanced\nvertices, by a factor of two on average, compared to the seeds selected by the HighDegree method.\nFor correlated networks, our method outperforms the two baselines by an order of magnitude. The\nactual values of this experiment can be found in the supplementary material.\nRunning time. We proceed to evaluate the ef\ufb01ciency and the scalability of our algorithms. We\nobserve that all algorithms have comparable running times and good scalability. More information\ncan be found in the supplementary material.\nUse case with Fracking. We present a qualitative case-study analysis for the seeds selected by our\nalgorithm. We highlight the Fracking dataset, even though we applied similar analysis to the other\ndatasets as well (the results are given in the supplementary material of the paper). Recall that for\neach dataset we identify two sides with opposing views, and a set of initial seeds for each side (I1\nand I2). We consider the users in the initial seeds I1 (side supporting fracking), and summarize the\ntext of all their Twitter pro\ufb01le descriptions in a word cloud. The result, contains words that are used\nto emphasize the bene\ufb01ts of fracking (energy, oil, gas, etc.). We then draw a similar word cloud\nfor the users identi\ufb01ed by the Hedge algorithm as seed nodes in the sets S1 and S2 (k = 50). The\nresult, contains a more balanced set of words, which includes many words used to underline the\nenvironmental dangers of fracking. We use word clouds as a qualitative case study to complement\nour quantitative results and to provide more intuition about our problem statement, rather than an\nalternative quantitative measure.\n\n6 Conclusion\n\nWe presented the \ufb01rst study of the problem of balancing information exposure in social networks\nusing techniques from the area of information diffusion. Our approach has several novel aspects. In\nparticular, we formulate our problem by seeking to optimize a symmetric difference function, which\nis neither monotone nor submodular, and thus, not amenable to existing approaches. Additionally,\nwhile previous studies consider a setting with sel\ufb01sh agents and provide bounds on best-response\nstrategies (i.e., move of the last player), we consider a centralized setting and provide bounds for a\nglobal objective function.\nOur work provides several directions for future work. One interesting problem is to improve the\napproximation guarantee for the problem we de\ufb01ne. Second, we would like to extend the problem\nde\ufb01nition for more than two campaigns and design approximation algorithms for that case. Finally,\nwe believe that it is worth studying the BALANCE problem under complex diffusion models that\ncapture more realistic social behavior in the presence of multiple campaigns. One such extension\nis to consider propagation probabilities on the edges that are dependent in the past behavior of the\nnodes with respect to the two campaigns, e.g., one could consider Hawkes processes [28].\nAcknowledgments. This work has been supported by the Academy of Finland projects \u201cNestor\u201d\n(286211) and \u201cAgra\u201d (313927), and the EC H2020 RIA project \u201cSoBigData\u201d (654024).\n\n8\n\n\fReferences\n[1] L. A. Adamic and N. Glance. The political blogosphere and the 2004 us election: divided they blog. In\n\nLinkKDD, pages 36\u201343, 2005.\n\n[2] N. Alon, M. Feldman, A. D. Procaccia, and M. Tennenholtz. A note on competitive diffusion through\n\nsocial networks. IPL, 110(6):221\u2013225, 2010.\n\n[3] S. Bharathi, D. Kempe, and M. Salek. Competitive in\ufb02uence maximization in social networks. In WINE,\n\n2007.\n\n[4] A. Borodin, Y. Filmus, and J. Oren. Threshold models for competitive in\ufb02uence in social networks. In\n\nWINE, 2010.\n\n[5] A. Borodin, M. Braverman, B. Lucier, and J. Oren. Strategyproof mechanisms for competitive in\ufb02uence\n\nin networks. In WWW, pages 141\u2013150, 2013.\n\n[6] C. Budak, D. Agrawal, and A. El Abbadi. Limiting the spread of misinformation in social networks. In\n\nWWW, pages 665\u2013674, 2011.\n\n[7] T. Carnes, C. Nagarajan, S. M. Wild, and A. Van Zuylen. Maximizing in\ufb02uence in a competitive social\n\nnetwork: a follower\u2019s perspective. In EC, 2007.\n\n[8] W. Chen, C. Wang, and Y. Wang. Scalable in\ufb02uence maximization for prevalent viral marketing in large-\n\nscale social networks. In KDD, pages 1029\u20131038, 2010.\n\n[9] M. Conover, J. Ratkiewicz, M. Francisco, B. Gonc\u00b8alves, F. Menczer, and A. Flammini. Political Polar-\n\nization on Twitter. In ICWSM, 2011.\n\n[10] P. Dubey, R. Garg, and B. De Meyer. Competing for customers in a social network: The quasi-linear case.\n\nIn WINE, 2006.\n\n[11] M. Farajtabar, X. Ye, S. Harati, L. Song, and H. Zha. Multistage campaigning in social networks. In\n\nNIPS, pages 4718\u20134726. 2016.\n\n[12] K. Garimella, G. De Francisci Morales, A. Gionis, and M. Mathioudakis. Quantifying controversy in\n\nsocial media. In WSDM, pages 33\u201342, 2016.\n\n[13] K. Garimella, G. De Francisci Morales, A. Gionis, and M. Mathioudakis. Reducing controversy by\n\nconnecting oppposing views. In WSDM, 2017.\n\n[14] R. K. Garrett. Echo chambers online?: Politically motivated selective exposure among internet news\n\nusers1. JCMC, 14(2):265\u2013285, 2009.\n\n[15] J. Gottfried and E. Shearer. News use across social media platforms 2016. Pew Research Center, 2016.\n[16] S. Goyal, H. Heidari, and M. Kearns. Competitive contagion in networks. Games and Economic Behavior,\n\n2014.\n\n[17] R. Jie, J. Qiao, G. Xu, and Y. Meng. A study on the interaction between two rumors in homogeneous\n\ncomplex networks under symmetric conditions. Physica A, 454:129\u2013142, 2016.\n\n[18] D. Kempe, J. Kleinberg, and \u00b4E. Tardos. Maximizing the spread of in\ufb02uence through a social network. In\n\nKDD, pages 137\u2013146, 2003.\n\n[19] Q. V. Liao and W.-T. Fu. Expert voices in echo chambers: effects of source expertise indicators on\n\nexposure to diverse opinions. In CHI, pages 2745\u20132754, 2014.\n\n[20] H. Lu, J. Caverlee, and W. Niu. Biaswatch: A lightweight system for discovering and tracking topic-\n\nsensitive opinion bias in social media. In CIKM, pages 213\u2013222, 2015.\n\n[21] W. Lu, W. Chen, and L. V. Lakshmanan. From competition to complementarity: comparative in\ufb02uence\n\ndiffusion and maximization. PVLDB, 9(2):60\u201371, 2015.\n\n[22] A. Morales, J. Borondo, J. Losada, and R. Benito. Measuring political polarization: Twitter shows the\n\ntwo sides of Venezuela. Chaos, 25(3), 2015.\n\n[23] S. A. Myers and J. Leskovec. Clash of the contagions: Cooperation and competition in information\n\ndiffusion. In ICDM, pages 539\u2013548, 2012.\n\n[24] G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of approximations for maximizing submodular set\n\nfunctions \u2013 I. Mathematical Programming, 14(1):265\u2013294, 1978.\n\n[25] E. Pariser. The \ufb01lter bubble: What the Internet is hiding from you. Penguin UK, 2011.\n[26] V. Tzoumas, C. Amanatidis, and E. Markakis. A game-theoretic analysis of a competitive diffusion\n\nprocess over social networks. In WINE, 2012.\n\n[27] I. Valera and M. Gomez-Rodriguez. Modeling adoption of competing products and conventions in social\n\nmedia. In ICDM, 2015.\n\n[28] A. Zarezade, A. Khodadadi, M. Farajtabar, H. R. Rabiee, and H. Zha. Correlated cascades: Compete or\n\ncooperate. In AAAI, pages 238\u2013244, 2017.\n\n9\n\n\f", "award": [], "sourceid": 2439, "authors": [{"given_name": "Kiran", "family_name": "Garimella", "institution": "Aalto University"}, {"given_name": "Aristides", "family_name": "Gionis", "institution": "Aalto University"}, {"given_name": "Nikos", "family_name": "Parotsidis", "institution": "University of Rome Tor Vergata"}, {"given_name": "Nikolaj", "family_name": "Tatti", "institution": "Aalto University"}]}