{"title": "Group and Topic Discovery from Relations and Their Attributes", "book": "Advances in Neural Information Processing Systems", "page_first": 1449, "page_last": 1456, "abstract": "", "full_text": "Group and Topic Discovery\n\nfrom Relations and Their Attributes\n\nXuerui Wang, Natasha Mohanty, Andrew McCallum\n\n{xuerui,nmohanty,mccallum}@cs.umass.edu\n\nAmherst, MA 01003\n\nDepartment of Computer Science\n\nUniversity of Massachusetts\n\nAbstract\n\nWe present a probabilistic generative model of entity relationships and\ntheir attributes that simultaneously discovers groups among the entities\nand topics among the corresponding textual attributes. Block-models of\nrelationship data have been studied in social network analysis for some\ntime. Here we simultaneously cluster in several modalities at once, incor-\nporating the attributes (here, words) associated with certain relationships.\nSigni\ufb01cantly, joint inference allows the discovery of topics to be guided\nby the emerging groups, and vice-versa. We present experimental results\non two large data sets: sixteen years of bills put before the U.S. Sen-\nate, comprising their corresponding text and voting records, and thirteen\nyears of similar data from the United Nations. We show that in compari-\nson with traditional, separate latent-variable models for words, or Block-\nstructures for votes, the Group-Topic model\u2019s joint inference discovers\nmore cohesive groups and improved topics.\n\n1 Introduction\n\nThe \ufb01eld of social network analysis (SNA) has developed mathematical models that dis-\ncover patterns in interactions among entities. One of the objectives of SNA is to detect\nsalient groups of entities. Group discovery has many applications, such as understanding\nthe social structure of organizations or native tribes, uncovering criminal organizations,\nand modeling large-scale social networks in Internet services such as Friendster.com or\nLinkedIn.com. Social scientists have conducted extensive research on group detection,\nespecially in \ufb01elds such as anthropology and political science. Recently, statisticians and\ncomputer scientists have begun to develop models that speci\ufb01cally discover group member-\nships [5, 2, 7]. One such model is the stochastic Blockstructures model [7], which discovers\nthe latent groups or classes based on pair-wise relation data. A particular relation holds be-\ntween a pair of entities (people, countries, organizations, etc.) with some probability that\ndepends only on the class (group) assignments of the entities. This model is extended in\n[4] to support an arbitrary number of groups by using a Chinese Restaurant Process prior.\n\nThe aforementioned models discover latent groups by examining only whether one or more\nrelations exist between a pair of entities. The Group-Topic (GT) model presented in this pa-\nper, on the other hand, considers both the relations between entities and also the attributes\n\n\fof the relations (e.g., the text associated with the relations) when assigning group mem-\nberships. The GT model can be viewed as an extension of the stochastic Blockstructures\nmodel [7] with the key addition that group membership is conditioned on a latent variable,\nwhich in turn is also associated with the attributes of the relation. In our experiments, the\nattributes of relations are words, and the latent variable represents the topic responsible for\ngenerating those words. Our model captures the (language) attributes associated with inter-\nactions, and uses distinctions based on these attributes to better assign group memberships.\n\nConsider a legislative body and imagine its members forming coalitions (groups), and vot-\ning accordingly. However, different coalitions arise depending on the topic of the resolution\nup for a vote.\nIn the GT model, the discovery of groups is guided by the emerging topics,\nand the forming of topics is shaped by emerging groups.Resolutions that would have been\nassigned the same topic in a model using words alone may be assigned to different top-\nics if they exhibit distinct voting patterns. Topics may be merged if the entities vote very\nsimilarly on them. Likewise, multiple different divisions of entities into groups are made\npossible by conditioning them on the topics.\nThe importance of modeling the language associated with interactions between people has\nrecently been demonstrated in the Author-Recipient-Topic (ART) model [6].\nIt can mea-\nsure role similarity by comparing the topic distributions for two entities. However, the\nART model does not explicitly discover groups formed by entities. When forming la-\ntent groups, the GT model simultaneously discovers salient topics relevant to relationships\nbetween entities\u2014topics which the models that only examine words are unable to detect.\n\nWe demonstrate the capabilities of the GT model by applying it to two large sets of vot-\ning data: one from US Senate and the other from the General Assembly of the UN. The\nmodel clusters voting entities into coalitions and simultaneously discovers topics for word\nattributes describing the relations (bills or resolutions) between entities. We \ufb01nd that the\ngroups obtained from the GT model are signi\ufb01cantly more cohesive (p-value < 0.01) than\nthose obtained from the Blockstructures model. The GT model also discovers new and\nmore salient topics that help better predict entities\u2019 behaviors.\n\n2 Group-Topic Model\n\nThe Group-Topic model is a directed graphical model that clusters entities with relations\nbetween them, as well as attributes of those relations. The relations may be either sym-\nmetric or asymmetric and have multiple attributes. In this paper, we focus on symmetric\nrelations and have words as the attributes on relations. The graphical model representation\nof the model and our notation are shown in Figure 1.\n\nWithout considering the topics of events, or by treat-\ning all events in a corpus as re\ufb02ecting a single\ntopic, the simpli\ufb01ed model becomes equivalent to\nthe stochastic Blockstructures model [7]. Here, each\nevent de\ufb01nes a relationship, e.g., whether in the event\ntwo entities\u2019 group(s) behave the same way or not.\nOn the other hand, in our model a relation may also\nhave multiple attributes. When we consider the com-\nplete model, the dataset is dynamically divided into\nT sub-blocks each of which corresponds to a topic.\nThe generative process of the GT model is as right.\n\ntb \u223c Uniform(1/T )\nwit|\u03c6t \u223c Multinomial(\u03c6t)\n\u03c6t|\u03b7 \u223c Dirichlet(\u03b7)\ngit|\u03b8t \u223c Multinomial(\u03b8t)\n\u03b8t|\u03b1 \u223c Dirichlet(\u03b1)\nij |\u03b3(b)\ngigj \u223c Binomial(\u03b3(b)\nv(b)\ngigj )\ngh |\u03b2 \u223c Beta(\u03b2).\n\u03b3(b)\n\nWe want to perform joint inference on (text) attributes and relations to obtain topic-wise\ngroup memberships. We employ Gibbs sampling to conduct inference. Note that we adopt\nconjugate priors in our setting, and thus we can easily integrate out \u03b8, \u03c6 and \u03b3 to decrease\n\n\fSYMBOL DESCRIPTION\ngst\ntb\nw(b)\nk\nv(b)\nij\n\nentity s\u2019s group assignment in topic t\ntopic of an event b\nthe kth token in the event b\nentity i and j\u2019s group(s) behaved same (1)\nor differently (2) on the event b\n# of entities\n# of topics\n# of groups\n# of events\n# of unique words\n# of word tokens in the event b\n# of entities who participated in the event b\n\nS\nT\nG\nB\nV\nNb\nSb\n\nFigure 1: The Group-Topic model and notations used in this paper\n\ng=1\n\nh=1\n\nPG\n\nthe uncertainty associated with them.. In our case we need to compute the conditional dis-\ntribution P (gst|w, v, g\u2212st, t, \u03b1, \u03b2, \u03b7) and P (tb|w, v, g, t\u2212b, \u03b1, \u03b2, \u03b7), where g\u2212st denotes\nthe group assignments for all entities except entity s in topic t, and t\u2212b represents the topic\nassignments for all events except event b. Beginning with the joint probability of a dataset,\nand using the chain rule, we can obtain the conditional probabilities conveniently. In our\nsetting, the relationship we are investigating is always symmetric, so we do not distinguish\nRij and Rji in our derivations (only Rij(i \u2264 j) remain). Thus\nP (gst|v, g\u2212st, w, t, \u03b1, \u03b2, \u03b7)\n\u221d\n\n \nI(tb = t)QG\n\n\u03b1gst +ntgst\u22121\n\nQB\n\nQd\n\n!\n\n(b)\ngst hk\n\nx=1\n\nk=1\n\n(b)\n\n,\n\n(cid:0)\u03b2k+m\n(cid:0)(P2\n\nk=1\n\ngsthk\u2212x(cid:1)\ngsthk)\u2212x(cid:1)\n\n(b)\n\nQ2\nQP2\n\nx=1\n\nk=1\n\nwhere ntg represents how many entities are assigned into group g in topic t, ctv repre-\nsents how many tokens of word v are assigned to topic t, m(b)\nghk represents how many times\ngroup g and h vote same (k = 1) and differently (k = 2) on event b, I(tb = t) is an\nindicator function, and d(b)\ngsthk if entity s were assigned to group\ngst than without considering s at all (if I(tb = t) = 0, we ignore the increase in event b).\nP (tb|v, g, w, t\u2212b, \u03b1, \u03b2, \u03b7)\n\u221d\n\ngsthk is the increase in m(b)\n\n(\u03b7v+ctb v\u2212x)\n\nQG\n\n\u0393(\u03b2k+m\n\nx=1\n\nk=1\n\nv=1\n\n,\n\n(\u03b1g+ntg)\u22121\n\nb=1\n\nd\n\n(b)\ngsthk\n\n(\u03b2k+m\n\nQV\nQPV\n\nv=1\n\nx=1\n\nQe\n(cid:0)PV\n\n(b)\nv\n\n(b)\nv\n\ne\n\nv=1\n\n(\u03b7v+ctb v)\u2212x(cid:1)QG\n\nQ2\n\u0393(P2\n\ng=1\n\nh=g\n\n(b)\nghk)\n(b)\nghk))\n\n(\u03b2k+m\n\nk=1\n\nis the number of tokens of word v in event b.\n\nwhere e(b)\nv\nThe GT model uses information from two different modalities whose likelihoods are gen-\nerally not directly comparable, since the number of occurrences of each type may vary\ngreatly. Thus we raise the \ufb01rst term in the above formula to a power, as is common in\nspeech recognition when the acoustic and language models are combined.\n\n3 Related Work\n\nThere has been a surge of interest in models that describe relational data, or relations\nbetween entities viewed as links in a network, including recent work in group discovery\n[2, 5]. The GT model is an enhancement of the stochastic Blockstructures model [7] and\n\n\u03b7\u03c6wv\u03b3\u03b2tNbSb2TBG2STBg\u03b1\u03b8\fDatasets Avg. AI for GT Avg. AI for Baseline\nSenate\n\nUN\n\n0.8294\n0.8664\n\n0.8198\n0.8548\n\np-value\n< .01\n< .01\n\nTable 1: Average AI for GT and Baseline for both Senate and UN datasets. The group\ncohesion in GT is signi\ufb01cantly better than in baseline.\n\nthe extended model of Kemp et al. [4] as it takes advantage of information from different\nmodalities by conditioning group membership on topics. In this sense, the GT model draws\ninspiration from the Role-Author-Recipient-Topic (RART) model [6]. As an extension of\nART model, RART clusters together entities with similar roles. In contrast, the GT model\npresented here clusters entities into groups based on their relations to other entities.\n\nThere has been a considerable amount of previous work in understanding voting patterns.\nExploring the notion that the behavior of an entity can be explained by its (hidden) group\nmembership, Jakulin and Buntine [3] develop a discrete PCA model for discovering groups,\nwhere each entity can belong to each of the k groups with a certain probability, and each\ngroup has its own speci\ufb01c pattern of behaviors. They apply this model to voting data in\nthe 108th US Senate where the behavior of an entity is its vote on a resolution. We apply\nour GT model also to voting data. However, unlike [3], since our goal is to cluster entities\nbased on the similarity of their voting patterns, we are only interested in whether a pair of\nentities voted the same or differently, not their actual yes/no votes. This \u201ccontent-ignorant\u201d\nfeature is similarly found in work on web log clustering [1].\n\n4 Experimental Results\n\nWe present experiments applying the GT model to the voting records of members of two\nlegislative bodies:\nthe US Senate and the UN General Assembly. For comparison, we\npresent the results of a baseline method that \ufb01rst uses a mixture of unigrams to discover\ntopics and associate a topic with each resolution, and then runs the Blockstructures model\n[7] separately on the resolutions assigned to each topic. This baseline approach is similar\nto the GT model in that it discovers both groups and topics, and has different group as-\nsignments on different topics. However, whereas the baseline model performs inference\nserially, GT performs joint inference simultaneously.\n\nWe are interested in the quality of both the groups and the topics. In the political science\nliterature, group cohesion is quanti\ufb01ed by the Agreement Index (AI) [3], which, based on\nthe number of group members that vote Yes, No or Abstain, measures the similarity of\nvotes cast by members of a group during a particular roll call. Higher AI means better\ncohesion. The group cohesion using the GT model is found to be signi\ufb01cantly greater than\nthe baseline group cohesion under pairwise t-test, as shown in Table 1 for both datasets,\nwhich indicates that the GT model is better able to capture cohesive groups.\n\n4.1 The US Senate Dataset\n\nOur Senate dataset consists of the voting records of Senators in the 101st-109th US Senate\n(1989-2005) obtained from the Library of Congress THOMAS database. During a roll call\nfor a particular bill, a Senator may respond Yea or Nay to the question that has been put\nto vote, else the vote will be recorded as Not Voting. We do not consider Not Voting as a\nunique vote since most of the time it is a result of a Senator being absent from the session\nof the US Senate. The text associated with each resolution is composed of its index terms\nprovided in the database. There are 3423 resolutions in our experiments (we excluded\nroll calls that were not associated with resolutions). Since there are far fewer words than\n\n\fEconomic Education Military Misc.\n\nfederal\nlabor\n\ninsurance\n\naid\ntax\n\nbusiness\nemployee\n\ncare\n\neducation\n\nschool\n\naid\n\nchildren\n\ndrug\n\nstudents\n\nelementary\nprevention\n\ngovernment\n\nmilitary\nforeign\n\ntax\n\ncongress\n\naid\nlaw\npolicy\n\nEnergy\nenergy\npower\nwater\nnuclear\n\ngas\npetrol\nresearch\npollution\n\nTable 2: Top words for topics generated with the mixture of unigrams model on the Senate\ndataset. The headers are our own summary of the topics.\n\nEconomic Education + Domestic\n\nlabor\n\ninsurance\n\ntax\n\ncongress\nincome\nminimum\n\nwage\n\nbusiness\n\neducation\n\nschool\nfederal\n\nenergy\nresearch\n\ngovernment\n\naid\n\ntax\n\nForeign\nforeign\ntrade\n\nchemicals\n\ntariff\n\ncongress\n\ndrugs\n\ncommunicable\n\ndiseases\n\nSocial Security + Medicare\n\nsocial\nsecurity\ninsurance\nmedical\n\ncare\n\nmedicare\ndisability\nassistance\n\nTable 3: Top words for topics generated with the GT model on the Senate dataset. The\ntopics are in\ufb02uenced by both the words and votes on the bills.\n\npairs of votes, we raise the text likelihood to the 5th power (mentioned in Section 2) in the\nexperiments with this dataset so as to balance its in\ufb02uence during inference.\n\nWe cluster the data into 4 topics and 4 groups (cluster sizes are chosen somewhat arbitrarily)\nand compare the results of GT with the baseline. The most likely words for each topic from\nthe traditional mixture of unigrams model is shown in Table 2, whereas the topics obtained\nusing GT are shown in Table 3. The GT model collapses the topics Education and Energy\ntogether into Education and Domestic, since the voting patterns on those topics are quite\nsimilar. The new topic Social Security + Medicare did not have strong enough word\ncoherence to appear in the baseline model, but it has a very distinct voting pattern, and thus\nis clearly found by the GT model. Thus, importantly, GT discovers topics that help predict\npeople\u2019s behavior and relations, not simply word co-occurrences.\n\nExamining the group distribution across topics in the GT model, we \ufb01nd that on the topic\nEconomic the Republicans form a single group whereas the Democrats split into 3 groups\nindicating that Democrats have been somewhat divided on this topic. On the other hand,\nin Education + Domestic and Social Security + Medicare, Democrats are more uni\ufb01ed\nwhereas the Republicans split into 3 groups. The group membership of Senators on Edu-\ncation + Domestic issues is shown in Table 4. We see that the \ufb01rst group of Republicans\ninclude a Democratic Senator from Texas, a state that usually votes Republican. Group 2\n(majority Democrats) includes Sen. Chafee who has been involved in initiatives to improve\neducation, as well as Sen. Jeffords who left the Republican Party to become an Independent\nand has championed legislation to strengthen education and environmental protection.\n\nNearly all the Republican Senators in Group 4 (in Table 4) are advocates for education and\nmany of them have been awarded for their efforts. For instance, Sen. Voinovich and Sen.\nSymms are strong supporters of early education and vocational education, respectively; and\n\n\fGroup 1\n\nGroup 3\n\nGroup 4\n\n73 Republicans\nKrueger (D-TX)\n\nGroup 2\n\n90 Democrats\nChafee (R-RI)\nJeffords (I-VT)\n\nArmstrong (R-CO)\n\nCohen (R-ME)\nDanforth (R-MO)\n\nBrown (R-CO)\nDeWine (R-OH)\nDurenberger (R-MN) Humphrey (R-NH) Thompson (R-TN)\nFitzgerald (R-IL)\nVoinovich (R-OH)\n\nHat\ufb01eld (R-OR)\nHeinz (R-PA)\n\nGarn (R-UT)\n\nMcCain (R-AZ)\nMcClure (R-ID)\n\nRoth (R-DE)\nSymms (R-ID)\nWallop(R-WY)\n\nMiller (D-GA)\n\nColeman (R-MN)\n\nKassebaum (R-KS)\nPackwood (R-OR)\n\nSpecter (R-PA)\nSnowe (R-ME)\nCollins (R-ME)\n\nTable 4: Senators in the four groups corresponding to Education + Domestic in Table 3.\n\nEverything Nuclear Human Rights\n\nSecurity in Middle East\n\nnuclear\nweapons\n\nuse\n\nimplementation\n\ncountries\n\nrights\nhuman\npalestine\nsituation\n\nisrael\n\noccupied\n\nisrael\nsyria\n\nsecurity\n\ncalls\n\nTable 5: Top words for topics generated from mixture of unigrams model with the UN\ndataset. Only text information is utilized to form the topics, as opposed to Table 6 where\nour GT model takes advantage of both text and voting information.\n\nSen. Roth has voted for tax deductions for education. It is also interesting to see that Sen.\nMiller (D-GA) appears in a Republican group; although he is in favor of educational re-\nforms, he is a conservative Democrat and frequently criticizes his own party\u2014even backing\nRepublican George W. Bush over Democrat John Kerry in the 2004 Presidential Election.\n\nMany of the Senators in Group 3 have also focused on education and other domestic issues\nsuch as energy, however, they often have a more liberal stance than those in Group 4, and\ncome from states that are historically less conservative. For example, Sen. Danforth has\npresented bills for a more fair distribution of energy resources. Sen. Kassebaum is known\nto be uncomfortable with many Republican views on domestic issues such as education,\nand has voted against voluntary prayer in school. Thus, both Groups 3 and 4 differ from\nthe Republican core (Group 2) on domestic issues, and also differ from each other.\n\nWe also inspect the Senators that switch groups the most across topics in the GT model. The\ntop 5 Senators are Shelby (D-AL), He\ufb02in (D-AL), Voinovich (R-OH), Johnston (D-LA),\nand Armstrong (R-CO). Sen. Shelby (D-AL) votes with the Republicans on Economic,\nwith the Democrats on Education + Domestic and with a small group of maverick Re-\npublicans on Foreign and Social Security + Medicare. Sen. Shelby, together with Sen.\nHe\ufb02in, is a Democrat from a fairly conservative state (Alabama) and are found to side with\nthe Republicans on many issues.\n\n4.2 The United Nations Dataset\n\nThe second dataset involves the voting record of the UN General Assembly1. We focus\non the resolutions discussed from 1990-2003, which contain votes of 192 countries on 931\nresolutions. If a country is present during the roll call, it may choose to vote Yes, No or\n\n1http://home.gwu.edu/\u223cvoeten/UNVoting.htm\n\n\fG Nuclear Nonproliferation Nuclear Arms Race Human Rights\nR\nO\nU\nP\n\u2193\n\nrights\nhuman\npalestine\noccupied\n\nnuclear\narms\n\nprevention\n\nrace\nspace\nUK\n\nFrance\nSpain\nMonaco\n\nEast-Timor\n\nIndia\nRussia\n\nMicronesia\n\nJapan\n\nGermany\n\nItaly...\nPoland\n\nHungary...\n\nChina\nBrazil\nMexico\nIndonesia\n\nIran...\nUSA\nIsrael\nPalau\n\nisrael\nBrazil\nMexico\nColumbia\n\nChile\nPeru...\n\nNicaragua\n\nPapua\nRwanda\nSwaziland\n\nFiji...\nUSA\nJapan\n\nGermany\n\nUK...\n\nRussia...\nChina\nIndia\n\nIndonesia\nThailand\n\nPhilippines...\n\nBelarus\n\nTurkmenistan\n\nAzerbaijan\nUruguay\n\nKyrgyzstan...\n\nnuclear\nstates\nunited\nweapons\nnations\nBrazil\n\nColumbia\n\nVenezuela...\n\nChile\nPeru\n\nUSA\nJapan\n\nGermany\n\nUK...\n\nRussia...\nChina\nIndia\nMexico\n\nIran\n\nPakistan...\nKazakhstan\n\nBelarus\n\nYugoslavia\nAzerbaijan\nCyprus...\nThailand\nPhilippines\nMalaysia\nNigeria\nTunisia...\n\n1\n\n2\n\n3\n\n4\n\n5\n\nTable 6: Top words for topics generated from the GT model with the UN dataset as well as\nthe corresponding groups for each topic (column). The countries listed for each group are\nordered by their 2005 GDP (PPP).\n\nAbstain. Unlike the Senate dataset, a country\u2019s vote can have one of three possible values\ninstead of two. Because we parameterize agreement and not the votes themselves, this 3-\nvalue setting does not require any change to our model. In experiments with this dataset,\nwe use a weighting factor 500 for text (adjusting the likelihood of text by a power of 500\nso as to make it comparable with the likelihood of pairs of votes for each resolution). We\ncluster this dataset into 3 topics and 5 groups (chosen somewhat arbitrarily).\n\nThe most probable words in each topic from the mixture of unigrams model is shown in\nTable 5. For example, Everything Nuclear constitutes all resolutions that have anything to\ndo with the use of nuclear technology, including nuclear weapons. Comparing these with\ntopics generated from the GT model shown in Table 6, we see that the GT model splits the\ndiscussion about nuclear technology into two separate topics, Nuclear Nonproliferation\n(generally about countries obtaining nuclear weapons and management of nuclear waste),\nand Nuclear Arms Race (focused on the historic arms race between Russia and the US, and\npreventing a nuclear arms race in outer space). These two issues had drastically different\nvoting patterns in the UN, as can be seen in the contrasting group structure for those topics\nin Table 6. Thus, again, the GT model is able to discover more salient topics\u2014topics\n\n\fthat re\ufb02ect the voting patterns and coalitions, not simply word co-occurrence alone. The\ncountries in Table 6 are ranked by their GDP in 2005.2\nAs seen in Table 6, groups formed in Nuclear Arms Race are unlike the groups formed\nin other topics. These groups map well to the global political situation of that time when,\ndespite the end of the Cold War, there was mutual distrust between Russia and the US with\nregard to the continued manufacture of nuclear weapons. For missions to outer space and\nnuclear arms, India was a staunch ally of Russia, while Israel was an ally of the US.\n\n5 Conclusions\n\nWe introduce the Group-Topic model that jointly discovers latent groups in a network as\nwell as clusters of attributes (or topics) of events that in\ufb02uence the interaction between\nentities. The model extends prior work on latent group discovery by capturing not only\npair-wise relations between entities but also multiple attributes of the relations (in particu-\nlar, words describing the relations). In this way the GT model obtains more cohesive groups\nas well as salient topics that in\ufb02uence the interaction between groups. This paper demon-\nstrates that the Group-Topic model is able to discover topics capturing the group based\ninteractions between members of a legislative body. The model can be applied not just to\nvoting data, but any data having relations with attributes. We are now using the model to\nanalyze the citations in academic papers capturing the topics of research papers and dis-\ncovering research groups. The model can be altered suitably to consider other categorical,\nmulti-dimensional, and continuous attributes characterizing relations.\n\nAcknowledgments\n\nThis work was supported in part by the CIIR, the Central Intelligence Agency, the National\nSecurity Agency, the National Science Foundation under NSF grant #IIS-0326249, and by\nthe Defense Advanced Research Projects Agency, through the Department of the Interior,\nNBC, Acquisition Services Division, under contract #NBCHD030010. We would also like\nto thank Prof. Vincent Moscardelli, Chris Pal and Aron Culotta for helpful discussions.\n\nReferences\n\n[1] Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In\n\nThe 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2000.\n\n[2] Indrajit Bhattacharya and Lise Getoor. Deduplication and group detection using links. In The\n10th SIGKDD Conference Workshop on Link Analysis and Group Detection (LinkKDD), 2004.\n[3] Aleks Jakulin and Wray Buntine. Analyzing the US Senate in 2003: Similarities, networks,\n\nclusters and blocs, 2004. http://kt.ijs.si/aleks/Politics/us senate.pdf.\n\n[4] Charles Kemp, Thomas L. Grif\ufb01ths, and Joshua Tenenbaum. Discovering latent classes in rela-\n\ntional data. Technical report, AI Memo 2004-019, MIT CSAIL, 2004.\n\n[5] Jeremy Kubica, Andrew Moore, Jeff Schneider, and Yiming Yang. Stochastic link and group\n\ndetection. In The 17th National Conference on Arti\ufb01cial Intelligence (AAAI), 2002.\n\n[6] Andrew McCallum, Andres Corrada-Emanuel, and Xuerui Wang. Topic and role discovery in\n\nsocial networks. In The 19th International Joint Conference on Arti\ufb01cial Intelligence, 2005.\n\n[7] Krzysztof Nowicki and Tom A.B. Snijders. Estimation and prediction for stochastic blockstruc-\n\ntures. Journal of the American Statistical Association, 96(455):1077\u20131087, 2001.\n\n2http://en.wikipedia.org/wiki/List of countries by GDP %28PPP%29. In Table 6, we omit some\ncountries (represented by ...) in order to show other interesting but relatively low-ranked countries\n(for example, Russia) in the GDP list.\n\n\f", "award": [], "sourceid": 2820, "authors": [{"given_name": "Xuerui", "family_name": "Wang", "institution": null}, {"given_name": "Natasha", "family_name": "Mohanty", "institution": null}, {"given_name": "Andrew", "family_name": "McCallum", "institution": null}]}