{"title": "Joint Analysis of Time-Evolving Binary Matrices and Associated Documents", "book": "Advances in Neural Information Processing Systems", "page_first": 2370, "page_last": 2378, "abstract": "We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives.", "full_text": "Joint Analysis of Time-Evolving Binary Matrices\n\nand Associated Documents\n\n1Eric Wang, 1Dehong Liu, 1Jorge Silva, 2David Dunson and 1Lawrence Carin\n\n1Electrical and Computer Engineering Department, Duke University\n\n{eric.wang,dehong.liu,jg.silva,lawrence.carin}@duke.edu\n\n2Statistics Department, Duke University\n\ndunson@stat.duke.edu\n\nAbstract\n\nWe consider problems for which one has incomplete binary matrices that evolve\nwith time (e.g., the votes of legislators on particular legislation, with each year\ncharacterized by a different such matrix). An objective of such analysis is to infer\nstructure and inter-relationships underlying the matrices, here de\ufb01ned by latent\nfeatures associated with each axis of the matrix. In addition, it is assumed that\ndocuments are available for the entities associated with at least one of the ma-\ntrix axes. By jointly analyzing the matrices and documents, one may be used\nto inform the other within the analysis, and the model offers the opportunity to\npredict matrix values (e.g., votes) based only on an associated document (e.g.,\nlegislation). The research presented here merges two areas of machine-learning\nthat have previously been investigated separately: incomplete-matrix analysis and\ntopic modeling. The analysis is performed from a Bayesian perspective, with ef-\n\ufb01cient inference constituted via Gibbs sampling. The framework is demonstrated\nby considering all voting data and available documents (legislation) during the\n220-year lifetime of the United States Senate and House of Representatives.\n\nIntroduction\n\n1\nThere has been signi\ufb01cant recent research on the analysis of incomplete matrices [10, 15, 1, 12,\n13, 18]. Most analyses have been performed under the assumption that the matrix is real. There\nare interesting problems for which the matrices may be binary; for example, re\ufb02ecting the pres-\nence/absence of links on nodes of a graph, or for analysis of data associated with a series of binary\nquestions. One may connect an underlying real matrix to binary (or, more generally, integer) obser-\nvations via a probit or logistic link function; for example, such analysis has been performed in the\ncontext of analyzing legislative roll-call data [6]. A problem that has received less attention concerns\nthe analysis of time-evolving matrices. The speci\ufb01c motivation of this paper involves binary ques-\ntions in a legislative setting; we are interested in analyzing such data over many legislative sessions,\nand since the legislators change over time, it is undesirable to treat the entire set of votes as a single\nmatrix. Each piece of legislation (question) is unique, but it is desirable to infer inter-relationships\nand commonalities over time. Similar latent groupings and relationships exist for the legislators.\nThis general setting is also of interest for analysis of more-general social networks [8].\nA distinct line of research has focused on analysis of documents, with topic modeling constituting a\npopular framework [4, 2, 17, 3, 11]. Although the analysis of matrices and documents has heretofore\nbeen performed independently, there are many problems for which documents and matrices may be\ncoupled. For example, in addition to a matrix of links between websites or email sender/recipient\ndata, one also has access to the associated documents (website and email content). By analyzing the\nmatrices and documents simultaneously, one may infer inter-relationships about each. For example,\nin a factor-based model of matrices [8], the associated documents may be used to relate matrix\nfactors to topics/words, providing insight from the documents about the matrix, and vice versa.\n\n1\n\n\fTo the authors\u2019 knowledge, this paper represents the \ufb01rst joint analysis of time-evolving matrices and\nassociated documents. The analysis is performed using nonparametric Bayesian tools; for example,\nthe truncated Dirichlet process [7] is used to jointly cluster latent topics and matrix features. The\nframework is demonstrated through analysis of large-scale data sets. Speci\ufb01cally, we consider binary\nvote matrices from the United States Senate and House of Representatives, from the \ufb01rst congress\nin 1789 to the present. Documents of the legislation are available for the most recent 20 years, and\nthose are also analyzed jointly with the matrix data. The quantitative predictive performance of this\nframework is demonstrated, as is the power of this setting for making qualitative assessments of\nlarge-scale and complex joint matrix-document data.\n2 Modeling Framework\n2.1 Time-evolving binary matrices\nAssume we are given a set of binary matrices, {Bt}t=1,\u03c4 , with Bt \u2208 {0, 1}N (t)\nx . The number\nof rows and columns, respectively N (t)\nx , may vary with time. For example, for the leg-\ny\nislative roll-call data consider below, time index t corresponds to year and the number of pieces of\nlegislation and legislators changes with time (e.g., for the historical data considered for the United\nStates congress, the number of states and hence legislators changes as the country has grown).\nUsing a modeling framework analogous to that in [6], the binary matrix has a probit-model gener-\native process, with Bt(i, j) = 1 if Xt(i, j) > 0, and Bt(i, j) = 0 otherwise, and the latent real\nmatrix is de\ufb01ned as\n\nand N (t)\n\ny \u00d7N (t)\n\nXt(i, j) =< y(t)\ni\n\n, x(t)\n\nj > +\u03b2(t)\n\ni + \u03b1(t)\n\nj + \u0001(t)\n\ni,j\n\n\u03b2 ) and \u03b1(t)\n\nj \u223c N (0, \u03bb\u22121\n\n) and legislation (de\ufb01ned by x(t)\n\n(1)\nwhere < \u00b7,\u00b7 > denotes a vector inner product, and \u0001(t)\ni,j \u223c N (0, 1). The random effects are drawn\n\u03b1 ), with \u03bb\u03b1 \u223c \u00b5\u03b1\u03b4\u221e + (1 \u2212 \u00b5\u03b1)Gamma(a, b) and \u03bb\u03b2 \u223c\ni \u223c N (0, \u03bb\u22121\n\u03b2(t)\n\u00b5\u03b2\u03b4\u221e + (1 \u2212 \u00b5\u03b2)Gamma(a, b); \u03b4\u221e is a point measure at in\ufb01nity, corresponding to there not being\nan associated random effect. The probability of whether there is a random effect is controlled by \u00b5\u03b2\nand \u00b5\u03b1, each of which is drawn from a beta distribution.\nRandom effect \u03b1j is motivated by our example application, for which the index j denotes a speci\ufb01c\npiece of legislation that is voted upon; this parameter re\ufb02ects the \u201cdif\ufb01culty\u201d of the vote, and if |\u03b1j|\nis large, then all people are likely to vote one way or the other (an \u201ceasy\u201d vote), while if \u03b1(t)\nis small\nj\nthe details of the legislator (de\ufb01ned by y(t)\nj ) strongly impact the\nvote. In previous political science Bayesian analysis [6] researchers have simply set \u00b5\u03b2 = 1 and\n\u00b5\u03b1 = 0, but here we consider the model in a more-general setting, and infer these relationships.\nAdditionally, in previous Bayesian analysis [6] the dimensionality of y(t)\nhas been set\n(usually to one or two).\nIn related probabilistic matrix factorization (PMF) applied to real ma-\ntrices [15, 12], priors/regularizers are employed to constrain the dimensionality of the latent fea-\ntures. Here we employ the sparse binary vector b \u2208 {0, 1}K, with bk \u223c Bernoulli(\u03c0k), and\n\u03c0k \u223c Beta(c/K, d(K \u2212 1)/K), for K set to a large integer. By setting c and d appropriately,\nthis favors that most of the components of b are zero (imposes sparseness). Speci\ufb01cally, by inte-\ngrating out the {\u03c0k}k=1,K, one may readily show that the number of non-zero components in b is a\nrandom variable drawn from Binomial(K, c/(c + d(K \u2212 1))), and the expected number of ones in\nb is cK/[c + d(K \u2212 1)]. This is related to a draw from a truncated beta-Bernoulli process [16].\nWe consider two types of matrix axes. Speci\ufb01cally, we assume that each row corresponds to a\nperson/entity that may be present for matrix t + 1 and matrix t. It is assumed here that each column\ncorresponds to a question (in the examples, a piece of legislation), and each question is unique. Since\nx IK), \u03b3x \u223c Gamma(e, f ),\nthe columns are each unique, we assume x(t)\nwhere \u25e6 denotes the pointwise/Hadamard vector product. If the person/entity associated with the\nith row at time t is introduced for the \ufb01rst time, its associated feature vector is similarly drawn\ni = b \u25e6 \u02c6y(t)\ny(t)\nis\nalready drawn (person/entity i is active prior to time t + 1), then a simple auto-regressive model is\n, \u03be\u22121IK), with \u03be \u223c Gamma(g, h). The\nused to draw y(t+1)\nprior on \u03be is set to favor small/smooth changes in the features of an individual on consecutive years.\nThis model constitutes a relatively direct extension of existing techniques for real matrices [15, 12].\nSpeci\ufb01cally, we have introduced a probit link function and a simple auto-regression construction to\n\ny IK), with \u03b3y \u223c Gamma(e, f ). However, assuming y(t)\n\ni \u223c N (0, \u03b3\u22121\n\u02c6y(t)\n: y(t+1)\n\nj \u223c N (0, \u03b3\u22121\n\nj = b\u25e6 \u02c6x(t)\n\n= b \u25e6 \u02c6y(t+1)\n\nand x(t)\nj\n\n\u223c N ( \u02c6y(t)\n\ni\n\n, \u02c6y(t+1)\n\ni\n\nj , \u02c6x(t)\n\n,\n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\n2\n\n\fimpose statistical correlation in the traits of a person/entity at consecutive times. The introduction\nof the random effects \u03b1j and \u03b2i has also not been considered within much of the machine-learning\nmatrix-analysis literature, but the use of \u03b1j is standard in political science Bayesian models [6]. The\nprincipal modeling contribution of this paper concerns how one may integrate such a time-evolving\nbinary-matrix model with associated documents.\n2.2 Topic model\nThe manner in which the topic modeling is performed is a generalization of latent Dirichlet allo-\ncation (LDA) [4]. Assume that the documents of interest have words drawn from a vocabulary\nV = {w1, . . . , wV }. The kth topic is characterized by a distribution pk on words (\u201cbag-of-words\u201d\nassumption), where pk \u223c Dir(\u03b1V /V, . . . , \u03b1V /V ). The generative model draws {pk}k=1,T once\nfor each of the T possible topics.\nis characterized by a probability distribution on topics, where the cl \u223c\nEach document\nDir(\u03b1T /T, . . . , \u03b1T /T ) corresponds to the distribution across T topics for document l. The gen-\nerative process for drawing words for document l is to \ufb01rst (and once) draw cl for document l. For\nword i in document l, we draw a topic zil \u223c Mult(cl), and then the speci\ufb01c word is drawn from a\nmultinomial with probability vector pzil.\nThe above procedure is like the standard LDA [4], with the difference manifested in how we handle\nthe Dirichlet distributions Dir(\u03b1V /V, . . . , \u03b1V /V ) and Dir(\u03b1T /T, . . . , \u03b1T /T ). The Dirichlet dis-\ntribution draws are constituted via Sethuraman\u2019s construction [14]; this allows us to place gamma\npriors on \u03b1V and \u03b1T , while retaining conjugacy, permitting analytic Gibbs\u2019 sampling (we there-\nfore get a full posterior distribution for all model parameters, while most LDA implementations\nemploy a point estimate for the document-dependent probabilities of topics). Speci\ufb01cally, the fol-\nlowing hierarchical construction is used for draws from Dir(\u03b1V /V, . . . , \u03b1V /V ) (and similarly for\nDir(\u03b1T /T, . . . , \u03b1T /T )):\n\n(cid:89)\n\n(1 \u2212 Un) , Uh \u223c Beta(1, \u03b1V ) , \u03b8h \u223c V(cid:88)\n\n1\nV\n\n\u03b4w\n\n(2)\n\n\u221e(cid:88)\n\npk =\n\nah\u03b4\u03b8h , ah = Uh\n\nh=1\n\nn, and via the probit link\nfunction the probability of a \u201cyes\u201d vote is quanti\ufb01ed, for Senator i on new legislation LN . This is\nthe model in (1), with \u03b2(t)\nj = 0. Based upon Figure 4 (last plot), the approximation\n\u03b2(t)\ni = 0 is reasonable. The legislation-dependent random effect \u03b1(t)\nis expected to be important\n\nfrom 2007, we may readily compute < y(t)\n\ni = 0 and \u03b1(t)\n\ni\n\n, \u00b5\u2217\n\ni\n\nj\n\n6\n\n-4-2024-6-4-20246Latent Dimension 1Latent Dimension 2101st Congress-4-2024-6-4-20246Latent Dimension 1Latent Dimension 2110th Congress510152000.10.20.3TopicCluster 1510152000.050.10.150.2TopicCluster 2510152000.050.10.150.2TopicCluster 3510152000.10.20.3TopicCluster 4510152000.20.4TopicCluster 5Topic 1annualresearcheconomydoefoodsalemotorcropcountyemployeeTopic 2militarydefensethisproductexpenserestorepublicannualuniversalindependenceTopic 3fueltransportpublicresearchagricultureexportelectricalforestforeignwaterTopic 4militarydefensenavyairguardresearchclosurenavalndaabonusTopic 5publicresearchtransportannualchildrentrainexpenselawstudentorganizationTopic 6lawviolateimportgoalbureaucommerceregistrationreformrisklistTopic 7defensecivilianiraqtrainhealthcostforeignenvironmentairdependTopic 8penaltyexpensehealthdrugpropertycreditpublicworkmedicalorganizationTopic 9employeepubliccostdefensedomesticworkinspectbureautaxbuildTopic 10foreignlawterroristcriminalagriculturejusticeterrorengageeconomycrimeTopic 11taxbudgetannualdebtorbankruptcyforeigntaxpayercredit propertyproductTopic 12taxhealthdrugmedicaidcandidatecostchildrenaggregatelawmedicalTopic 13militarytransportationsafetyairdefensehealthguardannualforeignwasteTopic 14violencevictimdrugalienemployeevisayouthpenaltycriminalminorTopic 15taxhealthannualcostlawthismail\ufb01nancialliabilityloanTopic 16medicaretaxssaannualdeducthospitalparentbankruptcydebtormaleTopic 17loanenvironmenttrainpropertyscienceannuallaw transportationhigh\ufb01veTopic 18annualhealththispublicdefendeseaproductfcccarriercolumbiaTopic19immigrationjuvenile\ufb01rearmsentencealiencrimedhtrainconvictprisonTopic 20aliencivilparenthaimmigrantlaborcriminalfreetermpetition\fFigure 4: First four plots: Predicted probability of voting \u201cYes\u201d given only the legislation text for 2008, based\nupon the model learned using vote-legislation data from 1898\u20132007. The dots (colored by party af\ufb01liation)\nshow the empirical voting frequencies for all legislation in the cluster, from 2008 (not used in model). Only\nfour clusters are utilized during session 2008, out of \ufb01ve inferred by the model for the overall period 1989\u20132007.\nLast plot: Estimated log p(\u03b1) and log p(\u03b2). Note how p(\u03b2) is much more sharply peaked near zero.\n\nj ) or \u201cno\u201d (large negative \u03b1(t)\nfor legislation for which most senators vote \u201cyes\u201d (large positive \u03b1(t)\nj ).\nWhen testing the predictive quality of the model for the held-out year 2008, we assume \u03b1(t)\nj = 0\n(since this parameter cannot be inferred without modeling the text and votes jointly, while for 2008\nwe are only modeling the documents); we therefore only test the model on legislation from 2008\nfor which less than 90% of the senators agreed, such legislation assumed corresponding to small\n|\u03b1(t)\nj | (it is assumed that in practice it would be simple to determine whether a piece of legislation is\nlikely to be near-unanimous \u201cyes\u201d or \u201cno\u201d, and therefore model-based prediction of votes for such\nlegislation is deemed less interesting).\nIn Figure 4 we compare the predicted, probit-based probability of a given senator voting \u201cyes\u201d for\nlegislation within clusters 1-4 (see Figure 2); the points in Figure 4 represent the empirical data for\neach senator, and the curve represents the predictions of the probit link function. These results are\ndeemed to be remarkably good. In Figure 4, the senators along each horizontal axis are ordered\naccording to the probability of voting \u201cyes\u201d.\nOne interesting issue that arises in this prediction concerns clusters 1 and 4 in Figure 2, and the as-\nsociated predictions for the held-out year 2008, in Figure 4. Since the distributions of these clusters\nover topics is very similar, the documents alone cannot distinguish between clusters 1 and 4. How-\never, we also have the sponsor of each piece of legislation, and based upon the data from 1989-2007,\nif a piece of legislation from 2008 is mapped to either cluster 1 or 4, it is disambiguated based upon\nthe party af\ufb01liation of the sponsor (cluster 1 is a Republican viewpoint on these topics, while cluster\n4 is a Democratic viewpoint, based upon voting records from 1989-2007).\n3.3 Time evolution of congressman and legislation\nThe above joint analysis of text and votes was restricted to 1989-2008, since the documents (leg-\nislation) were only available for those years. However, the dataset contains votes on all legislation\nfrom 1789 to the present, and we now analyze the vote data from 1789-1988. Figure 5 shows\nsnapshots in time of the latent space for voters and legislation, for the House of Representatives\n(similar results have been computed for the Senate, and are omitted for brevity; as supplemental ma-\nterial, at http://sites.google.com/site/matrixtopics/ we present movies of how\nlegislation and congressman evolve across all times, for both the House and Senate). Five features\nwere inferred, with the two highest-variance features chosen for the axes. The blue symbols denote\nDemocratic legislators, or legislation sponsored by a Democrat, and the red points correspond to\nRepublicans. Results like these are of interest to political scientists, and allow examination of the\ndegree of partisanship over time, for example.\n\n7\n\n2040608010000.20.40.60.81Empirical vs. Predicted Voting Frequency for Cluster 1Senators (sorted by predicted probability)Probability of Voting Yes PredictionDemocratRepublican102 Senators26 Votes2040608010000.20.40.60.81Empirical vs. Predicted Voting Frequency for Cluster 2Senators (sorted by predicted probability)Probability of Voting Yes PredictionDemocratRepublican102 Senators43 Votes2040608010000.20.40.60.81Empirical vs. Predicted Voting Frequency for Cluster 3Senators (sorted by predicted probability)Probability of Voting Yes PredictionDemocratRepublican102 Senators46 Votes2040608010000.20.40.60.81Empirical vs. Predicted Voting Frequency for Cluster 4Senators (sorted by predicted probability)Probability of Voting Yes PredictionDemocratRepublican102 Senators29 Votes\u22124\u2212202410\u2212410\u2212310\u2212210\u22121100\u03b1\u03b2\u03b1, \u03b2Log-posterior probability\fFigure 5: Congressman (top) and legislation (bottom) in latent space for sessions 1\u201398 of the House of Rep-\nresentatives. The Democrat/Republican separation is usually sharper than for the Senate, and frequently only\nthe partisan information seems to matter. Note the gradual rotation of the red/blue blue axis. Best viewed\nelectronically, zoomed-in.\n3.4 Additional quantitative tests\nOne may ask how well this model addresses the more-classical problem of estimating the values of\nmatrix data that are missing uniformly at random, in the absence of documents. To examine this\nquestion, we considered binary Senate vote data from 1989-2008, and removed a fraction of the\nvotes uniformly at random, and then use the proposed time-evolving matrix model to process the\nobserved data, and to compute the probability of a \u201cyes\u201d vote on all missing data (via the probit link\nfunction). If the probability is larger than 0.5 the vote is set to \u201cyes\u201d, and otherwise it is set to \u201cno\u201d.\nWe compare our time-evolving model to [12], with the addition of a probit link function; for the\nlatter we processed all 20 years as one large matrix, rather than analyzing time-evolving structure.\nUp to 40% missingness, the proposed model and a modi\ufb01ed version of that in [12] performed almost\nidentically, with an average probability of error (on the binary vote) of approximately 0.1. For\ngreater than 40% missingness, the proposed time-evolving model manifested a \u201cphase transition\u201d,\nand the probability of error increased smoothly up to 0.3, as the fraction of missing data rose to\n80%; in contrast, the generalized model in [12] (with probit link) continued to yield a probability\nof error of about 0.1. The phase transition of the proposed model is likely manifested because the\nentire matrix is partitioned by year, with a linkage between years manifested via the Markov process\nbetween legislators (we don\u2019t analyze all data by one contiguous, large matrix). The phase transition\nis expected based on the theory in [5], when the fraction of missing data gets large enough (since\nthe size of the contiguous matrices analyzed by the time-evolving model is much smaller than that\nof the entire matrix, such a phase transition is expected with less missingness than via analysis of\nthe entire matrix at once).\nWhile the above results are of interest and deemed encouraging, such uniformly random missingness\non matrix data alone is not the motivation of the proposed model. Rather, traditional matrix-analysis\nmethods [10, 15, 1, 12, 13, 18] are incapable of predicting votes on new legislation based on the\nwords alone (as in Figure 4), and such models do not allow analysis of the time-evolving properties\nof elements of the matrix, as in Figure 5.\n4 Conclusions\nA new model has been developed for the joint analysis of time-evolving matrices and associated\ndocuments. To the authors\u2019 knowledge, this paper represents the \ufb01rst integration of research hereto-\nfore performed separately on topic models and on matrix analysis/completion. The model has been\nimplemented ef\ufb01ciently via Gibbs sampling. A unique set of results are presented using data from\nthe US Senate and House of Representatives, demonstrating the ability to predict the votes on new\nlegislation, based only on the associated documents. The legislation data was considered principally\nbecause it was readily available and interesting in its own right; however, the proposed framework\nis of interest for many other problems. For example, the model is applicable to analysis of time-\nevolving relationships between multiple entities, augmented by the presence of documents (e.g.,\nlinks between websites, and the associated document content).\nAcknowledgement\nThe research reported here was supported by the US Army Research Of\ufb01ce, under grant W911NF-\n08-1-0182, and the Of\ufb01ce of Naval Research under grant N00014-09-1-0212.\n\n8\n\n\u22126\u22124\u22122024\u221212\u221210\u22128\u22126\u22124\u221220246810Year: 1789\u22121790DemocratRepublicanOthers\u22126\u22124\u22122024\u221212\u221210\u22128\u22126\u22124\u221220246810Year: 1939\u22121940DemocratRepublicanOthers\u22126\u22124\u22122024\u221212\u221210\u22128\u22126\u22124\u221220246810Year: 1947\u22121948DemocratRepublicanOthers\u22126\u22124\u22122024\u221212\u221210\u22128\u22126\u22124\u221220246810Year: 1963\u22121964DemocratRepublicanOthers\u22126\u22124\u22122024\u221212\u221210\u22128\u22126\u22124\u221220246810Year: 1983\u22121984DemocratRepublicanOthers\u22126\u22124\u2212202468\u22126\u22124\u221220246Year: 1789\u22121790DemocratRepublicanOthers\u22126\u22124\u2212202468\u22126\u22124\u221220246Year: 1939\u22121940DemocratRepublicanOthers\u22126\u22124\u2212202468\u22126\u22124\u221220246Year: 1947\u22121948DemocratRepublicanOthers\u22126\u22124\u2212202468\u22126\u22124\u221220246Year: 1963\u22121964DemocratRepublicanOthers\u22126\u22124\u2212202468\u22126\u22124\u221220246Year: 1983\u22121984DemocratRepublicanOthers\fReferences\n[1] J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert. A new approach to collaborative \ufb01ltering:\n\noperator estimation with spectral regularization. J. Machine Learning Research, 2009.\n\n[2] D. M. Blei and J. D. Lafferty. Dynamic topic models. Proceedings of the 23rd International\n\nConference on Machine Learning, pages 113\u2013120, 2006.\n\n[3] D. M. Blei and J. D. Lafferty. A correlated topic model of science. The Annals of Applied\n\nStatistics, 1(1):17\u201335, 2007.\n\n[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine\n\nLearning Research, 3:993\u20131022, 2003.\n\n[5] E.J. Cand`es and T. Tao. The power of convex relaxation: Near-optimal matrix completion.\n\nIEEE Transactions on Information Theory, 56(5):2053\u20132080, 2010.\n\n[6] J. Cinton, S. Jackman, and D. Rivers. The statistical analysis of roll call data. Am. Political Sc.\n\nReview, 2004.\n\n[7] T. S. Ferguson. A bayesian analysis of some nonparametric problems. The Annals of Statistics,\n\n1(2):209\u2013230, 1973.\n\n[8] P. D. Hoff. Multiplicative latent factor models for description and prediction of social networks.\n\nComputational and Mathematical Organization Theory, 2009.\n\n[9] J. Ishwaran and L. James. Gibbs sampling methods for stick-breaking priors. Journal of the\n\nAmerican Statistical Association, 96:161174, 2001.\n\n[10] E. Meeds, Z. Ghahramani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent\n\nfactors. In Advances in NIPS, pages 977\u2013984, 2007.\n\n[11] I. Pruteanu-Malinici, L. Ren, J. Paisley, E. Wang, and L. Carin. Hierarchical bayesian modeling\n\nof topics in time-stamped documents. IEEE Trans. Pattern Analysis Mach. Intell., 2010.\n\n[12] R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization with mcmc.\n\nAdvances in NIPS, 2008.\n\nIn\n\n[13] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In Advances in NIPS, 2008.\n[14] J. Sethuraman. A constructive de\ufb01nition of dirichlet priors. Statistica Sinica, 4:639\u2013650, 1994.\n[15] N. Srebro, J.D.M. Rennie, and T.S. Jaakkola. Maximum-margin matrix factorization. In Ad-\n\nvances in NIPS, 2005.\n\n[16] R. Thibaux and M.I. Jordan. Hierarchical beta processes and the indian buffet process.\n\nInternational Conference on Arti\ufb01cial Intelligence and Statistics, 2007.\n\nIn\n\n[17] H. M. Wallach. Topic modeling: beyond bag of words. Proceedings of the 23rd International\n\nConference on Machine Learning, 2006.\n\n[18] K. Yu, J. Lafferty, S. Zhu, and Y. Gong. Large-scale collaborative prediction using a nonpara-\n\nmetric random effects model. In Proc. Int. Conf. Machine Learning, 2009.\n\n9\n\n\f", "award": [], "sourceid": 417, "authors": [{"given_name": "Eric", "family_name": "Wang", "institution": null}, {"given_name": "Dehong", "family_name": "Liu", "institution": null}, {"given_name": "Jorge", "family_name": "Silva", "institution": null}, {"given_name": "Lawrence", "family_name": "Carin", "institution": null}, {"given_name": "David", "family_name": "Dunson", "institution": null}]}