{"title": "Meta Learning with Relational Information for Short Sequences", "book": "Advances in Neural Information Processing Systems", "page_first": 9904, "page_last": 9915, "abstract": "This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta Learning method for Short Sequences) for learning heterogeneous point process models from a collection of short event sequence data along with a relational network. Specifically, we propose a hierarchical Bayesian mixture Hawkes process model, which naturally incorporates the relational information among sequences into point process modeling. Compared with existing methods, our model can capture the underlying mixed-community patterns of the relational network, which simultaneously encourages knowledge sharing among sequences and facilitates adaptively learning for each individual sequence. We further propose an efficient stochastic variational meta-EM algorithm, which can scale to large problems. Numerical experiments on both synthetic and real data show that HARMLESS outperforms existing methods in terms of predicting the future events.", "full_text": "Meta Learning with Relational Information\n\nfor Short Sequences\n\nYujia Xie\n\nCollege of Computing, Georgia Tech\n\nXie.Yujia000@gmail.com\n\nHaoming Jiang\n\nCollege of Engineering, Georgia Tech\n\njianghm@gatech.edu\n\nFeng Liu\n\nFlorida Atlantic University\n\nFLIU2016@fau.edu\n\nTuo Zhao\n\nCollege of Engineering, Georgia Tech\n\ntuo.zhao@isye.gatech.edu\n\nInstitute for Data and Decision Analytics, the Chinese University of Hong Kong, Shenzhen\n\nShenzhen Institute of Arti\ufb01cial Intelligence and Robotics for Society\n\nHongyuan Zha0\n\nzhahy@cuhk.edu.cn\n\nAbstract\n\nThis paper proposes a new meta-learning method \u2013 named HARMLESS (HAwkes\nRelational Meta LEarning method for Short Sequences) for learning heterogeneous\npoint process models from short event sequence data along with a relational net-\nwork. Speci\ufb01cally, we propose a hierarchical Bayesian mixture Hawkes process\nmodel, which naturally incorporates the relational information among sequences\ninto point process modeling. Compared with existing methods, our model can\ncapture the underlying mixed-community patterns of the relational network, which\nsimultaneously encourages knowledge sharing among sequences and facilitates\nadaptive learning for each individual sequence. We further propose an ef\ufb01cient\nstochastic variational meta expectation maximization algorithm that can scale to\nlarge problems. Numerical experiments on both synthetic and real data show that\nHARMLESS outperforms existing methods in terms of predicting the future events.\n\nIntroduction\n\n1\nEvent sequence data naturally arises in analyzing the temporal behavior of real world subjects\n(Cleeremans and McClelland, 1991). These sequences often contain rich information, which can\npredict the future evolution of the subjects. For example, the timestamps of tweets of a twitter user\nre\ufb02ect his activeness and certain state of mind, and can be used to show when he will tweet next time\n(Kobayashi and Lambiotte, 2016). The job hopping history of a person usually suggests when he will\nhop next time (Xu et al., 2017b). Unlike usual sequential data such as text data, event sequences are\nalways asynchronous and tend to be noisy (Ross et al., 1996). Therefore specialized algorithms are\nneeded to learn from such data.\nIn this paper, we are interested in short sequences, a type of sequence data that commonly appears\nin many real-world applications. Such data is usually short for two possible reasons. One is that\nthe event sequences are short in nature, such as the job hopping history. Another is the observation\nwindow is narrow. For example, we are interested in the criminal incidents of an area after a speci\ufb01c\nregulation is published. Moreover, this kind of data usually appears as a collection of sequences,\nsuch as the timestamps of many user\u2019s tweets. Our goal is to extract information that can predict the\noccurrence of future events from a large collection of such short sequences.\n\n0Corresponding author. On leave from College of Computing, Georgia Institute of Technology\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fMany existing literature considers medium-length or long sequences. They \ufb01rst model a sequence as\na parametric point process, e.g., Poisson process, Hawkes process or their neural variants, and apply\nmaximum likelihood estimation to \ufb01nd the optimal parameters (Ogata, 1999; Rasmussen, 2013).\nHowever, for short sequences, their lengths are insuf\ufb01cient for reliable inference. One remedy is\nthat we treat the collection of short sequences as independent identically distributed realizations of\nthe same point process, since many subjects, e.g., Twitter users, often share similar behaviors. This\nmakes the inference manageable. However, the learned pattern can be highly biased against certain\nindividuals, especially the non-mainstream users, since this method ignores the heterogeneity within\nthe collection.\nAn alternative is to recast the problem as a multitask learning problem (Zhang and Yang, 2017) \u2013\nwe target at multi-sequence analysis for multi-subjects. For each sequence, we consider a point\n\nprocess model that slightly deviates from a common point process model, i.e., efj = f0 + fj, where\nf0 is the common model that captures the main effect, efj is the model for the j-th sequence, and fj\n\nis the relatively small deviation. Such an assumption that there exists a universal common model\ncross all subjects, however, is still strong, since the subjects\u2019 patterns can differ dramatically. For\nexample, the job hopping history of a software engineer and a human resource manager should\nhave distinct characteristics. Furthermore, such method ignores the relationship of the subjects that\nusually can be revealed by side information. For example, a social network often shows community\npattern (Girvan and Newman, 2002) \u2013 across the communities the variation of the subjects is large,\nwhile within the communities the variation is small. The connections in the social network, such as\n\"follow\" or retweet relationship in Twitter data, can provide us valuable information to identify such\ncommunity pattern, but the aforementioned methods do not take into account such understanding to\nhelp analyzing subjects\u2019 behavior.\nTo this end, we propose a HAwkes Relational Meta LEarning method for Short Sequence (HARM-\nLESS), which can adaptively learn from a collection of short sequence. More speci\ufb01cally, in a social\nnetwork, each user often has multiple identities (Airoldi et al., 2008). For example, a Twitter user can\nbe both a military fan and a tech fan. Both his tweet history and social connections are based on his\nidentities. Motivated by above facts, we model each sequence as a hierarchical Bayesian mixture of\nHawkes processes \u2013 the weights of each Hawkes process are determined jointly by the hidden pattern\nof sequences and the relational information, e.g., social graphs.\nWe then propose a variational meta expectation maximization algorithm to ef\ufb01ciently perform\ninference. Different from existing fully bayesian inference methods (Box and Tiao, 2011; Rasmussen,\n2013; Xu and Zha, 2017), we make no assumption on the prior distribution of the parameters of\nHawkes process. Instead, when inferring for the Hawkes process parameters of the same identity\nfor all the subjects, we perform a model-agnostic adaptation from a common model for this identity\n(Finn et al. (2017), see section 3 for more details). This is more \ufb02exible since it does not restrict to a\nspeci\ufb01c form. We apply HARMLESS to both synthetic and real short event sequences, and achieve\ncompetitive performance.\nNotations: Throughout the paper, the unbold letters denote vectors or scalars, while the bold letters\ndenote the corresponding matrices or sequences. We refer the k-th entry of vector ai as ai,k. We refer\nthe i-th subject as subject i.\n2 Preliminaries\nWe brie\ufb02y introduce Hawkes Process and Model-Agnostic Meta Learning.\nHawkes processes (Hawkes, 1971) is a doubly stochastic temporal point process H(\u2713) with condi-\ntional intensity function = (t; \u2713, \u2327 ) de\ufb01ned as\n\n(t; \u2713, \u2327 ) = \u00b5 + X\u2327 (j) K, some of the communities would split.\nBene\ufb01t of joint training. To validate the bene\ufb01t of\njoint training on graphs and sequences, we compare\nHARMLESS result with a two step procedure: We\n\ufb01rst train an MMB model and obtain the identities,\nand train HARMLESS (MAML) with \ufb01xed identities.\nIn Figure 3 we plot the obtained log-likelihood with\nrespect to K0.\nHARMLESS (MAML) consistently achieves larger\nlog-likelihood than the two step procedure. This sug-\ngests joint training of graphs and the sequences indeed improve the prediction of future events.\nLog-likelihood with respect to K0. We also include the results of the baselines and HARMLESS\n(FOMAML) in Figure 3. The performance of HARMLESS is consistently better than the baselines.\nBesides, we \ufb01nd the performance HARMLESS (Reptile) is very dependent on the dataset. For this\nsynthetic dataset, Reptile cannot perform well.\n5.2 Real Data\nWe adopt four real datasets.\n\nFigure 3: Plot of synthetic data. S = 1.\n\n7\n\n13610K05.005.055.105.155.205.25Log-LikelihoodMLE-SepMLE-ComMTLDMHPTwo StepHARMLESS(MAML)HARMLESS(FOMAML)\fTable 2: Log-likelihood of real datasets.\n\nLinkedIn\n\nN\\A\n\n911-Calls\n\n4.0030 \u00b1 0.3763\n4.5111 \u00b1 0.3192\n4.4812 \u00b1 0.3434\n4.4621 \u00b1 0.3173\n4.5208 \u00b1 0.3256\n4.6362 \u00b1 0.3241\n4.4929 \u00b1 0.3503\n\n0.8419 \u00b1 0.0251\n0.8768 \u00b1 0.0028\n0.8348 \u00b1 0.0030\n0.9270 \u00b1 0.0027\n1.4070 \u00b1 0.0105\n1.0129 \u00b1 0.004\n0.9540 \u00b1 0.0082\n\nMathOver\ufb02ow\n0.5043 \u00b1 0.0657\n1.7805 \u00b1 0.0345\n1.5394 \u00b1 0.0347\n1.7225 \u00b1 0.0336\n1.8563 \u00b1 0.0345\n1.8344 \u00b1 0.0348\n1.8663 \u00b1 0.0342\n\nDataset\nMLE-Sep\nMLE-Com\nDMHP\nMTL\nHARMLESS (MAML)\nHARMLESS (FOMAML)\nHARMLESS (Reptile)\n\nStackOver\ufb02ow\n0.2862 \u00b1 0.0177\n1.5594 \u00b1 0.0134\n1.4910 \u00b1 0.0089\n1.3886 \u00b1 0.0082\n1.5988 \u00b1 0.0083\n1.6017 \u00b1 0.0097\n911-Calls dataset: The 911-Calls dataset1 contains emergency phone call records of \ufb01re, traf\ufb01c and\nother emergencies for Montgomery County, PA. The county is divided into disjoint areas, each of\nwhich has a unique ZIP Code. For each area, the timestamps of emergency phone calls in this area are\nrecorded as an event sequence. We consider each area as a subject, and two subjects are connected if\nthey are adjoint. We \ufb01nally obtain 57 subjects and 81 connections among them. The average length\nof the sequences is 219.1.\nLinkedIn dataset: The LinkedIn dataset (Xu et al., 2017b) contains job hopping records of the users.\nFor each user, her/his check-in timestamps corresponding to different companies are recorded as an\nevent sequence. We consider each user as a subject, and two subjects are connected if the difference in\ntimestamps of two user joined the same company is less than 2 weeks. After removing the singleton\nsubjects, we have 1, 369 subjects and 12, 815 connections among them. The average length of the\nsequences is 4.9.\nMathOver\ufb02ow dataset: The MathOver\ufb02ow dataset (Paranjape et al., 2017) contains records of the\nusers posting and answering math questions. We adopt the records from May 2, 2014 to March 6,\n2016. For each user, her/his timestamps of answering questions are recorded as an event sequence.\nWe consider each user as a subject, and two subjects are connected if one user answers another user\u2019s\nquestion. After removing the singleton subjects, we have 1, 529 subjects and 6, 937 connections\namong them. The average length of the sequences is 11.8.\nStackOver\ufb02ow dataset: StackOver\ufb02ow is a question and answer site similar to MathOver\ufb02ow. We\nadopt the records from November 8, 2015 to December 1, 2015. We construct the sequences and\ngraphs in the same way as MathOver\ufb02ow. After removing the singleton subjects, we have 13, 434\nusers and 19, 507 connections among them. The average length of the sequences is 7.7.\nResult: The log-likelihood is summarized in Table 2. Note due to Markov chain Monte Carlo is\nneeded for DMHP, we cannot get reasonable result for large dataset, i.e., StackOver\ufb02ow. HARMLESS\nperforms consistently better than the baselines. Since the standard error of the results of 911-\nCalls dataset are large, we also performed a paired t test. The test shows the difference in log-\nlikelihood between MLE-Com, i.e., best of the baselines, and HARMLESS (FOMAML), i.e., best of\nHARMLESS series, is statistically signi\ufb01cant (with p value= 1.3 \u21e5 105).\n5.3 Ablation Study\nWe then perform ablation study using\nLinkedIn dataset. Three sets of ablation\nstudy are considered here:\nRemove inner heterogeneity: We model\neach community of sequences using the\n\nTable 3: Results of ablation study.\n\nMethod\nHARMLESS (MAML)\nHARMLESS (FOMAML)\nHARMLESS (Reptile)\nRemove inner heterogeneity (K = 3)\nRemove inner heterogeneity (K = 5)\nRemove grouping (MAML)\nRemove grouping (FOMAML)\nRemove grouping (Reptile)\nRemove graph (MAML)\nRemove graph (FOMAML)\nRemove graph (Reptile)\n\nLog-Likelihood\n1.4070 \u00b1 0.0105\n1.0129 \u00b1 0.0042\n0.9540 \u00b1 0.0082\n0.9405 \u00b1 0.0032\n0.9392 \u00b1 0.0032\n0.9432 \u00b1 0.0031\n0.9376 \u00b1 0.0031\n0.9455 \u00b1 0.0041\n0.9507 \u00b1 0.0032\n0.9446 \u00b1 0.0032\n0.9489 \u00b1 0.0072\n\nk = \u2713k.\n\nsame parameters, i.e., we sete\u2713(i)\n\nRemove grouping: We set K = 1, so\nthat the whole graph is one community.\nThis equivalent to apply the MAML-type\nalgorithms on the sequences directly.\nRemove graph: We do not consider the\ngraph information, i.e., we remove z!, z , Y and B from the panel in Figure 2.\nThe results in Table 3 suggest that MAML-type adaptation, graph information, and using multiple\nidentities all contribute to the good performance of HARMLESS.\n6 Discussions\nThe setting of meta learning. The goal of conventional settings of meta learning is to train a model\non a set of tasks, so that it can quickly adapt to a new task with only few training samples. Therefore,\n\n1Data is provided by montcoalert.org.\n\n8\n\n\fpeople divide the tasks into meta training set and meta test set, where each of the task contains a\ntraining set and a test set. The meta model is trained on the meta training set, aiming to minimize the\ntest errors, and validated on the meta test set (Vinyals et al., 2016; Santoro et al., 2016). This setting\nis designed for supervised learning or reinforcement learning tasks that has accuracy or reward as a\nclear evaluation metric. Extracting information from the event sequences, however, is essentially an\nunsupervised learning task. Therefore, we do not separate meta training set and meta test set. Instead,\nwe pull the collection of tasks together, and aim to extract shared information of the collection to\nhelp the training of models on individual tasks. Here, each short sequence is a task. We exploit the\nshared pattern of the collection of the sequences to obtain the models for individual sequences.\nCommunity Pattern. The target of Mixed Membership stochastic Blockmodels (MMB) is to\nidentify the communities in a social graph, e.g., the classes in a school. However, real social graphs\ncannot always be viewed as Erd\u02ddos-R\u00e9nyi (ER) graphs assumed by MMB. As argued in Karrer and\nNewman (2011), for real-world networks, MMB tends to assign nodes with similar degrees to same\ncommunities, which is different from the popular interpretation of the community pattern. This\nproperty, however, is actually very helpful in our case. As an example, Twitter users that are more\nactive tend to have similar behavior: They tend to make more connections and post tweets more\nfrequently. In contrast, users with very different node degrees often have the tweets histories of\ndifferent characteristics, and thus should be assigned to different identities. Such property of MMB\nallows the identities in HARMLESS to represent this non-traditional community patterns in non-ER\ngraphs, i.e., it assigns subjects with various activeness to different communities.\nMixture of Hawkes processes. Many existing works adopt mixture of Hawkes process to model\nsequences that are generated from complicated mechanisms (Yang and Zha, 2013; Li and Zha, 2013;\nXu and Zha, 2017). Those works are different from HARMLESS since they do not consider the\nhierarchical heterogeneity of the sequences, and do not consider the relational information.\nVariants of Hawkes process. Some attempts have been made to further enhance the \ufb02exibility of\nHawkes processes. For example, the time-dependent Hawkes process (TiDeH) in Kobayashi and\nLambiotte (2016) and the neural network-based Hawkes process (N-SM-MPP) in Mei and Eisner\n(2017) learn very \ufb02exible Hawkes processes with complicated intensity functions. Those models\nusually have more parameters than vanilla Hawkes processes. For longer sequences, HARMLESS can\nalso be naturally extended to TiDeHs or N-SM-MPP. However, this work focuses on short sequences.\nThese methods are not useful here, since they have too many degrees of freedom.\n\nAcknowledgement\nThis work is partially supported by the grant NSF IIS 1717916 and NSF CMMI 1745382. Part of\nthe work done by Hongyuan Zha is supported by Shenzhen Institute of Arti\ufb01cial Intelligence and\nRobotics for Society, and Shenzhen Research Institute of Big Data.\n\nReferences\nACHAB, M., BACRY, E., GA\u00cfFFAS, S., MASTROMATTEO, I. and MUZY, J.-F. (2017). Uncovering\ncausality from multivariate hawkes integrated cumulants. The Journal of Machine Learning\nResearch, 18 6998\u20137025.\n\nAIROLDI, E. M., BLEI, D. M., FIENBERG, S. E. and XING, E. P. (2008). Mixed membership\n\nstochastic blockmodels. Journal of machine learning research, 9 1981\u20132014.\n\nBACRY, E., DAYRI, K. and MUZY, J.-F. (2012). Non-parametric kernel estimation for symmetric\nhawkes processes. application to high frequency \ufb01nancial data. The European Physical Journal B,\n85 157.\n\nBAUWENS, L. and HAUTSCH, N. (2009). Modelling \ufb01nancial high frequency data using point\n\nprocesses. In Handbook of \ufb01nancial time series. Springer, 953\u2013979.\n\nBENGIO, Y., BENGIO, S. and CLOUTIER, J. (1990). Learning a synaptic learning rule. Universit\u00e9\n\nde Montr\u00e9al, D\u00e9partement d\u2019informatique et de recherche ?\n\nBLEI, D. M., KUCUKELBIR, A. and MCAULIFFE, J. D. (2017). Variational inference: A review for\n\nstatisticians. Journal of the American Statistical Association, 112 859\u2013877.\n\n9\n\n\fBLUNDELL, C., BECK, J. and HELLER, K. A. (2012). Modelling reciprocating relationships with\n\nhawkes processes. In Advances in Neural Information Processing Systems.\n\nBOX, G. E. and TIAO, G. C. (2011). Bayesian inference in statistical analysis, vol. 40. John Wiley\n\n& Sons.\n\nCHALMERS, D. J. (1991). The evolution of learning: An experiment in genetic connectionism. In\n\nConnectionist Models. Elsevier, 81\u201390.\n\nCLEEREMANS, A. and MCCLELLAND, J. L. (1991). Learning the structure of event sequences.\n\nJournal of Experimental Psychology: General, 120 235.\n\nEICHLER, M., DAHLHAUS, R. and DUECK, J. (2017). Graphical modeling for multivariate hawkes\n\nprocesses with nonparametric link functions. Journal of Time Series Analysis, 38 225\u2013242.\n\nFARAJTABAR, M., YANG, J., YE, X., XU, H., TRIVEDI, R., KHALIL, E., LI, S., SONG, L. and\nZHA, H. (2017). Fake news mitigation via point process based intervention. In Proceedings of the\n34th International Conference on Machine Learning-Volume 70. JMLR. org.\n\nFARAJTABAR, M., YE, X., HARATI, S., SONG, L. and ZHA, H. (2016). Multistage campaigning in\n\nsocial networks. In Advances in Neural Information Processing Systems.\n\nFINN, C., ABBEEL, P. and LEVINE, S. (2017). Model-agnostic meta-learning for fast adaptation of\ndeep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume\n70. JMLR. org.\n\nFINN, C., XU, K. and LEVINE, S. (2018). Probabilistic model-agnostic meta-learning. In Advances\n\nin Neural Information Processing Systems.\n\nFOX, E. W., SHORT, M. B., SCHOENBERG, F. P., CORONGES, K. D. and BERTOZZI, A. L. (2016).\nModeling e-mail networks and inferring leadership using self-exciting point processes. Journal of\nthe American Statistical Association, 111 564\u2013584.\n\nGIRVAN, M. and NEWMAN, M. E. (2002). Community structure in social and biological networks.\n\nProceedings of the national academy of sciences, 99 7821\u20137826.\n\nGRANT, E., FINN, C., LEVINE, S., DARRELL, T. and GRIFFITHS, T. (2018). Recasting gradient-\n\nbased meta-learning as hierarchical bayes. arXiv preprint arXiv:1801.08930.\n\nHANSEN, N. R., REYNAUD-BOURET, P., RIVOIRARD, V. ET AL. (2015). Lasso and probabilistic\n\ninequalities for multivariate point processes. Bernoulli, 21 83\u2013143.\n\nHAWKES, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes.\n\nBiometrika, 58 83\u201390.\n\nHOFFMAN, M. D., BLEI, D. M., WANG, C. and PAISLEY, J. (2013). Stochastic variational inference.\n\nThe Journal of Machine Learning Research, 14 1303\u20131347.\n\nKARRER, B. and NEWMAN, M. E. (2011). Stochastic blockmodels and community structure in\n\nnetworks. Physical review E, 83 016107.\n\nKOBAYASHI, R. and LAMBIOTTE, R. (2016). Tideh: Time-dependent hawkes process for predicting\n\nretweet dynamics. In Tenth International AAAI Conference on Web and Social Media.\n\nKOCH, G., ZEMEL, R. and SALAKHUTDINOV, R. (2015). Siamese neural networks for one-shot\n\nimage recognition. In ICML deep learning workshop, vol. 2.\n\nLAUB, P. J., TAIMRE, T. and POLLETT, P. K. (2015). Hawkes processes. arXiv preprint\n\narXiv:1507.02822.\n\nLI, L. and ZHA, H. (2013). Dyadic event attribution in social networks with mixtures of hawkes\nprocesses. In Proceedings of the 22nd ACM international conference on Information & Knowledge\nManagement. ACM.\n\n10\n\n\fLINDERMAN, S. and ADAMS, R. (2014). Discovering latent network structure in point process data.\n\nIn International Conference on Machine Learning.\n\nLUO, D., XU, H., ZHEN, Y., NING, X., ZHA, H., YANG, X. and ZHANG, W. (2015). Multi-task\nmulti-dimensional hawkes processes for modeling event sequences. In Twenty-Fourth International\nJoint Conference on Arti\ufb01cial Intelligence.\n\nMACLAURIN, D., DUVENAUD, D. and ADAMS, R. (2015). Gradient-based hyperparameter opti-\n\nmization through reversible learning. In International Conference on Machine Learning.\n\nMEI, H. and EISNER, J. M. (2017). The neural hawkes process: A neurally self-modulating\n\nmultivariate point process. In Advances in Neural Information Processing Systems.\n\nMUNKHDALAI, T. and YU, H. (2017). Meta networks. In Proceedings of the 34th International\n\nConference on Machine Learning-Volume 70. JMLR. org.\n\nNICHOL, A., ACHIAM, J. and SCHULMAN, J. (2018). On \ufb01rst-order meta-learning algorithms.\n\narXiv preprint arXiv:1803.02999.\n\nNICHOL, A. and SCHULMAN, J. (2018). Reptile: a scalable metalearning algorithm. arXiv preprint\n\narXiv:1803.02999.\n\nOGATA, Y. (1999). Seismicity analysis through point-process modeling: A review. In Seismicity\n\npatterns, their statistical signi\ufb01cance and physical meaning. Springer, 471\u2013507.\n\nPARANJAPE, A., BENSON, A. R. and LESKOVEC, J. (2017). Motifs in temporal networks. In\nProceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM.\nRASMUSSEN, J. G. (2013). Bayesian inference for hawkes processes. Methodology and Computing\n\nin Applied Probability, 15 623\u2013642.\n\nRAVI, S. and BEATSON, A. (2018). Amortized bayesian meta-learning.\nRAVI, S. and LAROCHELLE, H. (2016). Optimization as a model for few-shot learning.\nREYNAUD-BOURET, P., SCHBATH, S. ET AL. (2010). Adaptive estimation for hawkes processes;\n\napplication to genome analysis. The Annals of Statistics, 38 2781\u20132822.\n\nROSS, S. M., KELLY, J. J., SULLIVAN, R. J., PERRY, W. J., MERCER, D., DAVIS, R. M.,\nWASHBURN, T. D., SAGER, E. V., BOYCE, J. B. and BRISTOW, V. L. (1996). Stochastic\nprocesses, vol. 2. Wiley New York.\n\nSANTORO, A., BARTUNOV, S., BOTVINICK, M., WIERSTRA, D. and LILLICRAP, T. (2016).\nMeta-learning with memory-augmented neural networks. In International conference on machine\nlearning.\n\nSNELL, J., SWERSKY, K. and ZEMEL, R. (2017). Prototypical networks for few-shot learning. In\n\nAdvances in Neural Information Processing Systems.\n\nSUNG, F., YANG, Y., ZHANG, L., XIANG, T., TORR, P. H. and HOSPEDALES, T. M. (2018).\nIn Proceedings of the IEEE\n\nLearning to compare: Relation network for few-shot learning.\nConference on Computer Vision and Pattern Recognition.\n\nTRAN, L., FARAJTABAR, M., SONG, L. and ZHA, H. (2015). Netcodec: Community detection from\nindividual activities. In Proceedings of the 2015 SIAM International Conference on Data Mining.\nSIAM.\n\nTRIVEDI, R., FARAJTABAR, M., BISWAL, P. and ZHA, H. (2018). Dyrep: Learning representations\n\nover dynamic graphs.\n\nVINYALS, O., BLUNDELL, C., LILLICRAP, T., WIERSTRA, D. ET AL. (2016). Matching networks\n\nfor one shot learning. In Advances in neural information processing systems.\n\nXIE, J., KELLEY, S. and SZYMANSKI, B. K. (2013). Overlapping community detection in networks:\n\nThe state-of-the-art and comparative study. Acm computing surveys (csur), 45 43.\n\n11\n\n\fXU, H., LUO, D., CHEN, X. and CARIN, L. (2017a). Bene\ufb01ts from superposed hawkes processes.\n\narXiv preprint arXiv:1710.05115.\n\nXU, H., LUO, D. and ZHA, H. (2017b). Learning hawkes processes from short doubly-censored event\nsequences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70.\nJMLR. org.\n\nXU, H. and ZHA, H. (2017). A dirichlet mixture model of hawkes processes for event sequence\n\nclustering. In Advances in Neural Information Processing Systems.\n\nYANG, S.-H. and ZHA, H. (2013). Mixture of mutually exciting processes for viral diffusion. In\n\nInternational Conference on Machine Learning.\n\nZAREZADE, A., KHODADADI, A., FARAJTABAR, M., RABIEE, H. R. and ZHA, H. (2017). Corre-\nlated cascades: Compete or cooperate. In Thirty-First AAAI Conference on Arti\ufb01cial Intelligence.\nZHANG, Y. and YANG, Q. (2017). A survey on multi-task learning. arXiv preprint arXiv:1707.08114.\nZHAO, Q., ERDOGDU, M. A., HE, H. Y., RAJARAMAN, A. and LESKOVEC, J. (2015). Seismic: A\nself-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM\nSIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.\n\nZHOU, K., ZHA, H. and SONG, L. (2013). Learning social infectivity in sparse low-rank networks\n\nusing multi-dimensional hawkes processes. In Arti\ufb01cial Intelligence and Statistics.\n\n12\n\n\f", "award": [], "sourceid": 5251, "authors": [{"given_name": "Yujia", "family_name": "Xie", "institution": "Georgia Institute of Technology"}, {"given_name": "Haoming", "family_name": "Jiang", "institution": "Georgia Institute of Technology"}, {"given_name": "Feng", "family_name": "Liu", "institution": "Florida Atlantic University"}, {"given_name": "Tuo", "family_name": "Zhao", "institution": "Georgia Tech"}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": "Georgia Tech"}]}