{"title": "A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 1354, "page_last": 1363, "abstract": "How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective model-based clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes --- Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet process as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm. Moreover, we investigate the sample complexity and the computational complexity of our learning algorithm in depth. Experiments on both synthetic and real-world data show that the clustering method based on our model can learn structural triggering patterns hidden in asynchronous event sequences robustly and achieve superior performance on clustering purity and consistency compared to existing methods.", "full_text": "A Dirichlet Mixture Model of Hawkes Processes for\n\nEvent Sequence Clustering\n\nHongteng Xu\u2217\nSchool of ECE\n\nGeorgia Institute of Technology\nhongtengxu313@gmail.com\n\nHongyuan Zha\n\nCollege of Computing\n\nGeorgia Institute of Technology\n\nzha@cc.gatech.edu\n\nAbstract\n\nHow to cluster event sequences generated via different point processes is an inter-\nesting and important problem in statistical machine learning. To solve this problem,\nwe propose and discuss an effective model-based clustering method based on a\nnovel Dirichlet mixture model of a special but signi\ufb01cant type of point processes \u2014\nHawkes process. The proposed model generates the event sequences with different\nclusters from the Hawkes processes with different parameters, and uses a Dirichlet\ndistribution as the prior distribution of the clusters. We prove the identi\ufb01ability\nof our mixture model and propose an effective variational Bayesian inference\nalgorithm to learn our model. An adaptive inner iteration allocation strategy is\ndesigned to accelerate the convergence of our algorithm. Moreover, we investigate\nthe sample complexity and the computational complexity of our learning algorithm\nin depth. Experiments on both synthetic and real-world data show that the clus-\ntering method based on our model can learn structural triggering patterns hidden\nin asynchronous event sequences robustly and achieve superior performance on\nclustering purity and consistency compared to existing methods.\n\n1\n\nIntroduction\n\nIn many practical situations, we need to deal with a huge amount of irregular and asynchronous\nsequential data. Typical examples include the viewing records of users in an IPTV system, the\nelectronic health records of patients in hospitals, among many others. All of these data are so-called\nevent sequences, each of which contains a series of events with different types in the continuous time\ndomain, e.g., when and which TV program a user watched, when and which care unit a patient is\ntransferred to. Given a set of event sequences, an important task is learning their clustering structure\nrobustly. Event sequence clustering is meaningful for many practical applications. Take the previous\ntwo examples: clustering IPTV users according to their viewing records is bene\ufb01cial to the program\nrecommendation system and the ads serving system; clustering patients according to their health\nrecords helps hospitals to optimize their medication resources.\nEvent sequence clustering is very challenging. Existing work mainly focuses on clustering syn-\nchronous (or aggregated) time series with discrete time-lagged observations [19, 23, 39]. Event\nsequences, on the contrary, are in the continuous time domain, so it is dif\ufb01cult to \ufb01nd a universal and\ntractable representation for them. A potential solution is constructing features of event sequences via\nparametric [22] or nonparametric [18] methods. However, these feature-based methods have a high\nrisk of over\ufb01tting because of the large number of parameters. What is worse, these methods actually\ndecompose the clustering problem into two phases: extracting features and learning clusters. As a\nresult, their clustering results are very sensitive to the quality of learned (or prede\ufb01ned) features.\n\n\u2217Corresponding author.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fTo make concrete progress, we propose a Dirichlet Mixture model of Hawkes Processes (DMHP\nfor short) and study its performance on event sequence clustering in depth. In this model, the event\nsequences belonging to different clusters are modeled via different Hawkes processes. The priors\nof the Hawkes processes\u2019 parameters are designed based on their physically-meaningful constraints.\nThe prior of the clusters is generated via a Dirichlet distribution. We propose a variational Bayesian\ninference algorithm to learn the DMHP model in a nested Expectation-Maximization (EM) framework.\nIn particular, we introduce a novel inner iteration allocation strategy into the algorithm with the help\nof open-loop control theory, which improves the convergence of the algorithm. We prove the local\nidenti\ufb01ability of our model and show that our learning algorithm has better sample complexity and\ncomputational complexity than its competitors.\nThe contributions of our work include: 1) We propose a novel Dirichlet mixture model of Hawkes pro-\ncesses and demonstrate its local identi\ufb01ability. To our knowledge, it is the \ufb01rst systematical research\non the identi\ufb01ability problem in the task of event sequence clustering. 2) We apply an adaptive inner\niteration allocation strategy based on open-loop control theory to our learning algorithm and show\nits superiority to other strategies. The proposed strategy achieves a trade-off between convergence\nperformance and computational complexity. 3) We propose a DMHP-based clustering method. It\nrequires few parameters and is robust to the problems of over\ufb01tting and model misspeci\ufb01cation,\nwhich achieves encouraging clustering results.\n\n2 Related Work\n\nt ]/dt, where HC\n\nA temporal point process [4] is a random process whose realization consists of an event sequence\n{(ti, ci)}M\ni=1 with time stamps ti \u2208 [0, T ] and event types ci \u2208 C = {1, ..., C}. It can be equivalently\nrepresented as C counting processes {Nc(t)}C\nc=1, where Nc(t) is the number of type-c events\noccurring at or before time t. A way to characterize point processes is via the intensity function\n\u03bbc(t) = E[dNc(t)|HC\nt = {(ti, ci)|ti < t, ci \u2208 C} collects historical events of all\ntypes before time t. It is the expected instantaneous rate of happening type-c events given the history,\nwhich captures the phenomena of interests, i.e., self-triggering [13] or self-correcting [44].\nHawkes Processes. A Hawkes process [13] is a kind of point processes modeling complicated event\nsequences in which historical events have in\ufb02uences on current and future ones. It can also be viewed\nas a cascade of non-homogeneous Poisson processes [8, 34]. We focus on the clustering problem of\nthe event sequences obeying Hawkes processes because Hawkes processes have been proven to be\nuseful for describing real-world data in many applications, e.g., \ufb01nancial analysis [1], social network\nanalysis [3, 51], system analysis [22], and e-health [30, 42]. Hawkes processes have a particular form\nof intensity:\n\n(cid:88)C\n\n(cid:90) t\n\n\u03c6cc(cid:48)(s)dNc(cid:48)(t \u2212 s),\n\n(1)\n\nc(cid:48)=1\n\n0\n\nc(cid:48)=1\n\n\u03bbc(t) = \u00b5c +\n\nwhere \u00b5c is the exogenous base intensity independent of the history while(cid:80)C\n\n(cid:82) t\n0 \u03c6cc(cid:48)(s)dNc(cid:48)(t\u2212s)\nthe endogenous intensity capturing the peer in\ufb02uence. The decay in the in\ufb02uence of historical type-c(cid:48)\nevents on the subsequent type-c events is captured via the so-called impact function \u03c6cc(cid:48)(t), which is\nnonnegative. A lot of existing work uses prede\ufb01ned impact functions with known parameters, e.g.,\nthe exponential functions in [29, 50] and the power-law functions in [49]. To enhance the \ufb02exibility, a\nnonparametric model of 1-D Hawkes process was \ufb01rst proposed in [16] based on ordinary differential\nequation (ODE) and extended to multi-dimensional case in [22, 51]. Another nonparametric model\nis the contrast function-based model in [30], which leads to a Least-Squares (LS) problem [7].\nA Bayesian nonparametric model combining Hawkes processes with in\ufb01nite relational model is\nproposed in [3]. Recently, the basis representation of impact functions was used in [6,15,41] to avoid\ndiscretization.\nSequential Data Clustering and Mixture Models. Traditional methods mainly focus on clustering\nsynchronous (or aggregated) time series with discrete time-lagged variables [19, 23, 39]. These\nmethods rely on probabilistic mixture models [46], extracting features from sequential data and\nthen learning clusters via a Gaussian mixture model (GMM) [25, 28]. Recently, a mixture model\nof Markov chains is proposed in [21], which learns potential clusters from aggregate data. For\nasynchronous event sequences, most of the existing clustering methods can be categorized into feature-\nbased methods, clustering event sequences from learned or prede\ufb01ned features. Typical examples\n\n2\n\n\f(cid:88)\n\n\u03bbk\nc (t) = \u00b5k\n\nc +\nc ] \u2208 RC\n\n(cid:88)D\n\n(cid:88)\ncc(cid:48)(t) via basis functions as(cid:80)\n\nccidgd(t \u2212 ti),\nak\n\nti