{"title": "Time-series Generative Adversarial Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 5508, "page_last": 5518, "abstract": "A good generative model for time-series data should preserve temporal dynamics, in the sense that new sequences respect the original relationships between variables across time. Existing methods that bring generative adversarial networks (GANs) into the sequential setting do not adequately attend to the temporal correlations unique to time-series data. At the same time, supervised models for sequence prediction - which allow finer control over network dynamics - are inherently deterministic. We propose a novel framework for generating realistic time-series data that combines the flexibility of the unsupervised paradigm with the control afforded by supervised training. Through a learned embedding space jointly optimized with both supervised and adversarial objectives, we encourage the network to adhere to the dynamics of the training data during sampling. Empirically, we evaluate the ability of our method to generate realistic samples using a variety of real and synthetic time-series datasets. Qualitatively and quantitatively, we find that the proposed framework consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.", "full_text": "Time-series Generative Adversarial Networks\n\nJinsung Yoon\u2217\n\nUniversity of California, Los Angeles, USA\n\njsyoon0823@g.ucla.edu\n\nDaniel Jarrett\u2217\n\nUniversity of Cambridge, UK\n\ndaniel.jarrett@maths.cam.ac.uk\n\nMihaela van der Schaar\n\nUniversity of Cambridge, UK\n\nmv472@cam.ac.uk, mihaela@ee.ucla.edu\n\nUniversity of California, Los Angeles, USA\n\nAlan Turing Institute, UK\n\nAbstract\n\nA good generative model for time-series data should preserve temporal dynamics,\nin the sense that new sequences respect the original relationships between variables\nacross time. Existing methods that bring generative adversarial networks (GANs)\ninto the sequential setting do not adequately attend to the temporal correlations\nunique to time-series data. At the same time, supervised models for sequence\nprediction\u2014which allow \ufb01ner control over network dynamics\u2014are inherently\ndeterministic. We propose a novel framework for generating realistic time-series\ndata that combines the \ufb02exibility of the unsupervised paradigm with the control\nafforded by supervised training. Through a learned embedding space jointly\noptimized with both supervised and adversarial objectives, we encourage the\nnetwork to adhere to the dynamics of the training data during sampling. Empirically,\nwe evaluate the ability of our method to generate realistic samples using a variety of\nreal and synthetic time-series datasets. Qualitatively and quantitatively, we \ufb01nd that\nthe proposed framework consistently and signi\ufb01cantly outperforms state-of-the-art\nbenchmarks with respect to measures of similarity and predictive ability.\n\n1\n\nIntroduction\n\nWhat is a good generative model for time-series data? The temporal setting poses a unique challenge\nto generative modeling. A model is not only tasked with capturing the distributions of features\nwithin each time point, it should also capture the potentially complex dynamics of those variables\nacross time. Speci\ufb01cally, in modeling multivariate sequential data x1:T = (x1, ..., xT ), we wish to\naccurately capture the conditional distribution p(xt|x1:t\u22121) of temporal transitions as well.\nOn the one hand, a great deal of work has focused on improving the temporal dynamics of au-\ntoregressive models for sequence prediction. These primarily tackle the problem of compounding\nerrors during multi-step sampling, introducing various training-time modi\ufb01cations to more accurately\nre\ufb02ect testing-time conditions [1, 2, 3]. Autoregressive models explicitly factor the distribution of\n\nsequences into a product of conditionals(cid:81)t p(xt|x1:t\u22121). However, while useful in the context of\n\nforecasting, this approach is fundamentally deterministic, and is not truly generative in the sense that\nnew sequences can be randomly sampled from them without external conditioning. On the other\nhand, a separate line of work has focused on directly applying the generative adversarial network\n(GAN) framework to sequential data, primarily by instantiating recurrent networks for the roles\nof generator and discriminator [4, 5, 6]. While straightforward, the adversarial objective seeks to\nmodel p(x1:T ) directly, without leveraging the autoregressive prior. Importantly, simply summing\n\n\u2217 indicates equal contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fthe standard GAN loss over sequences of vectors may not be suf\ufb01cient to ensure that the dynamics of\nthe network ef\ufb01ciently captures stepwise dependencies present in the training data.\nIn this paper, we propose a novel mechanism to tie together both threads of research, giving rise to a\ngenerative model explicitly trained to preserve temporal dynamics. We present Time-series Generative\nAdversarial Networks (TimeGAN), a natural framework for generating realistic time-series data in\nvarious domains. First, in addition to the unsupervised adversarial loss on both real and synthetic\nsequences, we introduce a stepwise supervised loss using the original data as supervision, thereby\nexplicitly encouraging the model to capture the stepwise conditional distributions in the data. This\ntakes advantage of the fact that there is more information in the training data than simply whether each\ndatum is real or synthetic; we can expressly learn from the transition dynamics from real sequences.\nSecond, we introduce an embedding network to provide a reversible mapping between features and\nlatent representations, thereby reducing the high-dimensionality of the adversarial learning space.\nThis capitalizes on the fact the temporal dynamics of even complex systems are often driven by fewer\nand lower-dimensional factors of variation. Importantly, the supervised loss is minimized by jointly\ntraining both the embedding and generator networks, such that the latent space not only serves to\npromote parameter ef\ufb01ciency\u2014it is speci\ufb01cally conditioned to facilitate the generator in learning\ntemporal relationships. Finally, we generalize our framework to handle the mixed-data setting, where\nboth static and time-series data can be generated at the same time.\nOur approach is the \ufb01rst to combine the \ufb02exibility of the unsupervised GAN framework with the\ncontrol afforded by supervised training in autoregressive models. We demonstrate the advantages\nin a series of experiments on multiple real-world and synthetic datasets. Qualitatively, we conduct\nt-SNE [7] and PCA [8] analyses to visualize how well the generated distributions resemble the\noriginal distributions. Quantitatively, we examine how well a post-hoc classi\ufb01er can distinguish\nbetween real and generated sequences. Furthermore, by applying the \"train on synthetic, test on real\n(TSTR)\" framework [5, 9] to the sequence prediction task, we evaluate how well the generated data\npreserves the predictive characteristics of the original. We \ufb01nd that TimeGAN achieves consistent\nand signi\ufb01cant improvements over state-of-the-art benchmarks in generating realistic time-series.\n\n2 Related Work\n\nTimeGAN is a generative time-series model, trained adversarially and jointly via a learned embedding\nspace with both supervised and unsupervised losses. As such, our approach straddles the intersection\nof multiple strands of research, combining themes from autoregressive models for sequence prediction,\nGAN-based methods for sequence generation, and time-series representation learning.\nAutoregressive recurrent networks trained via the maximum likelihood principle [10] are prone to\npotentially large prediction errors when performing multi-step sampling, due to the discrepancy\nbetween closed-loop training (i.e. conditioned on ground truths) and open-loop inference (i.e.\nconditioned on previous guesses). Based on curriculum learning [11], Scheduled Sampling was\n\ufb01rst proposed as a remedy, whereby models are trained to generate output conditioned on a mix of\nboth previous guesses and ground-truth data [1]. Inspired by adversarial domain adaptation [12],\nProfessor Forcing involved training an auxiliary discriminator to distinguish between free-running\nand teacher-forced hidden states, thus encouraging the network\u2019s training and sampling dynamics to\nconverge [2]. Actor-critic methods [13] have also been proposed, introducing a critic conditioned\non target outputs, trained to estimate next-token value functions that guide the actor\u2019s free-running\npredictions [3]. However, while the motivation for these methods is similar to ours in accounting for\nstepwise transition dynamics, they are inherently deterministic, and do not accommodate explicitly\nsampling from a learned distribution\u2014central to our goal of synthetic data generation.\nOn the other hand, multiple studies have straightforwardly inherited the GAN framework within the\ntemporal setting. The \ufb01rst (C-RNN-GAN) [4] directly applied the GAN architecture to sequential\ndata, using LSTM networks for generator and discriminator. Data is generated recurrently, taking as\ninputs a noise vector and the data generated from the previous time step. Recurrent Conditional GAN\n(RCGAN) [5] took a similar approach, introducing minor architectural differences such as dropping\nthe dependence on the previous output while conditioning on additional input [14]. A multitude of\napplied studies have since utilized these frameworks to generate synthetic sequences in such diverse\ndomains as text [15], \ufb01nance [16], biosignals [17], sensor [18] and smart grid data [19], as well as\nrenewable scenarios [20]. Recent work [6] has proposed conditioning on time stamp information to\n\n2\n\n\fhandle irregularly sampling. However, unlike our proposed technique, these approaches rely only\non the binary adversarial feedback for learning, which by itself may not be suf\ufb01cient to guarantee\nspeci\ufb01cally that the network ef\ufb01ciently captures the temporal dynamics in the training data.\nFinally, representation learning in the time-series setting primarily deals with the bene\ufb01ts of learning\ncompact encodings for the bene\ufb01t of downstream tasks such as prediction [21], forecasting [22], and\nclassi\ufb01cation [23]. Other works have studied the utility of learning latent representations for purposes\nof pre-training [24], disentanglement [25], and interpretability [26]. Meanwhile in the static setting,\nseveral works have explored the bene\ufb01t of combining autoencoders with adversarial training, with\nobjectives such as learning similarity measures [27], enabling ef\ufb01cient inference [28], as well as\nimproving generative capability [29]\u2014an approach that has subsequently been applied to generating\ndiscrete structures by encoding and generating entire sequences for discrimination [30]. By contrast,\nour proposed method generalizes to arbitrary time-series data, incorporates stochasticity at each\ntime step, as well as employing an embedding network to identify a lower-dimensional space for the\ngenerative model to learn the stepwise distributions and latent dynamics of the data.\nFigure 1(a) provides a high-level block diagram of TimeGAN, and Figure 2 gives an illustrative\nimplementation, with C-RNN-GAN and RCGAN similarly detailed. For purposes of expository and\nexperimental comparison with existing methods, we employ a standard RNN parameterization. A\ntable of related works with additional detail can be found in the Supplementary Materials.\n\n3 Problem Formulation\n\nConsider the general data setting where each instance consists of two elements: static features (that\ndo not change over time, e.g. gender), and temporal features (that occur over time, e.g. vital signs).\nLet S be a vector space of static features, X of temporal features, and let S \u2208 S, X \u2208 X be random\nvectors that can be instantiated with speci\ufb01c values denoted s and x. We consider tuples of the\nform (S, X1:T ) with some joint distribution p. The length T of each sequence is also a random\nvariable, the distribution of which\u2014for notational convenience\u2014we absorb into p. In the training\ndata, let individual samples be indexed by n \u2208 {1, ..., N}, so we can denote the training dataset\nD = {(sn, xn,1:Tn )}N\nOur goal is to use training data D to learn a density \u02c6p(S, X1:T ) that best approximates p(S, X1:T ).\nThis is a high-level objective, and\u2014depending on the lengths, dimensionality, and distribution of\nthe data\u2014may be dif\ufb01cult to optimize in the standard GAN framework. Therefore we additionally\nmake use of the autoregressive decomposition of the joint p(S, X1:T ) = p(S)(cid:81)t p(Xt|S, X1:t\u22121)\nto focus speci\ufb01cally on the conditionals, yielding the complementary\u2014and simpler\u2014objective of\nlearning a density \u02c6p(Xt|S, X1:t\u22121) that best approximates p(Xt|S, X1:t\u22121) at any time t.\nTwo Objectives. Importantly, this breaks down the sequence-level objective (matching the joint\ndistribution) into a series of stepwise objectives (matching the conditionals). The \ufb01rst is global,\n\nn=1. Going forward, subscripts n are omitted unless explicitly required.\n\n\u02c6p\n\nmin\n\n\u02c6p\n\nmin\n\nD(cid:16)p(S, X1:T )(cid:13)(cid:13)\u02c6p(S, X1:T )(cid:17)\nD(cid:16)p(Xt|S, X1:t\u22121)(cid:13)(cid:13)\u02c6p(Xt|S, X1:t\u22121)(cid:17)\n\n(1)\n\n(2)\n\nwhere D is some appropriate measure of distance between distributions. The second is local,\n\nfor any t. Under an ideal discriminator in the GAN framework, the former takes the form of the\nJensen-Shannon divergence. Using the original data for supervision via maximum-likelihood (ML)\ntraining, the latter takes the form of the Kullback-Leibler divergence. Note that minimizing the former\nrelies on the presence of a perfect adversary (which we may not have access to), while minimizing\nthe latter only depends on the presence of ground-truth sequences (which we do have access to). Our\ntarget, then, will be a combination of the GAN objective (proportional to Expression 1) and the ML\nobjective (proportional to Expression 2). As we shall see, this naturally yields a training procedure\nthat involves the simple addition of a supervised loss to guide adversarial learning.\n\n4 Proposed Model: Time-series GAN (TimeGAN)\n\nTimeGAN consists of four network components: an embedding function, recovery function, sequence\ngenerator, and sequence discriminator. The key insight is that the autoencoding components (\ufb01rst two)\n\n3\n\n\fare trained jointly with the adversarial components (latter two), such that TimeGAN simultaneously\nlearns to encode features, generate representations, and iterate across time. The embedding network\nprovides the latent space, the adversarial network operates within this space, and the latent dynamics\nof both real and synthetic data are synchronized through a supervised loss. We describe each in turn.\n\n4.1 Embedding and Recovery Functions\n\nhS = eS (s),\n\nht = eX (hS , ht\u22121, xt)\n\nThe embedding and recovery functions provide mappings between feature and latent space, allowing\nthe adversarial network to learn the underlying temporal dynamics of the data via lower-dimensional\nrepresentations. Let HS ,HX denote the latent vector spaces corresponding to feature spaces S,X .\nThen the embedding function e : S \u00d7(cid:81)t X \u2192 HS \u00d7(cid:81)t HX takes static and temporal features to\ntheir latent codes hS , h1:T = e(s, x1:T ). In this paper, we implement e via a recurrent network,\n(3)\nwhere eS : S \u2192 HS is an embedding network for static features, and eX : HS \u00d7 HX \u00d7 X \u2192 HX a\nrecurrent embedding network for temporal features. In the opposite direction, the recovery function\nr : HS \u00d7(cid:81)t HX \u2192 S \u00d7(cid:81)t X takes static and temporal codes back to their feature representations\n(4)\nwhere rS : HS \u2192 S and rX : HX \u2192 X are recovery networks for static and temporal embeddings.\nNote that the embedding and recovery functions can be parameterized by any architecture of choice,\nwith the only stipulation being that they be autoregressive and obey causal ordering (i.e. output(s) at\neach step can only depend on preceding information). For example, it is just as possible to implement\nthe former with temporal convolutions [31], or the latter via an attention-based decoder [32]. Here\nwe choose implementations 3 and 4 as a minimal example to isolate the source of gains.\n\n\u02dcs, \u02dcx1:T = r(hS , h1:T ). Here we implement r through a feedforward network at each step,\n\n\u02dcs = rS (hs),\n\n\u02dcxt = rX (ht)\n\n4.2 Sequence Generator and Discriminator\n\nInstead of producing synthetic output directly in feature space, the generator \ufb01rst outputs into the\nembedding space. Let ZS ,ZX denote vector spaces over which known distributions are de\ufb01ned, and\nfrom which random vectors are drawn as input for generating into HS ,HX . Then the generating\nfunction g : ZS \u00d7(cid:81)t ZX \u2192 HS \u00d7(cid:81)t HX takes a tuple of static and temporal random vectors to\nsynthetic latent codes \u02c6hS , \u02c6h1:T = g(zS , z1:T ). We implement g through a recurrent network,\n(5)\nwhere gS : ZS \u2192 HS is an generator network for static features, and gX : HS \u00d7HX \u00d7ZX \u2192 HX is\na recurrent generator for temporal features. Random vector zS can be sampled from a distribution of\nchoice, and zt follows a stochastic process; here we use the Gaussian distribution and Wiener process\n\n\u02c6ht = gX (\u02c6hS , \u02c6ht\u22121, zt)\n\n\u02c6hS = gS (zS ),\n\nFigure 1: (a) Block diagram of component functions and objectives. (b) Training scheme; solid lines\nindicate forward propagation of data, and dashed lines indicate backpropagation of gradients.\n\n4\n\nClassificationsReconstructionsReal SequencesRandom VectorsLatent CodesRecoveryEmbeddingGenerateDiscriminateUnsupervisedLossLearn distribution\u02c6p(S,X1:T)(null)(null)(null)(null)directlySupervisedLossLearn conditionals\u02c6p(Xt|S,X1:t1)(null)(null)(null)(null)ReconstructionLossProvide LatentEmbedding Space2S\u21e5QtX(null)(null)(null)(null)2S\u21e5QtX(null)(null)(null)(null)2[0,1]\u21e5...(null)(null)(null)(null)2ZS\u21e5QtZt(null)(null)(null)(null)2HS\u21e5QtHt(null)(null)(null)(null)e(null)(null)(null)(null)r(null)(null)(null)(null)d(null)(null)(null)(null)g(null)(null)(null)(null)\u02dcs,\u02dcx1:T(null)(null)(null)(null)s,x1:T(null)(null)(null)(null)\u02c6hS,\u02c6h1:T(null)(null)(null)(null)hS,h1:T(null)(null)(null)(null)zS,z1:T(null)(null)(null)(null)\u02dcyS,\u02dcy1:T(null)(null)(null)(null)@LR@\u2713e(null)(null)(null)(null)@LR@\u2713r(null)(null)(null)(null)@LS@\u2713g(null)(null)(null)(null)@LS@\u2713e(null)(null)(null)(null)@LU@\u2713g(null)(null)(null)(null)@LU@\u2713d(null)(null)(null)(null)(a) Block Diagram(b) Training Scheme\frespectively. Finally, the discriminator also operates from the embedding space. The discrimination\n\nfunction d : HS \u00d7(cid:81)t HX \u2192 [0, 1] \u00d7(cid:81)t[0, 1] receives the static and temporal codes, returning\n\nclassi\ufb01cations \u02dcyS , \u02dcy1:T = d(\u02dchS , \u02dch1:T ) . The \u02dch\u2217 notation denotes either real (h\u2217) or synthetic (\u02c6h\u2217)\nembeddings; similarly, the \u02dcy\u2217 notation denotes classi\ufb01cations of either real (y\u2217) or synthetic (\u02c6y\u2217) data.\nHere we implement d via a bidirectional recurrent network with a feedforward output layer,\n\n\u02dcyS = dS (\u02dchS )\n\n(6)\nwhere (cid:126)ut = (cid:126)cX (\u02dchS , \u02dcht, (cid:126)ut\u22121) and (cid:126)ut = (cid:126)cX (\u02dchS , \u02dcht, (cid:126)ut+1) respectively denote the sequences of\nforward and backward hidden states, (cid:126)cX , (cid:126)cX are recurrent functions, and dS , dX are output layer\nclassi\ufb01cation functions. Similarly, there are no restrictions on architecture beyond the generator being\nautoregressive; here we use a standard recurrent formulation for ease of exposition.\n\n\u02dcyt = dX ( (cid:126)ut, (cid:126)ut)\n\n4.3\n\nJointly Learning to Encode, Generate, and Iterate\n\nFirst, purely as a reversible mapping between feature and latent spaces, the embedding and recovery\nfunctions should enable accurate reconstructions \u02dcs, \u02dcx1:T of the original data s, x1:T from their latent\nrepresentations hS , h1:T . Therefore our \ufb01rst objective function is the reconstruction loss,\n\nLR = Es,x1:T \u223cp(cid:2)(cid:107)s \u2212 \u02dcs(cid:107)2 +(cid:80)t (cid:107)xt \u2212 \u02dcxt(cid:107)2(cid:3)\n\n(7)\nIn TimeGAN, the generator is exposed to two types of inputs during training. First, in pure open-\nloop mode, the generator\u2014which is autoregressive\u2014receives synthetic embeddings \u02c6hS , \u02c6h1:t\u22121 (i.e.\nits own previous outputs) in order to generate the next synthetic vector \u02c6ht. Gradients are then\ncomputed on the unsupervised loss. This is as one would expect\u2014that is, to allow maximizing (for\nthe discriminator) or minimizing (for the generator) the likelihood of providing correct classi\ufb01cations\n\u02c6yS , \u02c6y1:T for both the training data hS , h1:T as well as for synthetic output \u02c6hS , \u02c6h1:T from the generator,\n(8)\nRelying solely on the discriminator\u2019s binary adversarial feedback may not be suf\ufb01cient incentive\nfor the generator to capture the stepwise conditional distributions in the data. To achieve this more\nef\ufb01ciently, we introduce an additional loss to further discipline learning. In an alternating fashion, we\nalso train in closed-loop mode, where the generator receives sequences of embeddings of actual data\nh1:t\u22121 (i.e. computed by the embedding network) to generate the next latent vector. Gradients can\nnow be computed on a loss that captures the discrepancy between distributions p(Ht|HS , H1:t\u22121)\nand \u02c6p(Ht|HS , H1:t\u22121). Applying maximum likelihood yields the familiar supervised loss,\n(9)\n\nLU = Es,x1:T \u223cp(cid:2) log yS +(cid:80)t log yt(cid:3) + Es,x1:T \u223c \u02c6p(cid:2) log(1 \u2212 \u02c6yS ) +(cid:80)t log(1 \u2212 \u02c6yt)(cid:3)\n\nLS = Es,x1:T \u223cp(cid:2)(cid:80)t (cid:107)ht \u2212 gX (hS , ht\u22121, zt)(cid:107)2(cid:3)\n\nFigure 2: (a) TimeGAN instantiated with RNNs, (b) C-RNN-GAN, and (c) RCGAN. Solid lines de-\nnote function application, dashed lines denote recurrence, and orange lines indicate loss computation.\n\n5\n\ns(null)(null)(null)(null)\u02dcs(null)(null)(null)(null)\u02dcxt(null)(null)(null)(null)xt(null)(null)(null)(null)ht(null)(null)(null)(null)hS(null)(null)(null)(null)zS(null)(null)(null)(null)\u02c6hS(null)(null)(null)(null)\u02c6ht(null)(null)(null)(null)zt(null)(null)(null)(null)!ut(null)(null)(null)(null) ut(null)(null)(null)(null)\u02dcyt(null)(null)(null)(null)\u02dcyS(null)(null)(null)(null)eS(null)(null)(null)(null)eX(null)(null)(null)(null)gS(null)(null)(null)(null)gX(null)(null)(null)(null)dS(null)(null)(null)(null)dX(null)(null)(null)(null)rX(null)(null)(null)(null)rS(null)(null)(null)(null) cX(null)(null)(null)(null)!cX(null)(null)(null)(null)LS(null)(null)(null)(null)LU(null)(null)(null)(null)LR(null)(null)(null)(null)EmbeddingDiscriminateRecoveryGenerate!ut(null)(null)(null)(null) ut(null)(null)(null)(null)\u02dcyt(null)(null)(null)(null)LU(null)(null)(null)(null)Discriminate\u02c6ht(null)(null)(null)(null)zt(null)(null)(null)(null)Generate\u02c6xt(null)(null)(null)(null)\u02dcyt(null)(null)(null)(null)LU(null)(null)(null)(null)Discriminate\u02c6ht(null)(null)(null)(null)zt(null)(null)(null)(null)Generate\u02c6xt(null)(null)(null)(null)s(null)(null)(null)(null)ut(null)(null)(null)(null)s(null)(null)(null)(null)cX(null)(null)(null)(null)dX(null)(null)(null)(null)dX(null)(null)(null)(null) cX(null)(null)(null)(null)!cX(null)(null)(null)(null)gX(null)(null)(null)(null)gX(null)(null)(null)(null)rX(null)(null)(null)(null)rX(null)(null)(null)(null)(a) TimeGAN(b) C-RNN-GAN(c) RCGAN\fwhere gX (hS , ht\u22121, zt) approximates Ezt\u223cN [\u02c6p(Ht|HS , H1:t\u22121, zt)] with one sample zt\u2014as is\nstandard in stochastic gradient descent. In sum, at any step in a training sequence, we assess the\ndifference between the actual next-step latent vector (from the embedding function) and synthetic\nnext-step latent vector (from the generator\u2014conditioned on the actual historical sequence of latents).\nWhile LU pushes the generator to create realistic sequences (evaluated by an imperfect adversary),\nLS further ensures that it produces similar stepwise transitions (evaluated by ground-truth targets).\nOptimization. Figure 1(b) illustrates the mechanics of our approach at training. Let \u03b8e, \u03b8r, \u03b8g, \u03b8d\nrespectively denote the parameters of the embedding, recovery, generator, and discriminator networks.\nThe \ufb01rst two components are trained on both the reconstruction and supervised losses,\n\nmin\n\u03b8e,\u03b8r\n\n(\u03bbLS + LR)\n\n(10)\n\nwhere \u03bb \u2265 0 is a hyperparameter that balances the two losses. Importantly, LS is included such that\nthe embedding process not only serves to reduce the dimensions of the adversarial learning space\u2014it\nis actively conditioned to facilitate the generator in learning temporal relationships from the data.\nNext, the generator and discriminator networks are trained adversarially as follows,\n\nmin\n\u03b8g\n\n(\u03b7LS + max\n\n\u03b8d LU)\n\n(11)\n\nwhere \u03b7 \u2265 0 is another hyperparameter that balances the two losses. That is, in addition to the\nunsupervised minimax game played over classi\ufb01cation accuracy, the generator additionally minimizes\nthe supervised loss. By combining the objectives in this manner, TimeGAN is simultaneously trained\nto encode (feature vectors), generate (latent representations), and iterate (across time).\nIn practice, we \ufb01nd that TimeGAN is not sensitive to \u03bb and \u03b7; for all experiments in Section 5,\nwe set \u03bb = 1 and \u03b7 = 10. Note that while GANs in general are not known for their ease of\ntraining, we do not discover any additional complications in TimeGAN. The embedding task serves\nto regularize adversarial learning\u2014which now occurs in a lower-dimensional latent space. Similarly,\nthe supervised loss has a constraining effect on the stepwise dynamics of the generator. For both\nreasons, we do not expect TimeGAN to be more challenging to train, and standard techniques for\nimproving GAN training are still applicable. Algorithm pseudocode and illustrations with additional\ndetail can be found in the Supplementary Materials.\n\n5 Experiments\n\nBenchmarks and Evaluation. We compare TimeGAN with RCGAN [5] and C-RNN-GAN [4],\nthe two most closely related methods. For purely autoregressive approaches, we compare against\nRNNs trained with teacher-forcing (T-Forcing) [33, 34] as well as professor-forcing (P-Forcing)\n[2]. For additional comparison, we consider the performance of WaveNet [31] as well as its GAN\ncounterpart WaveGAN [35]. To assess the quality of generated data, we observe three desiderata:\n(1) diversity\u2014samples should be distributed to cover the real data; (2) \ufb01delity\u2014samples should be\nindistinguishable from the real data; and (3) usefulness\u2014samples should be just as useful as the real\ndata when used for the same predictive purposes (i.e. train-on-synthetic, test-on-real).\n(1) Visualization. We apply t-SNE [7] and PCA [8] analyses on both the original and synthetic\ndatasets (\ufb02attening the temporal dimension). This visualizes how closely the distribution of generated\nsamples resembles that of the original in 2-dimensional space, giving a qualitative assessment of (1).\n(2) Discriminative Score. For a quantitative measure of similarity, we train a post-hoc time-series\nclassi\ufb01cation model (by optimizing a 2-layer LSTM) to distinguish between sequences from the\noriginal and generated datasets. First, each original sequence is labeled real, and each generated\nsequence is labeled not real. Then, an off-the-shelf (RNN) classi\ufb01er is trained to distinguish between\nthe two classes as a standard supervised task. We then report the classi\ufb01cation error on the held-out\ntest set, which gives a quantitative assessment of (2).\n(3) Predictive Score. In order to be useful, the sampled data should inherit the predictive characteris-\ntics of the original. In particular, we expect TimeGAN to excel in capturing conditional distributions\nover time. Therefore, using the synthetic dataset, we train a post-hoc sequence-prediction model (by\noptimizing a 2-layer LSTM) to predict next-step temporal vectors over each input sequence. Then,\nwe evaluate the trained model on the original dataset. Performance is measured in terms of the mean\n\n6\n\n\fabsolute error (MAE); for event-based data, the MAE is computed as |1\u2212 estimated probability that\nthe event occurred|. This gives a quantitative assessment of (3).\nThe Supplementary Materials contains additional information on benchmarks and hyperparame-\nters, as well as further details of visualizations and hyperparameters for the post-hoc evaluation\nmodels. Implementation of TimeGAN can be found at https://bitbucket.org/mvdschaar/\nmlforhealthlabpub/src/master/alg/timegan/.\n\n5.1\n\nIllustrative Example: Autoregressive Gaussian Models\n\nOur primary novelties are twofold: a supervised loss to better capture temporal dynamics, and an\nembedding network that provides a lower-dimensional adversarial learning space. To highlight\nthese advantages, we experiment on sequences from autoregressive multivariate Gaussian models as\nfollows: xt = \u03c6xt\u22121 + n, where n \u223c N (0, \u03c31 + (1 \u2212 \u03c3)I). The coef\ufb01cient \u03c6 \u2208 [0, 1] allows us to\ncontrol the correlation across time steps, and \u03c3 \u2208 [\u22121, 1] controls the correlation across features.\nAs shown in Table 1, TimeGAN consistently generates higher-quality synthetic data than benchmarks,\nin terms of both discriminative and predictive scores. This is true across the various settings for the\nunderlying data-generating model. Importantly, observe that the advantage of TimeGAN is greater\nfor higher settings of temporal correlation \u03c6, lending credence to the motivation and bene\ufb01t of the\nsupervised loss mechanism. Likewise, observe that the advantage of TimeGAN is also greater for\nhigher settings of feature correlation \u03c3, providing con\ufb01rmation for the bene\ufb01t of the embedding\nnetwork.\n\nSettings\n\nTimeGAN\nRCGAN\n\nTemporal Correlations (\ufb01xing \u03c3 = 0.8)\n\u03c6 = 0.2\n\n\u03c6 = 0.5\n\n\u03c6 = 0.8\n\nTable 1: Results on Autoregressive Multivariate Gaussian Data (Bold indicates best performance).\nFeature Correlations (\ufb01xing \u03c6 = 0.8)\n\u03c3 = 0.8\n\u03c3 = 0.2\nDiscriminative Score (Lower the better)\n.181\u00b1.006\n.186\u00b1.012\n.198\u00b1.011\n.499\u00b1.001\n.460\u00b1.003\n.217\u00b1.010\n.192\u00b1.012\n\n.175\u00b1.006 .174\u00b1.012\n.177\u00b1.012 .190\u00b1.011\nC-RNN-GAN .391\u00b1.006 .227\u00b1.017\n.500\u00b1.000 .500\u00b1.000\nT-Forcing\n.498\u00b1.002 .472\u00b1.008\nP-Forcing\nWaveNet\n.337\u00b1.005 .235\u00b1.009\nWaveGAN\n.336\u00b1.011 .213\u00b1.013\n\n.105\u00b1.005\n.133\u00b1.019\n.220\u00b1.016\n.499\u00b1.001\n.396\u00b1.018\n.229\u00b1.013\n.230\u00b1.023\n\n.105\u00b1.005\n.133\u00b1.019\n.220\u00b1.016\n.499\u00b1.001\n.396\u00b1.018\n.229\u00b1.013\n.230\u00b1.023\n\n\u03c3 = 0.5\n\n.152\u00b1.011\n.190\u00b1.012\n.202\u00b1.010\n.499\u00b1.001\n.408\u00b1.016\n.226\u00b1.011\n.205\u00b1.015\n\nPredictive Score (Lower the better)\n\nTimeGAN\nRCGAN\n\n.640\u00b1.003 .412\u00b1.002\n.652\u00b1.003 .435\u00b1.002\nC-RNN-GAN .696\u00b1.002 .490\u00b1.005\n.737\u00b1.022 .732\u00b1.012\nT-Forcing\n.665\u00b1.004 .571\u00b1.005\nP-Forcing\nWaveNet\n.718\u00b1.002 .508\u00b1.003\n.712\u00b1.003 .489\u00b1.001\nWaveGAN\n\n.251\u00b1.002\n.263\u00b1.003\n.299\u00b1.002\n.503\u00b1.037\n.289\u00b1.003\n.321\u00b1.005\n.290\u00b1.002\n\n.282\u00b1.005 .261\u00b10.002 .251\u00b1.002\n.292\u00b1.003\n.263\u00b1.003\n.299\u00b1.002\n.293\u00b1.005\n.503\u00b1.037\n.515\u00b1.034\n.289\u00b1.003\n.406\u00b1.005\n.331\u00b1.004\n.321\u00b1.005\n.325\u00b1.003\n.290\u00b1.002\n\n.279\u00b1.002\n.280\u00b1.006\n.543\u00b1.023\n.317\u00b1.001\n.297\u00b1.003\n.353\u00b1.001\n\n5.2 Experiments on Different Types of Time Series Data\n\nWe test the performance of TimeGAN across time-series data with a variety of different characteristics,\nincluding periodicity, discreteness, level of noise, regularity of time steps, and correlation across\ntime and features. The following datasets are selected on the basis of different combinations of these\nproperties (detailed statistics of each dataset can be found in the Supplementary Materials).\n(1) Sines. We simulate multivariate sinusoidal sequences of different frequencies \u03b7 and phases \u03b8,\nproviding continuous-valued, periodic, multivariate data where each feature is independent of others.\nFor each dimension i \u2208 {1, ..., 5}, xi(t) = sin(2\u03c0\u03b7t + \u03b8), where \u03b7 \u223c U[0, 1] and \u03b8 \u223c U[\u2212\u03c0, \u03c0].\n\n7\n\n\f(2) Stocks. By contrast, sequences of stock prices are continuous-valued but aperiodic; furthermore,\nfeatures are correlated with each other. We use the daily historical Google stocks data from 2004 to\n2019, including as features the volume and high, low, opening, closing, and adjusted closing prices.\n(3) Energy. Next, we consider a dataset characterized by noisy periodicity, higher dimensionality,\nand correlated features. The UCI Appliances energy prediction dataset consists of multivariate,\ncontinuous-valued measurements including numerous temporal features measured at close intervals.\n(4) Events. Finally, we consider a dataset characterized by discrete values and irregular time stamps.\nWe use a large private lung cancer pathways dataset consisting of sequences of events and their times,\nand model both the one-hot encoded sequence of event types as well as the event timings.\n\n(a) TimeGAN (b) RCGAN (c) CRNNGAN (d) T-Forcing (e) P-Forcing\nFigure 3: t-SNE visualization on Sines (1st row) and Stocks (2nd row). Each column provides the\nvisualization for each of the 7 benchmarks. Red denotes original data, and blue denotes synthetic.\nAdditional and larger t-SNE and PCA visualizations can be found in the Supplementary Materials.\n\n(g) WaveGAN\n\n(f) WaveNet\n\nVisualizations with t-SNE and PCA. In Figure 3, we observe that synthetic datasets generated by\nTimeGAN show markedly better overlap with the original data than other benchmarks using t-SNE\nfor visualization (PCA analysis can be found in the Supplementary Materials). In fact, we (in the \ufb01rst\ncolumn) that the blue (generated) samples and red (original) samples are almost perfectly in sync.\nDiscriminative and Predictive Scores. As indicated in Table 2, TimeGAN consistently generates\nhigher-quality synthetic data in comparison to benchmarks on the basis of both discriminative (post-\nhoc classi\ufb01cation error) and predictive (mean absolute error) scores across all datasets. For instance\nfor Stocks, TimeGAN-generated samples achieve 0.102 which is 48% lower than the next-best\nbenchmark (RCGAN, at 0.196)\u2014a statistically signi\ufb01cant improvement. Remarkably, observe that\nthe predictive scores of TimeGAN are almost on par with those of the original datasets themselves.\n\nTable 2: Results on Multiple Time-Series Datasets (Bold indicates best performance).\nMetric\n\nSines\n\nDiscriminative\n\nScore\n\n(Lower the Better)\n\nPredictive\n\nScore\n\n(Lower the Better)\n\nMethod\nTimeGAN\nRCGAN\n\nC-RNN-GAN\n\nT-Forcing\nP-Forcing\nWaveNet\nWaveGAN\nTimeGAN\nRCGAN\n\nT-Forcing\nP-Forcing\nWaveNet\nWaveGAN\nOriginal\n\nC-RNN-GAN\n\nStocks\n.102\u00b1.021\n.196\u00b1.027\n.399\u00b1.028\n.226\u00b1.035\n.257\u00b1.026\n.232\u00b1.028\n.217\u00b1.022\n.038\u00b1.001\n.040\u00b1.001\n.038\u00b1.000\n.038\u00b1.001\n.043\u00b1.001\n.042\u00b1.001\n.041\u00b1.001\n.036\u00b1.001\n\nEnergy\n.236\u00b1.012\n.336\u00b1.017\n.499\u00b1.001\n.483\u00b1.004\n.412\u00b1.006\n.397\u00b1.010\n.363\u00b1.012\n.273\u00b1.004\n.292\u00b1.005\n.483\u00b1.005\n.315\u00b1.005\n.303\u00b1.006\n.311\u00b1.005\n.307\u00b1.007\n.250\u00b1.003\n\nEvents\n.161\u00b1.018\n.380\u00b1.021\n.462\u00b1.011\n.387\u00b1.012\n.489\u00b1.001\n.385\u00b1.025\n.357\u00b1.017\n.303\u00b1.006\n.345\u00b1.010\n.360\u00b1.010\n.310\u00b1.003\n.320\u00b1.008\n.333\u00b1.004\n.324\u00b1.006\n.293\u00b1.000\n\n.011\u00b1.008\n.022\u00b1.008\n.229\u00b1.040\n.495\u00b1.001\n.430\u00b1.027\n.158\u00b1.011\n.277\u00b1.013\n.093\u00b1.019\n.097\u00b1.001\n.127\u00b1.004\n.150\u00b1.022\n.116\u00b1.004\n.117\u00b1.008\n.134\u00b1.013\n.094\u00b1.001\n\n8\n\n\f5.3 Sources of Gain\n\nTimeGAN is characterized by (1) the supervised loss, (2) embedding networks, and (3) the joint\ntraining scheme. To analyze the importance of each contribution, we report the discriminative and\npredictive scores with the following modi\ufb01cations to TimeGAN: (1) without the supervised loss, (2)\nwithout the embedding networks, and (3) without jointly training the embedding and adversarial\nnetworks on the supervised loss. (The \ufb01rst corresponds to \u03bb = \u03b7 = 0, and the third to \u03bb = 0).\n\nTable 3: Source-of-Gain Analysis on Multiple Datasets (via Discriminative and Predictive scores).\n\nMetric\n\nMethod\nTimeGAN\n\nSines\n\nDiscriminative\n\nScore\n\n(Lower the Better)\n\nw/o Supervised Loss\nw/o Embedding Net.\nw/o Joint Training\n\nEvents\n.161\u00b1.018\n.195\u00b1.013\n.244\u00b1.011\n.181\u00b1.011\n.303\u00b1.006\n.380\u00b1.023\n.410\u00b1.013\n.348\u00b1.021\nWe observe in Table 3 that all three elements make important contributions in improving the quality\nof the generated time-series data. The supervised loss plays a particularly important role when the\ndata is characterized by high temporal correlations, such as in the Stocks dataset. In addition, we \ufb01nd\nthat the embedding networks and joint training the with the adversarial networks (thereby aligning\nthe targets of the two) clearly and consistently improves generative performance across the board.\n\nStocks\n.102\u00b1.021\n.145\u00b1.023\n.260\u00b1.021\n.131\u00b1.019\n.038\u00b1.001\n.054\u00b1.001\n.048\u00b1.001\n.045\u00b1.001\n\nEnergy\n.236\u00b1.012\n.298\u00b1.010\n.286\u00b1.006\n.268\u00b1.012\n.273\u00b1.004\n.277\u00b1.005\n.286\u00b1.002\n.276\u00b1.004\n\n.011\u00b1.008\n.193\u00b1.013\n.197\u00b1.025\n.048\u00b1.011\n.093\u00b1.019\n.116\u00b1.010\n.124\u00b1.002\n.107\u00b1.008\n\nTimeGAN\n\nw/o Supervised Loss\nw/o Embedding Net.\nw/o Joint Training\n\nPredictive\n\nScore\n\n(Lower the Better)\n\n6 Conclusion\n\nIn this paper we introduce TimeGAN, a novel framework for time-series generation that combines the\nversatility of the unsupervised GAN approach with the control over conditional temporal dynamics\nafforded by supervised autoregressive models. Leveraging the contributions of the supervised loss and\njointly trained embedding network, TimeGAN demonstrates consistent and signi\ufb01cant improvements\nover state-of-the-art benchmarks in generating realistic time-series data. In the future, further work\nmay investigate incorporating the differential privacy framework into the TimeGAN approach in\norder to generate high-quality time-series data with differential privacy guarantees.\n\nAcknowledgements\n\nThe authors would like to thank the reviewers for their helpful comments. This work was supported\nby the National Science Foundation (NSF grants 1407712, 1462245 and 1533983), and the US Of\ufb01ce\nof Naval Research (ONR).\n\nReferences\n[1] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence\nprediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages\n1171\u20131179, 2015.\n\n[2] Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and\nYoshua Bengio. Professor forcing: A new algorithm for training recurrent networks. In Advances In Neural\nInformation Processing Systems, pages 4601\u20134609, 2016.\n\n[3] Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron\nCourville, and Yoshua Bengio. An actor-critic algorithm for sequence prediction. arXiv preprint\narXiv:1607.07086, 2016.\n\n[4] Olof Mogren. C-rnn-gan: Continuous recurrent neural networks with adversarial training. arXiv preprint\n\narXiv:1611.09904, 2016.\n\n9\n\n\f[5] Crist\u00f3bal Esteban, Stephanie L Hyland, and Gunnar R\u00e4tsch. Real-valued (medical) time series generation\n\nwith recurrent conditional gans. arXiv preprint arXiv:1706.02633, 2017.\n\n[6] Giorgia Ramponi, Pavlos Protopapas, Marco Brambilla, and Ryan Janssen. T-cgan: Conditional generative\nadversarial network for data augmentation in noisy time series with irregular sampling. arXiv preprint\narXiv:1811.08295, 2018.\n\n[7] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning\n\nresearch, 9(Nov):2579\u20132605, 2008.\n\n[8] Fred B Bryant and Paul R Yarnold. Principal-components analysis and exploratory and con\ufb01rmatory factor\n\nanalysis. 1995.\n\n[9] Jinsung Yoon, James Jordon, and Mihaela van der Schaar. PATE-GAN: Generating synthetic data with\n\ndifferential privacy guarantees. In International Conference on Learning Representations, 2019.\n\n[10] Ronald J Williams and David Zipser. A learning algorithm for continually running fully recurrent neural\n\nnetworks. Neural computation, 1(2):270\u2013280, 1989.\n\n[11] Yoshua Bengio, J\u00e9r\u00f4me Louradour, Ronan Collobert, and Jason Weston. Curriculum learning.\n\nIn\nProceedings of the 26th annual international conference on machine learning, pages 41\u201348. ACM, 2009.\n\n[12] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Fran\u00e7ois Laviolette,\nMario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of\nMachine Learning Research, 17(1):2096\u20132030, 2016.\n\n[13] Vijay R Konda and John N Tsitsiklis. Actor-critic algorithms. In Advances in neural information processing\n\nsystems, pages 1008\u20131014, 2000.\n\n[14] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784,\n\n2014.\n\n[15] Yizhe Zhang, Zhe Gan, and Lawrence Carin. Generating text via adversarial training. In NIPS workshop\n\non Adversarial Training, volume 21, 2016.\n\n[16] Luca Simonetto. Generating spiking time series with generative adversarial networks: an application on\n\nbanking transactions. 2018.\n\n[17] Shota Haradal, Hideaki Hayashi, and Seiichi Uchida. Biosignal data augmentation based on generative\nadversarial networks. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine\nand Biology Society (EMBC), pages 368\u2013371. IEEE, 2018.\n\n[18] Moustafa Alzantot, Supriyo Chakraborty, and Mani Srivastava. Sensegen: A deep learning architecture for\nsynthetic sensor data generation. In 2017 IEEE International Conference on Pervasive Computing and\nCommunications Workshops (PerCom Workshops), pages 188\u2013193. IEEE, 2017.\n\n[19] Chi Zhang, Sanmukh R Kuppannagari, Rajgopal Kannan, and Viktor K Prasanna. Generative adversarial\nnetwork for synthetic time series data generation in smart grids. In 2018 IEEE International Conference\non Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), pages 1\u20136.\nIEEE, 2018.\n\n[20] Yize Chen, Yishen Wang, Daniel Kirschen, and Baosen Zhang. Model-free renewable scenario generation\n\nusing generative adversarial networks. IEEE Transactions on Power Systems, 33(3):3265\u20133275, 2018.\n\n[21] Andrew M Dai and Quoc V Le. Semi-supervised sequence learning. In Advances in neural information\n\nprocessing systems, pages 3079\u20133087, 2015.\n\n[22] Xinrui Lyu, Matthias Hueser, Stephanie L Hyland, George Zerveas, and Gunnar Raetsch. Improving clinical\npredictions through unsupervised time series representation learning. arXiv preprint arXiv:1812.00490,\n2018.\n\n[23] Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. Unsupervised learning of video representa-\n\ntions using lstms. In International conference on machine learning, pages 843\u2013852, 2015.\n\n[24] Otto Fabius and Joost R van Amersfoort. Variational recurrent auto-encoders.\n\narXiv:1412.6581, 2014.\n\narXiv preprint\n\n[25] Yingzhen Li and Stephan Mandt. Disentangled sequential autoencoder. arXiv preprint arXiv:1803.02991,\n\n2018.\n\n10\n\n\f[26] Wei-Ning Hsu, Yu Zhang, and James Glass. Unsupervised learning of disentangled and interpretable\nIn Advances in neural information processing systems, pages\n\nrepresentations from sequential data.\n1878\u20131889, 2017.\n\n[27] Anders Boesen Lindbo Larsen, S\u00f8ren Kaae S\u00f8nderby, Hugo Larochelle, and Ole Winther. Autoencoding\n\nbeyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.\n\n[28] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, and\n\nAaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.\n\n[29] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial\n\nautoencoders. arXiv preprint arXiv:1511.05644, 2015.\n\n[30] Yoon Kim, Kelly Zhang, Alexander M Rush, Yann LeCun, et al. Adversarially regularized autoencoders.\n\narXiv preprint arXiv:1706.04223, 2017.\n\n[31] A\u00e4ron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal\nKalchbrenner, Andrew W Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.\nSSW, 125, 2016.\n\n[32] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning\n\nto align and translate. arXiv preprint arXiv:1409.0473, 2014.\n\n[33] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.\n\n[34] Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In\nProceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1017\u20131024,\n2011.\n\n[35] Chris Donahue, Julian McAuley, and Miller Puckette. Adversarial audio synthesis. arXiv preprint\n\narXiv:1802.04208, 2018.\n\n11\n\n\f", "award": [], "sourceid": 2946, "authors": [{"given_name": "Jinsung", "family_name": "Yoon", "institution": "University of California, Los Angeles"}, {"given_name": "Daniel", "family_name": "Jarrett", "institution": "University of Cambridge"}, {"given_name": "Mihaela", "family_name": "van der Schaar", "institution": "University of California, Los Angeles"}]}