{"title": "Gene Expression Clustering with Functional Mixture Models", "book": "Advances in Neural Information Processing Systems", "page_first": 683, "page_last": 690, "abstract": "", "full_text": "Gene Expression Clustering with Functional\n\nMixture Models\n\nDarya Chudova,\n\nDepartment of Computer Science\nUniversity of California, Irvine\n\nIrvine CA 92697-3425\n\ndchudova@ics.uci.edu\n\nChristopher Hart\nDivision of Biology\n\nCalifornia Institute of Technology\n\nPasadena, CA 91125\n\nhart@caltech.edu\n\nEric Mjolsness\n\nDepartment of Computer Science\nUniversity of California, Irvine\n\nIrvine CA 92697-3425\n\nemj@uci.edu\n\nPadhraic Smyth\n\nDepartment of Computer Science\nUniversity of California, Irvine\n\nIrvine CA 92697-3425\nsmyth@ics.uci.edu\n\nAbstract\n\nWe propose a functional mixture model for simultaneous clustering and\nalignment of sets of curves measured on a discrete time grid. The model\nis speci\ufb01cally tailored to gene expression time course data. Each func-\ntional cluster center is a nonlinear combination of solutions of a simple\nlinear differential equation that describes the change of individual mRNA\nlevels when the synthesis and decay rates are constant. The mixture of\ncontinuous time parametric functional forms allows one to (a) account for\nthe heterogeneity in the observed pro\ufb01les, (b) align the pro\ufb01les in time by\nestimating real-valued time shifts, (c) capture the synthesis and decay of\nmRNA in the course of an experiment, and (d) regularize noisy pro\ufb01les\nby enforcing smoothness in the mean curves. We derive an EM algo-\nrithm for estimating the parameters of the model, and apply the proposed\napproach to the set of cycling genes in yeast. The experiments show\nconsistent improvement in predictive power and within cluster variance\ncompared to regular Gaussian mixtures.\n\n1 Introduction\n\nCurve data arises naturally in a variety of applications. Each curve typically consists of a\nsequence of measurements as a function of discrete time or some other independent vari-\nable. Examples of such data include trajectory tracks of individuals or objects (Gaffney\nand Smyth, 2003) and biomedical measurements of response to drug therapies (James and\nSugar, 2003). In some cases, the curve data is measured on regular grids and the curves have\nthe same lengths. It is straightforward to treat such curves as elements of the corresponding\nvector spaces, and apply traditional vector based clustering methodologies such as k-means\nor mixtures of Gaussian distributions Often the curves are sampled irregularly, have vary-\ning lengths, lack proper alignment in the time domain or the task requires interpolation\n\n\for inference at the off-grid locations. Such properties make vector-space representations\nundesirable. Curve data analysis is typically referred to as \u201cfunctional data analysis\u201d in\nthe statistical literature (Ramsay and Silverman, 1997), where the observed measurements\nare treated as samples from an assumed underlying continuous-time process. Clustering\nin this context can be performed using mixtures of continuous functions such as splines\n(James and Sugar, 2003) and polynomial regression models (DeSarbo and Cron, 1988;\nGaffney and Smyth, 2003). In this paper we focus on the speci\ufb01c problem of analyzing\ngene expression time course data and extend the functional mixture modelling approach to\n(a) cluster the data using plausible biological models for the expression dynamics, and (b)\nalign the expression pro\ufb01les along the time axis.\n\nLarge scale gene expression pro\ufb01ling measures the relative abundance of tens of thousands\nof mRNA molecules in the cell simultaneously. The goal of clustering in this context is to\ndiscover groups of genes with similar dynamics and \ufb01nd sets of genes that participate in the\nsame regulatory mechanism. For the most part, clustering approaches to gene expression\ndata treat the observed curves as elements of the corresponding vector-space. A variety of\nvector-based clustering algorithms have been successfully applied, ranging from hierarchi-\ncal clustering (Eisen et al., 1998) to model based methods (Yeung et al., 2001). However,\napproaches operating in the observed \u201cgridded\u201d domain of discrete time are insensitive to\nmany of the constraints that the temporal nature of the data imposes, including\n\nContinuity of the temporal process: The continuous-time nature of gene expression\ndynamics are quite important from a scienti\ufb01c viewpoint. There has been some\nprevious work on continuous time models in this context, e.g., mixed effects mix-\ntures of splines (Bar-Joseph et al., 2002) were applied to clustering and alignment\nof the cell-cycle regulated genes in yeast and good interpolation properties were\ndemonstrated. However, such spline models are \u201cblack boxes\u201d that can approx-\nimate virtually any temporal behavior \u2014 they do not take the speci\ufb01cs of gene\nregulation mechanisms into account. In contrast, in this paper we propose speci\ufb01c\nfunctional forms that are targeted at short time courses, in which fairly simple\nreaction kinetics can describe the possible dynamics.\n\nAlignment: Individual genes within clusters of co-regulated genes can exhibit variations\nin the time of the onset of their characteristic behaviors or in their initial con-\ncentrations. Such differences can signi\ufb01cantly increase within-cluster variability\nand produce incorrect cluster assignments. We address this problem by explicitly\nmodelling the unknown real-valued time shifts between different genes.\n\nSmoothing. The high noise levels of observed gene expression data imply the need for\nrobust estimation of mean behavior. Functional models (such as those that we\npropose here) naturally impose smoothness in the learned mean curves, providing\nimplicit regularization for such data.\n\nWhile some of these problems have been previously addressed individually, no prior work\nhandles all of them in a uni\ufb01ed manner. The primary contributions of this paper are (a) a\nnew probabilistic model based on functional mixtures that can simultaneously cluster and\nalign sets of curves observed on irregular time grids, and (b) a proposal for a speci\ufb01c func-\ntional form that models changes in mRNA levels for short gene expression time courses.\n\n2 Model Description\n\n2.1 Generative Model\n\nWe describe a generative model that allows one to simulate heterogeneous sets of curves\nfrom a mixture of functional curve models. Each generated curve Yi is a series of obser-\n\n\fvations at a discrete set of values Xi of an independent variable. In many applications, and\nfor gene expression measurements in particular, the independent variable X is time.\nWe adopt the same general approach to functional curve clustering that is used in regression\nmixture models (DeSarbo and Cron, 1988), random effects regression mixtures (Gaffney\nand Smyth, 2003) and mixtures of spline models (James and Sugar, 2003). In all of these\nmodels, the component densities are conditioned on the values of the independent variable\nXi, and the conditional likelihood of a set Y of N curves is de\ufb01ned as\n\nP (Y|X, \u0398) =\n\nN(cid:1)\n\nK(cid:2)\n\ni=1\n\nk=1\n\nP (Yi|Xi, \u0398k)P (k)\n\n(1)\n\nHere P (k) is the component probability and \u0398 is a complete set of model parameters.\nThe clusters are de\ufb01ned by their mean curves parametrized by a set of parameters \u00b5k:\nfk(x) = f(x, \u00b5k), and a noise model that describes the deviation from the mean functional\nform (described below in Section 2.2.\n\nIn contrast to standard Gaussian mixtures, the functional mixture is de\ufb01ned in continuous\ntime, allowing evaluation of the mean curves on a continuum of \u201coff-grid\u201d time points.\nThis allows us to extend the functional mixture models described above by incorporating\nreal-valued alignment of observed curves along the time axis. In particular, the precise time\ngrid Xi of observation i is assumed unknown and is allowed to vary from curve to curve.\nThis is common in practice when the measurement process cannot be synchronized from\ncurve to curve. For simplicity we assume (unknown) linear shifts of the curves along the\ntime axis. We \ufb01x the basic time grid X, but generate each curve on its own grid (X + \u03c6i)\nwith a curve-speci\ufb01c time offset \u03c6i. We treat the offset corresponding to curve Yi as an\nadditional real-valued latent variable in the model. The conditional likelihood of a single\ncurve under component k is calculated by integrating out all possible offset values:\n\nP (Yi|X, \u0398k) =\n\nP (Yi|X + \u03c6i, \u0398k)P (\u03c6i|\u0398k)d\u03c6i\n\n(2)\n\n(cid:3)\n\n\u03c6i\n\nFinally, we assume that the measurements have additive Gaussian noise with zero mean\nand diagonal covariance matrix Ck, and express the conditional likelihood as\n\nP (Yi|X + \u03c6i, \u0398k) \u221d N (Yi|fk(X + \u03c6i), Ck)\n\n(3)\nThe full set of cluster parameters \u0398k includes mean curve parameters \u00b5k that de\ufb01ne fk(x),\ncovariance matrix Ck, cluster probability P (k), and time shift probability P (\u03c6|k): \u0398k =\n{\u00b5k, Ck, P (k), P (\u03c6|k)}\n\n2.2 Functional form of the mean curves\nThe generative model described above uses a generic functional form f(x, \u00b5) for the mean\ncurves. In this section, we introduce a parametric representation of f(x, \u00b5) that is speci\ufb01-\ncally tailored to short gene expression time courses.\nTo a \ufb01rst-order approximation, the raw mRNA levels {v1, . . . , vN} measured in gene ex-\npression experiments can be modeled via a system of differential equations with the follow-\ning structure (see Gibson and Mjolsness , eq. 1.19, and Mestl, Lemay, and Glass (1996)):\n\n= \u03c3g1,i(v1, . . . , vN ) \u2212 \u0001vig2,i(v1, . . . , vN )\n\ndvi\ndt\n\n(4)\n\n\fThe \ufb01rst term on the right hand side is responsible for the synthesis of vi with maximal\nrate \u03c3, and the second term represents decay with maximal fractional rate \u0001. In general, we\ndon\u2019t know the speci\ufb01c coef\ufb01cients or nonlinear saturating functions g1 and g2 that de\ufb01ne\nthe right hand-side of the equations. Instead, we make a few simplifying assumptions about\nthe equation and use it as a motivation for the parametric functional form that we propose\nbelow. Speci\ufb01cally, suppose that\n\nin time for any given group;\n\nwhose production is driven by similar mechanisms;\n\n\u2022 the set of N heterogeneous variables can be divided into K groups of variables,\n\u2022 the synthesis and decay functions g1 and g2 are approximately piecewise constant\n\u2022 there are at most two regimes involved in the production of vi, each characterized\nby their own synthesis and decay rates \u2014 this is appropriate for short time courses;\n\u2022 for each group there can be an unknown change point on the time axis where a\nrelatively rapid switching between the two different regimes takes place, due to\nexogenous changes in the variables (v1, . . . , vN ) outside the group.\n\nWithin the regions of constant synthesis and decay functions g1 and g2, we can solve equa-\ntion (4) analytically and obtain a family of simple exponential solutions parametrized by\n\u00b51 = {\u03bd, \u03c3, \u0001}:\n\n(cid:4)\n\n(cid:5)\n\nf a(x, \u00b51) =\n\n\u03bd \u2212 \u03c3\n\u0001\n\n\u2212\u03c3x + \u03c3\ne\n\u0001\n\n,\n\n(5)\n\nThis motivates us to construct the functional forms for the mean curves by concatenat-\ning two parameterized exponents, with an unknown change point and a smooth transition\nmechanism:\n\nf(x, \u00b5) = f a(x, \u00b51) (1 \u2212 \u03a6(x, \u03b4, \u03c8)) + f a(x, \u00b52)\u03a6(x, \u03b4, \u03c8)\n\n(6)\nHere f a(x, \u00b51) and f a(x, \u00b52) represent the exponents to the left and right of the switching\npoint, with different sets of initial conditions, synthesis and decay rates denoted by param-\neters \u00b51 and \u00b52. The nonlinear sigmoid transfer function \u03a6(x, \u03b4, \u03c8) allows us to model\n\u22121\nswitching between the two regimes at x = \u03c8 with slope \u03b4: \u03a6(x, \u03b4, \u03c8) = (1 + e\nThe random effects on the time grid allow us to time-shift each curve individually by replac-\ning x with (x + \u03c6i) in Equation (6). There are other biologically plausible transformation\non the curves in a cluster that we do not pursue in this paper, such as allowing \u03c8 to vary\nwith each curve, or representing minor differences in the regulatory functions g1,i and g2,i\nwhich affect the timing of their transitions.\n\n\u2212\u03b4(x\u2212\u03c8))\n\nWhen learning these models from data, we restrict the class of functions in Equation (6) to\nthose with non-negative initial conditions, synthesis and decay rates, as well as enforcing\ncontinuity of the exponents at the switching point: f a(\u03c8, \u00b51) = f a(\u03c8, \u00b52). Finally, given\nthat the log-normal noise model is well-suited to gene expression data (Yeung et al., 2001)\nwe use the logarithm of the functional forms proposed in Equation (6) as a general class of\nfunctions that describe the mean behavior within the clusters.\n\n3 Parameter Estimation\n\nWe use the well-known Expectation Maximization (EM) algorithm to simultaneously re-\ncover the full set of model parameters \u0398 = {\u03981, . . . , \u0398K}, as well as the posterior joint\n\n\fy\nt\ni\ns\nn\ne\nt\nn\n\u2212\ng\no\nL\n\ni\n\n1\n\n0\n\n\u22121\n\n2\n\n0\n\n\u22122\n\ny\nt\ni\ns\nn\ne\nt\nn\n\u2212\ng\no\nL\n\ni\n\n\u221210 0 10 20 30 40 50 60 70 80\n\nTime [minutes]\n\ny\nt\ni\ns\nn\ne\nt\nn\n\u2212\ng\no\nL\n\ni\n\n1\n\n0\n\n\u22121\n\ny\nt\ni\ns\nn\ne\nt\nn\n\u2212\ng\no\nL\n\ni\n\n\u221210 0 10 20 30 40 50 60 70 80\n2\n\nTime [minutes]\n\n0\n\n\u22122\n\n\u221210 0 10 20 30 40 50 60 70 80\n\nTime [minutes]\n\n\u221210 0 10 20 30 40 50 60 70 80\n\nTime [minutes]\n\ny\nt\ni\ns\nn\ne\nt\nn\n\u2212\ng\no\nL\n\ni\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n\u221210 0 10 20 30 40 50 60 70 80\n\nTime [minutes]\n\nl\n\ne\np\no\nS\n \ng\nn\nh\nc\nt\ni\n\ni\n\nw\nS\n \ne\ns\nr\ne\nv\nn\nI\n\n5\n\n4.5\n\n4\n\n3.5\n\n3\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n20\n\nK = 1\nK = 2\nK = 3\nK = 4\nK = 5\n\n30\n\n40\n\nSwitching Time [minutes]\n\n50\n\n60\n\nFigure 1: A view of the cluster mean curves (left) and variation in the switching-point pa-\nrameters across 10 cross-validation folds (right) using functional clustering with alignment\n(see Section 4 for full details).\n\ndistribution of cluster membership Z and time offsets \u03c6 for each observed curve. Each\ncluster is characterized by the parameters of the mean curves, noise variance, cluster prob-\nability and time shift distribution: \u0398k = {\u00b5k, Ck, P (k), P (\u03c6|k)}.\n\n\u2022 In the E-step, we \ufb01nd the posterior distribution of the cluster membership Zi and\n\u2022 In the M-step, we maximize the expected log-likelihood with respect to the poste-\n\nthe time shift \u03c6i for each curve Yi, given current cluster parameters \u0398;\n\nrior distribution of Z and \u03c6 by adjusting \u0398.\n\nSince the time shifts \u03c6 are real-valued, the E-step requires evaluation of the posterior dis-\ntribution over a continuous domain of \u03c6. Similarly, the M-step requires integration with\nrespect to \u03c6. We approximate the domain of \u03c6 with a \ufb01nite sample from its prior distribu-\ntion. The sample is kept \ufb01xed throughout the computation. The posterior probability of the\nsampled values is updated after each M-step to approximate the model distribution P (\u03c6|k).\nThe M-step optimization problem does not allow closed-form solutions due to non-\nlinearities with respect to function parameters. We use conjugate gradient descent with\na pseudo-Newton step size selection. The step size selection issue is crucial in this prob-\nlem, as the second derivatives with respect to different parameters of the model differ by\norders of magnitude. This indicates the presence of ridges and ravines on the likelihood\nsurface, which makes gradient descent highly sensitive to the step size and slow to con-\nverge. To speed up the EM algorithm, we initialize the coef\ufb01cients of the mean functional\nforms by approximating the mean vectors obtained using a standard vector-based Gaussian\nmixture model on the same data. This typically produces a useful set of initial parameter\nvalues which are then optimized by running the full EM algorithm for a functional mixture\nmodel with alignment.\n\nWe use the EM algorithm in its maximum a posteriori (MAP) formulation, using a zero-\nmean Gaussian prior distribution on the curve-speci\ufb01c time shifts. The variance of the prior\ndistribution allows us to control the amount of shifting allowed in the model. We also use\nconjugate prior distributions for the noise variance Ck to regularize the model and prohibit\ndegenerate solutions with near-zero covariance terms.\nFigure 1 shows examples of mean curves (Equation (6)), that were learned from actual gene\nexpression data. Each functional form has 7 free parameters: \u00b5 = {\u03bd, \u03c31, \u00011, \u03c32, \u00012, \u03b4, \u03c8}.\nNote that, as with many time-course gene expression data sets, having so few points\npresents an obvious problem for parameter estimation directly from a single curve. How-\never, the curve-speci\ufb01c time shifts in effect provide a \ufb01ner sampling grid that helps to\n\n\fP\ng\no\n\nl\n \nt\n\ni\n\nn\no\np\n\u2212\nr\ne\nP\n\n\u22120.17\n\n\u22120.18\n\n\u22120.19\n\n\u22120.2\n\n\u22120.21\n\n\u22120.22\n\n\u22120.23\n\n\u22120.24\n\n\u22120.25\n\n\u22120.26\n5\n\nFunctional MM\nGaussian MM\n\n6\n\n7\n\nNumber of components\n\n8\n\n9\n\nE\nS\nM\n\n0.12\n\n0.115\n\n0.11\n\n0.105\n\n0.1\n\n0.095\n5\n\nFunctional MM\nGaussian MM\n\n6\n\n7\n\nNumber of components\n\n8\n\n9\n\nFigure 2: Cross-validated conditional logP scores (left) and cross-validated interpolation\nmean-squared error (MSE) (right), as a function of the number of mixture components, for\nthe \ufb01rst cell cycle of the Cho et al. data set.\n\nrecover the parameters from observed data, in addition to the \u201cpooling\u201d effect of learning\ncommon functional forms for groups of curves. The right-hand side of Figure 1 shows a\nscatter plot of the switching parameters for 5 clusters estimated from 10 different cross-\nvalidation runs. The 5 clusters exhibit different dynamics (as indicated by the spread in\nparameter space) and the algorithm \ufb01nds qualitatively similar parameter estimates for each\ncluster across the 10 different runs.\n\n4 Experimental Evaluation\n\n4.1 Gene expression data\n\nWe illustrate our approach using the immediate responses of yeast Saccharomyces cere-\nvisiae when released from cell cycle arrest, using the raw data reported by Cho et al (1998).\nBrie\ufb02y, the CDC28 TS mutants were released from the cell cycle arrest by temperature\nshift. Cells were harvested and RNA was collected every 10 min for 170 min, spanning two\ncell cycles. The RNA was than analyzed using Affymetrix gene chip arrays. From these\ndata we select only the 416 genes which are reported to be actively regulated throughout\nthe cell cycle and are expressed for 30 continuous minutes above an Affymetrix absolute\nlevel of 100 (a total of 385 genes pass these criteria). We normalize each gene expression\nvector by its median expression value throughout the time course to reduce the in\ufb02uence of\nprobe-speci\ufb01c intensity biases.\n\n4.2 Experimental results\n\nIn order to study the immediate cellular response we analyze only the \ufb01rst 8 time points of\nthis data set. We evaluate the cross-validated out-of-sample performance of the proposed\nfunctional mixture model. A conventional Gaussian mixture model applied to observations\non the discrete time grid is used for baseline comparison. It is not at all clear a priori\nthat the functional mixture models with highly constrained parametric set of mean curves\nshould outperform Gaussian mixtures that impose no parametric assumptions and are free\nto approximate any discrete grid observation. While one can expect that mixtures of splines\n(Bar-Joseph et al., 2002) or functions with universal approximation capabilities can be\n\ufb01tted to any mean behavior, the restricted class of functions that we proposed (based on the\nsimpli\ufb01ed dynamics of the mRNA changes implied by the differential equation in Equation\n(4)) is likely to fail if the true dynamics does not match the assumptions.\n\nThere are two main reasons to use the proposed restricted class of functional forms: (1)\n\n\f0.085\n\n0.08\n\nE\nS\nM\n\n0.075\n\n0.07\n\n0.065\n\n0.06\n5\n\n0.15\n\n0.14\n\nE\nS\nM\n\n0.13\n\n0.12\n\n0.11\n5\n\nT = 7\n\nT = 6\n\nFunctional MM\nRegular MM\n\n0.12\n\n0.11\n\n0.1\n\nE\nS\nM\n\n6\n\n7\n\n8\n\n9\n\n0.09\n5\n\n6\n\n7\n\n8\n\n9\n\nT = 8\n\nT = 6:8\n\n0.115\n\n0.11\n\n0.105\n\n0.1\n\n0.095\n\nE\nS\nM\n\n9\n\n0.09\n5\n\n6\n\n7\n\n8\n\nNumber of Components\n\n6\n\n7\n\n8\n\nNumber of Components\n\n9\n\n0.19\n\n0.185\n\nE\nS\nM\n\n0.18\n\n0.175\n\n0.17\n\n0.1\n\n0.095\n\nE\nS\nM\n\n0.09\n\n0.085\n\n0.08\n\n0.075\n\n0.2\n\n0.19\n\nE\nS\nM\n\n0.18\n\n0.17\n\n0.16\n\n0.15\n\n0.075\n\n0.07\n\nE\nS\nM\n\n0.065\n\n0.06\n\n0.055\n\n5\n\n6\n\n7\n\n8\n\n9\n\nNumber of Components\n\n5\n\n6\n\n7\n\n8\n\n9\n\nNumber of Components\n\nT = 2\n\nFunctional MM\nRegular MM\n\n5\n\n6\n\n7\n\n8\n\n9\n\nT = 5\n\nT = 3\n\n5\n\n6\n\n7\n\n8\n\n9\n\nT = 6\n\n0.085\n\n0.08\n\nE\nS\nM\n\n0.075\n\n0.07\n\n0.065\n\n0.06\n\n0.11\n\n0.1\n\nE\nS\nM\n\n0.09\n\n0.08\n\n0.07\n\n0.06\n\nT = 4\n\n5\n\n6\n\n7\n\n8\n\n9\n\nT = 7\n\n5\n\n6\n\n7\n\n8\n\n9\n\nNumber of Components\n\nFigure 3: Cross-validated one-step-ahead prediction MSE (left) and cross-validated inter-\npolation MSE (right) for the \ufb01rst cell cycle of the Cho et al. data set.\n\nto be able to interpret the resulting mean curves in terms of the synthesis / decay rates at\neach of the regimes as well as the switching times; (2) to naturally incorporate alignment\nby real-values shifts along the time axis.\nIn Figures 2 and 3, we present 5-fold cross-validated out-of-sample scores, as a function\nof the number of clusters, for both the functional mixture model and the baseline Gaussian\nmixture model. The conditional logP score (Figure 2, left panel) estimates the average\nprobability assigned to a single measurement at time points 6, 7, 8 within an unseen curve,\ngiven the \ufb01rst \ufb01ve measurements of the same curve. Higher scores indicate a better \ufb01t.\nThe conditioning on the \ufb01rst few time points allows us to demonstrate the power of models\nwith random effects since estimation of alignment based on partial curves improves the\nprobability of the remainder of the curve.\n\nThe interpolation error in Figure 2 (right panel) shows the accuracy of recovering miss-\ning measurements. The observed improvement in this score is likely due to the effect of\naligning the test curves. To evaluate the interpolation error, we trained the models on the\nfull training curves, and then assumed a single measurement was missing from the test\ncurve (at time point 2 through 7). The model was then used to make a prediction at the\ntime point of the missing measurement, and the interpolation error was averaged for all\ntime points and test curves. The right panel of Figure 3 contains a detailed view of these\nresults: each subplot shows the mean error in recovering values at a particular time point.\nWhile some time points are harder to approximate than the others (in particular, T = 2, 3),\nthe functional mixture models provide better interpolation properties overall. Dif\ufb01culties\nin approximating at T = 2, 3 can be attributed to the large changes in the intensities at\nthese time points, and possibly indicate the limitations of the functional forms chosen as\ncandidate mean curves.\n\nFinally, the left panel of Figure 3 shows improvement in one-step-ahead prediction error.\nAgain, we trained the models on the full curves, and then used the models to make predic-\ntion for test curves at time T given all measurements up to T \u2212 1 (T = 6, 7, 8). Figures\n2 and 3 demonstrate a consistent improvement in the out-of-sample performance of the\nfunctional mixtures.\n\nThe improvements seen in these plots result from integrating alignment along the time\naxis into the clustering framework. We found that the functional mixture model without\nalignment does not result in better out-of-sample performance than discrete-time Gaussian\nmixtures. This may not be surprising given the constrained nature of the \ufb01tted functions.\n\nIn the experiments presented in this paper we used a Gaussian prior distribution on the time-\n\n\fshift parameter to softly constrain the shifts to lie roughly within 1.5 time grid intervals.\nThe discrete grid alignment approaches that we proposed earlier in Chudova et al (2003)\ncan successfully align curves if one assumes offsets on the scale of multiple time grid\npoints. However, they are not designed to handle \ufb01ner sub-grid alignments. Also worth\nnoting is the fact that continuous time mixtures can align curves sampled on non-uniform\ntime grids (such non-uniform sampling in time is relatively common in gene expression\ntime course data).\n\n5 Conclusions\n\nWe presented a probabilistic framework for joint clustering and alignment of gene expres-\nsion time course data using continuous time cluster models. These models allow (1) real-\nvalued off-grid alignment of unequally spaced measurements, (2) off-grid interpolation,\nand (3) regularization by enforcing smoothness implied by the functional cluster forms.\nWe have demonstrated that a mixture of simple parametric functions with nonlinear transi-\ntion between two exponential regimes can model a broad class of gene expression pro\ufb01les\nin a single cell cycle of yeast. Cross-validated performance scores show the advantages\nof continuous time models over standard Gaussian mixtures. Possible extensions include\nadding additional curve-speci\ufb01c parameters, incorporating other alignment methods, and\nintroducing periodic functional forms for multi-cycle data.\n\nReferences\n\nBar-Joseph, Z., Gerber, G., Gifford, D., Jaakkola, T., and Simon, I. (2002). A new approach to\nanalyzing gene expression time series data. In The Sixth Annual International Conference on\n(Research in) Computational (Molecular) Biology (RECOMB), pages 39\u201348, N.Y. ACM Press.\nCho, R. J., Campbell, M. J., Winzeler, E. A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg,\nT. G., Gabrielian, A. E., Landsman, D., Lockhart, D. J., and Davis, R. W. (1998). A genome-\nwide transcriptional analysis of the mitotic cell cycle. Mol Cell, 2(1):65\u201373.\n\nChudova, D., Gaffney, S., Mjolsness, E., and Smyth, P. (2003). Mixture models for translation-\ninvariant clustering of sets of multi-dimensional curves.\nIn Proceedings of the Ninth ACM\nSIGKDD International Conference on Knowledge Discovery and Data Mining, pages 79\u201388,\nWashington, DC.\n\nDeSarbo, W. S. and Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear\n\nregression. Journal of Classi\ufb01cation, 5(1):249\u2013282.\n\nEisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis and display of\n\ngenome-wide expression patterns. Proc Natl Acad Sci U S A, 95(25):14863\u20138.\n\nGaffney, S. J. and Smyth, P. (2003). Curve clustering with random effects regression mixtures. In\nBishop, C. M. and Frey, B. J., editors, Proceedings of the Ninth International Workshop on\nArti\ufb01cial Intelligence and Statistics, Key West, FL.\n\nGibson, M. and Mjolsness, E. (2001). Modeling the activity of single genes. In Bower, J. M. and\n\nBolouri, H., editors, Computational Methods in Molecular Biology. MIT Press.\n\nJames, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. Journal of\n\nthe American Statistical Association, 98:397\u2013408.\n\nMestl, T., Lemay, C., and Glass, L. (1996). Chaos in high-dimensional neural and gene networks.\n\nPhysica, 98:33.\n\nRamsay, J. and Silverman, B. W. (1997). Functional Data Analysis. Springer-Verlag, New York, NY.\nYeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., and Ruzzo, W. L. (2001). Model-based clustering\n\nand data transformations for gene expression data. Bioinformatics, 17(10):977\u2013987.\n\n\f", "award": [], "sourceid": 2445, "authors": [{"given_name": "Darya", "family_name": "Chudova", "institution": null}, {"given_name": "Christopher", "family_name": "Hart", "institution": null}, {"given_name": "Eric", "family_name": "Mjolsness", "institution": null}, {"given_name": "Padhraic", "family_name": "Smyth", "institution": null}]}