{"title": "Learning Representations for Time Series Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 3781, "page_last": 3791, "abstract": "Time series clustering is an essential unsupervised technique in cases when category information is not available. It has been widely applied to genome data, anomaly detection, and in general, in any domain where pattern detection is important. Although feature-based time series clustering methods are robust to noise and outliers, and can reduce the dimensionality of the data, they typically rely on domain knowledge to manually construct high-quality features. Sequence to sequence (seq2seq) models can learn representations from sequence data in an unsupervised manner by designing appropriate learning objectives, such as reconstruction and context prediction. When applying seq2seq to time series clustering, obtaining a representation that effectively represents the temporal dynamics of the sequence, multi-scale features, and good clustering properties remains a challenge. How to best improve the ability of the encoder is still an open question. Here we propose a novel unsupervised temporal representation learning model, named Deep Temporal Clustering Representation (DTCR), which integrates the temporal reconstruction and K-means objective into the seq2seq model. This approach leads to improved cluster structures and thus obtains cluster-specific temporal representations. Also, to enhance the ability of encoder, we propose a fake-sample generation strategy and auxiliary classification task. Experiments conducted on extensive time series datasets show that DTCR is state-of-the-art compared to existing methods. The visualization analysis not only shows the effectiveness of cluster-specific representation but also shows the learning process is robust, even if K-means makes mistakes.", "full_text": "Learning Representations for Time Series Clustering\n\nSouth China University of Technology\n\nSouth China University of Technology\n\nSouth China University of Technology\n\nUniversity of California, San Diego\n\nQianli Ma\n\nGuangzhou, China\n\nqianlima@scut.edu.cn\n\nSen Li \u2217\n\nGuangzhou, China\n\nawslee@foxmail.com\n\nJiawei Zheng\u2217\n\nGuangzhou, China\n\ncsjwzheng@foxmail.com\n\nGarrison W. Cottrell\n\nCA, USA\n\ngary@ucsd.edu\n\nAbstract\n\nTime series clustering is an essential unsupervised technique in cases when category\ninformation is not available. It has been widely applied to genome data, anomaly\ndetection, and in general, in any domain where pattern detection is important.\nAlthough feature-based time series clustering methods are robust to noise and\noutliers, and can reduce the dimensionality of the data, they typically rely on domain\nknowledge to manually construct high-quality features. Sequence to sequence\n(seq2seq) models can learn representations from sequence data in an unsupervised\nmanner by designing appropriate learning objectives, such as reconstruction and\ncontext prediction. When applying seq2seq to time series clustering, obtaining a\nrepresentation that effectively represents the temporal dynamics of the sequence,\nmulti-scale features, and good clustering properties remains a challenge. How\nto best improve the ability of the encoder is still an open question. Here we\npropose a novel unsupervised temporal representation learning model, named\nDeep Temporal Clustering Representation (DTCR), which integrates the temporal\nreconstruction and K-means objective into the seq2seq model. This approach\nleads to improved cluster structures and thus obtains cluster-speci\ufb01c temporal\nrepresentations. Also, to enhance the ability of encoder, we propose a fake-sample\ngeneration strategy and auxiliary classi\ufb01cation task. Experiments conducted on\nextensive time series datasets show that DTCR is state-of-the-art compared to\nexisting methods. The visualization analysis not only shows the effectiveness of\ncluster-speci\ufb01c representation but also shows the learning process is robust, even if\nK-means makes mistakes.\n\n1\n\nIntroduction\n\nTime series clustering is an important data mining technology widely applied to genome data [1],\nanomaly detection [2] and in general, to any domain where pattern detection is important. Time series\nclustering aids in the discovery of interesting patterns that empower data analysts to extract valuable\ninformation from complex and massive datasets [3].\nFeature-based methods typically consist of extracted features and clusters. Such an approach is robust\nto noise and can \ufb01lter out some irrelevant information [4], which can reduce the data dimension\nand thus improve the ef\ufb01ciency of clustering algorithms [3, 4]. However, most existing methods are\ndomain-dependent, requiring domain knowledge to construct high-quality features manually [5]. In a\n\n\u2217Two authors have equal contribution.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fnumber of studies [6, 7, 8, 9], discriminative features were selected with the help of pseudo cluster\nlabels learned via local learning. However, the selected features are typically linear, while non-linear\ndynamics are more common in time series [10, 11, 12, 13].\nIn recent years, deep learning models have been applied to a wide variety of tasks and achieved great\nsuccess. Among them, the seq2seq model can learn general representations from sequence data in\nan unsupervised manner by designing learning objectives that exploit labels that are freely available\nwith the data [14]. For example, Kiros et al. [15] used it to learn the sentence representations by\npredicting the context sentences of a given sentence. Gan et al. [16] learned sentence representations\nby predicting multiple future sentences based on the seq2seq model. As shown by their experiments, if\nthe general representations are \ufb01ne-tuned using the downstream classi\ufb01cation task, it can signi\ufb01cantly\nimprove the performance. This veri\ufb01es the bene\ufb01ts of a task-related representation.\nMotivated by this research, we aim to learn a non-linear temporal representation for time series\nclustering using the seq2seq model. When applying it to time series clustering, due to the lack of\nlabels, effectively guiding the learning process to generate cluster-speci\ufb01c representations as well as\ncapturing the dynamics and multi-scale characteristics of time series is a challenge. Moreover, the\nseq2seq model relies on the capabilities of the encoder. Improving the ability of the encoder for time\nseries clustering remains an open question.\nIn this paper, we propose a novel unsupervised temporal representation learning model, Deep Tempo-\nral Clustering Representation (DTCR), which can generate cluster-speci\ufb01c temporal representations.\nDTCR integrates temporal reconstruction and the K-means objective into a seq2seq model. Speci\ufb01-\ncally, DTCR adapts bidirectional Dilated recurrent neural networks [17] as the encoder, enabling the\nlearned representation to capture the temporal dynamics and multi-scale characteristics of time series.\nThe learned representation forms a cluster structure with the guidance of the K-means objective. To\nfurther enhance the ability of the encoder, inspired by [18], we propose a fake-sample generation\nstrategy for time series and introduce an auxiliary classi\ufb01cation task for the encoder. Our contributions\ncan be summarized as follows:\n\n1. We propose a novel unsupervised temporal representation learning model for time series\nclustering, which integrates the temporal reconstruction and K-means objective to generate\ncluster-speci\ufb01c temporal representations.\n\n2. We propose a fake-sample generation strategy for time series and introduce an auxiliary\n\nclassi\ufb01cation task for the encoder to enhance its ability.\n\n3. Our experimental results on a large number of benchmark time series datasets show that the\nproposed model achieves state-of-the-art performance. Visualization analysis illustrates the\neffectiveness of cluster-speci\ufb01c temporal representations and demonstrates the robustness of\nthe learning process, even if K-means makes mistakes.\n\n2 Related Work\n\nTime series clustering algorithms can be broadly classi\ufb01ed into two approaches: raw-data-based\nmethods and feature-based methods [19].\n\n2.1 Raw-data-based methods\n\nRaw-data-based methods mainly modify the distance function to adapt to the time series characteristics\n(e.g., scaling and distortion). For example, Petitjean et al. [20] proposed a k-DBA algorithm for better\nalignment, which combines K-means and dynamic time warping [21]. Yang et al. [22] developed the\nK-Spectral Centroid (K-SC) method to uncover the temporal dynamics by using a similarity metric\nthat is invariant to scaling and shifting. Paparrizos et al. [5] presented a method called k-Shape that\nfurther considers the shapes of the time series, using a normalized version of the cross-correlation\nmeasure. However, the above methods are usually sensitive to outliers and noise, since all time points\nare taken into account [4].\n\n2.2 Feature-based methods\n\nFeature-based methods use clustering algorithms on the extracted feature representations of input time\nseries, which mitigates the impact of noise or outliers while also reducing the dimensionality of the\n\n2\n\n\fdata. Since our method is related to this category, we here subdivide the feature-based methods into:\n(i) two-stage approaches that cluster after extracting features; (ii) approaches that jointly optimize the\nfeature learning and clustering. The former \ufb01rst extracts features and then performs clustering. Guo\net al. [23] used independent component analysis to convert the data into low dimensional features.\nZakaria et al. [24] proposed u-shapelet to learn local patterns. The features extracted by these\nmethods may not be suitable for clustering due to using the feature extraction as a pre-processing\nstep. The latter category of algorithms jointly optimizes feature learning and clustering. In [6, 7, 8, 9],\nthey iteratively adopted local learning to obtain pseudo-labels and then employed them to select\ndiscriminative features. The features extracted by these methods are linear, while real time series tend\nto be non-linear [10, 11, 12, 13]. Sai et al. [25] proposed deep temporal clustering (DTC), using an\nauto-encoder and a clustering layer [26] to learn a non-linear cluster representation. The clustering\nlayer is designed by measuring the KL divergence between the predicted and target distribution.\nDuring training, the target distribution is calculated by the predicted distribution and updated at each\niteration, which leads to instability [27]. Moreover, the performance of DTC strongly depends on the\nability of the encoder since the predicted distribution is calculated on the learned representations.\n\n3 Proposed Method\n\nFigure 1: The general architecture of the Deep Temporal Clustering Representation (DTCR).\n\nIn this paper, we propose a novel model called Deep Temporal Clustering Representation(DTCR) to\ngenerate cluster-speci\ufb01c representations. The general structure of DTCR is illustrated in Figure 1.\nThe encoder maps original time series into a latent space of representations. Then the representations\nare used to reconstruct the input data with the decoder. At the same time, a K-means objective is\nintegrated into the model to guide the representation learning. Furthermore, we propose a fake-sample\ngeneration strategy and auxiliary classi\ufb01cation task to enhance the ability of encoder.\n\n3.1 Deep Temporal Representation Clustering\nGiven a set of n time series D = {x1, x2, ..., xn}, each time series xi contains T ordered real\nvalues denoted as xi = (xi,1, xi,2, ...xi,T ). De\ufb01ne non-linear mappings fenc : xi \u2192 hi and\nfdec : hi \u2192 \u02c6xi. fenc, fdec, denotes the encoding and decoding process, respectively. hi \u2208 Rm is\nthe m-dimensional latent representation of time series xi, de\ufb01ned by:\n\n(1)\nWe aim to train a good fenc, making the learned representations facilitate the clustering task. We\ninstantiate the non-linear mapping as a bidirectional RNN. Furthermore, considering that time series\nare commonly multi-scale, the encoder RNN is instantiated by a multi-layer Dilated RNN [17]. The\nlatent representation is obtained by concatenating the last hidden state output of each layer of the\nDilated RNN. After decoding, we can obtain the output \u02c6xi, where \u02c6xi \u2208 RT is given by:\n\nhi = fenc(xi)\n\nWe use Mean Square Error (MSE) as the reconstruction loss, which is de\ufb01ned by:\n\n(2)\n\n(3)\n\n\u02c6xi = fdec(hi)\n\nn(cid:88)\n\ni=1\n\n(cid:107) xi \u2212 \u02c6xi (cid:107)2\n\n2\n\nLreconstruction =\n\n1\nn\n\n3\n\nHiddenK-meansLossEncoderDecoderReconstruction LossFakeRealClassification LossFakeRealInput\fNote that although the learned representations by reconstruction loss capture the informative features\nof the original time series, they are not necessarily suitable for the clustering task. To enable the\nlearned representations to form cluster structures and thus obtain cluster-speci\ufb01c representations, we\nfurther guide the network learning through k-means.\nGiven a static data matrix H \u2208 Rm\u00d7N , Zha et al. [28] showed that the minimization of K-means\ncould be reformulated as a trace maximization problem associated with the Gram matrix H T H,\nwhich possesses optimal global solutions without local minima. Spectral relaxation converts the\nK-means objective into the following problem:\n\nLK\u2212means = T r(H T H) \u2212 T r(F T H T HF )\n\n(4)\nwhere T r denotes the matrix trace. F \u2208 RN\u00d7k is the cluster indicator matrix. Considering H is\ngiven, the minimization of Eq. (4) can be further relaxed to a trace maximization problem by setting\nF to be an arbitrary orthogonal matrix:\n\nT r(F T H T HF ), s.t. F T F = I\n\nmax\n\nF\n\n(5)\n\nThe closed-form solution of F is obtained by composing the \ufb01rst k singular vectors of H according\nto the Ky Fan theorem.\nHowever, in our case, H is learned by the network instead of static. This motivates regarding Eq. (4)\nas a regularization term for learning H, which guides the learning representation process, forming\nthe cluster structures. Thus, our target is to minimize the objective below (\u03bb is a scalar):\n\nmin\nH,F\n\nJ(H) +\n\n\u03bb\n2\n\n[T r(H T H) \u2212 T r(F T H T HF )], s.t. F T F = I\n\n(6)\n\nwhere J(H) is the sum of the reconstruction loss and the classi\ufb01cation loss (see Section 3.2). The\nwhole training process of DTRC consists of iteratively updating F and H. Fixing F , updating H\ncan follow the standard stochastic gradient descent (SGD), with the gradient given as: \u2207J(H) +\n\u03bbH(I \u2212 F F T ). Fixing H, we update F using the closed-form solution to Eq. (5), by computing\nthe k-truncated singular value decomposition (SVD) of H. In this way, the K-means objective guides\nthe representations to form the cluster structures. Note that to avoid instability, F should not be\nupdated at each iteration. In practice, we update F once after every 10 iterations. We provide an\nanalysis of that in the section F of the Supplementary Material.\n\n3.2 Encoder Classi\ufb01cation Task\n\nSince the seq2seq model relies on the capabilities of the encoder, the better the encoder is trained,\nthe better the learned representations will be. For time series, we propose a fake-sample generation\nstrategy and auxiliary classi\ufb01cation task to enhance the ability of the encoder.\nGiven a time series xi \u2208 RT , we generate its fake version by randomly shuf\ufb02ing some time steps.\nThe number of selected time steps is (cid:98)\u03b1 \u00d7 T(cid:99), where \u03b1 \u2208 (0, 1] is a hyper-parameter we set to 0.2.\nFor each raw time series, we will generate the corresponding fake sample. The auxiliary classi\ufb01cation\ntask is to train the encoder to detect whether a given time series is real or fake. Formally, the encoder\nis trained by minimizing the following loss function:\n\n\u02c6yi = Wf c2(Wf c1hi)\n\nLclassif ication = \u2212 1\n2N\n\ni=1\n\nj=1\n\n2N(cid:88)\n\n2(cid:88)\n\n1{yi,j = 1} log\n\n(cid:80)2\n\nexp \u02c6yi,j\nj=1 exp(\u02c6yi,j)\n\n(7)\n\n(8)\n\nwhere yi is a 2-dim one-hot vector indicating real or fake, and \u02c6yi is the classi\ufb01cation result. For\nsimplicity, we ignore the bias term. Wf c1 \u2208 Rm\u00d7d, Wf c2 \u2208 Rd\u00d72 are parameters of the fully\nconnected layers and d is set to 128.\nThe ability of the encoder is enhanced to distinguish between real and fake samples, enabling the\nlearned representation to better represent real time series.\n\n4\n\n\fAlgorithm 1 DTCR Training Method\nInput: Data set: D; Number of clusters: K; Alternate update: T ; Maximum iterations: M axIter\nOutput: Cluster result s\n1: For each time series in D, generate the corresponding fake samples.\n2: for iter = 1 to M axIter do\n3:\n4:\n5:\nend if\n6:\n7: end for\n8: Apply K-means to the learned representation and get the cluster result s.\n\nUpdate latent representation {hi = fenc(xi)}n\nif iter % T = 0 then\n\nUpdate F using the closed-form solution of Eq. (5).\n\ni=1 using SGD based on Eq. (9).\n\n3.3 Overall Loss Function\nFinally, the overall training loss LDT CR of DTCR is de\ufb01ned by:\n\nLDT CR = Lreconstruction + Lclassif ication + \u03bbLK\u2212means\n\n(9)\nwhere \u03bb is the regularization coef\ufb01cient. Eq. (9) is minimized to learn the cluster-speci\ufb01c representa-\ntions. Speci\ufb01cally, Lreconstruction makes the representations reconstruct the input. Lclassif ication\nenhances the ability of the encoder. LK\u2212means encourages the representations to form cluster struc-\ntures. After training, we apply K-means to the learned representations. The detailed training method\nof DTCR is presented in Algorithm 1.\n\n4 Experiments\n\nFollowing the protocol used in [20, 24, 5, 25, 29], we conduct experiments on the 36 UCR [30] time\nseries datasets to evaluate performance. The statistics of these 36 datasets are shown in Table 1 of the\nSupplementary Material. Each data set has a default train/test split. We adopted the protocol used in\nUSSL [29], training on the training set and evaluating on the test set for comparison. As mentioned\nabove, we employ the bidirectional multi-layer Dilated RNN [17] as the encoder, capturing the\ndynamics and multi-scale characteristics of the time series. In our experiments, we \ufb01xed the number\nof layers and the number of dilation per layer to 3 and 1, 4, and 16, respectively. DTCR performs\nwell under this setting. With further tuning, the performance could be improved. The decoder is a\nsingle-layer RNN. Gated Recurrent Units (GRU) are used in the RNNs [31]. The number of units\nper layer of the encoder is [m1, m2, m3] \u2282 {[100, 50, 50], [50, 30, 30]}. The number of hidden units\nin the decoder is (m1 + m2 + m3) \u00d7 2. The decoder takes the \ufb01nal hidden state of the encoder as\nits initial state and performs iterative prediction, i.e., the output at time t \u2212 1 is fed as the input at\ntime t. The \u03bb of Eq. (9) \u2208 {1, 1e \u2212 1, 1e \u2212 2, 1e \u2212 3}. The batch size is 2N. To reduce the impact of\nrandom initialization, we ran each experiment 5 times and report means and standard deviations.\nThe experiments are run on the TensorFlow [32] platform using an Intel Core i7 \u2212 6850K, 3.60-GHz\nCPU, 64-GB RAM and a GeForce GTX 1080-Ti 11G GPU. The Adam [33] optimizer is employed\nwith an initial learning rate of 5e \u2212 3.\n\n4.1 Comparison with State-of-the-art Methods\n\nFollowing USSL, the Rand Index [34] and Normalized Mutual Information [35] are used for evaluat-\ning clustering performance. The RI is de\ufb01ned as:\n\nRI =\n\nT P + T N\nn(n \u2212 1)/2\n\n(10)\n\nwhere T P (True Positive) is the number of pairs of time series that are correctly put in the same\ncluster, T N (True Negative) is the number of pairs that are correctly put in different clusters and n is\nthe size of the data set.\nThe NMI is de\ufb01ned as:\n\n(cid:80)M\n(cid:80)M\n((cid:80)M\nj=1 Nij log( N\u00b7Nij\n|Gi||Aj| )\nj=1 |Aj| log Aj\ni=1 |Gi| log\nN )\n\nN )((cid:80)M\n\n(cid:113)\n\ni=1\n\n|Gi|\n\nN M I =\n\n(11)\n\n5\n\n\f(cid:84) Aj| denotes the number of time series belonging to the intersection of sets\n\nwhere N represents the total number of time series. |Gi|, |Aj| are the number of time series in cluster\nGi and Aj. Nij = |Gi\nGi and Aj. In these two metrics, values close to 1 indicates high quality clustering [29].\nWe compare DTCR with 11 recently representative time series clustering methods. We also compare\nDTCR with 2 state-of-the-art non-time-series deep clustering methods (DEC [26], IDEC [27]). The\ndetails of these methods are described in section B of the Supplementary Material. All the results in\nTable 1 are collected from [29] (2018 TPAMI) except for the new time series method DTC2 [25] and\ntwo non time series methods (DEC3, IDEC4). The results of these 3 methods are obtained by running\ntheir published code.\nAs shown in Table 1, DTCR achieves the best performance in terms of the lowest average rank of\n3.0694, the highest average RI of 0.7714 and the number of best results 17. To further analyze the\nperformance, we perform a pairwise comparison for each method against DTCR. Speci\ufb01cally, we\nconduct the Wilcoxon signed rank test [36] to measure the signi\ufb01cance of the difference. As shown\nin Table 1, DTCR is signi\ufb01cantly better than all of the other methods at p < 0.05 level, except\nUSSL [29]. Although DTCR is numerically superior in average rank and RI, it is not signi\ufb01cantly\nbetter than USSL. Note that USSL depends on pseudo-labels to guide the learning, while there is\nno mechanism to reduce the negative impact when mistakes occur in the pseudo-labels. In contrast,\nDTCR is capable of correcting mistakes with the help of temporal reconstruction (for analysis see\nSection 4.3.3). Due to space limitations, the results using the NMI metric are reported in Table 2 of\nthe Supplementary Material. Note that DTCR also achieves the lowest average rank of 2.2500. We\nalso show the performance on the ACC metric in Tables 3 and 4 of the Supplementary Material.\nIn addition, following the YADING paper [19], a larger and more complex dataset (StarLightCurves:\n9236 samples, each sample\u2019s length is 1024) is used for evaluation. We adopted the same metric\n(NMI) for direct comparison. As shown in Table 2, DTCR again achieves the best performance.\n\nTable 1: Rand Index (RI) comparisons on 36 time series datasets (the values in parentheses present\nstandard deviations)\n\nk-shape [5]\n\nu-shapelet [24] DTC [25] USSL [29]\n\nIDEC [27]\n\nDTCR\n\n2.089E-6\n\n4.8823E-6\n\n3.4131E-5\n\n5.7729E-5\n\n4.1222E-5\n\n1.3545E-4\n\n1.2565E-5\n\n1.4814E-4\n\n3.4141E-5\n\n3.0287E-7\n\n9.7386E-1\n\n8.7697E-07\n\n3.2916E-7\n\n4.2 Ablation Study\nTo verify the effectiveness of the LK\u2212means and Lclassif ication, here we show a comparison between\nthe full DTCR model and its two ablation models: 1) DTCR without K-means loss; and 2) DTCR\n\n2https://github.com/saeeeeru/dtc-tensor\ufb02ow\n3https://github.com/piiswrong/dec\n4https://github.com/XifengGuo/IDEC\n\n6\n\nDataset\nArrow\nBeef\n\nBeetleFly\nBirdChicken\n\nCar\n\ncoffee\n\nchlorineConcentration\n\ndiatomsizeReduction\ndist.phal.outl.agegroup\ndist.phal.outl.correct\n\nECG200\n\nECGFiveDays\n\nGunPoint\n\nHam\nHerring\nLighting2\n\nMeat\n\nMid.phal.outl.agegroup\nMid.phal.outl.correct\n\nMid.phal.TW\nMoteStrain\nOSULeaf\n\nPlane\n\nProx.phal.outl.ageGroup\n\nProx.phal.TW\n\nSonyAIBORobotSurface\nSonyAIBORobotSurfaceII\n\nSwedishLeaf\n\nSymbols\n\nToeSegmentation1\nToeSegmentation2\n\nTwoPatterns\nTwoLeadECG\n\nWordsSynonyms\n\nwafer\nWine\n\nAVG Rank\nAVG RI\n\nBest\np-value\n\nK-means [37] UDFS [6] NDFS [7]\n0.7381\n0.7034\n0.5579\n0.7316\n0.6260\n0.5225\n1.0000\n0.9583\n0.6239\n0.5362\n0.6315\n0.5573\n0.5102\n0.5362\n0.5164\n0.5373\n0.6635\n0.5350\n0.5047\n0.1919\n0.6053\n0.5622\n0.8954\n0.5463\n0.6053\n0.7721\n0.8865\n0.5500\n0.8562\n0.5873\n0.5968\n0.8530\n0.6328\n0.5263\n0.5123\n0.8760\n7.2222\n0.6402\n\n0.6905\n0.6713\n0.4789\n0.4947\n0.6345\n0.5241\n0.7460\n0.9583\n0.6171\n0.5252\n0.6315\n0.4783\n0.4971\n0.5025\n0.4965\n0.4966\n0.6595\n0.5351\n0.5000\n0.0983\n0.4947\n0.5615\n0.9081\n0.5288\n0.4789\n0.7721\n0.8697\n0.4987\n0.8810\n0.4873\n0.5257\n0.8529\n0.5476\n0.4925\n0.4984\n0.8775\n10.6667\n0.5975\n\n0.7254\n0.6759\n0.4949\n0.4947\n0.6757\n0.5282\n0.8624\n0.9583\n0.6531\n0.5362\n0.6533\n0.5020\n0.5029\n0.5219\n0.5099\n0.5119\n0.6483\n0.5269\n0.5431\n0.1225\n0.5579\n0.5372\n0.8949\n0.4997\n0.4947\n0.7695\n0.8745\n0.4923\n0.8548\n0.4921\n0.5257\n0.8259\n0.5495\n0.4925\n0.4987\n0.8697\n9.6806\n0.6077\n\n0\n\n0\n\n1\n\nRUFS [8]\n0.7476\n0.7149\n0.6053\n0.5579\n0.6667\n0.5330\n0.5476\n0.9333\n0.6252\n0.5252\n0.7018\n0.5020\n0.6498\n0.5107\n0.5238\n0.5729\n0.6578\n0.5315\n0.5114\n0.7920\n0.5579\n0.5497\n0.9220\n0.5780\n0.5579\n0.7787\n0.8756\n0.5192\n0.8525\n0.5429\n0.5968\n0.8385\n0.8246\n0.5263\n0.5021\n0.8861\n7.3889\n0.6478\n\n1\n\nRSFS [9]\n0.7108\n0.6975\n0.6516\n0.6632\n0.6708\n0.5316\n1.0000\n0.9137\n0.6539\n0.5327\n0.6916\n0.5953\n0.4994\n0.5127\n0.5151\n0.5269\n0.6657\n0.5473\n0.5149\n0.8062\n0.6168\n0.5665\n0.9314\n0.5384\n0.5211\n0.7928\n0.8948\n0.5038\n0.9060\n0.4968\n0.5826\n0.8588\n0.5635\n0.4925\n0.5033\n0.8817\n6.8750\n0.6542\n\n1\n\nKSC [22] KDBA [20]\n0.7254\n0.7057\n0.6053\n0.7316\n0.6898\n0.5256\n1.0000\n1.0000\n0.6535\n0.5235\n0.6315\n0.5257\n0.4971\n0.5362\n0.4940\n0.6263\n0.6723\n0.5364\n0.5014\n0.8187\n0.6632\n0.5714\n0.9603\n0.5305\n0.6053\n0.7726\n0.9039\n0.4923\n0.8982\n0.5000\n0.5257\n0.8585\n0.5464\n0.4925\n0.5006\n0.8727\n7.1389\n0.6582\n\n0.7222\n0.6713\n0.6052\n0.6053\n0.6254\n0.5300\n0.4851\n0.9583\n0.6750\n0.5203\n0.6018\n0.5573\n0.5420\n0.5141\n0.5164\n0.5119\n0.6816\n0.5513\n0.5563\n0.8046\n0.4789\n0.5541\n0.9225\n0.5192\n0.5211\n0.7988\n0.8684\n0.5500\n0.9774\n0.6143\n0.5573\n0.8446\n0.5476\n0.4925\n0.5064\n0.8159\n7.9167\n0.6335\n\n3\n\n2\n\n0.7254\n0.5402\n0.6053\n0.6632\n0.7028\n0.4111\n1.0000\n1.0000\n0.6020\n0.5252\n0.7018\n0.5020\n0.6278\n0.5311\n0.4965\n0.6548\n0.6575\n0.5105\n0.5114\n0.6213\n0.6053\n0.5538\n0.9901\n0.5617\n0.5211\n0.8084\n0.5617\n0.5333\n0.8373\n0.6143\n0.5257\n0.8046\n0.8246\n0.4925\n0.5001\n0.7844\n8.2361\n0.6419\n\n0\n\n0.6460\n0.6966\n0.7314\n0.5579\n0.6418\n0.5318\n1.0000\n0.7083\n0.6273\n0.5098\n0.5758\n0.5968\n0.6278\n0.5362\n0.5417\n0.5192\n0.6742\n0.5396\n0.5218\n0.7920\n0.4789\n0.5525\n1.0000\n0.5206\n0.4789\n0.7639\n0.8770\n0.6154\n0.9603\n0.5873\n0.5020\n0.7757\n0.5404\n0.4925\n0.5033\n0.8230\n8.2500\n0.6402\n\n1\n\n0.6692\n0.6345\n0.5211\n0.4947\n0.6695\n0.5353\n0.4841\n0.8792\n0.7812\n0.5010\n0.6018\n0.5016\n0.5400\n0.5648\n0.5045\n0.5770\n0.3220\n0.5757\n0.5272\n0.7115\n0.5062\n0.7329\n0.9040\n0.7430\n0.8380\n0.5563\n0.7012\n0.8871\n0.9053\n0.5077\n0.5348\n0.6251\n0.5116\n0.5324\n0.4906\n0.8855\n8.8194\n0.6238\n\n0\n\n0.7159\n0.6966\n0.8105\n0.8105\n0.7345\n0.4997\n1.0000\n1.0000\n0.6650\n0.5962\n0.7285\n0.8340\n0.7257\n0.6393\n0.6190\n0.6955\n0.7740\n0.5807\n0.6635\n0.7920\n0.8105\n0.6551\n1.0000\n0.7939\n0.7282\n0.8105\n0.8575\n0.8547\n0.9200\n0.6718\n0.6778\n0.8318\n0.8628\n0.8246\n0.8985\n0.8540\n3.5000\n0.7676\n\n12\n\nDEC [26]\n0.5817\n0.5954\n0.4947\n0.4737\n0.6859\n0.5348\n0.4921\n0.9294\n0.7785\n0.5029\n0.6422\n0.5103\n0.4981\n0.5963\n0.5099\n0.5311\n0.6475\n0.7059\n0.5423\n0.8590\n0.7435\n0.7484\n0.9447\n0.4263\n0.8189\n0.5732\n0.6514\n0.8837\n0.8841\n0.4984\n0.4991\n0.6293\n0.5007\n0.5679\n0.4913\n0.8893\n8.6528\n0.6351\n\n0\n\n0.6210\n0.6276\n0.6053\n0.4789\n0.6870\n0.5350\n0.5767\n0.7347\n0.7786\n0.5330\n0.6233\n0.5114\n0.4974\n0.4956\n0.5099\n0.5519\n0.6220\n0.6800\n0.5423\n0.8626\n0.7324\n0.7607\n0.9447\n0.8091\n0.9030\n0.6900\n0.6572\n0.8893\n0.8857\n0.5017\n0.4991\n0.6338\n0.5016\n0.5597\n0.5157\n0.8947\n7.5833\n0.6515\n\n1\n\n0.6868(0.0026)\n0.8046(0.0018)\n0.9000(0.0001)\n0.8105(0.0033)\n0.7501(0.0022)\n0.5357(0.0011)\n0.9286(0.0016)\n0.9682(0.0032)\n0.7825(0.0008)\n0.6075(0.0024)\n0.6648(0.0034)\n0.9638(0.0032)\n0.6398(0.0011)\n0.5362(0.0035)\n0.5759(0.0017)\n0.5913(0.0016)\n0.9763(0.0016)\n0.7982(0.0028)\n0.5617(0.0006)\n0.8638(0.0007)\n0.7686(0.0036)\n0.7739(0.0014)\n0.9549(0.0037)\n0.8091(0.0038)\n0.9023(0.0023)\n0.8769(0.0033)\n0.8354(0.0016)\n0.9223(0.0021)\n0.9168(0.0022)\n0.5659(0.0006)\n0.8286(0.0028)\n0.6984(0.0025)\n0.7114(0.0014)\n0.7338(0.0006)\n0.6271(0.0039)\n0.8984(0.0003)\n\n3.0694\n0.7714\n17\n-\n\n\fTable 2: Normalized Mutual Information (NMI) comparisons on StarLightCurves\n\nDataset\n\nStarLightCurves\n\nYADING\n0.6000\n\nDEC\n0.6058\n\nIDEC\n0.6056\n\nDTC\n0.6072\n\nDTCR\n0.6731\n\nTable 3: Rand Index (RI) ablation study results of DTCR\n\nCar\n\ncoffee\n\nDataset\nArrow\nBeef\n\nchlorineConcentration\n\nBeetleFly\nBirdChicken\n\nNo.\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18 Mid.phal.outl.agegroup\n\ndiatomsizeReduction\ndist.phal.outl.agegroup\ndist.phal.outl.correct\n\nGunPoint\n\nHam\nHerring\nLighting2\n\nECG200\n\nECGFiveDays\n\nMeat\n\nw/o K-means w/o classi\ufb01cation DTCR No.\n19\n20\n21\n22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n36\n\n0.6868\n0.8046\n0.9000\n0.8105\n0.7501\n0.5357\n0.9286\n0.9682\n0.7825\n0.6075\n0.6648\n0.9638\n0.6398\n0.5362\n0.5759\n0.5913\n0.9763\n0.7982\n\n0.5698\n0.6497\n0.6053\n0.4821\n0.6688\n0.5004\n0.5434\n0.7851\n0.7780\n0.5051\n0.5412\n0.5623\n0.4969\n0.5040\n0.4967\n0.5554\n0.7181\n0.7923\n\n0.5980\n0.7352\n0.6305\n0.5600\n0.6610\n0.5341\n0.6672\n0.8892\n0.7775\n0.5056\n0.6064\n0.6970\n0.5589\n0.5330\n0.5173\n0.5626\n0.8245\n0.7981\n\nDataset\n\nMid.phal.outl.correct\n\nMid.phal.TW\nMoteStrain\nOSULeaf\n\nPlane\n\nProx.phal.outl.ageGroup\n\nProx.phal.TW\n\nSonyAIBORobotSurface\nSonyAIBORobotSurfaceII\n\nSwedishLeaf\n\nSymbols\n\nToeSegmentation1\nToeSegmentation2\n\nTwoPatterns\nTwoLeadECG\n\nwafer\nWine\n\nWordsSynonyms\n\nw/o K-means w/o classi\ufb01cation DTCR\n0.5617\n0.8638\n0.7686\n0.7739\n0.9549\n0.8091\n0.9023\n0.8769\n0.8354\n0.9223\n0.9168\n0.5659\n0.8286\n0.6984\n0.7114\n0.7338\n0.6271\n0.8984\n\n0.5033\n0.8620\n0.7239\n0.7314\n0.9409\n0.7922\n0.8359\n0.7702\n0.6332\n0.9047\n0.9043\n0.4993\n0.6012\n0.6650\n0.5262\n0.5322\n0.5159\n0.8891\n\n0.5137\n0.8625\n0.7121\n0.7416\n0.9530\n0.8004\n0.8549\n0.7561\n0.7069\n0.9107\n0.8989\n0.5598\n0.6878\n0.6537\n0.5316\n0.5900\n0.5642\n0.8920\n\nwithout the auxiliary classi\ufb01cation loss (w/o classi\ufb01cation loss). Table 3 shows that the full DTCR\nis always superior to all of its ablations, demonstrating the effectiveness of the LK\u2212means and\nLclassif ication.\n\n4.3 Visualization Analysis\n\nThrough visualization, we analyze the bene\ufb01ts of the cluster-speci\ufb01c representations and illustrate\nthe robustness of our model even if K-means makes mistakes. In all of the following experiments, we\nuse t-SNE [38] to map the learned representations into 2D and plot it.\n\n4.3.1 Contribution of Each Loss\n\nTo explore the effectiveness of cluster-speci\ufb01c representations, we visualize the representations\nlearned by DTCR and two of its ablations on datasets ECGFiveDays and SonyAIBORobotSurface.\nAs shown in Figure 2, it is obvious that the representations learned by DTCR have formed 2 clusters\ndespite a small amount of mixing. In contrast, the results of DTCR without K-means loss presents no\ncluster shape, and contain a small amount of mixing as well. As for DTCR without classi\ufb01cation loss,\nthe representations are also mixed, which veri\ufb01es the importance of the ability of the encoder.\n\n4.3.2 The Process of Learning Representations\n\nTo better understand how DTCR learns the cluster-speci\ufb01c representations, we visualize its learning\nprocess. As shown in Figure 3, the representations at the beginning are scattered and chaotic. At\nEpoch 30, the prototype of 2 clusters has been formed. At Epoch 50, a well learned cluster-speci\ufb01c\nrepresentation is established in terms of the small distance of intra-class and the large inter-class\ndistance. This experiment has been conducted on all other datasets and the same experimental results\nwere obtained. We report these in Table 3 of the Supplementary Material.\n\n4.3.3 Robustness Analysis\n\nSince our model uses the information provided by K-means, what if K-means makes mistakes? Here\nwe argue that our model is capable of correcting mistakes with the help of Lreconstruction and verify\nthis point with some experiments.\nNote that DTCR has 3 loss terms: Lreconstruction, LK\u2212means and Lclassif ication. We can disrupt\nLK\u2212means while retaining either Lreconstruction or Lclassif ication to \ufb01gure out which one takes a\nmore important role in preventing being misled by K-means. Here are the detailed settings: First,\nDTCR is trained with all loss terms for 50 epochs, and we plot the learned representations as the initial\n\n7\n\n\f(a) ECGFiveDays\n\n(b) SonyAIBORobotSurface\n\nFigure 2: The visualizations with t-SNE on the datasets (a) ECGFiveDays and (b) SonyAIBORobot-\nSurface. The colors of the points indicate the actual labels.\n\n(a) Epoch 0\n\n(b) Epoch 30\n\n(c) Epoch 50\n\nFigure 3: The learned representations on data set ECGFiveDays during the training process. From\nleft to the right, the sub\ufb01gure is obtained at Epoch 0, 30 and 50, respectively.\n\nstate. Then, we randomly shuf\ufb02e the clustered index matrix F (disrupting the term of LK\u2212means)\nwhile retaining only one term of Lreconstruction or Lclassif ication, training for 50 epochs. We plot\nthe learned representation as an intermediate state. Finally, we put the missing loss term back and\ntrain DTCR with all loss terms for another 50 epochs, obtaining the \ufb01nal state.\nAs shown in the \ufb01rst row of Figure. 4, when we train the DTCR with only the loss term of shuf\ufb02ed K-\nmeans and the classi\ufb01cation, the wrong clustering information does mislead the learning, decreasing\nRI and thus the representations are mixed (Fig. 4 (b)). However, once adding Lreconstruction back,\nthe RI is improved, indicating the learning of the model was corrected (Fig. 4 (c)). Similarly, we do\nthat again to check what happens without Lclassif ication. As the second row of Figure. 4 shows, even\nwithout Lclassif ication but with the help of Lreconstruction, the RI is still improved and less confused\n(Fig. 4 (e)). Finally, putting the Lclassif ication back improves the RI. Comparing Fig. 4 (b) with (e),\nit is clear that Lreconstruction enables our model to correct mistakes. Comparing Fig. 4 (c) with (f)\nshows that the earlier and longer the Lreconstruction is used, the stronger the ability to prevent being\nmisled by K-means and thus obtaining the higher RI. Note that Lreconstruction is only put back in\nFig. 4 (c) (trained for 50 epochs) while it is used in Fig. 4 (e) and (f) (trained for 100 epochs). The\nrobustness analysis has been conducted on all other datasets and shows the same results (See the\ndetails in Section E of the Supplementary Material).\n\n8\n\n\u000e\r\n\u0012\r\u0012\u000e\r\n\u000e\r\n\u0012\r\u0012\u000e\r.\u0004,88\u0003\u000e.\u0004,88\u0003\u000fw/o K-means loss\n\u000e\r\u000b\r\n\u0001\u000b\u0012\n\u0012\u000b\r\n\u000f\u000b\u0012\r\u000b\r\u000f\u000b\u0012\u0012\u000b\r\u0001\u000b\u0012\n\u000e\u0012\n\u000e\r\n\u0012\r\u0012\u000e\rw/o classification loss\n\u000e\r\n\u0012\r\u0012\u000e\r\n\u000e\r\n\u0012\r\u0012\u000e\rfull model DTCR\n\u0001\u000b\u0012\n\u0012\u000b\r\n\u000f\u000b\u0012\r\u000b\r\u000f\u000b\u0012\u0012\u000b\r\u0001\u000b\u0012\u000e\r\u000b\r\n\u000e\r\u000b\r\n\u0001\u000b\u0012\n\u0012\u000b\r\n\u000f\u000b\u0012\r\u000b\r\u000f\u000b\u0012\u0012\u000b\r\u0001\u000b\u0012\u000e\r\u000b\r.\u0004,88\u0003\u000e.\u0004,88\u0003\u000f\n\u0001\n\u0013\n\u0011\n\u000f\r\u000f\u0011\u0013\u0001\n\u000e\r\n\u0012\r\u0012\u000e\r\n\u000e\r\n\u0012\r\u0012\u000e\r\n\u000e\r\u000b\r\n\u0001\u000b\u0012\n\u0012\u000b\r\n\u000f\u000b\u0012\r\u000b\r\u000f\u000b\u0012\u0012\u000b\r\u0001\u000b\u0012\u000e\r\u000b\r\n\u000e\u0012\n\u000e\r\n\u0012\r\u0012\u000e\r\u000e\u0012\u000f\r\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r.\u0004,88\u0003\u000e.\u0004,88\u0003\u000f\n\u0001\n\u0013\n\u0011\n\u000f\r\u000f\u0011\u0013\u0001\n\u000e\r\n\u0012\r\u0012\u000e\r\u000e\u0012\n\u000e\r\n\u0012\r\u0012\u000e\r\n\u000e\r\n\u0012\r\u0012\u000e\r\f(a) Initial state\n\n(b) Intermediate state only with\nshuf\ufb02ed K-means and classi\ufb01cation\nloss\n\n(c) Final state (putting reconstruc-\ntion loss back)\n\n(d) Initial state\n\n(e) Intermediate state only with\nshuf\ufb02ed K-means and reconstruc-\ntion loss\n\n(f) Final state (putting classi\ufb01ca-\ntion loss back)\n\nFigure 4: Robustness Analysis of DTCR on SonyAIBORobotSurface. Note that the (d) is the same as\n(a), replicated here for better illustration; hence the \ufb01rst and second rows start with the same state.\n\n5 Conclusion\n\nIn this paper, we propose a novel model called Deep Temporal Clustering Representation (DTCR)\nthat effectively generates cluster-speci\ufb01c representations. We integrate the temporal reconstruction\nand K-means objective into the seq2seq model, enabling the learned representations to encode the\ntime series and to form cluster structures. Moreover, a fake-sample generation strategy for time series\nand auxiliary classi\ufb01cation task are proposed to enhance the ability of the encoder. The extensive\nexperimental results verify the effectiveness of the proposed method. Furthermore, we provide\nthe visualization analysis to demonstrate the advantages of the cluster-speci\ufb01c representations and\nshow the learning process is robust even if K-means makes mistakes. How to extend our clustering\nframework to time series with missing values is left for future work.\n\nAcknowledgments\n\nWe thank the anonymous reviewers for their helpful feedbacks. The work described in this paper\nwas partially funded by the National Natural Science Foundation of China (Grant Nos. 61502174,\n61872148), the Natural Science Foundation of Guangdong Province (Grant Nos. 2017A030313355,\n2017A030313358, 2019A1515010768), the Guangzhou Science and Technology Planning Project\n(Grant Nos. 201704030051, 201902010020).\n\nReferences\n[1] Andr\u00e9 Fujita, Patricia Severino, Kaname Kojima, Jo\u00e3o Ricardo Sato, Alexandre Galv\u00e3o Patriota,\nand Satoru Miyano. Functional clustering of time series gene expression data by granger\ncausality. BMC systems biology, 6(1):137, 2012.\n\n[2] Philip K Chan and Matthew V Mahoney. Modeling multiple time series for anomaly detection.\nIn Fifth IEEE International Conference on Data Mining (ICDM\u201905), pages 8\u2013pp. IEEE, 2005.\n\n9\n\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r.\u0004,88\u0003\u000e.\u0004,88\u0003\u000fRI = 0.5498\n\u000e\u0012\n\u000e\r\n\u0012\r\u0012\u000e\r\u000e\u0012\u000f\r\n\u000f\r\n\u000e\r\r\u000e\r\u000f\rRI = 0.5062\n\u000e\r\n\u0012\r\u0012\u000e\r\n\u000e\r\u000b\r\n\u0001\u000b\u0012\n\u0012\u000b\r\n\u000f\u000b\u0012\r\u000b\r\u000f\u000b\u0012\u0012\u000b\r\u0001\u000b\u0012RI = 0.6548\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r.\u0004,88\u0003\u000e.\u0004,88\u0003\u000fRI = 0.5498\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r\n\u000e\u0012\n\u000e\r\n\u0012\r\u0012\u000e\r\u000e\u0012RI = 0.6165\n\u000f\r\n\u000e\r\r\u000e\r\u000f\r\n\u000f\u0012\n\u000f\r\n\u000e\u0012\n\u000e\r\n\u0012\r\u0012\u000e\r\u000e\u0012RI = 0.7026\f[3] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah. Time-series clustering\u2013a\n\ndecade review. Information Systems, 53:16\u201338, 2015.\n\n[4] Leonardo N Ferreira and Liang Zhao. Time series clustering via community detection in\n\nnetworks. Information Sciences, 326:227\u2013242, 2016.\n\n[5] John Paparrizos and Luis Gravano. k-shape: Ef\ufb01cient and accurate clustering of time series.\nIn Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data,\npages 1855\u20131870. ACM, 2015.\n\n[6] Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. L2, 1-norm regular-\nized discriminative feature selection for unsupervised. In Twenty-Second International Joint\nConference on Arti\ufb01cial Intelligence, 2011.\n\n[7] Zechao Li, Yi Yang, Jing Liu, Xiaofang Zhou, and Hanqing Lu. Unsupervised feature selection\nusing nonnegative spectral analysis. In Twenty-Sixth AAAI Conference on Arti\ufb01cial Intelligence,\n2012.\n\n[8] Mingjie Qian and Chengxiang Zhai. Robust unsupervised feature selection. In Twenty-Third\n\nInternational Joint Conference on Arti\ufb01cial Intelligence, 2013.\n\n[9] Lei Shi, Liang Du, and Yi-Dong Shen. Robust spectral learning for unsupervised feature\nselection. In 2014 IEEE International Conference on Data Mining, pages 977\u2013982. IEEE, 2014.\n\n[10] Huang Lei, Yingcun Xia, Xu Qin, et al. Estimation of semivarying coef\ufb01cient time series\n\nmodels with arma errors. The Annals of Statistics, 44(4):1618\u20131660, 2016.\n\n[11] Zongwu Cai, Jianqing Fan, and Qiwei Yao. Functional-coef\ufb01cient regression models for\nnonlinear time series. Journal of the American Statistical Association, 95(451):941\u2013956, 2000.\n\n[12] Dag Tj\u00f8stheim and Bj\u00f8rn H Auestad. Nonparametric identi\ufb01cation of nonlinear time series:\n\nprojections. Journal of the American Statistical Association, 89(428):1398\u20131409, 1994.\n\n[13] Qianli Ma, Sen Li, Lifeng Shen, Jiabing Wang, Jia Wei, Zhiwen Yu, and Garrison W Cottrell.\nEnd-to-end incomplete time-series modeling from linear memory of latent variables. IEEE\ntransactions on cybernetics, 2019.\n\n[14] Lajanugen Logeswaran and Honglak Lee. An ef\ufb01cient framework for learning sentence repre-\n\nsentations. arXiv preprint arXiv:1803.02893, 2018.\n\n[15] Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio\nTorralba, and Sanja Fidler. Skip-thought vectors. In Advances in neural information processing\nsystems, pages 3294\u20133302, 2015.\n\n[16] Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, and Lawrence Carin.\nUnsupervised learning of sentence representations using convolutional neural networks. arXiv\npreprint arXiv:1611.07897, 2016.\n\n[17] Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael\nWitbrock, Mark A Hasegawa-Johnson, and Thomas S Huang. Dilated recurrent neural networks.\nIn Advances in Neural Information Processing Systems 30, pages 77\u201387, 2017.\n\n[18] Viresh Ranjan, Heeyoung Kwon, Niranjan Balasubramanian, and Minh Hoai. Fake sentence\n\ndetection as a training task for sentence encoding. arXiv preprint arXiv:1808.03840, 2018.\n\n[19] Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, and Dongmei Zhang.\nYading: fast clustering of large-scale time series data. Proceedings of the VLDB Endowment,\n8(5):473\u2013484, 2015.\n\n[20] Fran\u00e7ois Petitjean, Alain Ketterlin, and Pierre Gan\u00e7arski. A global averaging method for\ndynamic time warping, with applications to clustering. Pattern Recognition, 44(3):678\u2013693,\n2011.\n\n[21] Lawrence R Rabiner, Biing-Hwang Juang, and Janet C Rutledge. Fundamentals of speech\n\nrecognition, volume 14. PTR Prentice Hall Englewood Cliffs, 1993.\n\n10\n\n\f[22] Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. In Proceedings\nof the fourth ACM international conference on Web search and data mining, pages 177\u2013186.\nACM, 2011.\n\n[23] Chonghui Guo, Hongfeng Jia, and Na Zhang. Time series clustering based on ica for stock data\nanalysis. In 2008 4th International Conference on Wireless Communications, Networking and\nMobile Computing, pages 1\u20134. IEEE, 2008.\n\n[24] Jesin Zakaria, Abdullah Mueen, and Eamonn Keogh. Clustering time series using unsupervised-\nshapelets. In 2012 IEEE 12th International Conference on Data Mining, pages 785\u2013794. IEEE,\n2012.\n\n[25] Naveen Sai Madiraju, Seid M Sadat, Dimitry Fisher, and Homa Karimabadi. Deep temporal clus-\ntering: Fully unsupervised learning of time-domain features. arXiv preprint arXiv:1802.01059,\n2018.\n\n[26] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering\n\nanalysis. In International conference on machine learning, pages 478\u2013487, 2016.\n\n[27] Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. Improved deep embedded clustering\n\nwith local structure preservation. In IJCAI, pages 1753\u20131759, 2017.\n\n[28] Hongyuan Zha, Xiaofeng He, Chris Ding, Ming Gu, and Horst D Simon. Spectral relaxation for\nk-means clustering. In Advances in neural information processing systems, pages 1057\u20131064,\n2002.\n\n[29] Qin Zhang, Jia Wu, Peng Zhang, Guodong Long, and Chengqi Zhang. Salient subsequence learn-\ning for time series clustering. IEEE transactions on pattern analysis and machine intelligence,\n2018.\n\n[30] Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen,\nand Gustavo Batista. The ucr time series classi\ufb01cation archive, July 2015. www.cs.ucr.edu/\n~eamonn/time_series_data/.\n\n[31] Kyunghyun Cho, Bart Van Merri\u00ebnboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares,\nHolger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-\ndecoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.\n\n[32] Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu\nDevin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensor\ufb02ow: A system for\nlarge-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and\nImplementation ({OSDI} 16), pages 265\u2013283, 2016.\n\n[33] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[34] William M Rand. Objective criteria for the evaluation of clustering methods. Journal of the\n\nAmerican Statistical association, 66(336):846\u2013850, 1971.\n\n[35] Hui Zhang, Tu Bao Ho, Yang Zhang, and Mao Song Lin. Unsupervised feature extraction for\ntime series clustering using orthogonal wavelet transform. Informatica, 30(3):305\u2013319, 2006.\n\n[36] Janez Dem\u0161ar. Statistical comparisons of classi\ufb01ers over multiple data sets. Journal of Machine\n\nlearning research, 7(Jan):1\u201330, 2006.\n\n[37] John A Hartigan and Manchek A Wong. Algorithm as 136: A k-means clustering algorithm.\n\nJournal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100\u2013108, 1979.\n\n[38] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine\n\nlearning research, 9(Nov):2579\u20132605, 2008.\n\n11\n\n\f", "award": [], "sourceid": 2058, "authors": [{"given_name": "Qianli", "family_name": "Ma", "institution": "South China University of Technology"}, {"given_name": "Jiawei", "family_name": "Zheng", "institution": "South China University of Technology"}, {"given_name": "Sen", "family_name": "Li", "institution": "South China University of Technology"}, {"given_name": "Gary", "family_name": "Cottrell", "institution": "UCSD"}]}