Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Originality: The methods are not necessarily original - the approach is a pretty straightforward application of triplet embeddings from NLP to time series context. Of course, they did need to map context and positive and negative examples into the time series setting, which they have done. I am curious, though, how the method would perform if the "context" and "positive" examples did not explicitly overlap. For example, if they were simply close to each other in time, either slightly overlapping or adjacent, how would this change the performance. Quality: The paper quality is mediocre overall, though I do like the idea and do want to see it published. Clarity: Can be improved, as the writing is poor at times. Significance: Embeddings for time series is an important problem. This paper does apply a useful technique to embed time series, which to the best of my knowledge has not been done. In that regard, this paper is significant, and the community does need to see these results. That said, there is only one comparison to state of the art existing unsupervised methods in the main paper, DTW, so it is difficult to know how this performs in comparison to other embedding methods like seq2seq. Again, I like the ideas in the paper, I think they could be very useful in many applied areas, but I'm not sure the current version of the paper is NeurIPs quality.
Originality: The paper builds on several known ideas (dilated causal convolutions for the encoder, and similar triplet loss ideas from other domains), but the application to learning time series embeddings appears novel (and, as the empirical evaluation seems to demonstrate, effective). Quality: The proposed approach appears technically sounds. The empirical evaluation is extensive and demonstrates several desirable attributes of the proposed embeddings. Clarity: The paper is mostly clear and easy to follow. Some of the details, in particular about the exact model architecture used for the experiments, are relegated to the supplementary material. I was surprised not to see a weighting factor (also depending on K) for the different terms in eq. (1), but looking at the code it seems one was actually used -- this should be described in the paper. It's also not quite clear how the hyperparameters for the experiments where chosen. Significance: The proposed technique, though fairly straightforward and making use of established techniques, appears novel and, especially due to its encouraging results, could lead to further fruitful work in this direction.
This paper proposes a model for unsupervised time series modeling. The model consists of an encoder (for subsequences of varying length), a sampling strategy of triplet subsequences, and a loss function called the triplet-loss. The three samples are a reference sequence, a subsequence of the reference called the positive sequence and a negative sequence which is in no relation to the reference or positive sequences. The encoder maps each of the subsequences to their embedding and the triplet loss ensures that the embeddings of the positive sample and the reference are 'similar' to each other, while the positive and negative sample are 'dissimilar' to each other. The claim is that the learned representations capture meaningful features of the time series. Representation learning is an active area of research but somehow embeddings for time series data have not yet enjoyed as much attention in the research community. Time series embeddings are useful because they can map time series data to fixed-length representations which can be used as input features for downstream tasks or for qualitative exploration of the data. In a large-scale empirical study, the authors evaluate the time series embeddings on different tasks; classification, classification with sparse labels, and 'forecasting'. Evaluating time series embeddings is notoriously hard, especially since to quantify their quality a task needs to be defined. But for a given task (e.g. classification) the best model is simply a supervised model. It is intuitively clear that good embeddings are also useful for other tasks (e.g. clustering of the time series) but again it is difficult to quantitatively 'prove' that a specific embedding method would be preferable. Large benchmark time series tasks are only available for classification and not for other tasks. The empirical study of this paper is quite extensive. I would have been curious to see performance on other tasks (though of course I understand there might be no benchmarks). I also would have been curious to see experiments that compare the embeddings of different encoders trained with the triplet loss. This might help understand how much of the performance is due to the encoder architecture and how much is due to the triplet loss being a good objective. Similarly, it would be interesting to see the performance of the proposed encoder trained on a different objective, e.g. autoencoding or the 'forecasting task' from Section 5.3. In Bagnall et al. many of the competitive methods are based on KNN (K-nearest neighbours). Have the authors tried (instead of using an SVM) doing the classification based on KNN? Are the distances in the embedding space more useful than DTW? An advantage would be that one wouldn't need any class-labels during training. I would rate the originality of the work as medium (the encoder architecture and triplet loss come from other work, but how to best combine them requires some thoughts), with significance medium to high, as time series embeddings is an important topic. The work is of good quality (especially the extensive Experiments in a domain that is hard to evaluate) and clearly presented.