Reviews: Diffeomorphic Temporal Alignment Nets

EDIT: I've read the rebuttal and increase my score to 6. I still couldn't understand what you mean by "DBA/SDTW (but not DTAN) require test-data labels". It would be great to explain that better in the paper. Make sure you define all symbols before you use them and I'd recommend to give more background and context for your work in the introduction. ----- This paper proposes a deep learning method for time-series alignment and averaging. The key tool is a diffeomorphism T which allows to transform an input time-series U into a wrapped time-series V. On the plus side, the paper was enjoyable to read, includes several nice visualizations, and includes quite convincing experiments (although UCR contains only uni-dimensional time-series). On the negative side, I found that the notation was often not clearly defined, which made the proposed method hard to follow. For instance, AFAIK, \circ is never defined in the paper. The proposed method doesn't seem to handle variable-length time series, which I think is a very important feature. The technical methodology seems to follow mostly [43], although I don't think this is an issue if its use in a time-series context is novel. The proposed method includes several hyper-parameters to tune but DBA has zero and SDTW only one. I will adjust my score after the authors answered my questions below, in particular regarding notation, ability to handle variable-length time-series and computational cost of CPAB. General comments ---------------- * The introduction jumps right away into an equation. It would be better to provide some background and motivation for this work first. * The criticism that DTW and SDTW can't handle test data seems unfair, as it suffices to recompute an alignment between them. It would be fairer to say that your approach is a model-based, while DTW and SDTW are cast as an optimization problem. * It seems like the proposed method cannot compute alignments between variable-length time series but this was not entirely clear. Could you clarify? * Likewise, does the proposed approach handle multivariate time-series? The fact that DTW and SDTW are cast as an optimization problem allows to deal with variable-length time series. * Although it's probably not possible to describe it in full details, it would be great to explain how to compute CPAB and the gradient in more details. Could you also clarify the big O complexity? * Could you clarify how you choose the partition Omega in practice? Detailed comments ------------------ * Line 191: It was not clear why you claim that DTW and SDTW require class information. As I view it, this is not true. * Equations (1) and (2): what does \circ mean? It's not clear whether you mean element-wise multiplication between matrices or function composition. If they are matrices, please indicate the shape of U_i, V_i and W_i. * Instead of single vs. multiclass time-series, I would refer to unlabeled vs. labeled time-series. * Line 64: an optimal *monotonic* alignment * Line 66: The original DTW [41,42] is between a pair of times-series. Since you give an O(K^N) cost, I think you mean joint alignment of N time-series. Please provide a reference for this setting. * Equation (3): what is theta_i? Although it's meaning can be inferred, it should be explicitly defined. * Equation (3): Is the l.h.s. necessary? It is confusing, since it uses V_i while the r.h.s. uses (U_i, theta_i) (the signature of F is different on the l.h.s. and on the r.h.s.) * Equation (4): Typo: phi^theta -> T^theta * Line 157: I don't think you need to introduce the notation theta_i(U_i, w), theta_i = f_loc(U_i, w) is enough. Same for V_i. * Line 159: To be more explicit, maybe write "V_i = ..., where theta_i depends on w, as defined above".

I know little about time series, but the approach seems sound to me. In line with recent approaches for images and shape analysis. I think the authors should actually look at the recent approaches in 3D shape analysis to perform alignment via predicted deformations, such as Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). 3d-coded: 3d correspondences by deep deformation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 230-246). The philosophy of this type of approaches is very similar to the one of the paper, but the parametric deformation family T_\theta is learned by a neural network (an MLP), and thus could be better adapted to the signals. This would provide a nice comparison (i.e. CPAB performance v.s. MLP performance) I was a bit disappointed by the evaluations which is only on NN classification performance for the real data. I am not sure if there are better datasets that could be used. Anyhow, I think it would be good to analyse the results in much more details, for example correlating the performance/improvement of the proposed approach with the amount of data available for training or the variety of the data. === post-rebuttal I was quite disappointed by the rebuttal, which didn't address my comments for reasons I didn't agree with. I strongly encourage the authors to improve their experiments. However, I still think the paper is sound and quite original, so I do not oppose acceptance but don't strongly support it either.

Paper ID:	3549
Title:	Diffeomorphic Temporal Alignment Nets

Reviewer 1

Reviewer 2

Reviewer 3