Paper ID: | 642 |
---|---|

Title: | Region-specific Diffeomorphic Metric Mapping |

The paper is well written, although *very* dense both in terms of mathematical expectation and development, as well as in terms of space. It is *not* an easy read (I suppose unless you are super fluent in LDDMM background). I think the authors could improve this to help this paper reach a broader audience, or perhaps they are not interested in this, I'm not sure. As it stands, it is a *bit* hard to evaluate due to the super condensed and dense nature of it. I believe this is a technical clean contribution with a clear advancement. However, I have a few concerns, but am happy to read and evaluate a rebuttal. The more serious concerns are with the experiments, below. General conceptual concerns: - In general, the usefulness of spatially varying regularization is confusing to me. I know there is literature towards this end, but in some sense regularization can be seen as a prior, and a prior is devoid of effects from data. In a bayesian perspective, wouldn't it be up to the data to affect the deformation such that it "overcomes" the prior when the signal is strong enough? Other than technical curiosity, could the authors argue the need for this -- or make it clear that it's a technical exercise? - Along the same lines, while following all of the LDDMM development, I've never seen a practical example where the time-dependent velocities/momenta provide any advantage over stationary ones, but perhaps the authors could provide an example (since their model is time-dependent). It's also fine if this is a technical exercise, but it would be great if the authors are up-front about this. - I don't fully follow how the velocity field is bounded at the boundaries of segmentations (line 113), and in fact the experiments seem to have more folding voxels there (as the authors state). Can the authors explain if this is, indeed true? In fact, it's unclear to me why invertibility needs to be maintained -- wouldn't this method be applicable in situations where there are strong changes along edges where smoothness and invertibility is not necessary? - I am a bit confused why the symmetric loss (169) is necessary -- wouldn't the fact that the deformations are invertible guarantee this property? Is the loss aimed at regularizing numerical issues, in a sense? When using the DL frameworks, the numerical approximation should be good enough to avoid this issue. - Finally, I am a bit unclear if this belongs in NeurIPS. I believe this could be a very appropriate (and strong) contribution in a place like MICCAI or even directly to IEEE-TMI or similar. I am not sure, however, of how the development applies to NeurIPS -- there are minimal contributions to learning, neuroscience or optimization, and the main contribution is the extension/improvement to LDDMM theory, which would probably fit more in the differential geometry or, as the authors state, fluid mechanics. However, perhaps I am misinterpretting the growing encapsulation of NeurIPS -- I welcome an author rebuttal and Meta-Reviewer override on this aspect. Experimental concerns: - For the DL approaches: In the reproducibility checklist, there are questions about ensuring that the data is properly split into train/validate/test sets, and that there is sufficient description about motivation of hyper-parameter choices. These questions are really important in ML -- without a separate validation set (separate from the test set, which should be held out), for example, it is very likely that the hyper-parameters are over-fit. While the authors responded 'yes' to these questions, these aspects re missing from the work -- as far as I can tell, there is no validation/test set discussion. I understand that this is registration and some of these methods are unsupervised, but it's been shown that there is a difference between train and test set of these networks depending on the amount of training data. What datasets are the results on ? Why does this differ from the checklist? - To me, the results are very similar to the baselines -- e.g. LDDMM and the DL methods. While the authors bold their results, as far as I can see the results are essentially the same -- if the authors disagree, can they provide a statistical significance test? If not, the practical advantage/utility of the method seems quite limited -- RDMM shows more folds (minimal, but more), and the maps they produce are, I would argue, only minimally informative -- in fact, I suspect that if the deformation fields themselves are visualized in image-based ways, the same or similar features can also be seen. Perhaps the authors can explain how I am wrong in this regard -- but it does seem that overall, the results are comparable (in accuracy, regularity (folds), runtime, and insight (if visualizing the deformation)). Unless there is a strong counter-argument to this, I would have liked to see a more clear write-up essentially saying: "Look, our contribution is mainly theoretical improvement, and these experiments are mostly to validate that we don't break anything compared to current approaches. We don't mean that this by itself will be useful, but rather that it will provice theoretical grounding for the future" -- I think this would make it more clear what the setting of the paper is. As it stands, it seems like at times the authors try to declare that RDMM is better in some ways, but I personally don't observe this. Again, I'm happy to read a rebuttal to this. - Less important -- I'm curious why the authors didn't apply this work to brain, which is probably the most often use case that I've seen used with LDDMM, and this would fit well with NeurIPS since there is an entire neuroscience sub-community. Minor: - Please be careful about the order of introducing mathematical definitions. For example, you define <.,.> as inner product after having used it several times already. - Please specify where in the suppl. Material to refer to for each part where you say "(see suppl. Material)". It's very hard to track otherwise. - Please cite papers correctly -- [10] is MICCAI, [11] is DLMIA, [17] is ICLR, [23] and [33] are CVPR, etc. - Considering the NeurIPS now accepts code at submission, stating "our code will be open-sourced" is peculiar (and if anything, raises skepticism for me). All papers can make this empty promise -- if you'd like that to be a bonus to help your paper, include it in the submission. * update after author response: Thank you for the thorough response. Overall, I don't feel like I should change my overall score -- I *do* certainly still feel that the paper should be accepted, and will argue for this in the discussion, but currently feel that a 7 is a fair assessment. I believe the authors mainly addressed the train/validate/test split issue, but several concerns still stand -- the answers for several of them were a bit too short for me to understand how they are really addressed (e.g. the symmetry question)..

The material of this submission is substantial. Unfortunately, the NeurIPS template is short and the author have postponed many details in the appendix. Splitting the paper in two part has a drawback: it makes the reading not as smooth as it could be in a single file as the Appendix now contains large technical blocks (with few text) making it tedious to read. The paper contains few typos and the proof seems to me correct even if I don't check everything line by line. Here are the minor typos I found: Paper - lines 4-8 (Abstract): The sentence is too long. Please consider to split it in two. - page 2 (caption Fig 1): t=0 -> $t=0$ and t=1 -> $t=1$ - line 92: missing a space after w.r.t. - lines 107 and 372: Is it possible to find a better notation than $K = \sum_i w_i K w_i$? - equation 3.5: Please introduce notation \frac{\delta}{\delta I(1)}. supplementary material - Equation 7.1 : there is an extra | (vert) in LHS. - Equation 7.5: the dependence on $t$ is implicit in RHS. Please make it explicit. - line 421: euqation -> equation

The paper is however hard on notation, possibly restricting a clear reading. One example is a lack of formal definition of used term, such as Reg(), cramping many notation in single lines (eq 2.1-2.4), heavy wording ("regularize the regularizer"). For this reason, perhaps, I missed an important step in the paper: How are the initial momenta exactly learned? Is this from ground truth momenta, if so, are they commuted via a conventional gradient-descent optimization? If so, (a) how does the learning method compares with the learned method, and (b) how does using the conventional optimization method compares with non-spatially-varying methods (LDDMMs, SVFs). In the experiment, the proposed learning method is directly compared with conventional non-spatially-varying method, which prevents the reader to appreciate the contribution from (a) the learning approach, and (b) the spatially-varying regularization. In other word, is deep learning really adding value to the registration regularization (the contribution)? Spatially-varying method has been studied in the medical imaging community, notably with the work on "Probabilistic non-linear registration with spatially adaptive regularization", MICCAI'13, MedIA'15. Other follow up work should be mentioned and studied if the spatially-varying aspect is promoted in the submission.