Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper is well written, and presents a novel idea of pairwise image registration using RNNs that can be trained in an unsupervised manner. Registration performance is on par with the B-Spline Image registration method and is much faster than the latter. Authors have described their approach very well, and the provided convincing results as well as supplementary material that was helpful for the review.
1. The main advantage of this approach is its efficiency at inference time with comparable performance of B-spline based approach where an optimization is needed per registration. And it has, according to the authors, much less parameters to optimize. Please confirm if this understanding is correct? 2. What is the reason of making the choice of using multiple steps to gradually transform the moving image to the fixed one? Could the local transformation done in one step instead? For instance, the position network could directly predict K locations to transform in one step instead of prediction one location for K steps. What is the difference? 3. What is the reason of using Gaussian to perform the local transformation? Are there any other choices? 4. It seems odd to me that the parameter network is not explicitly made aware of the decision made by the position network as they would have to collaborate to perform the image transformation. If the GRU is needed to run very many steps, the positions predicted early may not be relevant to what is produced by the parameter network at the end. 5. The application of MRI chest is not properly motivated. Why it is important to solve such a task and how hard it is to solve it? 6. With the visualizations included, the registration tasks seem to be quite easy to solve. I would like to see if this model could be applied on more challenging tasks such a registering longitudinal MR/CT studies on the same patient from different timepoint. 7. Perhaps this is not very clear in the text, if I have a point in the moving image, say (x,y), how do I derive its corresponding point (x', y') in the fixed image with the trained model? 8. Can you describe in more details the principle of B-Spline method, as this is the only benchmark that you have compared against? ----------- Post author feedback: It is very clear to me that the main novelty is on achieving the speedup in inference while maintaining a good accuracy. The author's response resolves most of my concern around architecture choices and parameterization. There are still two issues that I didn't get satisfactory answers: 1. Why use multiple fixation instead of one? Why there's no experiments justifying such an important decision? All experiments have the fixed step size 25, why is that? 2. Why is the application hard to solve and how relevant it is to solve it at a certain accuracy? It is hard to me grasp the metric used regarding the clinical impact to the patient outcome. Overall, I think there're novelty in computational efficiency gain. But the experimental design and execution is rather weak.
Overall I think this is very nice work. The idea is clear and interesting. The mathematics are well formulated. The only drawback here is it seems the comparison experiments are somehow insufficient and therefore readers may not know the performance comparison with other more recent methods such as [4,5]. Also, I think the authors should release the code as an additional contribution. But overall I think the idea of step-wise alignment is interesting enough for this paper to be considered as a good submission.