__ Summary and Contributions__: The authors propose a graph matching method, which aligns the nodes of multiple graphs simultaneously. Moreover a clustering can be constructed performing the alignment only within the clusters. These hard optimization problems are solved by adapting the classical graduated assignment method for graph matching to the new settings.

__ Strengths__: 1) The combination of multiple graph matching and clustering is really nice and has only recently been introduced.
2) The methods introduced are sound relying on well-established techniques and are well adapted to the new setting.
3) The empirical evaluation is extensive and clearly shows the merits of the approach.

__ Weaknesses__: My main concern is that the graduated assignment method has been adapted to multi-graph matching before limiting the original contribution of the authors. Unfortunately, these publications (see below) are not discussed in the paper. Besides this (severe) issue, this is a good publication.
The discussion of images in Sec. 2.1.1 comes unexpected. The application to image data should better be explained in the introduction.

__ Correctness__: The proposed methods and claims are correct and the evaluation seems reasonable.

__ Clarity__: Overall the paper is well-written and comprehensible. However, there are some language problems such as missing articles, which could be easily fixed.
Minor remarks:
- The sentence: "The converged S is the doubly-stochastic relaxation from the output of the Hungarian algorithm." does not make sense. S is a solution to the relaxed problem.
- Eq (8): Since C is optimized, it should probably appear together with X_ij below \max.

__ Relation to Prior Work__: Graph matching has been studied extensively in the literature. Unfortunately, the authors seem to have overlooked these publications extending the GA method to multi-graph matching:
Albert Solé-Ribalta, Francesc Serratosa:
Graduated Assignment Algorithm for Finding the Common Labelling of a Set of Graphs. SSPR/SPR 2010: 180-190
Albert Solé-Ribalta, Francesc Serratosa:
Graduated Assignment Algorithm for Multiple Graph Matching Based on a Common Labeling. Int. J. Pattern Recognit. Artif. Intell. 27(1) (2013)
The method proposed in the publications above appears to be closely related to Section 2.1 of the paper under review.

__ Reproducibility__: Yes

__ Additional Feedback__: === Update after the rebuttal ===
Thank you for pointing out the differences to the omitted references. However, in my opinion the original contribution of the MGM part is limited and the differences are not fully sufficient. It is not clear, if the main difference (highlighted in [red] in the feedback) really provides an advantage compared to adding dummy nodes as proposed and discussed in the omitted journal paper. The other differences are technical, but no major contributions in my opinion. The original contribution should be made clear on acceptance, which I cannot fully recommend for the reason stated above.

__ Summary and Contributions__: This work deals with a joint problem of graph matching and graph clustering. First, the authors proposed a new graph matching solver based on the first-order Taylor expansion of the Koopmans-Beckmann’s graph matching formulation. Then, the authors combined the multi-graph matching and clustering as an end-to-end model. Finally, they developed an unsupervised graph matching framework. Numerical experiments on Willows and CUB2011 datasets shows the proposed method reach a good performance in both non-learnable and learnable setting.

__ Strengths__: The unsupervised graph matching model is particularly interesting and potentially has nice impact on many practical applications, where the manual labeling is expensive.

__ Weaknesses__: The authors should test the proposed method on more challenging datasets, for example PASCAL VOC graph matching dataset, as the performance of recent proposed matching algorithms on Willows and CUB2011 is near saturated.

__ Correctness__: The method and experiment are sound.

__ Clarity__: The paper is well written, easy to read.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: Update after reading the rebuttal letter:
Thanks for the authors' feedback. I'd like to keep my score unchanged.

__ Summary and Contributions__: Authors propose a novel method for multi-graph matching that allows to backpropagate through the per-node feature extractors. The main novelty lies in utilizing the differentiable Sinkhorn algorithm for the bipartite matching part of the problem, and solving the multi-graph problem via Taylor expansion of the Koopmans-Beckmann’s QAP objective function. Results indicate SOTA performance on 2 datasets.

__ Strengths__: + A good theoretical contribution that meaningfully combines several graph matching algorithms while maintaining backpropagability.
+ SOTA performance.

__ Weaknesses__: - Similar to alternatives, it seems the method is quite memory heavy.
- It is not obvious whether both Alg. 1 and Alg. 2 are guaranteed to converge.

__ Correctness__: The paper seems theoretically correct.

__ Clarity__: The paper is reasonably easy to follow.

__ Relation to Prior Work__: The paper revises the prior work well. It builds on top of previous findings in a meaningful way.

__ Reproducibility__: Yes

__ Additional Feedback__: I like the chosen methodology and the problem in general. As the main disadvantage I view large memory consumption due to the need to keep all matching matrices in memory, which is the common issue with all alternatives. Also, it would be great to state whether both Alg. 1 and Alg. 2 are guaranteed to converge.
Questions:
1) I wonder whether it is meaningful to setup A_i (the graph connectivities) using distances of 2D keypoints. Intuitively, these distances change as the object changes pose in front of the camera and such "distance matrix" thus does not contain information invariant to the pose of the object.
2) It would be great to conduct some experiments where one does not rely on matching GT annotated keypoints. MatchALS did such experiment on a dataset of cars.
3) What is the theoretical limit for the size of a dataset that could be matched using the proposed method? I presume that one can only match 100s of images with 10-100 keypoints in them. Can authors suggest a way to decrease the memory consumption? E.g. can the matching matrix be subsampled somehow and thus only a small random subpart of it will be considered during every iteration?

__ Summary and Contributions__: As a commonly used graph data processing technology, graph matching has application scenarios and research value in the field of machine learning. In this paper, the authors handle the multi-graph matching problem via a graduated assignment procedure. Furthermore, the authors devise the first unsupervised deep graph matching networks.

__ Strengths__: 1. The authors propose a novel multi-graph matching solver by iteratively solving the first-order Taylor expansion of MGM Koopmans-Beckmann’s QAP.
2. The authors devise the first unsupervised deep graph matching network whose performance can be on par with supervised model for two graph matching.
3. Sufficient experimental results show the proposed approach effective.

__ Weaknesses__: Even though I am not an expert in this field，I feel that the novelty of proposed method is somewhat Incremental. All the important formulas，such as Eq.(1) to Eq.(5), are referenced from the existing works，which makes the readers hard to know which the acutal contribution is.

__ Correctness__: yes

__ Clarity__: yes

__ Relation to Prior Work__: I am not sure

__ Reproducibility__: Yes

__ Additional Feedback__: see weaknesses