__ Summary and Contributions__: This manuscript proposes a new deep graph learning framework which integrates feature learning and graph structure learning for a link prediction task. Based on anchor-based metric learning, a scalable method is also proposed to accelerate the computation of the original model. The proposed method slightly outperforms other state-of-the-art models.

__ Strengths__: 1. The manuscript is readable.
2. The idea of alternatively optimizing the node embedding vectors and revising the graph structure seems to be reasonable even though the effect of this integration is somewhat limitedly observed in the experimental results.
3. The anchor-based metric learning effectively reduces the model complexity while not losing much accuracy.

__ Weaknesses__: The main algorithm (Algorithm 1) should be described in the main text instead of Appendix. I was able to understand the overall flow of the algorithm only after reading Algorithm 1 in Appendix.
In Table 1, it is kind of disappointing to see that the performance gap between the proposed method and the best baseline method is so little on all the datasets. Since the proposed method is a more expensive method than other methods as shown in Table 4, one might expect to see this cost is payed off by a large performance increase. However, I'm afraid that Table 1 does not support this.
Also, it would be better to show the performance of all the graph-based models (GCN, GAT, GraphSage, and so on) on the Wine, Cancer, Digits datasets by generating a simple kNN graph based on the features. (Currently, only GCN_{kNN} is presented.) It would be informative if we can check the performances of all the graph-based models in the kNN graph.
In the experiments, it would be better to include more 'link prediction' tasks and include more datasets related to the link prediction task because the main goal of the proposed method is to increase the accuracy in link prediction.

__ Correctness__: Overall, the reasoning of the main idea and the proposed algorithm seem to be reasonable even though I did not check all the details. The experiments use benchmark datasets and standard metrics to compare the performances of the methods.

__ Clarity__: Algorithm 1 should be presented in the main text, or there should be some high-level description about the overall algorithm somewhere in the main text. In the experiments, how the training set and the test set is divided should be described in the manuscript.

__ Relation to Prior Work__: It seems that the proposed method is properly compared with other related methods.

__ Reproducibility__: No

__ Additional Feedback__:

__ Summary and Contributions__: In this paper, the author presented a new graph learning method for graph neural networks. The authors started to analyze the significant drawbacks of existing GNNs methods: 1. work only when the graph data input is given; 2. ignore potentially imperfect graph inputs (due to the noise and cannot reflect true graph topology); 3. completely fail when inputs like texts are not given in graph format. To solve these problems, this paper proposed a new deep graph learning framework for learning the graph embedding and graph structure at the same time. Specifically, this paper introduced an iterative deep graph learning approach, where the key idea is to alternatively produce a better and more robust graph node embedding with a better learned graph structure and then learn a better graph structure based on better graph node embeddings. They further proposed a scalable version of the proposed method IDGL by leveraging the anchor-based approximation method. Graph similarity learning and graph regularization are also proposed to learn a graph structure with controlled quality, instead of learning a fully connected graph in existing methods. Authors show the experimental comparisons with other state-of-the-art GNNs methods. The proposed method showed better performance on accuracy and computational time. In addition, the presented method showed good scalability even on relatively large dataset without compromising the accuracy, owning to the use of anchor-based approximation technique.

__ Strengths__: (1) The idea of using iterative method for constructing a graph and learning graph node embeddings in a joint fashion seems very novel, to the best of my knowledge. Authors further presented improved version of IDGL in order to overcome the scalability issue of original IDGL for dealing with large graphs. Authors also provided theoretical analysis of the convergence of Anchor graphs to the exact similarity graph, making the proposed method more principled.
(2) Instead of using fully connected graph (e.g. employing self-attention/transformer-like techniques), authors tend to use sparsified graph over the fully connected graph by performing epsilon-neighborhood and adaptive graph regularization and showed better performance in the experiments. I found this insight is very interesting and could be transferred to other domain as well.
(3) The paper is very well written and clearly motivated. For instance, the paper started to discuss the issues of existing methods (assuming graph is perfect, LDS only dealt with transductive setting and has scalability issue) and they proposed their solutions. These issues are well explained in the paper and the authors do a good job at explaining how their proposed method addresses these issues.
(4) The overall experimental and ablation study results well investigated several advantages of the method in terms of both accuracy and scalability compared to existing GNNs methods.

__ Weaknesses__: (1) Authors discussed anchor-based metric learning and showed how it can use affinity matrix between graphs nodes and anchor nodes to reduce both runtimes and memory consumption. I think this is very interesting direction for dealing with large-scale graphs. However, I did not see any experiments how the convergence of performance when increasing the number of anchor points. More detailed discussion on the sensitivity of the number of anchor nodes should be given.
(2) Authors did not provide the full algorithm 1 in the main section, which I found it is very inconvenient to digest the whole framework. I would suggest authors to move it in the main section in order to better discuss them in details.
(3) It seems like this work is also related to adversarial robustness of graph neural networks. I would like to see more discussions regarding the connection and difference between this work and adversarial examples in GNN domain.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper focus on the problem of how to develop a (scalable) graph learning technique, which has been underexplored in the domain. Existing popular methods still have some limitations: first, GNNs can be only used when graph-structured data is provided; second, the intrinsic graph inputs are often noisy (not optimal) or incomplete; third, many applications may even only have sequence data or data matrix and lack of graph structured data.
To overcome such issues, this paper presented a novel end-to-end graph learning framework, called Iterative Deep Graph Learning (IDGL), to joint learn graph structure and graph embeddings. The model can generate more robust and better node embedding when learning graph structure through an iterative learning framework. It also has lower computational complexity (IDGL-ANCH) with the anchor-based approximation. The model can cope with both transductive and inductive settings, where the learned embeddings can be applied for many tasks.
Besides, the key ideas of the paper, i.e., iterative graph learning and anchor approximation, seem generally useful for other machine learning tasks and problems.
The author also performed extensive set of experiments to study the properties of the proposed methods and demonstrated the effectiveness and efficiency compared to other popular GNN and graph learning methods.

__ Strengths__: (1) This paper studies an important problem and I like the overall idea behind the work. A simple and elegant solution is proposed to develop robust and better node embeddings, and at the same time, learn graph structure using a novel iterative approach.
(2) The model has several key components including graph similarity learning, graph regularization, joint learning, and particular anchor-based graph approximation. They are designed and combined in a nice architecture for fulfilling the above mentioned properties, although some of them are not completely new in the vast of literature.
(3) A good set of baselines and datasets are chosen in the experiments and an appropriate set of experiments are performed to show the strong empirical performance of the IDGL and IDGL-ANCH in comparison to other state-of-the-art baselines. In fact, the ablation study and model analyses help me better understand the roles of each important component, which brings me good points.

__ Weaknesses__: (1) In lines 139-143, the authors mentioned the final graph is the combination of the initial graph topology and learned graph topology. However, it is unclear to me why we also have the combination of A(t) and A(1) in Eq. (3). I suggest that elaborating more on this part.
(2) It seems like the graph embedding component is fixed to GCN in the paper. But it seems to me that it can be any existing variant of GNNs. Could you elaborate more on it? Also, have you tried to use different variants of GNNs to combine with your IDGL framework? I suggest adding some analyses about this point.
(3) Minors: Some hyper parameters need better explanations (such as lamda, gamma, beta and eta) and the paper needs some editing to fix a few grammar issues.

__ Correctness__: Yes, The model is in a nice architecture and the experiments have shown the effectiveness of each part.

__ Clarity__: Yes, the paper is well written, which is easy to follow, although some hyper parameters needs better explanations.

__ Relation to Prior Work__: Yes, The paper has pointed out the differences between the proposed model and existing works and highlighted the contributions.

__ Reproducibility__: Yes

__ Additional Feedback__: Thanks for the authors' response. I hope my suggestions would be helpful.

__ Summary and Contributions__: Many real-world applications naturally admit network-structured data (e.g., social networks). However, these intrinsic graph-structures are not always optimal for the downstream tasks.
To address this real-world issue, this paper presents an end-to-end graph learning framework, namely Iterative Deep Graph Learning (IDGL), for jointly and iteratively learning the graph structure and the GNN parameters that are optimized towards the downstream prediction task.
The performance of the proposed IDGL is consistently better than or close to that of SOTA baselines. The ablation study is provided to verify each component in IDGL is effective. In general, this paper is a good paper and should be presented in NeurIPS2020.

__ Strengths__: This paper addresses an important problem existed in GNNs. When training a GNN, many real-world applications naturally admit network-structured data (e.g., social networks). However, these intrinsic graph-structures are not always optimal for the downstream tasks.
To address this real-world issue, this paper presents an end-to-end graph learning framework, namely Iterative Deep Graph Learning (IDGL), for jointly and iteratively learning the graph structure and the GNN parameters that are optimized towards the downstream prediction task.
The performance of the proposed IDGL is consistently better than or close to that of SOTA baselines. The ablation study is provided to verify each component in IDGL is effective.

__ Weaknesses__: I would be the best to provide an empirical evidence that intrinsic graph-structures are not always optimal for the downstream tasks in the introduction section.

__ Correctness__: Yes, all claims and methods are correct. The the empirical methodology is correct.

__ Clarity__: This paper is well written.

__ Relation to Prior Work__: Yes. it is clearly discussed how this work differs from previous contributions.

__ Reproducibility__: Yes

__ Additional Feedback__: Many real-world applications naturally admit network-structured data (e.g., social networks). However, these intrinsic graph-structures are not always optimal for the downstream tasks.
To address this real-world issue, this paper presents an end-to-end graph learning framework, namely Iterative Deep Graph Learning (IDGL), for jointly and iteratively learning the graph structure and the GNN parameters that are optimized towards the downstream prediction task.
The performance of the proposed IDGL is consistently better than or close to that of SOTA baselines. The ablation study is provided to verify each component in IDGL is effective. In general, this paper is a good paper and should be presented in NeurIPS2020.
The detailed comments can be seen below.
1. It would be the best to provide empirical evidence that intrinsic graph-structures are not always optimal for the downstream tasks (in the introduction section).
2. Line 35 says that "if not available we use a kNN graph". How does the k value affect the performance of the proposed method?
3. The motivation behind the graph regularization is unclear. Some references should be added to support using this regularizer.
4. Since the second contribution is related to time complexity, the theoretical time complexity analysis (and the running time) should be presented in the main contents rather than in the appendix.
5. Similar to the 4th comment, the whole algorithm should be moved to the main content.