__ Summary and Contributions__: The paper describes an FHE based technique for privacy-preserving training neural networks. The core contributions of this work are a new cryptographic scheme that allows for switching between the BGV and TFHE cryptosystems since the both are suited for different types of layers of a neural network. This new technique offers ~3x speed up over the state-of-art Chimera approach

__ Strengths__: The main strength of this work is the implementation of a newly proposed scheme for switching between BGV and TFHE ciphertexts. This scheme is derived from the work on Chimera (which switches between BFV and TFHE ciphertexts). The authors perform detailed benchmarking of the runtime of linear layers in a neural network and show that the BGV is a better choice when compared with BFV. Based on this insight they implement their new scheme for two networks and show that they can reduce latency by ~69%. Finally, the authors use transfer learning to make the task of training competitive CNNs more tractable

__ Weaknesses__: The main weakness of the work stem from its incremental nature. Very similar ideas have been demonstrated in the past (e.g. Chimera for cryptosystem switching). The notion of using transfer learning for training although novel is not a substantial leap in the context of past work like EPIC. Finally, there is the issue with the scalability of these approaches to real world problems. Training a network for MNIST/CIFAR-10 like task after using transfer learning still takes >7days. The authors should ideally compare the same with plaintext and/or SGX based approaches to give a rough sense of the computational overhead.

__ Correctness__: The technical claims in the paper seem fair and borne out by the experimental results. The authors claim 69-99% reduction in latency in the abstract. The 69% number is explicitly reiterated in the main text however the exact conditions for the 99% claim are unclear. It would be good to highlight this condition more explicitly

__ Clarity__: The paper is well written. It does require prior familiarity with FHE cryptosystems but that is well justified given the technical nature of the contributions. The bitwidths used to represent the various activations could be specfied a bit more clearly (e.g. Section 3 shows bitwidth 3 LUTs but Section 5 indicates 8-bit quantization)

__ Relation to Prior Work__: The authors clearly highlight that their contribution differs from prior art in terms of the selection of BGV vs BFV for the linear layers. Similarly the use of transfer learning for training is clearly identified.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper presents a new method for training neural networks on homomorphically encrypted data. The technical contributions over prior work are (1) using a combination of BGV and TFHE schemes for delegating parts of the computation to the most appropriate schemes; (2) implementing a new TFHE-BGV scheme switching method. The paper demonstrates significant performance improvements over prior work.

__ Strengths__: The paper is beautifully written, at least from the point of view of a reader who is familiar with homomorphic encryption. All claims are easy to follow and mostly well justified; comparisons to prior work are clear and thorough. This work is likely to be of interest to ML privacy audiences.

__ Weaknesses__: I'm not sure how readable this is by people unfamiliar with homomorphic encryption. Unfortunately, with the low page limit, it just may be that such material is not possible to present at NeurIPS, and/or is simply outside the scope of the conference.
There are a couple of issues I have with the paper:
- I didn't see data sizes presented anywhere. How large is the encrypted training data, and how large are the key switching keys?
- The machine used is very powerful. What was the memory use of the implementation? Is there a chance to run this on a weaker machine? If not, is it purely an implementation issue of the libraries used?
- Can you comment on the machines used to evaluate the prior work, and how those machines may compare to your setup?
- You mention that HEAAN supports floating point computations better. Is there a reason it was not used, instead of BGV?
- Regarding "Broader impact", I would say that one disadvantage of training on encrypted data is that when data is contributed by multiple sources (through public-key encryption) any kind of model poisoning may be impossible to detect during the training phase. Have you considered such issues?

__ Correctness__: The claims, methodology, and comparison to prior work in the paper seem both correct and meaningful.

__ Clarity__: The paper is very clear and excellently written for homomorphic encryption experts, but I'm not sure how others will read it.

__ Relation to Prior Work__: The paper contains a thorough comparison comparison to prior work, but I could not find anything about the machines used for evaluating the performance of said prior work.

__ Reproducibility__: No

__ Additional Feedback__: There is a typo "transferring learning" repeated multiple times.

__ Summary and Contributions__: The paper proposes a FHE-based technique, Glyph, to train DNNs fast and accurately on encrypted data. Their proposed method switches TFHE and BGV cryptosystems using logic operation-friendly TFHE. Their method achieves SOTA performance. The paper claim that their method is the first work to use transfer learning in private training.

__ Strengths__: The paper is the first to apply transfer learning into training DNNs on encrypted data. The proposed system balances speed and accuracy. The performance is convincing and achieves state-of-the-art performance.

__ Weaknesses__: 1. All components, BGV and TFHE, are borrowed from other papers. Similar to Chimera, the switching mechanism is not proposed by the authors. Even transfer learning has developed for a long time. The authors are suspect of just ensembling these ideas together. The reviewer doubts the novelty of the proposed method.
2. The authors did not explain why FHESGD equipped with BGV performs worse than their system equipped with TFHE-BGV because the authors claim that the BGV is better than TFHE. The reviewer expects more analysis of mechanics.
Minor: Fig. 5 and Fig.6 are supposed to compare with Chimera.

__ Correctness__: The claims and method are basically correct. However, the reviewer has some concerns:
1. Why do not use HEAAN? Please give some explanation.
2. The switching strategy is BGV firstly and TFHE latter. But why do the authors take this order?

__ Clarity__: The paper is well written. But please pay some attention in layout. Because it is hard to read figures/tables and the corresponding text are far away.

__ Relation to Prior Work__: Very clearly. And explain the motivation for choosing the key components well.

__ Reproducibility__: No

__ Additional Feedback__: Try to propose a novel mechanism for training DNNs on encrypted data.