__ Summary and Contributions__: POST REBUTTAL COMMENT
----
I thank the authors for the useful comments in the rebuttal. I will keep my score as is: i think it is an interesting contribution but needs to be put more in context with existing literature, and existing benchmarks as well as it should be made more accessible for people not having a background in quantum computing.
____
The paper presents a novel parametrisation for unitary RNNs using parametrised quantum neurons, and amplitude amplification. An implementation of this method is provided which allows optimisation of paramterized quantum circuits with tens of thousands of parameters, and which can run on current hardware. Experiments on simple memorization tasks, or traditional integer classification tasks such as pixel-by-pixel MNIST show decent performance, especially given the context of the small number of qubits being used.

__ Strengths__: The proposed method seems to be the first recurrent and entirely quantum neural network, and could serve as a promising approach once more qubits can be utilized, as classical baselines such as the performance on pixel-by-pixel MNIST classification are already very decent given the context of the small number of qubits being used.

__ Weaknesses__: A lot of traditional memorisation and sequence prediction tasks such copy task, adding task, or PTB are left out, as well as some crucial baselines are missing, which could have significantly enhanced the paper. Unitary RNNs have their shortcomings, as they are only able to store information in memory but typically fail when more complex computations are necessary. It would have been nice to have a discussion whether the proposed method could be useful beyond the unitary RNN case. A lot of the proposed unitary RNNs in the literature are not full capacity / have reduced expressivity in that they do not parametrise the entire Stiefel manifold - it was not clear to me whether the proposed method suffers from the same limitation.

__ Correctness__: The empirical methodology seems correct, although I lack the expertise in applied quantum computing to properly assess the technical details of the proposed method.

__ Clarity__: The paper is well written and thoroughly explained, but details could be hard to follow for someone who lacks the necessary background in applied quantum computing. When talking about the literature on unitary and orthogonal RNNs, it would be good to cite [1] and use it a baseline for experiments, as it outperforms all the mentioned papers on traditional tasks like copy task, permuted MNIST, etc. The conclusion is merged with Broader Impact section, thus technically exceeding the 8 pages.
[1]: M. Lezcano-Casado and D. Martinez-Rubio: Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group. ICML 2019.

__ Relation to Prior Work__: It seems like the proposed method is the first of its kind, but I am very unfamiliar with related prior work in applied quantum computing. There are some important references missing in the unitary RNN literature (see previous comment).

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: UPDATE: the authors addressed my explicit points sufficiently. However, my general judgement that the paper needs more formal statements remains. Also the question of the complexity of the post selection Procedure came up in another review. I think this is a major unanswered point (see my updated “weakness” section).
I updated my score to 4 (previously 3), and lowered my confidence to 2 (previously 3).
ORIGINAL REVIEW:
The paper introduces quantum recurrent neural networks. On its own, the evolution of quantum states is intrinsically unitary. However, by introducing measurements and a post-selection strategy, it is possible to get non-linear amplitudes.
The used "Quantum Neuron" was introduced by Reference [CGA17], and the present work generalized it to higher-order control.
From the single neurons, a larger network is constructed. However, the exact way to do this is not described in the text. At least for me, it is not clear from the figures how the whole computation in the network works.
For the present work, the terminology "Quantum inspired RNN" seems more appropriate, since it is not clear here, why using actual quantum hardware should be useful.
Unfortunately, the paper comes almost without any mathematical formulae and instead uses pictorial representations of quantum circuits. This made it hard for me to understand the ideas that the authors want to convey.
Due to the missing clarity in the exposition of the main ideas, I cannot sensibly assess the empirical evaluations. Also, looking at the handed-in code, I did not manage to understand the fundamental ideas presented in the paper.
At this point, my main criticism is that the paper is not accessible enough. It might, however, be that I am missing the relevance and contribution of the paper due to this aspect.

__ Strengths__: The paper seems to make an exhaustive empirical analysis. However, due to the problems outlined in my review, I am not able to sensibly assess the outcomes of the empirical results.

__ Weaknesses__: - The paper misses a clear motivation and does not clearly state what its contribution is. Which problem does the paper address?
- The paper does not give enough technical details about the proposed ideas. For example, I would like to see a calculation on how to arrive at Equation (2), at least in the appendix.
- It is not clear why we need any notion of "Quantum" in this work. From the treatment in the paper, I do not see why using a quantum computer (if available) is beneficial. Thus the provocative question: what is the contribution of the paper, if we cancel all references to quantum and just treat it as a model that builds on basic linear algebra (as does QM)?
UPDATE: As pointed out by another reviewer, the complexity of the post selection might grow exponentially. In my understanding this happens with the number of neurons used. So for d neurons the probability of success is O(p^d) where p is the (roughly) the probability of accepting the computation of a single cell. As long as this cannot be addressed, there is no advantage in using quantum hardware. The paper should address this in a formal way.

__ Correctness__: The paper does not clearly state its claims. And the few used equations are not explained in enough detail.

__ Clarity__: Overall the paper reads very nicely. The figures and writing are very professional. However, on a content level, the given information is not sufficient.

__ Relation to Prior Work__: - The quantum neuron heavily builds on previous work [CGA17]. Although this is attributed sometimes, it is not always clear which ideas are new and which are taken from previous work. Based on my understanding, the claim in l.68 "a novel type of quantum neuron..." should be revised. Instead, I suggest referencing the original work and stating in a clearer way what the contributions of the present work are.
- The paper discusses problems with "classical" RNNs. Unfortunately, it is unclear why the presented methods are suitable to address such issues.

__ Reproducibility__: No

__ Additional Feedback__: - I am not able to reproduce Equation (2). There are no steps given on how to arrive at this result. I tried to reproduce it but failed. I wrote down the quantum states for the steps of the circuit in Figure 1 left. However, based on my calculations, the state before the measurement is exactly the initial state |x>|0>|0>. I might well be wrong, but if so, then it should be made clear from the paper. Also, looking up reference [CGA17] did not clarify this sufficiently.
UPDATE: This was addressed by the authors. I encourage the authors to include such calculations in the appendix, since most people in the community are likely not familiar with it.
- In l.145ff, the problem of inputs in superposition is discussed. However, is this even something that is intended to be done?
UPDATE: After the author response, I understand why superpositions are needed.
- The construction of the whole model (Figure 3,4) is not clear enough. I recommend a more extensive description and discussion in the text and potentially also using some formulae. I am not able to understand the intention here.
UPDATE: for the submission this point is still relevant.
- l.196ff: I do not understand how the data is encoded into the quantum state. So far $x \in {0,1}^d$. How does this work for other inputs?
A few minor points that should be improved:
- l17: Typo: exting -> existing
- l47: the quantum time evolution should have a minus "-" sign in the exponent. It might also be nice to include a reference here.
- l.67: "a [...] circuits" -> "a [...] circuit"
- l. 116: Consider including the explicit form of the Pauli matrix Y. Or at least include a textbook reference (Nielsen & Chuang)
- l.123ff: Why is it reasonable to think of superpositions of inputs in your applications?
- l.117: The rotation matrix has a sign error. The sinus terms have switched signs. See, for example, Eq. 4.5 in the book of Nielsen & Chuang. In the following, I think, however, that the rotation has been applied with the correct sign.
- l.132ff: The order parameter is introduced once as $o$ and once as \text{ord}. I assume it is the same. Please consider removing the redundancy. Also, in l.134, a closing bracket is missing.
- l.134: Do you mean to write $\theta$ in the equation? Or should it rather be $\eta$?
- I think the broader impact section has been abused for the "discussion". However, the discussion should be included in the 8-page limit. I consider this a slight violation of the paper submission guidelines.

__ Summary and Contributions__: The authors introduce the concept of a Quantum Recurrent Neural Network (QRNN) and show its performance on non-trivial tasks such as sequence learning and MNIST.
Exciting and remarkable is the provided implementation, since such software brings closer to the user the option of real Quantum Machine Learning.

__ Strengths__: The model presented here represents a strong contribution to quantum ML. In general, this work describes systematically how the quantum neuron is generated, part that in crucial in the construction of the QRNN, which is then used to contruct the QRNN cell.
The main contribution of this article is the method in which the QRNN cell was constructed. Beyond the specific application, I think that this recipe will help to better understand how to prepare gates to advance the field of QML.

__ Weaknesses__: The pytorch implementation is a great contribution, but it would have been better to present this in a more adoc framework such as TF-quantum.

__ Correctness__: in general the framework looks and reads consistent .

__ Clarity__: The paper is nicely written .

__ Relation to Prior Work__: Gives a nice introduction.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper proposes a recurrent neural network (RNN) based on cells
comprised of quantum unitary operators that would promise to
intrinsically circumvent the ubiquitous gradient issues associated
with classical RNNs. As quantum operations are fundamentally linear,
thought is needed to effect the nonlinearities required for flexible
modelling. A sequence of controlled rotations operating on a vector in
the computational basis is expressible as the nonlinear function in Eq
1. To enhance the suitability of the transformation to use as a
neuron, an order-parametrized linear map is invoked. Application of
this quantum neuron requires reading out an ancillary variable to
indicate that the operation has been successfully applied, which when
the state is not a computational basis vector requires fixed point
amplitude amplification. By including multi-control gates the neuron
representational flexibility can be increased so that \eta is now the
higher-order pseudo-Boolean function defined in Eq 3. rather than the
affine function appearing in Eq 1.
This new quantum cell is then employed in the overall quantum RNN,
which has analogous structure to Elman RNNs, to perform
sequence-to-sequence translation. Numerical results are presented on
some admittedly basic but nonetheless (IMHO) interesting and
instructive test cases in the current context.

__ Strengths__: I found this to be a refreshing and very interesting paper, with clever contributions towards the goal of constructing quantum RNNs. I really appreciated the fact that this paper was honest and hype-free, with justified enthusiasm about the contributions and careful discussion of limitations both theoretical and experimental.
While state-of-the-art performance and engineered models are well and good, I am very much in favour of new and promising ideas such as this one also being presented at NeurIPS. Its results don't break records, but the paper is nonetheless an interesting contribution to a field still in its early stages. I think it would be worthy inclusion to this conference, perhaps as a poster presentation.

__ Weaknesses__: In common with essentially all quantum machine learning, the paper's direct relevance to practical ML is some time away. Even so, the work is solid.
My main reservation about the particular methodology concerns the future scaling of the number of rounds of post-selection required for controls in superposition; it is encouraging that the authors observe this to be mild and diminishing for the experiments considered, but as the number of qubits was small by virtue of requiring classical simulation, might there be no reason to suspect some instabilities or exponential increase in the post-selection complexity as more qubits are involved? Can the authors provide some sort of ansatz or heuristic argument to reassure us?

__ Correctness__: The claims and methods used in the paper are to my understanding correct.

__ Clarity__: The paper is very well-written.

__ Relation to Prior Work__: Relation to prior work is well-acknoweldged.

__ Reproducibility__: Yes

__ Additional Feedback__: To share a few high-level remarks about the work as a whole:
- Competing against classical algorithms on classical problem domains
is IMHO a long-shot. One issue is the gate delays that can be
expected for any realistic quantum hardware in the future. The
unfortunate fact for quantum machine learning (and many quantum
algorithms overall) is that classical computing hardware such as
FPGAs, ASICs, and GPUs are extremely nimble. Unless a true
exponential speedup can be expected on a given problem domain,
objective gains will be quite hard to achieve. One area in which
such a speedup may be expected is in the very important problem of
simulation of dynamical quantum systems, which as stated in the
conclusion are currently done with classical RNNs.
- Re: solving the gradient vanishing/exploding gradient problem- this
is an admirable goal; as the paper acknowledges, there has been
considerable study into mitigating the effect in classical
RNNs. Many of these solutions are indeed expensive, but progress is
being made (indeed this reviewer has made some unpublished progress
in getting around several of the obstacles). This paper's
contribution is appealing as the unitary operations defining quantum
gates "natively" have this property.