Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
In this paper, the authors propose to use a fully-connected NN to improve the BP decoding for block codes of regular degree distribution. The results are quite interesting because it shows that we can do better than BP for these regular codes by weighting the different contributions coming from the parity check. In a way, it tells each bit which parity check should trust more when doing each BP step and it allows the modified BP algorithm to converge faster and more accurately to the right code word. The gains are marginal, but given how good BP typically is that should not come as a surprise and should not be held against the paper. I have several comments about the paper that I would like to be addressed in the final version. The comment about changing the arctanh for a Taylor expansion is not a regularization of training, it is due to numerical instabilities and randomness of the stochastic gradient descent that can make intermediate x too close to one and making the arctanh to overflow. I can imagine thresholding the arctanh between -4 and 4 would have the same effect (and is easier to understand). Please say that you are doing the Taylor expansion to avoid numerical instabilities and I encourage you to apply the threshold of the arctanh. The authors should also address the issues of non-regular LDPC codes and how the training would be done in this case, as the f and g network will have to be trained for each potential degree. Would this hurt their performance? How good would it be compared to BP in this case, given that BP achieves capacity? I would also like to see in Figure 3 the results with BP with $L = \infty$, because clearly in 3 of the cases the BP needs more iterations while the proposed algorithm does not (only in (b) the BP has converged). The authors can complain that the BP needs more iterations, but BP is simpler than the proposed algorithm and we should know if just applying more BP iterations would be enough. Finally, it would also be interesting to see codes with thousands or tens of thousands of bits, because in that case the NNs would need to be larger and harder to train and BP will be closer to its ideal conditions. This would allow us to understand if the proposed technique is universally applicable or only is valid for short (or not-so-long) codes, which is still a relevant problem. Minor issues: In the introduction to Section 5, you give the impression that you might need to train the network for all possible codewords and that only training with the zero-word might be a way of cheating. I know this I not the case, but I would reconsider rephrasing. What happens if the network f is trained with x instead of |x|, given that you have a large enough network it should not matter. I know that After the feedback: 1 Thank you for changing the nonlinearity (and it is good that it did not work). It will be good to add this comment to the paper so people understand why the Taylor series expansion is needed. 2 The results for irregular codes it is also good because it shows that you can do this not only for regular codes and will make the paper stronger. 3 With larger codes (n = 1000 bits), the gain almost vanishes and if you had used 10^4 or 10^5, I would be surprised that you can measure the difference. This is expected and your algorithm would be more useful in the ten to hundred bits range. I have decided to raise my score on this paper.
While I acknowledge the novelties of the paper, as outlined above, I think the authors could have done a better job in presenting their results. Right now, the paper looks like a report of a series of tricks to perform better NN decoding of algebraic codes. What is the basic idea behind these tricks and why are they playing a major role in improving the performance with respect to previous methods, e.g. Nachmani et al.? I simply don't see an interesting framework behind this paper. Also, in the experimental results, why haven't the authors provided comparison with the state-of-the-art decoding methods? E.g. for polar codes, the methods should also be compared with the list-successive -cancellation decoder. Right now, the methods have a marginal improvement in themes of the error performance with respect to Nachmani's method.
The paper provides a new deep unfolding formulation of belief propagation. The idea looks natural and reasonable. However, I felt that the novelty is a bit weak because the idea is a straight extension of Nachmani's works. Numerical results show the new algorithm outperforms the conventional algorithm when the code is short BCH codes. The gain obtained by the modification is not so large (compared with Nachmani's original) but the idea deserves to be known. I have read the author response and my opinion remains the same.