Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper shows that sum-product operation using tensors (equivalent to so-called tensor networks) can be viewed as a generalization of previously used low-rank / factorization techniques and also such techniques as depth convolution. This is quite interesting. The problem is that tensor network approximation/factorization is typically used with fine-tuning, that is when the architecture is really useful (you first do approximation using SVD-type algorithms of pretrained network, and then fine-tune). Here the layer is introduced more formally, and a huge extensive GA search is applied "on top of it", which in my opinion, does not add to the contribution. However, I think the idea is interesting, and the effort spend by the authors is huge and shows that such layers are working, but does not give any hint for larger datasets (other than CIFAR).
This paper investigates the class of tensor decompositions in CNNs and demonstrates them with hypergraphical structures. The experimental evaluation is thorough and captures well the comparison of various methods. Overall, the paper is quite easy to follow. The text is clear and the figures help clarify concepts. Appropriate motivations are clearly stated in the introduction of the paper. I believe the paper will lead some perspective on tensor and hypergraph study. However, I have some concerns below. 1. Why not make comparisons of all methods in each figure? 2. What does the black circle mean in each figure？ Minor comments: - In Figures 3 and 5, some symbol markers are not clear. I think the authors should plot them after black circles. - In page 3, line 93, "in a clearn manner" should be "in a clear manner".
There are some major concerns on the paper. 1. The main theoretical results is in Sec. 3.3. However, this part is not well written, the propositions 1-4 are given without any explanations about its content. The final result in Theorem 1 is not very informative. Because it is obvious that if the inner inner indices and filter size are finite, the combinations of different tensor decompositions are finite. 2. In the experiments, especially Fig. 4 on 3D filters, the results are shown without enumerate the case with two inner indices. However, the tensor decompositions are more useful for high-order case, and tensor networks e.g., TT, is powerful for compression of filters. But these methods were not compared. 3. The papers presents a general framework for various CNNs using tensor decomposition, and the results show that standard method has best performance and CP is the Pareto optimal. What do these experiments can demonstrate for? What is the main information the authors try to convey to reader is unclear. 4. The paper is not well organized. A lot of contents are presented for the well known introduction, e.g., CNN, graphical notation of tensor operations, Einconv layer, while the innovation and contribution parts are incredibly short. 5. Experiments are not very convincing, since the authors fail to explain many important parts. For example, how to enumerate many different tensor decompositions? and why the proposed method can achieve good results? Thanks for authors' response. The paper has some interesting contributions about nice connection between some existing models and different tensor decompositions. The method in this paper is just to enumerating all possible decompositions and compare the results by using different decompositions. These results are interest to shown, but this is not a practical solution at all, since we cannot enumerating all possible tensor decompositions when we train a CNN. Therefore, I will still keep evaluation score unchanged.