Review for NeurIPS paper: Geometric All-way Boolean Tensor Decomposition

NeurIPS 2020

Geometric All-way Boolean Tensor Decomposition

Review 1

Summary and Contributions: The authors propose a Boolean Tensor Decomposition technique called GETF. GETF performed various experiments on both synthetic and real datasets including crime data in Chicago and breast cancer data. Related work and problem definition sections are clear, however, experiments have fundamental issues.

Strengths: N/A

Weaknesses: There exist several limitations in experiments which are summarized as follows: 1-Authors need to show useful scenarios where binary representation is the best way to model the tensor. For instance, it seems count representation makes more sense rather than the binary representation. By binary representation, the quantity of crimes is ignored. att least authors need to verbally mention why binary representation is better than count representation. 2-The author mentioned GETF is more scalable and faster than other baselines, however, scalability experiments need more work. For figures 6c, 6I, it is better to specify the number of patterns. Why not showing the scalability of GETF and other baselines as we increase 1- number of patterns 2- the size of a tensor dimension, and 3- number of non-zero elements in the tenor. Moreover, it is very important to compare the scalability of GETF and other baselines on real-world datasets including Chicago crime record, and breast cancer data sets. Currently, the scalability experiment on real-world data is missing. Another minor issue is that the authors mentioned "We also evaluated each algorithm on different data scale in supplementary materials". It is better and easier for the reader to specify the section number. 3- Authors mentioned, "For a 5-way tensor with more than 3 ∗ 108 elements, GETF completed the task in less than 1 minute." This sentence is very vague. It is not clear what "completed the task" means? What is the convergence tolerance? what is the number of patterns? Is it 5? The authors need to mention that. 4- If I understand correctly, I assume the y-axis in Figures Figure 7C and 7D is the same, however, one is crime index and the other one is crime data. Moreover, it is necessary to mention why GETF is able to distinguish the two regions better than other baselines. Also, it is not clear how the authors determine the crime index for each region. Minor issues: 1-There is no description of figure 7A. 2- The caption of Figure 8 is vague. Does the visualization come from GETF? or is it ground truth representation. 2-Authors need to provide more details about the captions in Figures 6 and 7. 3-I suggest authors provide the statistics of the data sets in a table. 4- Line 274: data densitie and noise level ->data densities and noise levels. 5- "On the other hand, GETF manages to derive patterns gradually (Figure 7I)". Decreasing in reconstruction error doesn't neeccesarly mean that a technique is finding patterns. 6- line 225, 226: "Figure 8A shows the changes of GETF and LOM in reconstructing error along with the addition of pattern numbers". There is no figure 8a. 7- It would be better to include the code in the submission. 8- Provide the notations and symbols in a table.

Correctness: Please check the Weaknesses section.

Clarity: No, it is very hard to understand this paper. Captions of Figures 1,2, and 3 are very vague. Authors need to provide more detail. This paper needs more iteration. In addition, there are issues with experimental designs and also the results.

Relation to Prior Work: Yes the differnece is clearly mentioned.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper proposed a geometric method for Boolean Tensor Decomposition. The method sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. Experiments showed its effectiveness in reconstruction accuracy and efficiency on running time. The performance of the proposed method is impressive based on the claims in the paper.

Strengths: The paper tries to provide theoretical proofs and descriptions to endorse the correctness of the proposed method, and provide experimental support using both synthetic data and real world data. The impact of the work seems to be significant compared with existing Boolean Tensor Decomposition methods, as the proposed method has a better accuracy and less cost in running time. Although this method is mostly algorithmic, I believe it is relevant to the NeurIPS community as it provides a state-of-the-art way to solving an important problem.

Weaknesses: To the best of my understanding, the proposed method is actually a greedy algorithm. It would be good if the paper provides theoretical analysis on how close the greedy result is to the ground truth. But I guess it does not affect the significance of the paper from the perspective of experimental results.

Correctness: No obvious error was found.

Clarity: The paper is mostly well written. Detailed comments: 1. Figure 1 is not clearly described. It is also not self-explainable. 2. Typo: line 293, “coutns” -> “counts”

Relation to Prior Work: The paper provides clear discussion for related works. However, I have one doubt regarding the claim at the end of Section 2: “none of the existing algorithms are designed to handle the HBTD problem for higher order tensors”. From what I understand, the LOM paper actually describes the method for higher order tensors, while their implementation was only for 3-order tensors.

Reproducibility: Yes

Additional Feedback: The first step of GETF is to reorder the indices of the tensor into a (k-1)-LTL tensor by IRT. Although the paper provides the computational complexity analysis, I actually have doubts on the efficiency of this step. Does it need all permutations?

Review 3

Summary and Contributions: I have read the rebuttal and my concerns have been partially addressed. || Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors like CP decomposition. In this work, the authors proposes an algorithm that sequentially identify rank-1 component via geometry (Geometric Expansion for all-order Tensor Factorization). A theoretical analysis on the validity of the algorithm is provided as well as experiments on both synthetic and real-world data structures, which shows the efficiency of the method.

Strengths: The works apply the recent geometric advances in matrix factorization to tensor decomposition. It is notable that it provide an algorithm that it is not limited to 3d order tensors.

Weaknesses: Geometry-based boolean matrix factorization has been proposed recently and is based on finding the closest Upper Triangular Like matrix possibly with direct consecutive 1’s. Then based on the geometry of such matrix, the pattern of the decomposition can be retrieved from the medium column or row with successive expansion. Once the first component is found, the same methodology is applied to the residual. Even in matricial case, finding a good permutation toward the Triangular shape rely yet on heuristic. So one of the challenges is finding the right permutations up to a (k-1) LTL tensor and then its closest flat 2-LTL tensor, for which the geometry is analogous to that of Upper Triangular matrix. As the proposed algorithm to solve efficiently the determination of a flat 2-LTL tensor is based on the closest plane, it rely on having effectively a (k-1) LTL tensor. The existence of such tensor for any tensor is not assured, or it may be computationally demanding in very high order to do. A study of the performance with a degraded IRT would be a plus. Moreover, there no comparison with other methods. While alternating optimization methods have not been written as yet for higher-order BTF, they can easily be adapted, as the matricization formula used holds for higher-order tensors (and have been used for real-valued tensor factorization).

Correctness: the claims appears to be correct

Clarity: some examples p-LTL tensors just after the definitions (especially (k-1) LTL) and a better insight on flat 2 LTL tensors would be a plus for the reader.

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: The paper addresses tensor decomposition problem for a special tensor, Boolean tensor. The objective is to factorize a binary tensor into the Boolean sum of multiple rank-1 tensors. The paper proposes an efficient BTD algorithm, namely Geometric Expansion for all-order Tensor Factorization (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. Both theoretical analysis and computational experiments are given to validate the performance of the GETF algorithm.

Strengths: The paper proposes a geometric perspective for identifying the largest rank-1 pattern and seeding the most likely pattern fibers. Mathematical formulation is presented and some theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor is also provided. Experimental results on both synthetic and real-world data are given to show good performance in reconstruction accuracy, extraction of latent structures.

Weaknesses: The paper provided experimental results on one synthetic dataset and two real world datasets to validate the efficiency and robustness of GETF. But No comparisons with other methods were given. The characters in figure 7 are too small and almost invisible.

Correctness: The mathematical analysis is sound.

Clarity: Well written

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: