NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4813
Title:Learning Bayesian Networks with Low Rank Conditional Probability Tables

Reviewer 1

To summarize the basic idea: by assuming a low-rank representation of CPTs, it becomes possible to identify the leaves, and their parents. One can then iteratively learn the structure from leaves to roots. The low-rank representation of a CPT could be viewed as a mixture of a set of CPTs with a bounded number of parents. The paper would be improved with some experiments that support the claims. There are some preliminary results provided in the supplementary material, but the plots are noisy and don't really support well the claim that the algorithm recovers the true structure as the amount of data/queries increases. On the plots: rather than vary C on the x-axis, it would be more straightforward to plot the amount of data/queries, which would be more direct and easier to interpret.

Reviewer 2

The paper studies the problem of learning Bayesian network structures from data under a particular setting as specified under Assumptions 1 and 3 and assuming all variables are binary. It presents an algorithm that correctly recovers the true structure under Assumption 4, and provides the sample and time complexity of the algorithm. Overall the paper is clearly written. The results are nice under the setting studied in the paper. My main concern is with the motivation of the problem setting. Why is this problem setting (Assumptions 1 and 3) interesting or useful in practice? What does a CPT being rank 2 or low rank mean in practice? The paper claims that the proposed method can be easily extended for any rank k CPTs. Can you provide a discussion of how this can be done? I wonder whether Assumption 4 becomes less likely to hold when k becomes larger. How strong is Assumption 4? Under what situations will Assumption 4 be violated? Perhaps an example situation where Assumption 4 is violated will be helpful. The paper assumes all variables are binary. Are the results extendable to any discrete variables? -line 35, the citation entry (D., 1996) should be fixed. -line 195, “unifromly” -> uniformly ---------------- Comments after author response: Thanks for providing a motivation to the problem setting. I still have concerns over Assumption 4. It looks to me Assumption 4 could easily be violated. I wonder its impact on the correctness of the algorithm.

Reviewer 3

This paper presents a method for structural learning of a BN given observational data. The work is mainly theoretical, and for the proposal some assumptions are taken. A great effort is also given in presenting and develop theoretically the complexity of the algorithm. One of the key points in the proposed algorithm is the use of Fourier basis vectors (coefficients) and how they are applied in the compressed sensing step. I haven't checked thoroughly all the mathematical part, which is the core of the paper. Being familiarised with BNs but not with the Fourier transformation, this part seems hard to be evaluated from my side. However, the theory seems to have shown and extensively stated (also in the supplementary material). My biggest concern is the restriction of only two parents for a node. I am aware that authors indicate it wouldn't be a problem to make this number (rank) larger, but then one wonders why not using a more realistic approach. I can see that the paper is mainly theoretical and I wouldn't say this is not enough if the methodology is correctly presented, as this is the case. Anyway, in a so practical issue as learning BNs one would expect some empirical evaluation with at least a classical benchmark of datasets. You could have used some 'simulation' for the black-box solved queries. All in all, I like the idea, the way the work is presented, and I think this paper could generate interesting debate in the NeurIPS community. Minor comments: There are some places were English could be improved, so I suggested further checking. For instance: - Introduction, motivation: Bayesian networks are one of the most important classES of probabilistic graphical models - Introduction, related work: "... data is a well known but AN incredibly difficult problem to solve in the machine learning community"