
Submitted by Assigned_Reviewer_1
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors try to tackle the problem when bidirected edge (<>) and undirected edge () meet in a loopless mixed graph. They propose to construct a new mixed graph called segregated graph (SG) (where bidirected edge (<>) and undirected edge () never meet) which preserves all conditional independencies on the observed variables.
I like the originality of the idea. I am somewhat disappointed by the clarity of the paper, which affects the quality too. Also the significance of the problem considered is not well addressed. Here I give more details on each criterion: *Clarity: I like the authors' explanation of the big picture before going to details, which helps to clarify to some degree. However, once we check the details, it becomes less clear caused by typos and undefined/undeclared symbols. It would be much easier to read if the authors follow notations in any of the well known (and recent) publications on this topic such as [4], [1*], and [2*].
*Quality: it is dragged down by the level of clarity. Also I notice that if we move most of the proofs to the supplementary file, the paper would have much more spare space (23 pages perhaps). This suggests much more content can be added to the paper. *Originality: good. *Significance: the importance of the problem is only lightly touched (line 128129) with a bit further discussion on the implication in conclusions.
Missing references: [1*] Sadeghi, K. and Lauritzen, S. (2014). Markov properties for mixed graphs. Bernoulli 20, 676696. MR3178514 [2*] Wermuth, N. (2011). Probability distributions with summary graph structure. Bernoulli 17, 845879. MR2817608
After reviewers' dissuasion and seeing authors' response, I've increased the score by one step.
Q2: Please summarize your review in 12 sentences
This paper proposes segregated graphs in order to tackle a problem in loopless mixed graphs, where bidirected edge (<>) and undirected edge () meet.
Submitted by Assigned_Reviewer_2
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
SGs are graphical models over observed variables, consisting of undirected, directed, and bidirected edges. Bidirected edges represent a common latent parent, while undirected edges capture a bidirectional causal relationship between variables, where A can cause B or vice versa. The overall model seems to be a combination of ADMGs and MRFs.
SGs contain no partially directed cycles; also  and <> edges are not allowed to meet. These properties imply a factorization of the joint distribution that is a combination of ADMGs and MRF factors. If  and <> edges do meet in the original model, the authors show how to orient  edges while preserving the conditional independencies; this is one of the main technical contributions.
Quality: The paper is technically sound, and all claims are supported by rigorous proofs. There are no experiments, but the authors give examples of causal interactions for which SGs would be appropriate.
Clarity: The paper is wellwritten; however it is very dense and some of the proofs are hard to follow. I had to draw diagrams on the margins in order to understand claims throughout the paper. The presentation would probably be clearer if the paper included more illustrative figures.
Originality: SGs were constructed by merging existing models, and as such are not that original (even though the work is highly nontrivial).
Significance: Not very significant on its own, but might have impact on future work. SGs would be more interesting if accompanied by e.g. some new results on causal identifiability (that cannot be expressed using existing models), or an inference algorithm for some parametric family.
Q2: Please summarize your review in 12 sentences
This is a high quality paper that describes segregated graphs (SGs), a novel graphical model over observed variables, consisting of bidirected, directed, and undirected edges. The model seems to combine ADMGs and MRS, and has a bit more flexibility in specifying causal relationships.
Submitted by Assigned_Reviewer_3
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a reformulation of a large class of mixed graphical models that allows a series of factorization. The paper is quite technical and my main concern is that I did not really to what extant the proposed representation can simplifying some problems or reduce the complexity of some inference algorithm.
I fear that the 8page NIPS format does not fit for this kind of work, which would need a series of illustration to examplify the progress it allows.
Q2: Please summarize your review in 12 sentences
See below
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Hidden variables in Bayesian networks and in chain graphs add extra complexity to parameter learning and to inference. Mixed graphical models introduce bidirected edges that can eliminate the hidden variables while preserving the conditional independencies induced by the DAG or chain graph representations. However, after applying latent projection operation to eliminate the hidden variable in the resulting mixed graphs, bidirectional and undirected edges can meet, which leads to issues when trying to factorize the model.
As a solution segregated graphs are proposed, which preserve the conditional independencies between the nonlatent random variables in the original DAG or chain graph, and at the same time they avoid having nodes where undirected and bidirected edges meet.
Clarity: The introduction uses notations that are only introduced in the background section, which makes the reading of the paper more difficult.
Significance: It is hard to judge the paper's significance without having a real world practical example and its evaluation at hand.
Typo: line 099 "...B_1,Y_1..." > "B_1,Y_2"
Q2: Please summarize your review in 12 sentences
Segregated graphs are proposed, which can preserve conditional independencies between nonlatent random variables in a Bayesian network or chain graph, while eliminating latent variables without introducing nodes where undirected and bidirected edges meet. This can have a significant impact, however, this is not demonstrated through a practical example and with experimental evaluation.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 5000 characters. Note
however, that reviewers and area chairs are busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank the reviewers for their thoughtful comments
about the paper. Specific replies, where warranted,
follow.
Reviewer 1: We wanted to thank the reviewer for the
additional relevant references. We will do a better job of proofreading
the paper. Because the arguments were not straightforward, we felt it was
worthwhile to sacrifice some exposition to make sure all arguments were
exposed to peer review.
We believe moving simpler arguments to the
supplement would create space to better justify the result, and otherwise
improve the clarity of the paper.
Reviewer 2: A fitting algorithm
for discrete statespace models represented by segregated graphs follows
immediately from our results. The likelihood is decomposable by blocks
and districts, thus we can simply use the algorithm from Evans and
Richardson for districts, and a chain graph fitting algorithm
(e.g. iterative proportional fitting) for block pieces to fit our
model. Variants of belief propagation are possible by e.g. constructing a
join tree from the augmented (moralized) graph obtained from a segregated
graph in section 5. We did not discuss these extensions in detail in the
paper in the interests of space  we felt it was important to give our
arguments in the main body of the paper as the main contribution is
theoretical.
We agree that thinking about identification in causal
SGs is very interesting. But we believe it is first necessary to define
and justify counterfactuals that do not come from a directed acyclic graph
causal model. This is possible, and we are pursuing this, but we feel it
would be out of scope for this paper. Tchetgen Tchetgen and Vanderweele
did some preliminary work on this in the context of interference in causal
inference (we modified one of their examples to apply to two time points
in section 2).
In terms of significance, the relationship of our
work to latent variable chain graphs is similar to the relationship of
Evans and Richardson's work on the ordinary Markov model in Annals, and
hidden variable DAG models. It follows from our results (but this is out
of scope for a conference paper) that we get a welldefined conditional
independence supermodel (e.g. forms a curved exponential family), just as
E&R get for their case. We believe this is important to show. We
will be happy to clarify this in the text.
Reviewers 3, 5 and
6:
This paper shows that there exist "nice models" that capture
much of the structure of latent variable chain graphs, a very general
class of hidden variable model. Given that we wanted to prioritize
exposing our arguments to peer review, showing this took most of the
paper. We believe this result is important for causal inference, analysis
of social networks, and other types of hidden variable modeling where
directed and symmetric relationships appear together. We simply did not
have the space to pursue extensions and applications further in a
conference paper. We would be happy to move some of the simpler arguments
to a supplement, and use the space to better justify the result, and
improve the clarity of the paper. 
