
Submitted by
Assigned_Reviewer_1
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors propose a new exact Hamiltonian Monte
Carlo (HMC) samplers to sample from binary distributions. Their method
uses the fact that for a suitable auxiliary variable construction, it is
possible to solve exactly the equations of motion appearing in HMC. This
is neat. This approach has been pioneered to the best of my knowledge in a
recent paper of Paninski and it's a natural followup.
To the best
of my knowledge, this is original and interesting. However the paper is
unfortunately not as clear as it could/should be and the writing has
clearly been rushed: there are embarrassing typos (e.g. p(y)=p(y,s)in
equations 4 and 5, supplemetary material).
The paper organization
should be improved. We have a long section 3 dedicated to spike and slab
regression with truncated parameters (interesting but having simulation
for the standard case would have been interesting too) and simulation
results appear in section 4.3 while section 4.1 and 4.2 are dedicated to
the Ising model... It would have been better to expand section 2 and
rewrite it very clearly with the method described in a pseudoalgorithm
form and merge sections 3 and 4 together (e.g. do we need for example
equations (26) and (27))
The simulation section is a bit
deficient. It would have been also interesting to study at least
experimentally the sensitivity of the algorithm to the travel time T. It's
not clear why for various applications how the authors have chosen the
values of T. Giving some recommendations to the potential users would have
been useful.
Q2: Please summarize your review in 12
sentences
There is a neat idea in this paper and I think the
method proposed here is potentially useful. I only gave a 7 to the paper
because it is unfortunately not very wellwritten. Submitted
by Assigned_Reviewer_3
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors present a new method for sampling from
binary distributions using an exact Hamiltonian Monte Carlo scheme and
demonstrate its applicability on linear and probit regression models with
spike and slab priors and truncated parameters, problems that consist of
mixtures of continuous and binary variables.
The paper is
polished, well written and generally very clear. This work provides a
solution that may be applied to general binary or mixed binary/continuous
distributions, building upon previous work using a different approach
which was only applicable to Markov random fields. As far as I am aware
this is a novel piece of work and I very much enjoyed reading it. As such
I only have a few fairly minor comments to make:
Section 3.1:
Slightly confusing for the reader that D is used for the data, but then
without warning z is immediately used to represent the data.
Examples section 4: A variety of examples are shown with nice
graphs showing the sampler output. I think the reader would appreciate a
more rigorous quantitative comparison in terms of the effective sample
size, perhaps also normalised by the computational time required to draw
the samples. Of course this must be done carefully trying to minimise the
effect of coding differences in implementation, but I think it would
strengthen the argument for the adoption of such samplers.
Page 4,
line 209: Typo, "Horsehoe" > "Horseshoe"
Q2: Please summarize your review in 12
sentences
An interesting paper, well presented and of reasonable
interest to the wider community. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper describes a method for relaxing discrete
sampling problems into continuous ones that are suitable for Hamiltonian
Monte Carlo. The idea is to divide the continuous space into orthants,
creating a onetoone correspondence between the possible values of the
discrete space and orthants in the continuous space. The complication that
arises is that the relaxation has noncontinuous jumps at the boundaries
between orthants, which the authors address by augmenting Hamiltonian
dynamics to allow the sample to either reflect off or jump on top of
cliffs, depending on its energy. A nice effect of this construction is
within the orthant, the problem is a constrained quadratic, for which
recent work has pointed out that the dynamics can be computed exactly,
eliminating the need for numerical integration.
This is a creative
and nice application of recent work on exact Hamiltonian samplers for
logquadratic densities. Spikeandslab models are a great application of
this technique, and an important model, and the results show that this
method has promise as an effective sampling algorithm for this model.
I have a question about the validity of the method. It's clear to
me that with energy barriers added, the dynamics is reversible and
preserves the value of the Hamiltonian. I cannot immediately see why the
dynamics conserves volume. Obviously it does so away from the boundaries,
but it is less clear to me why the dynamics conserves volume at the energy
barriers. Could the authors please expand on this?
Assuming that
the method is valid, it's an interesting paper that merits publication.
Minor comments:
* lines 108ff: A possible comment that
might help the reader's intuition: The equation (9) has a solution iff the
right hand side is positive. So if Delta is positive, meaning that the new
discrete state would have higher probability than the current one, then
the coordinate always crosses the boundary, i.e., we always move to the
new discrete state. In the physical analogy of HMC, the dynamics reaches
the top of a cliff, so it always falls downward. On the other hand, if
Delta is negative, then the new discrete state has lower probability, so
we only switch to it if the sampler has enough energy, i.e., if q^2_j
(t_j^)/2 > Delta.
========== AFTER REBUTTAL
1.
Thanks for your explanation about the boundary effects.
2. Perhaps
a bit selfindulgent, rereading my review, I thought of a perhaps better
way to express my intuition at the end.
"On the other hand, if
Delta is negative, then the new discrete state has lower probability.
Because the sampler has to be reversible, we're only allowed to roll up
the cliff if there exists a reverse trajectory that could have been
generated by the dynamics. In the reverse trajectory, the particle gains
energy when it falls down the cliff, so this creates a lower bound on the
energy of particles that jump up the cliff."
Feel free to not use
this if you feel it doesn't help. Q2: Please summarize
your review in 12 sentences
Assuming that the method is valid, it's an interesting
paper that merits publication.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We would like to thank the reviewers for their
comments and suggestions. Below we address their comments in detail:
From Assigned_Reviewer_1
> there are embarrassing typos
(e.g. p(y)=p(y,s)in equations 4 and 5, supplementary material).
The identity "p(y)=p(y,s)" is not a typo, but follows from the
definition of p(y). Note that the rhs only depends on y, since s=sign(y).
To make this clear, perhaps the sum in (4) should be over s' instead of
over s. We will change this in a revised version.
> We have
a long section 3 dedicated to spike and slab regression with truncated
parameters (interesting but having simulation for the standard case would
have been interesting too)
As mentioned in line 247ff, in the
standard case (without truncated parameters) one can integrate out the
weights and just sample the binary variables. This gives a
RaoBlackwellized result, with lower variance. As mentioned there, our
approach is most useful precisely when such an exact integration is not
possible because the parameter region is truncated.
> and
simulation results appear in section 4.3 while section 4.1 and 4.2 are
dedicated to the Ising model... > It would have been better to
expand section 2 and rewrite it very clearly with the method described in
a pseudoalgorithm form and merge sections 3 and 4 together > (e.g.
do we need for example equations (26) and (27))
A problem with
merging sections 3 and 4 is that the latter contains Ising model examples
that would belong in section 2, and adding examples already in Section 2
would break the continuity of the theoretical exposition. As for the
equations, since equation (28) is quite central to the posterior analysis,
we thought a reader would find useful to see its derivation from (26) and
(27).
Summarizing the method in a pseudoalgorithm form is a good
idea, we will add it in a revised version.
> The simulation
section is a bit deficient. It would have been also interesting to study
at least experimentally the sensitivity of the algorithm to the travel
time T. It's not clear why for various applications how the authors have
chosen the values of T. Giving some recommendations to the potential users
would have been useful.
Thanks, great suggestion. As usual with
the HMC method, the value of T giving the best results requires
experimenting in each particular problem. We will address this point in a
revised version.
From Assigned_Reviewer_3
>
Section 3.1: Slightly confusing for the reader that D is used for the
data, but then without warning z is immediately used to represent the
data.
Thanks. We will correct the notation in a revised version.
> Examples section 4: I think the reader would appreciate a
more rigorous quantitative comparison in terms of the effective sample
size, perhaps also normalised by the computational time required to draw
the samples.
That would be a good addition, we will add such a
comparison in a revised version.
> Page 4, line 209: Typo,
"Horsehoe" > "Horseshoe"
Thanks, will correct that.
From Assigned_Reviewer_6
> it is less clear to
me why the dynamics conserves volume at the energy barriers. Could the
authors please expand on this?
Thanks, this is an important point.
A region of phase space with spatial coordinates near an orthant boundary
splits into two regions, according to whether the coordinate is reflected
or crosses the boundary. One region will have the momentum reflected and
the other transforms discretely, its volume conservation following from
the limit of a smooth but steep potential jump. Both transformations
preserve volume and hence the total volume is conserved. This argument
applies both to the binary and mixed continuousbinary cases. We will be
happy to elaborate on these points in a revised version.
>
lines 108ff: A possible comment that might help the reader's intuition:
The equation (9) has a solution iff the right hand side is positive. So if
Delta is positive, meaning that the new discrete state would have higher
probability > than the current one, then the coordinate always
crosses the boundary, i.e., we always move to the new discrete state. In
the physical analogy of HMC, the dynamics reaches the top of a cliff, so
it always falls downward. > On the other hand, if Delta is
negative, then the new discrete state has lower probability, so we only
switch to it if the sampler has enough energy, i.e., if q^2_j (t_j^)/2
> Delta.
Thanks, we will add a comment along these lines in a
revised version.
 