Paper ID: 237
Title: Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking
Reviews

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This is an interesting and novel paper about using variational
inference in Bayesian online change point detection. It seemed to me
that the experimental results are interesting and thorough, though I
am not in the field of radar tracking.

A problem with the paper is that I found the exposition hard to
follow---a more holistic view of the algorithm is needed to make the
paper clearer. (That is, after reading the paper, I'm not sure I could
implement it.) Perhaps you can include pseudocode with pointers to
various equations.

Comments:

p3: Intuitions about the Rice distribution would be nice. When and
why is it used in tracking?

p3: Nitpick: Variational Bayes finds a *local* optimum.

p4: Variational inference is easiest in the settings you describe,
those that are conjugate-exponential with latent variables. But see
recent work like [1] and [2] which seek to expand on this.

p7: I was intrigued by the comment "improved online updating in terms
of KL to batch" but could not unpack it---I don't think that turn of
phrase appears earlier to explain an idea in the algorithm. Here again
a holistic summary of the algorithm would help make the paper clearer.

p8: In the conclusions, "closing a hole" is a little strong. You
needed a variational approximation to the Rice so you developed it.
This is straightforward because it is a conjguate-exponential model.
(This is not to take away from the contribution---you make a good case
via an interesting application for the Rice being a distribution for
which we want a variational approximation.)

Also, I found referring to "non-exponential family distributions"
confusing. Yes, the Rice is not in the exponential family but, because
I read many papers about variational inference, I read the expression
to mean latent variable models that are not in the
conjugate-exponential family, which the Rice is in. (No need to change
if you feel otherwise.)
Q2: Please summarize your review in 1-2 sentences
This is an interesting paper. I am not in the field of radar tracking, but it seemed like a novel and important application of variational inference.

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

The paper presents a variational Bayes approach for online change-point detection in Rice distributed signal, accounting for the censoring of some observations, in the perspective of radar tracking. As far as I understand, the goal is to associate radar signals with real objects, which sounds like a unsupervised classification problem. Change point detection come on the top of that, due to changes in the data acquisition process.
One of the contribution is the variational posterior for a Rice distribution which does no belong to the exponential family, but that can be re-parametrized as a combination of exponential family distribution involving one unobserved variable $x$. The trick is nice, but then resumes to a classical variational Bayes approximation. A series of calculations derived in the paper resort from the same family, used in a relevant manner, but not really new.
I found this paper very hard to read for several reasons. The authors define more than 10 different acronyms. The model is never clearly stated so conditional dependences or independences have to be guessed, e.g. from equation (1). The connection between sections 1.2 and 1.3 is a bit hard to find. It is not clear to me if the Z (resp. a) of section 1.3 plays the role of the observed y (resp. unobserved x) of section 1.2. Also the connexion between classification (data association) and change point detection is still a bit obscure to me.
Q2: Please summarize your review in 1-2 sentences

Maybe an interesting paper, but very hard to read.

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper develops a Bayesian approach to change point models using a variational approximation. The basic approach is not new, but the distribution used here (Rice) and the approximation needed for handling it are unpublished. In particular, the way they modify the Rice to facilitate VB, while not deep, is technically clever and might find uses in other models. The approach to online learning is an interesting twist on Sato's work, but I miss evidence that it is necessary or better. There's probably a way to make this section clearer. I do not have sufficient domain expertise to judge the significance of the results on the 2 datasets (Slocumb and Klusman, flightradar24) but they seem impressive.

Quality: the paper is technically solid.
Clarity: the paper is well organized but not always easy to follow.
Originality: the approach taken by the paper is not in itself novel, but the application domain is relatively rare for this type of approach, and the paper adds technical novelties.
Significance: the problem addressed is difficult and the results seem to improve on previous approaches.
Q2: Please summarize your review in 1-2 sentences
This paper addresses the problem of multibody tracking in a radar context, which is quite hard. It is technically sound, and the approach contains some interesting technical novelties. Results seem convincing.

Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their useful comments.

Reviewer 4 motivates the use of a few extra citations. The use of a Rice distribution is motivated by the physics of electromagnetic reflections; a more modern citation than [25] (i.e. Swerling 1954) may be helpful for those interested. Sufficient detail to review the background of coding an entire tracking system is difficult to cover in the background section of a NIPS paper but augmenting the current citations of [2] and [20] might be helpful.

Reviewer 4 observes that the case of VB for a Rice noise model is simplified by the fact that it belongs to a class of models that are conjugate exponential family after being augmented with latent variables. However, as the reviewers note, this is a much larger class than exponential family since it includes Rice, Student-t, and mixture of Gaussians, just to name a few examples. Additionally, the VB bounds of Section 2.4 hold even if there is no clear way to augment the model to make it conjugate exponential.

Reviewers 4 and 6 would probably like if we added a few more sentences clarifying the performance benefits of the explained online method over Sato's method shown in Figure 2(a). To elaborate what is stated in the paper: The performance curves are the KL divergence between various online approximations to the posterior and the batch VB solution (i.e. redoing VB from scratch every new data point) vs sample size. Sato's online approximation to the posterior tends to have a KL divergence to the batch solution of around 1 nat, where as our approximation has an error of about 0.01 nats, which is a 100x improvement. On an absolute scale, 0.01 nats is typically considered a small approximation error where as 1 nat is a large one (see Kass and Raftery [1995] and Murray [2007, Ch. 4]).

Reviewer 5 appears to have some questions about the connections between different sub-sections of the background section. In our application, the radar cross section (RCS) measurement is used as the observed variable in VB (y in section 1.2). In practice, RCS will typically be one of the components of the z vector along with kinematic measurements (positions and perhaps velocity). So, the reviewer is effectively correct in that the Z measurements of Section 1.3 (background on tracking) are used as the observed y of Section 1.2 (background on VB). However, z will also contain measurements from other sensors that are ignored in BOCPD and VB.

Reviewer 5 would also like some extra explanation on the conditional independencies or what is sometimes referred to as the "probability of everything" or the joint of the model. We do provide the full joint for the tracking problem in (5) as we determined that would be most useful in explaining tracking to a machine learning audience; writing the full joint is not as common in the tracking literature. For space reasons we did not state the full joint of the BOCPD model (and its implied conditional independencies) as it is already present in the cited background on BOCPD.