Paper ID:1305
Title:Analysis of Brain States from Multi-Region LFP Time-Series
Current Reviews

Submitted by Assigned_Reviewer_25

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The manuscript describes a very interesting model for the analysis of brain states for multi-region LFP time-series. The time-series are separated in different time-windows. An infinite mixture of Gaussian Processes is considered to model the observations in each window. Brain states are assigned to each observation by means of an underlying HDP and brain regions are assigned to clusters by means of a HDP. The paper is original and overall clearly written, but the interpretation of the results needs some improvement. More specific comments are below:

a) Please clarify the algorithm used to separate the time-windows. That's very important and can affect considerably the results.

Line 41: you should clearly distinguish between brain connectivity and brain states.

Line 100: "each window is considered a single observation" Since you are not summarizing the observations in each window in a statistic, the statement has no clear meaning to me

Line 098 and following: There's a bit of confusion in the notation, Are the windows different in each region, as suggested by "For each region, the time-series is split...", or the windows are common across regions as suggested by the model formulation?

Line 107 What is L? How do you choose it?

Line 113: In (1) \lambda_g^{(a)} should be explicitly written down. Is it a vector across states or animals? That becomes clear only on line 124

Line 183: Once (6) is established, the induced joint distribution over all windows is not block diagonal anymore. Worse, the joint distribution is not even well defined, since the joint covariance matrix is not semi-definite positive anymore. As a matter of fact, your infinite mixtures of Gaussian Processes is not a Process anymore, which is odd. This limitation should at least be acknowledged.

Section 2.2 Gaussian Processes are probably not the best to describe brain connectivity in each given state. The covariance function depends on only a few parameters and cannot reproduce the pattern of sparsity of the brain (even when coupled across regions). This limitation, again, should be acknowledged.

Section 2.3 I don't see the necessity of this section. Since you are decomposing a matrix of latent probabilities, way down in the hierarchy, the tensor characterization seems quite silly to me.

Line 414 The statement is repeated (see line 410). In addition, the explanation is very weak. The results may depend on the choice of the clustering mechanism. It is well known that the DP shouldn't be used for cluster analysis in an absolute way (Antoniak, 1974; Miller and Harrison, 2013). Besides, you fix the parameters of the DPs (see line 333). Hence, your conclusions don't seem well supported.

Line 418: What's the significance of this "network" with respect to the literature?
Q2: Please summarize your review in 1-2 sentences
The paper is interesting, original, and overall clearly written, but the interpretation of the results needs some improvement.

Submitted by Assigned_Reviewer_32

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
In this paper, the authors describe a HMM-based model for analysing local field potential data. The generative model assumes that for any particular window (short time period), the brain is in one of a set of discrete states. Conditioned on the state, the different brain regions are then assigned to particular LFP clusters and conditioned on the cluster, the observed data is generated from a GP. Variational Inference is performed using three data sets - toy data, mouse sleep data, and mouse novel environment data. Performance on the toy data is very good (to be expected as the data are generated from the model). Performance on the other datasets is harder to gauge, although this is due to the nature of the problem.

Quality: this paper is of high quality. There is interesting technical development coupled with an interesting problem area. The model proposed is well presented and the performance appears (as far as it is possible to tell) to be good. I have no real criticism with the paper.

Clarity: very well written and very clear.

Originality and significance: the paper tackles an important problem. For me, the originality is predominately in the application of this class of algorithm to this problem - I don't feel that there is a huge amount of originality in the technical development itself.
Q2: Please summarize your review in 1-2 sentences
The authors propose a HMM-based model for analysing LFP data and assess it using real data from mice. The model is hierarchal, assuming that the observed data for each region comes from a region specific cluster which is itself conditioned on a global brain state. The performance on the datasets considered is good.

Submitted by Assigned_Reviewer_43

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
SUMMARY

This paper reports a bespoke variational scheme for inverting state-space models of spectral density implicit in LFP time-series. In particular, its contribution is to provide a bespoke variational inversion using a hidden Markov model of discrete brain states that generate activity, where the form of spectral responses provides a Gaussian process model for the time-series. I thought that this was an interesting if colloquial application of variational Bayes that may need to be contextualized within the broader church of dynamical causal modelling – and its application to neurobiological time-series.

COMMENTS TO AUTHORS

I enjoyed reading this interesting and detailed description of a variational state-space model inversion for time-series. Technically, this was an impressive piece of work. My main suggestions would be to contextualize this within the broader church of dynamic causal modelling and highlight the potential usefulness of your scheme. Perhaps you could consider the following:

1) In the neurosciences, the variational inversion of state-space models of electrophysiological time-series is usually described in terms of dynamic causal modelling. In fact, there is a literature on the dynamic causal modelling of cross spectral density that has been applied to multi region LFP time-series (and MEG). I think it would be scholarly to look at this work. You can find an overview of dynamic causal modelling at:

http://www.scholarpedia.org/article/Dynamic_causal_modeling

Your special contribution is a state-space model that is formulated in terms of a hidden Markov model. This contrasts with usual DCMs that are based upon differential equations. You might want to highlight this because it is particularly useful for things like steep staging or endogenous transitions among different brain states.

2) Your description of the generative model is framed for a machine learning audience (with things like spectral mixture kernel and Gaussian processes). However, your rhetoric may confuse people in engineering and signal processing (and neurobiology). It would be useful to link your terminology to more standard concepts (perhaps in a glossary). For example, the Fourier transform of your spectral mixture is simply the auto-covariance function. Furthermore, your use of the word kernel is colloquial. In other fields, the kernel will be taken to mean the impulse response function or first order Volterra kernel whose Fourier transform is the transfer function (that corresponds to the spectral mixture). It might be useful to clarify terminology here?

3) When you introduce the bound on model evidence or marginal likelihood, could you describe this as the variational free energy? This will enable people to see the connections between the use of variational free energy in dynamic causal modelling and in your application.

4) To illustrate the potential usefulness of your approach perhaps you could make more of the clusters implicit in the sleep data. Perhaps with something like:

“To illustrate the potential importance of our (Bayes-optimal) states-space model inversion, we can now relate the clusters identified during sleep to classical sleep staging schemes (four distinct states). By examining the similarity between the clusters (spectral mixtures) we identified and the classic spectral profiles, we can see how stage four can be decomposed into three sub-stages……”

I am not sure how you would do this but it would be very nice if you could provide a proof of principle that your approach can take us beyond what we already know.

MINOR POINTS

1) In the abstract, I would say: “The model is able to estimate the number of brain states….”

2) On page 2, it is not clear which of “the above two methods” you are referring to. Can I suggest you say:

“More recently new methods for tensor factorisation have been developed: in reference 7, tensor factorisation was applied to short-term FFT………….”

This will make it clear that the tensor factorisation does not refer to the current paper.

3) Below Equation 6, I would say the parameters describe the auto-correlation content associated with each y”. I know what you mean but you are actually characterising data in the time domain not the spectral domain. In other words, you are using spectral mixtures to provide constraints on the Gaussian process.

4) At the top of page 8, it was not clear to me exactly what was being predicted by the results of Table 1. When you talk about a held out log predictive probability for different priors. What was this probability distribution over?

5) Finally, I think you need to address a crucial issue in your generative model. Usually, state-space models of spectral density (or auto covariance functions) accommodate cross spectra or cross covariance functions. In other words, it is not just the spectral density at each node or region but the coupling between regions that is predicted on the basis of connectivity among regions. I think you need to make it clear that your generative model does not consider complex cross spectra (cross covariance functions) and that – in principle – you could extend the generative model in this direction.

6) In the supplementary material, when talking about the updates for global probability vectors, you might want to mention that the use of point estimates means you do not have to consider the entropy of the posterior distribution implicit in the variational free energy (and that you can use the log posterior directly).

I hope these comments help should any revision be required.
Q2: Please summarize your review in 1-2 sentences
This was an interesting variational scheme for state-spaces models based on a HMM and a spectral mixture model of electrophysiological time-series. It Is not very biologically plausible but may have a role in sleep staging and classification of epileptic discharges.
Author Feedback
Author Feedback
Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
First, we would like to thank the reviewers for all the insightful comments and suggestions about this work. The majority of the critiques can be separated into three broad categories: first, the impact of separating the LFP time-series into time-windows; second, contextualizing the brain-state model within the vast research on dynamic causal modeling; and, finally, the interpretation of the results. In this response, we address all comments that fall within these three broad categories. The more minor points will certainly be addressed via clarifications within the paper.


TIME-WINDOWS:
One set of critiques about this work centers around the separation of the LFP time-series into time-windows. This separation is done prior to modeling, and ensures that the windows are common across regions. By this, we mean that window w is comprised of the same N time points for each region of a given animal. This point will be clarified in the paper. The separation of the time-series into time-windows is a feature of our model that allows for computationally efficient inference.

When separating the data into these time-windows, we can either choose overlapping or non-overlapping windows; with overlapping windows, individual data points may be replicated between two consecutive windows. In our model, we treat these consecutive windows as independent observations conditioned on the cluster assignments. We are not modeling the time-series itself as a stochastic process, but rather our preprocessed, `independent' observation vectors. As mentioned in the reviews, the true joint distribution of the original time series may not be block diagonal. The independence assumption is a limitation of the model, and will be addressed as such in the paper. However, the windowing allows for efficient computation via the mixture of Gaussian processes and is necessary for our formulation.


DYNAMIC CAUSAL MODELING:
We have chosen to formulate a state-space model on discrete brain states in terms of a hidden Markov model. This is different than the dynamic causal modeling (DCM) literature, where differential equations are crafted based on neurobiological heuristics. These equations control the state of each region, and these states influence an observation model of the data. DCM has been used to model the underlying neurophysiology of multi-channel LFP data, and this approach enjoys several benefits, such as causal modeling.

The proposed model is significantly different than DCM and has its own benefits, to include: a) our model is not necessarily specialized towards electro-physiological measurements, whereas the differential equations in a DCM tend to be crafted for a particular application; b) our discrete brain-state assignments are directly useful for tasks such as sleep staging – we note, however, that discrete states may also be obtained in a less-natural manner via a DCM with additional modeling and/or post-processing; and c) we obtain an analytical form of the spectral density of the observations, which is easy to interpret.

The DCM will be referenced in the revised version, and the relationships to the proposed model will be clarified.


RESULTS:
A final area we would like to address is the interpretation of the results.

While it is true that the number of clusters from a DP will typically over-cluster a well-matched model, this isn't a detriment to the results. Over-clustering is preferred to under-clustering in many neuroscience tasks, such as spike sorting (Adamos et. al. 2008), and our hold-out predictive log-likelihood values are higher for the learned model than when the number of states is manually set to a lower or higher number.

The interesting result is that we repetitively identify many more clusters with significant mass than classical sleep staging schemes (Dzirasa et al., 2006 uses 3 distinct states: REM, SWS, and WK). In the paper, we plot the probability of each brain state identified by our model conditioned on each of the 3 states from the classical scheme. This reveals how the classical sleep stages may be decomposed into discovered sub-stages with our algorithm. This plot is shown in the paper, but, as desired by some reviewers, we can clarify how this plot demonstrates the potential usefulness of our approach. Despite this result, however, we are hesitant to make any broad neuroscience interpretations without further experimentation.


OTHER COMMENTS:
The language in this paper is framed for a machine learning audience. Since implications of the paper may go beyond the machine learning community, we will take the effort to define our rhetoric in a way that is potentially less confusing to the engineering, signal processing, and neuroscience communities.