Paper ID: 1841 Title: Clustered factor analysis of multineuronal spike data
Current Reviews

Submitted by Assigned_Reviewer_19

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
1841 - Clustered factor analysis of multineuronal spike data

This paper presents an important extension of the previous PLDS model by allowing disjoint latent dynamics for subpopulations.

~~ Quality

The paper presents state-of-the-art modeling and optimization techniques.

~~ Clarity

The model is described clearly. 8 pages is clearly not enough to describe everything and show detailed results, but appendix describes more details of the inference.

~~ Originality

This paper is largely based on previously successful PLDS model where a latent linear dynamical system is observed through Poisson processes. The key advancement in the model is the concept of subpopulation. For fitting the model, they propose sophisticated initialization procedures and compares methods.

~~ Significance

I believe this paper is a significant conceptual and technical progress towards better analysis of population neural data.

I do not understand why the latent linear dynamics for each subpopulation are allowed to interact. In other words, why is the matrix A not block diagonal? Wouldn't it allow any mixture factor model to be represented as well? It blurs the “subpopulation” interpretation. I would like to see a justification/explanation for allowing such interaction.

This tool is more useful as a confirmatory analysis than as an exploratory analysis. The addition of non-negativity in C for biological interpretation self-demonstrates its weakness.

~~ Minor

The last paragraph in 2.2 is confusing (duplicate info?).
The paper presents state-of-the-art modeling and optimization techniques towards an conceptual progress for population neural data analysis.

Submitted by Assigned_Reviewer_41

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a clustered factor analysis/latent linear dynamical systems model for neural population data that contains distinct, non-overlapping clusters of neurons that are each governed by low-dimensional dynamics.

Overall the paper is well written and the results on the toy data seem to indicate that the approach works. However, my enthusiasm is substantially dampened because I got stuck at several places due to oddities in the terminology, the model definition and the results, which made it hard to properly evaluate the work.

1. I don’t see how the proposed model relates to the cited mixture models (e.g. mixture of factor analyzers). In a mixture model we have

p(x) = \sum_i w_i p_i(x),

i.e. observations are assumed to be generated by individual mixture components. In the present paper, in contrast, the dimensions (neurons) – not the observations – are clustered. The result is a certain block structure in the loading matrix, but as far as I can tell it’s still essentially a factor analysis model. The authors should explain more clearly what’s the mixture distribution

2. The definition of s as a multinomial random variable doesn’t make sense to me. If s indicates the cluster label, it should be a simple categorical RV (or binary, if there is one dimension per cluster). The multinomial distribution measures the number of successes for N draws of a categorical RV. I don’t understand why one would use that distribution for s. Also, in line 131, why are there K different sets of \phi, i.e. K*M parameters? Shouldn’t we just need M parameters specifying the probability of neurons belonging to cluster m?

3. Why is the loading matrix (C) of the mixPLDS model shown in Fig. 2B not block-diagonal? If the zero structure is not enforced during training, how is this model different from a normal PLDS model (with non-negativity constraint)?

UPDATE AFTER AUTHOR REBUTTAL:

I upped my score a bit since the authors' addressed my questions. The explanations provided regarding points 2 & 3 should definitely be included in the manuscript!
A clustered factor analysis model that could potentially be very useful for identifying groups of neurons in neural population data. Unfortunately, I couldn’t follow the model derivation/description entirely.

Submitted by Assigned_Reviewer_43

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a new method for simultaneously clustering spiking data from a neuronal population, and extracting the latent factors (assuming a LDS) for each of the clusters. It is shown how to fit the model by variational technique and that it outperforms previous clustering methods in synthetic data. The inferred uncertainty over the cluster assignment is roughly consistent. The technique is applied to actual data from the spinal cord and produces sensible results.

I recommend publication, and only have minor comments:
- how consistent are the outputs when applied to the same dataset with different seeds?
- there is no colorbar in Fig 2A as referred to in the caption
Well-written methods paper, presenting a technique that simultaneously clusters a population of neurons based on their joint activity and extracts latent factors for each cluster assuming an underlying LDS. Clustering performance is improved compared to previous techniques in synthetic data, and sensible results are shown when applied to actual neural data.

Submitted by Assigned_Reviewer_44

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a method for clustering multi-neuronal data based on a parametric latent variable model. The main contribution of the approach presented is the extension of previous mixture of factor analyzers models to a dynamic setup where a hidden latent model captures the temporal structure of the model. Latent variable models have proven very successful in recent years, and offer great flexibility and lead to efficient (usually approximate) estimation algorithms. The basic building block of the model is a multinomial population model allowing each neuron to belong to one of M classes, where each latent variable is modeled using a linear dynamical system, and the observation model is based on a Poisson firing model. The main distinction compared to previous work is the incorporation of temporal dependencies in the latent variables. The general inference and parameter estimation is intractable within the present setting, leading to the introduction of a factorization assumption for the posterior distribution, and to a variational lower bound. This lower bound is optimized based on coordinate ascent, which is facilitated by using the dual formulation. The parameter update is interleaved with the update of the variational bound variables, leading to a variational EM type algorithm. In order to improve performance, an initialization scheme is proposed based on Poisson subspace sampling. Additionally, in order to prevent certain undesirable properties of the solution (like assigning all neurons to a single cluster), a non-negativity constraint is introduced into the neuronal activity model. The authors conclude their paper by presenting experiments with both artificial and natural data, taken from calcium imaging of motor neurons activities. The analysis of the empirical data led to some interesting conclusions about the firing phase of neurons. Interestingly, validation using electro-physiological measurements was suggested, opening an interesting route for model validation.

Overall this is a solid paper, extending earlier work in several promising directions and demonstrating good empirical results. However, as far as NIPS paper go, the contribution beyond previous work seems rather incremental.

A question that should be addressed:
The model described in eqs. (3-5) is linear and assumed a specific noise model. What are the implications of these assumptions? How robust is the model to misspecification?

Following the authors' response I have re-evaluated the contribution of the paper and raised my original score.
A solid paper, extending earlier work in several promising directions and demonstrating good empirical results. As far as NIPS paper go, the contribution beyond previous work seems rather incremental.
Author Feedback
Author Feedback
Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We would like to thank the reviewers for the constructive comments.

Reviewer 19:

1) "I do not understand why the latent linear dynamics for each subpopulation are allowed to interact. In other words, why is the matrix A not block diagonal? Wouldn't it allow any mixture factor model to be represented as well? It blurs the “subpopulation” interpretation. I would like to see a justification/explanation for allowing such interaction."

We agree that completely independent subpopulations (with a block-diagonal $A$) are paradigmatic examples of cell clusters. However, there might well be cases, where cell clusters interact over time. This is almost certainly the case in the data analyzed in section 3.2: Different motor neuron pools are believed to interact over time in a central-pattern-generator manner, producing slow oscillations. In our analysis of this data, we don't see how a block-diagonal matrix $A$ could capture these interactions.
However, even with non-block-diagonal $A$, at any given time step $t$ the neurons of cluster $m$ only depend on the factor $x^m_t$ and are independent of the factors of the remaining clusters. Hence, our model relaxes the assumption of ("global") independence of clusters to the assumption of independence at each time step (conditioned on current state), allowing for interactions over time.
We will try to better motivate a non-block-diagonal $A$ in the future revision of the manuscript.

2) "The last paragraph in 2.2 is confusing (duplicate info?)."

We will phrase this paragraph more clearly in the future revision.

Reviewer 41:

1) "I don’t see how the proposed model relates to the cited mixture models. [...] The authors should explain more clearly what’s the mixture distribution"

The model is a mixture model not in the time dimension but in the neuron dimension. Conditioning on the latent variables $x$ and the model parameters $\theta$ and marginalizing out the (categorical) indicator variables $s$, the distribution over the spike count $y_{kt}$ for neuron $k$ at time $t$ is a mixture of $M$ Poisson distributions. This reflects the modelling assumption that any given neuron comes from exactly one of the $M$ clusters.
The data are assumed to come in the form of a spike count matrix of size (neurons x time). Applying a standard mixture of factor analyzer model to the transpose of this matrix (where time steps are features and neurons are samples) would be one simple way of clustering neurons. The mixPLDS model is an extension of this approach (including a dynamical system prior in time and individual loading parameters for each neuron). We therefore think stating that the "resulting model is similar to a mixture of factor analyzers" (ll64) etc. is well justified.

2) "The definition of s as a multinomial random variable doesn’t make sense to me. [...] Also, in line 131, why are there K different sets of \phi, i.e. K*M parameters? Shouldn’t we just need M parameters specifying the probability of neurons belonging to cluster m?"

The reviewer is correct that $s$ is a categorical and not a multinomial RV; we will fix this.
Concerning the number of parameters: For each neuron (of which there are $K$), one needs $M$ parameters to specify the posterior cluster assignments ($M-1$ would be sufficient, the one additional parameter per neuron is compensated for by normalization). Hence one needs $K*M$ parameters for all neurons.

3) "Why is the loading matrix (C) of the mixPLDS model shown in Fig. 2B not block-diagonal? If the zero structure is not enforced during training, how is this model different from a normal PLDS model (with non-negativity constraint)?"

The estimates of $C$ would only be block-diagonal if there was no uncertainty on the cluster assignments for all neurons. For most neurons in Fig 2B, the algorithm is very certain to which cluster they belong. But there are a few for which the posterior uncertainty is high and hence the corresponding off-block-diagonal elements of $C$ are non-zero. Visual inspection of the spike-count time series for these neurons (Fig 2A) shows that they could indeed plausibly belong to both clusters.
Even with non-block-diagonal loading $C$ the mixPLDS model differs from the PLDS model. The PLDS model can use all factors *jointly* to explain the spike observations of any neuron $k$. In contrast, the mixPLDS can explain the spikes of neuron $k$ by using the factors of cluster 1 *or* the factors of cluster 2 *or* etc. (Being a Bayesian model, the mixPLDS integrates over these $M$ different hypothesis for each neuron.) Hence eg the likelihood of observed data under the PLDS and the mixPLDS (even with the same loading C) will in general be different.

Reviewer 43:

We agree that this is an important point. We will add more details about algorithmic complexity in the future revision.

2) "how consistent are the outputs when applied to the same dataset with different seeds?"

In the analysis of the real data (section 3.2), we observed that there is virtually now variability in the results for multiple restarts. The same holds for artificial data sets (section 3.1, fig 1) on which the mixPLDS shows good performance. On "hard" instances of the artificial data sets, different runs show some variability. We will try to quantify this and add the information to the future revision.

3) "there is no colorbar in Fig 2A as referred to in the caption"

We will fix this.

Reviewer 44:

1) "The model described in eqs. (3-5) is linear and assumed a specific noise model. What are the implications of these assumptions? How robust is the model to misspecification?"

We agree that this is an important point that needs careful analysis to understand the limitations of the model. We aim to address this in a future revision.