NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3075
Title:Learning Disentangled Representations for Recommendation

Reviewer 1

This paper propose a VAE-based model to learn disentangled representations based on user behavior, including both macro one and micro one. They model user representation as latent variable z, item representation as condition c, and user-item interactions as x. The whole paper is well-organized and well written. It is very easy to follow and understand. They compare MacridVAE with state of art collaborative filtering approaches, and demonstrate superiority of their method. But, we can find too many assumptions in the paper, like independence and sufficient assumptions. It would be great if the authors can provide more insights about why such assumptions important. Since one item may fall into more than one categories. Hierarchical recommendation algorithms are not new to the community, it is not clear how such disentangled representation differ from them, and what the superiority of using disentangled representations. It seems that disentangled representation is just a two-level representation. Another confusing thing is the setting of parameter K. Is it always set to ground truth value? What if we do not know the ground truth value? I have read authors' response, and I tend to increase the score.

Reviewer 2

Originality: - The task is not new. The proposed approach is in line with some of the prior art, but the specific approach is new. Related work is fairly cited. Quality: - the technique appears to be technically sound. - claims are well supported by theoretical analysis and experimental results - this is a complete piece of work. - Choosing the number of components K: May be I missed this, but I think the paper does not discuss how to choose K, what happens if the chosen K is way off from the actual K. Figure 2 and Figure 3 are with K = 7 which it appears is the number of ground truth categories for that dataset. line 209: we do not constrain the dimension of user representations since they are not parameters. I am not sure I understand this / agree with this, especially when comparing different methods. Clarity: - the paper is well written and well organized. Significance: - The contributions of the paper are relevant and significant. - the authors will also be releasing a new recommendation ratings dataset. I have carefully considered the authors' response. The rebuttal looks fair. But my question was more like a clarification. So, it doesn't change my scores.

Reviewer 3

- The macro disentanglement resembles a cluster assignment process, and the micro disentaglement (encourage independence between dimensions) is a ordinary method for learning disentangled representation. However, the whole framework makes sense to me, and the use of Gumbel-softmax trick and cosine similarity is also reasonable. - It'd be better to show visualizations of baselines (e.g. MultDAE) in Figure 2, so that we can see the comparison. As learning such an item representation (distinguished by category, like clustering) is not hard. The micro disentanglement (Figure 3) is interesting, but the quantitative measurement is missing. - I'd like to see more experimental analysis, like ablation study of the macro and micro disentanglement (e.g. set K=1 to remove macro disentanglement). - Is there a reason to account for the superior performance, especially on sparse data? Maybe the proposed macro-micro structure alleviates the data sparsity problem in some way? It might be nitpicking that line 218 says "consistently outperforms baselines" which is not exactly true. - The main concern I have is the lack of baselines, as it only compares with two methods from a recent work[30], but there are many CF baselines like BPR-MF are missing, and they often show competitive performance. --- The rebuttal addressed most of my concerns, hence I decided to raise my score.