NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4164
Title:Continual Unsupervised Representation Learning

Reviewer 1

I like the problem introduced in the paper and the approach taken. I have few questions and suggestion to authors: 1) “For dynamic expansion, we set the threshold for the negative log-likelihood (approximated by the ELBO) at cnew = 100”: How did authors decide cnew? Was there any tuning done to decide cnew? What is the effect of low cnew in experimental results? 2) What’s the time complexity of the algorithm? How long did it take to run all the experiments on Tesla V100 GPU? 3) Did authors try any preliminary experiments on little more complex datasets like CIFAR-100? 4) Figure 3b: Why did performance of “1” is getting worse with time but performance of “0” is not? 5) Figure 3B: Performance of “5” never seem to pick up beyond 50-60%. Is there any reason why? The submission is technically sound and claims are well supported by experimental results. The submission is clearly written and well organized. Authors address a novel problem setting with combination of old techniques. It is clear how this work differs from previous contributions and related work is adequately cited. Researchers and practitioners are likely to use the ideas presented in this paper and build on them.

Reviewer 2

============== After rebuttal ============== I thank the authors for their response, they have managed to clarify some of my concerns and overall I vote for acceptance of the paper. The authors introduce a method for continual unsupervised learning. They propose a generative categorical model, in which the latent space is modeled as a mixture of Gaussians, with a Bernoulli decoder. An expansion technique is used to include new mixture components for poorly modeled examples, and the generative model is used with previous model parameters to prevent forgetting old tasks. Their method is analysed on tasks constructed around MNIST and Omniglot, with an ablation study on the expansion and generative replay. The extension to a more standard supervised setting is also presented. Novelty and quality: The exact setting proposed in the paper, as well as the proposed model, are novel to my knowledge. Significance: The main contribution of the paper is empirical. The setting of unsupervised continual learning proposed by the authors is relevant and provides an interesting proof of concept for other tasks which can benefit from it, such as reinforcement learning. The experiments in 4.1 to 4.3 lack comparison to simple baselines, such as a hierarchical clustering technique. Since comparison to other methods is not possible, I believe this would strengthen the paper. Could the authors provide such a baseline? The fact that the method does well in supervised tasks is reassuring. Clarity: The paper is overall clear and the method is well presented. I have the following detailed comments: 1) The lack of comparison to baselines, as previously mentioned. 2) Why is the accuracy in Omniglot so low in Table 1? It is difficult to draw any conclusions if the method has such a high error on the dataset. 3) Why is only one sample \tilde(z)_k used in eq 3)? 4) I would appreciate more details to understand how eq 4) is derived, it is currently rather intuitively motivated. 5) What value is chosen for N_new? How is this selected? Moreover, is it sensible to initialise the parameters of a new cluster as in eq 5) if the new class is very different from previous classes? Did you try random initialization? 6) What exactly is p_{\theta} in eq 6)? 7) How are the training and test splits determined in the experiments, and how are the error bars computed? 8) How reproducible are Figure 4 and Figure 2b?

Reviewer 3

Originality: Unsupervised continual learning treated in this paper is challenging and quite important in the context of continual learning and lifelong learning. The proposed algorithm has several desirable properties including dynamic expansion and mixture generative replay, and can deal with both unsupervised and supervised continual learning problems. Quality: Although the authors do not provide the theoretical guarantee for the algorithm, the numerical experiments show the effectiveness of the algorithm. The authors seems honest about evaluating the proposed algorithm and other state-of-the-art algorithms as long as I see Table 3 and table 4. Clarity: This paper is totally well-written and the explanation for the proposed model is intelligible. Moreover, the authors provide adequate information about related work. Significance: The problem settings treated in this paper is quite significant in continual learning and the proposed algorithm has several interesting properties. I believe this paper inspires various ideas for readers.