NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
In this paper, the authors proposed a multi-task learning approach for few-shot learning using deep neural networks. The proposed method can automatically adapt to new tasks at testing time after initial multi-task training. To make this possible and achieve better performance over existing methods, the authors proposed to use two networks to learn task-specific parameters in the classifier. One is for task-specific parameters in the final classification layer. The other one is to learn task-specific parameters that adapt the common feature extractor (a network) shared among tasks to be task specific. The superior of this method is demonstrated by not only the better than the-state-of-the-art performance on few-shot learning problems, but also competitive performance on continual learning and active learning tasks. Even though there have been several existing works on few-shot learning, as demonstrated by the empirical results in the paper, this work significantly moves the-state-of-the-art. The paper is well organized and easy to follow. I found the illustrations, i.e., Figure 1-3 are very helpful for me to understand the architecture of the proposed method. There is just one typo that I noticed on line 246, D_{\tau}, should it be D_^{\tau} to keep it consistent with the notations in the rest of the paper?
Reviewer 2
Quality: The technical content of the paper is well motivated and the approach taken is interesting. However, a few things are worth mentioning. 1 - The classification parameters for a given class are generated independently from the other classes. This means that the classifier is more likely to act as a prototypical model than a discriminative one. 2 - In the adaptation network, the auto-regressive component is not technically motivated. The fact that it improves results just shows the lack of capacity in the FiLM network as a way to modulate the feature extractor parameters alone. Did you compare different ways of modulating the feature extractor parameters? 3 - z_G is computed using only the inputs from the query set, what about the labels? 4 - The statement “ Allowing θ to adapt during the second phase violates the principle of “train as you test", i.e., when test tasks are encountered, θ will be fixed, so it is important to simulate this scenario during training “ is technically false as within each meta-learning step θ will be fixed even when is not pretrained. Thus, the justification for the training procedure is a bit weak despite the comparison between the proposed approach and the classical one. Maybe the sensitivity of the hyper-parameters is more the main reason for those differences. 5 - Related to the previous point, pretraining θ requires a large dataset, which is not always available in other domains as it is in computer vision, do not play in favor of the proposed training procedure. Thus, it is critical to find an alternative that works for training all parameters together using the meta-dataset instead of the two-phase approach proposed. 6 - Despite great results shown for the few-shot learning settings, the results section is a bit unfocused as the application to active learning and continual learning seems unnatural and forced. Clarity: The paper is generally well-written and structured and really easy to understand. Originality: The main originality in this work is definitely the auto-regressive modulation network that was proposed. Significance: This work shows significant improvements over the state of the art in few-shot classification which is an important contribution. While weakly motivated, it also proposes a new neural net architecture that improves upon modulation results achieved by FiLM, which helps to achieve better results.
Reviewer 3
The authors present a method called Conditional Neural adaptive Processes (CNAPs) able to efficiently solve new multi-class classification problems after an initial pre-training phase. The proposed approach, based on Conditional Neural Processes[1], adapts a small number of task-specific parameters for each new task encountered at test time. These parameters are conditioned on a set of training examples for the new task, don't require any additional tuning and adapt both the final classification layer and the feature extraction process, allowing to handle different input distribution. While being very close to CNP, this work focuses on the image classification task and makes several addition to the original method. These additions (FiLM layers, auto-regressive feature adapter, usage of deep sets) are clearly justified and their individual contributions are explored in the different experiments. The major negative point of this paper is its similarity with CNPs. The authors compare the two approaches in section 2 (lines 67-70), but this argument is not convincing at all, the adapted parameters can also be seen as a simple vector. I think the article would gain from putting a bigger emphasize on the auto-regressive way of dynamically adapting the parameters, which is an interesting and novel contribution. The article is very well written. While the approach is complex, the authors did a good job at progressively presenting the different components used, with clear explanations and corresponding references to justify each choice they made. [1] Conditional neural processes. Garnelo et. al. 2018