NeurIPS 2020

Adapting Neural Architectures Between Domains

Review 1

Summary and Contributions: This work aims to minimize the cross-domain generalization gap that generally exists in current neural architecture search (NAS) methods with proxy tasks. Instead of directly using the target dataset for searching, which suffers from the high computation cost, the authors propose to improve the generalizability of neural architectures by leveraging a small portion of target samples via a domain adaptation technique. They first theoretically analyze the generalization bounds of architectures in NAS methods with proxy tasks, and then design a novel approach based on the analysis to minimize the bounds via adapting architectures between domains. Overall, the theoretical analysis is solid, and the experiments show that the proposed approach can achieve a good trade-off between the search cost and the target domain accuracy.

Strengths: There are several strengths of this work: - The motivation of this work is clear. The generalization issue between CIFAR-10 and ImageNet is a general problem in NAS. Existing methods aim to solve this problem by direct searching on ImageNet, which increases the search cost dramatically. This work provides a novel perspective to solve this problem. - The theoretical analysis is very solid and reveals important facts of how the generalization gap of neural architectures can be bounded. This can be used as a guideline for the algorithm design. - The proposed approach is based on and consistent with the theoretical results. Both the algorithm and theory are well supported by experiments. - Experiments of this work are very solid, including different pairs of domains and detailed ablation study, which makes this work convincing.

Weaknesses: Besides the explicit constraint on domain distance, which is the main contribution of the proposed method, there is another difference from this work to the others, that is the self-supervised technique which is used to bridge the inconsistency of labels between domains. This is rarely discussed in the paper. The authors should discuss what the role of self-supervised in this work is and to what extent does it contributes to the final result.

Correctness: The claims, method and the empirical methodology are correct.

Clarity: This paper is well written.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: There are several questions to the authors: - In the ablation study part, both the validation and test error rates are reported in Table 1. However, only the test error rates are reported in Tables 2 and 3. Could the authors explain why there is such an inconsistency? - In FBNet and HM-NAS, the searches are on a subset of ImageNet with 100 classes, while this work uses a different manner to construct the subset of ImageNet. Please compare the two methods and explain why the latter is applied. UPDATE after rebuttal: I have read the authors' rebuttal and other reviews. The paper has an interesting idea to adapt neural architecture between domains, and the authors suggest the possibility of the AdaptNAS to be a strong plus over existing NAS methods. It is also nice to see a more detailed analysis of the hyperparameters in the rebuttal. I would like to recommend acceptance.

Review 2

Summary and Contributions: This paper aims at improving the generalization of neural architectures via domain adaptation. This paper has analyzed the generalization bounds of the derived architecture and found its close relations with the validation error and the data distribution distance on both domains. This paper has proposed AdaptNAS, a novel and principled approach to adapt neural architectures between domains in NAS.

Strengths: In this paper, the generalization issue in ProxyNAS is studied, and two versions of generalization bound are proposed. Motivated by the generalization bound, this paper has designed an AdaptNAS method to find architectures with better generalizability. This paper has provided a new perspective in NAS: instead of direct searching on ImageNet or its subset, optimizing the generalizability of architectures by adding domain distance constraint during the search can reach better performance with lower computation cost. Extensive experiments on CIFAR-10 and ImageNet demonstrate that AdaptNAS is a more affordable searching method with more controllable generalizability comparing to the current state-of-the-art proxy or proxyless NAS methods.

Weaknesses: In Tab 4, the proposed method has no obvious improvement compared to P-DARTS. Figure 1: Table (d) is not reproducible in my implementation. My implementation of AdaptNAS can not outperform 'search on target' and there is even an obvious performance gap. Please also include experiments of SVHN(source) and MNIST(target) as in this paper: After I read the comments from other reviewers and the rebuttal, I guess there may be some problems in my implementation and this paper is a very nice submission.

Correctness: The prove in the supp looks correct when I follow the logic of the author.

Clarity: The writing of this paper is good except for several typos.

Relation to Prior Work: NAS applied in domain adaptation is new.

Reproducibility: No

Additional Feedback: Please double check the experiments in Figure 1: Table (d) .

Review 3

Summary and Contributions: The authors investigate the generalization gap of neural architectures between two different domains. Based on the analysis of generalization ability, the authors propose an AdaptNAS method by applying domain adaptation techniques to neural architecture search (NAS). Specifically, the proposed method incorporates a domain distance constraint and cross-domain self-supervised learning technique into the training of NAS models. Extensive experiments on CIFAR-10 and ImageNet demonstrate the superiority of AdaptNAS over existing methods.

Strengths: 1. It is worth mentioning that the authors theoretically analyze the generalization ability of architectures searched by NAS between two different domains. 2. The authors propose a theory-inspired AdaptNAS method by incorporating the domain adaptation and the self-supervised learning techniques into NAS. 3. Experimental results demonstrate the effectiveness of the proposed method on two benchmark datasets.

Weaknesses: 1. Some notations are very confusing. The authors use alpha to denote both the searched architecture and a scalar hyperparameter in Eq. (17). 2. The authors claim that the proposed method is able to reduce the generalization gap of architectures between two different domains. How much gap can be reduced in practice? It would be stronger to provide more discussions and results to illustrate this. 3. The self-supervised learning task seems to be very important for the training of the proposed method. What would happen if the authors use a different self-supervised learning task (e.g., solving jigsaw puzzles [1])? More discussions should be provided. 4. What would happen if the authors use a larger subset of ImageNet as the target samples during the search? Can the proposed method use the entire ImageNet dataset? 5. More implementation details should be provided. How many epochs do the authors train the model on CIFAR-10 and ImageNet? Do the authors randomly sample images from each category to construct the subset of ImageNet? Refs [1] Noroozi, Mehdi, and Paolo Favaro. "Unsupervised learning of visual representations by solving jigsaw puzzles." European Conference on Computer Vision. Springer, Cham, 2016.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: NA

Review 4

Summary and Contributions: This paper addresses the problem of NAS. The main idea is to bridge the domain discrepancy between the proxy task dataset and the target dataset by domain adaptation. A theoretical generalization bound is analyzed and a corresponding algorithm is given, namely AdaptNAS.

Strengths: The motivation is sound. The idea is novel and the proofs are detailed. I like the idea of using a transformed task as a latent space for domain discriminator.

Weaknesses: 1. In AdaptNAS, a domain discriminator is used to approximate the domain discrepancy, which might introduce a certain amount of computation overhead. 2. The results only show marginal improvement compared to previous state-of-the-art, especially P-DARTS[6] and MdeNAS[23]. In particular, [23] performs consistently better than the proposed method while only searching in CIFAR-10. 3. More importantly, it seems like there is no ablation study between using L_d (domain adaptation loss) or not. This makes it difficult to identify whether the performance is caused by using training data from both domains (L_S, L_T) or by the domain adaptation loss (L_d), which is the main contribution. 4. Also, in Tab 1, when alpha=0, there is indeed no L_d. However, it performs even better than most of the other settings where L_d presents (alpha > 0). This indicates the proposed L_d is less effective than directly utilizing data from the target domain. 5. What causes the inconsistency between Rot-4 and Rot-1? For the AdaptNAS-S, the Rot-4 version performs worse, while for the AdaptNAS-C, Rot-4 version performs better with even smaller network capacity. ------------------------------------------------- After rebuttal: The author convince me with a detailed explanation of my concerns. I encourage the author add these details to the final version of the paper.

Correctness: As listed above, a key ablation study is missing.

Clarity: The paper is well written and easy to follow. A typo in L267: “deceasing”

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: I mainly give my score mainly based on point 2, 3, 4 listed in the weakness. I like the general idea of this paper, but whether the experiment can consistently validate its effectiveness is a more important criteria for publication at NeurIPS.