Review for NeurIPS paper: Passport-aware Normalization for Deep Model Protection

NeurIPS 2020

Passport-aware Normalization for Deep Model Protection

Review 1

Summary and Contributions: The paper encode the passport into the affine transformation parameters in Batch Normalization Layer for deep model IP protection. The method is generally applicable to most existing normalization layers and only needs to add another passport-aware branch for IP protection.

Strengths: (1) The idea of independent branch for passport-free for deployment and passport-aware for verification is novel, only the passport-free branch will be delivered to end-users to prevent malicious attack and there is no structure change in the target model for end-users. (2) Compare to [17], Since the method use a separate branch for passport-awareness, it does not need to modify all the BN layers in the target model as [17] did

Weaknesses: (1) The training cost is a least 2x time since two identical sized branch need to be trained.

Correctness: The claims and methods are correct.

Clarity: The paper is well-written and easy to understand

Relation to Prior Work: Yes, they compare the Relationship to [17], another BN based passport method.

Reproducibility: No

Additional Feedback:

Review 2

Summary and Contributions: This work deals with establishing ownership of a DNN, which is an important problem given resources and IP involved in training accurate models. The authors propose a passport-aware scheme by batch normalization which is implemented by an independent branch and learnable shift parameters. Such a design can improve the performance of the previous method. Extensive experiments demonstrate the effectiveness of the proposed method.

Strengths: The paper is well-organized and easy to follow. The method is new and differs from previous contributions. Extensive experiments are conducted to verify the effectiveness of the proposed method.

Weaknesses: The design of passport-aware branch seems to be similar to the Squeeze-and Excitation block [1] that is a popular module in network architecture area. Please provide any intuitions why the authors choose the transformation like in eq3. When comparing eq3 with eq4, the improvement of passport-aware normalization over eq4 seems a bit incremental. Moreover, the proposed scheme also introduces extra complexity in model parameters and network training. [1] Jie Hu, et al. Squeeze-and Excitation Networks, CVPR2018.

Correctness: The technical details of the article are basically correct. The motivation is valid.

Clarity: The writing quality is good. The authors clearly present the contributions of the paper.

Relation to Prior Work: To my best of knowledge, the related work section sec.3 is complete.

Reproducibility: Yes

Additional Feedback: There seen to be many normalization methods focusing on transformation on gamma and beta. It would be great if comparisons are provided.

Review 3

Summary and Contributions: This paper proposed a passport-aware normalization design by adding a new normalization branch beside the original norm. layer for IP protection. This work is an extension of [17] “Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks”, by addressing the network structure change and model performance drop issues.

Strengths: Contributions clearly stated and validated.

Weaknesses: Some concerns: 1. One big concern is, the proposed method is combined with existing trigger-set-based IP protection methods to support the black-box verification, however, if the existing trigger-set-based method is strong enough to verify the suspect model do we need an extra step (i.e., the proposed method) to confirm? Or is this two-step verification necessary? I think the authors should provide some discussions or data to claim it. 2. The training process of the passport-free branch and the passport-aware branch is in an alternative way. What's the training cost? Why the authors use this training strategy, rather than train them simultaneously? More details are appreciated. 3. The details of the trigger-set-based method are missing. What existing method did the author use? How many special sets of data are used to identify a suspect model? Will this impact the DNN performance? Could the authors provide some detailed evaluation results of this part?

Correctness: The proposed method is clearly claimed.

Clarity: This paper is well written and organized.

Relation to Prior Work: The auhtors clearly discussed the relationship between the proposed method and [17], i.e. [17] is one special case of the proposed method.

Reproducibility: No

Additional Feedback:

Review 4

Summary and Contributions: This paper considers the problem of intellectual property protection for a learned deep model. It extends previous work [17] which introduced the notion of a passport layer, whereby the performance of the network would be significantly deteriorated unless the genuine passport was supplied. However, the introduction of the passport layer made it not possible to use batch-normalization, and additionally degraded overall performance of the network somewhat. This paper modifies the passport layer formulation, allowing for the use of batch-normalization, and allowing for learnable parameters in the passport layer.

Strengths: As mentioned in the Broader Impact statement, IP protection for deep models seems like an important but currently under-researched area given the state of deep learning and usage in commercial products, so this paper could have wide relevance. It addresses a specific issue with the existing method of [17] and leads to a small but consistent increase in performance.

Weaknesses: The proposed method is somewhat incremental over the method of [17]. In order to produce a network for deployment (that does not require the passport), [17] uses multi-task learning to optimize performance of the network when the passport layers are used as well as when they are skipped. This paper essentially does the same thing, except the version where the passport layers are skipped is now a batch or group normalization layer, with normalization statistics decoupled from the passport layer, and the passport layer can also contain learnable parameters. However, the ablation analysis presented in the paper suggests that the learnable affine transformation parameters give only a very small improvement.

Correctness: From my understanding, yes.

Clarity: Overall yes, with some typos. However, the section describing the training on page 4, line 147-150, could be improved. Specifically, there is a difference between "alternative" - meaning "different from the usual standard", and "alternating" - meaning "switching between two modes/paths". I assume the authors meant "alternating". Beyond this change, it might be helpful to be explicit - the training consists of one pass/update with one branch, and then one pass/update with the other?

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback: update after reading author feedback: after reading the author response, although there are some empirical improvements, I still have the above concerns with novelty, and so maintain my original rating.