Reviews: Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks

Establishing ownership of a DNN is an important problem especially given the resources and IP involved in training accurate models. The paper identifies gaps in previous efforts based on watermarking, and proposes a new method that is robust to attacks. Overall I liked the paper. It explains the key observation driving the design, and then discusses a few ways for embedding passports during training that create a dependence between the models performance and the passport input. I do have the following questions/concerns. 1) In definition 1, is D_t a pre-defined dataset i.e. known at training time? If it is not known at training time, it is unclear how M_t is computed. And if it is, is it also known to the attacker? 2) Is epsilon_f also known during training? If it is not, how does training guarantee Proposition 2, II? It appears that epsilon_f is set after training based on model performance (line 273), which seems very ad hoc. 3) The experiment to evaluate robustness to persistent reverse engineering attacks (where the adversary is assumed to have access to the training dataset) is not entirely satisfactory. The paper only explores one way of reverse engineering passports i.e. freezing training weights. There may be other ways of training e.g. freezing weights, maximizing distance from original passport, and minimizing accuracy loss. More broadly, is there an alternative way of show robustness without constraining what the adversary can do.

Reviewer 2

- Originality The method is new and differs from previous contributions. The related work is adequately cited. - Quality For most of the submission: it is technically sound; the claims are supported by theoretical analysis and experimental results; a complete piece of work; the authors are careful about evaluating the strengths and weakness of the work. The part for the resilience against ambiguity attacks is a little bit weak to me. - Clarity The paper is clearly written and well organized. - Significance The paper address an interesting task overall and present a new idea. Question about resilience to ambiguity attacks - Different from the existing watermarking methods which do not need any training/test data in verification, this method uses test performance as a metric to verify ownership — requires dataset in verification. To be fair, if anyone wants to forge watermark and is informed about the verification method (on passport layer) and verification data. They could easily learn qualified passport using same objective with the weights fixed. If they're not informed how the verification works, surely it's hard to construct ambiguity attack, but this also holds true for any other watermarking method. Together with the increasing complexity of both training and verification process, it's questionable whether people would use this idea in practice.

Reviewer 3

The paper: - shows an important weakness of the current watermarking methods, namely the fact that they are prone to ambiuity attacks, - offers an analysis of the issue investigating the requirements that have to be fullfiled by any method that should withstand such attacks, - proposes such a method based on "passport layers" which are appended after convolutions. Overall the paper is well structured and the method is explained with enough detail to probably allow reimplementation. The text is clear enough with the exception of the experiments section, which would require some additional attention from the authors. Details follow below. Concerning the method I would be interested in seing how much does the performance (accuracy) suffer because of including the passports (no passports vs. the V1 setting) and because of the multi-task setting (V2/3 vs V1). In general a comparison of the three proposed settings V1, V2, V3 is missing from the experiments/discussion. Specific comments to the experiments follow: - It is not clear whether the experiments use V1, V2, or V3? - It is not entirely clear what the Table 2 shows. I guess the numbers in parentheses are the accuracies either on the source task or after fine-tuning on the target task, and the numbers in front of the parentheses is the fraction of cases when the signature withstood the fine-tuning. Either the table headers or the legend should be improved. (Also please make the left and right tables symmetric in how the numbers are shown - with or without the "%" sign.) - In Fig. 4 legend, please specify the performance metric (accuracy?) instead of writing "DNN performances". - Consider reformulating sentences in the "Experiment results" section to make understanding the experiments easier. Especially paragraph on fine-tuning (L245-53), or sentences like "In this experiment..." (L255). Sometimes one has to search for the meaning as in the sentence "This type of weight pruning..." (L256) where it is not clear which special kind of weight pruning (if any) is refferred to. In subsection 4.2, it is not entirely clear what the "fake2" attack consists of, please clarify. - In Fig. 5, it would be helpful to specify what does "valid" and "orig" differ in. - Figures use too small font that makes reading them hard (especially Figs. 3 & 5). Please adapt the figures.

Paper ID:	2632
Title:	Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks

Reviewer 1

Reviewer 2

Reviewer 3