Reviews: Dual Variational Generation for Low Shot Heterogeneous Face Recognition

UPDATE: ======== After reading other reviewers’ comments and the rebuttal, I decided to raise my score by one point from 6 to 7. I am satisfied with the effort the authors made to address my two major concerns and I recommend to accept this submission in agreement with the other reviewers. Overview/Contribution: ======== The paper proposes a dual variational autoencoder to generate synthetic training data to combat the limited data in heterogeneous face recognition. The synthetic data tries to preserve identity via identity preserving generation both in the image and embedding spaces while providing sufficient variation for the training data of the downstream recognition task. The authors claim an 18% improvement in TPR while FPR @10e-5. Strengths: ======== - Most facial recognition tasks involve certain assumptions that constrain the task into homogeneous set of inputs. Heterogeneous face recognition (HFR) is an important task for many practical applications that is attracting attention recently. Learning heterogeneous face recognition with limited dataset by generating synthetic data is interesting. - The choice of paired unconditional generation instead of image-to-image translation seem to promote inter and intra-identity diversity while preserving the identity. - The semi-supervised HFR formulation with an added L2 loss component for the generated unlabeled pair could allow sufficient training of the network while implicitly learning the identity in the case of limited training data such as this HFR. - Capturing the visual quality and identity preservation using FID and MD in addition to the recognition metric and the ablation of omitting loss components in the generation in Table I is interesting. - Evaluating the method using multiple HFR datasets helps in drawing useful conclusion. This included near IR (NIR) and visual (VIS) pair and sketch and VIS pairs. Weaknesses: =========== - Although there are multiple datasets, the variation in terms of modality is limited to either NIR-VIS and Sketch-VIS. Other heterogeneous datasets with wide variations in resolutions, cameras and environmental conditions would have made the conclusions more stronger. As the task is HFR, more modalities would have been useful. - In the formulations, there were ‘trade-off’ parameters both in the generation and recognition models yet their effect on the overall recognition is not fully explored. Ablation experiments exploring the effect of each of those terms would have helped to pinpoint from where the significant improvement was coming from. Overall, the paper reads well and the task is relevant to NeurIPS audience. However, more datasets and more ablations could have helped especially when the major contribution of the paper is paired unconditional generation improves HFR. So, I suggest the authors add another dataset that is different from the pairing explored here such as pose and resolution pairing etc.

This paper presents a new unconditional Dual Variational Generation (DVG) framework that generates large-scale paired heterogeneous images with the same identity from noise. DVG promotes the inter-class diversity and makes the generated images can be used as augmented data to optimize recognition models by a pairwise distance constraint, aiming at reducing the domain discrepancy. Extensive experiments on four HFR databases show that the proposed method can significantly improve SOTA results. - Pros: The organization, writing and presentation are clear and easy to follow. The formulas are sufficient and correct. The idea is novel, the contributions are solid and the experimental results are impressive. - Cons: 1) PIM [Zhao et al., CVPR 2018] also achieves one-to-many face generation via noise term injection, please add corresponding discussion. Some related works should be mentioned, e.g., [2], [3]. [1] Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, Fang Zhao, Karlekar Jayashree, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng. Towards Pose Invariant Face Recognition in the Wild. CVPR, 2018. [2] Luan Tran, Xi Yin, Xiaoming Liu. Disentangled Representation Learning GAN for Pose-Invariant Face Recognition. CVPR, 2017. [3] Jian Zhao, Lin Xiong, Yu Cheng, Yi Cheng, Jianshu Li, Li Zhou, Yan Xu, Karlekar Jayashree, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng. 3D-Aided Deep Pose-Invariant Face Recognition. IJCAI, 2018. 2) Eqn. (10) has 3 hyperparameters (\lambda_1 to \lambda_3), and Eqn. (13) has 1 hyperparameter (\alpha_1). It is unclear how to assign appropriate values to these hyperparameters. A sensitivity analysis is recommended to make this work more complete. 3) Evaluation on complexity (training & inference) is recommended, which is important for real applications. - Additional minor comments: 1) The generated paired heterogeneous face databases are highly recommended to be released to push the research frontiers heterogeneous face recognition. 2) The word "multivariate" in Line 129 Page 4 could be revised to "multi-variate" to be consistent with the same word in Line 122 Page 4. The title of Sec. 4.3 & Sec. 4.4 could be re-considered since both sections are reporting experimental results.

Paper ID:	1536
Title:	Dual Variational Generation for Low Shot Heterogeneous Face Recognition

Reviewer 1

Reviewer 2

Reviewer 3