Review for NeurIPS paper: Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

NeurIPS 2020

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Review 1

Summary and Contributions: This paper proposes a simple technique DARP to refine the biased pseudo-labels for imbalanced semi-supervised learning (SLL), and DARP is applicable to many existing SSL methods. **The authors addressed my questions. The experiments do show promising results, but I think the theoretical gounding is a little weak. **

Strengths: 1. The proposed method DARP is simple and effective to improve the performance of existing SLL algorithms for handling imbalanced SLL problem. 2. The experiments are well-designed.

Weaknesses: 1. Algorithm 1 is proposed to solve Eq. (1), but the derivation process is not very clear and the notations are somewhat confused (e.g. X denotes the input, does it mean the same thing in Algorithm 1?). 2. It is declared in Sec. 3.2 that "DARP increases at most 20% of the overall training time of an existing SSL scheme". I don't quite understand. 3. The experiments show that DARP could improve the performance of the SLL methods. But I observe that the results of SLL methods which are not specifically designed for imbalanced SLL (VAT, Mean-Teacher...) are comparable with that of imbalanced SLL methods. This may seem a little strange, so I wonder the reason.

Correctness: The technique appears to be correct.

Clarity: The paper is readable.

Relation to Prior Work: The paper deals with a valuable problem (imbalanced SLL), reviews the previous works and compares with them. While there is still room for improvements in the motivation because the difference and connection with previous works are not very clear.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: Distribution Aligning Refinery of Pseudo-label (DARP) For semi-supervised learning (SSL), DARP is proposed to match the pseudo-labels with the underlying class distribution of the unlabeled data. The objective function is to minimize the KL divergence of the "aligned" pseudo-labels with the original pseudo-labels subject to the constraints that the "aligned" pseudo-labels are consistent with desired class/label distribution for the unlabeled data. To speed up the process, DARP uses a coordinate ascent algorithm for the Largrangian dual of the objective function. The evaluation was conducted with the CIFAR10 dataset with various artificially degrees of imbalance. DARP was used with a few existing algorithms for imbalanced SSL. One scenario is the unlabeled data has the same class distribution as the labeled data. A second scenario is when the two class distributions are different. In the second scenario, they estimated the class distribution of the unlabeled using essentially a validation set from the training data. DARP was also compared with two existing distribution matching methods. The results are generally favorable for DARP.

Strengths: The main strength of the DARP algorithm is to align pseudo-labels with the desirable class distribution in the unlabeled data for semi-supervised learning with imbalanced labeled data. Empirical results are favorable compared against existing techniques.

Weaknesses: The main weakness is the justification for the proposed method for estimating the desired/actual class distribution of the unlabeled data, which is not known. Another weakness is that only the CIFAR10 dataset was used for evaluation. [After the author feedback: while in the supplementary materials, Section F has justification and Sections C&D have results from additional data sets, summarizing some insights of the justification and key results from other datasets in the main text would be beneficial. ]

Correctness: The methods and empirical methodology are reasonable.

Clarity: The paper is generally well written. For the desired/actual class distribution for the unlabeled data (which is not known), the reasoning for the proposed estimation technique could be expanded in the main paper. This is quite important for the DARP algorithm. Also, expanding on why setting small probabilities to zero enhance the quality of pseudo labels would be beneficial.

Relation to Prior Work: Discussion with prior work is reasonable.

Reproducibility: Yes

Additional Feedback: line 32 and Figure 1b: the ratio seems to be less than 4 (100/25) in Figure 1b rather than 1046 on line 32. line 119: true distribution is assumed from the labelled training data? eq 1: what is \hat{y}_m? Looks like the refined pseudo labels line 134: why removing small entries enhances the quality of pseudo-labels? Table 1: test error gains are negative, might be simple to say test error reduction is positive line 203: test error gains are positive? (not consistent with Table 1) line 212: What is the justification for: "we approximate C^{unlabeled} using some small subset of the labeled dataset, which is not used for the training until the confusion matrix is estimated."

Review 3

Summary and Contributions: This paper focuses on semi-supervised learning under the class imbalance problem. They propose a class distribution matching method to refine the pseudo-labels for unlabeled data. Specifically, they try to refine the biased pseudo-labels distribution to match the true distribution of unlabeled data. Experiment results are reported on CIFAR-10 data sets. However, the proposed method is based on an assumption that the true distribution of unlabeled data needs to be known which is not feasible in real-tasks. Moreover, more discussion of the distribution estimation method needs to be discussed. And more experiment results on other data sets need to be reported.

Strengths: 1) The paper focuses on semi-supervised learning under class imbalance. This is an important problem and it is not well studied. 2) They propose a distribution matching method to refine the biased pseudo-label distribution. However, the method needs to know the true distribution of unlabeled data which is not feasible in real applications. So the contribution is quite limited. 3) The paper is relevant to the NeurIPS community.

Weaknesses: 1) The authors claim that the reason for SSL can not work well is that they adopt biased pseudo-labels. However, most deep SSL methods are based on the smooth assumption and encourage the original data and the augmented data have similar predictions. Actually, they don't need to assign a pseudo-label to an unlabeled example explicitly. So I think the claim does not make sense. 2) The proposed method in Section 3 is based on the true class distribution of unlabeled data. This is not feasible in real applications. Although the authors give an estimation method in the experiment section, it is still a problem with the practicability of the proposal. More analysis about the estimation method need to be discussed, for example, the distance between the estimated distribution and the true distribution 3) All experiment results are conducted on CIFAR-10 datasets, experiments on more data sets should be reported to demonstrate the effectiveness of the proposal. **The authors have carefully addressed issues on 2) and 3) in the rebuttal**

Correctness: The proposed method is based on the true distribution of unlabeled data that is not available in real applications

Clarity: No. More analysis of the distribution estimation method needs to be discussed.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: Semi-supervised learning models trained on label-imbalanced datasets tend to output even more biased prediction and therefore perform badly under balanced testing criterion. To overcome the problem, this work proposes an approach to refine pseudo labels to meet the prior label distribution.

Strengths: - As far as I know, it is the first work to explore the label-imbalanced problem in training deep SSL models. - The proposed approach is general and can be adopted in any self-training SSL method. - Authors empirically show that the proposed method consistently improves deep self-training methods such as MixMatch, ReMixMatch, FixMatch.

Weaknesses: - The work fails to compare with simple baseline methods that adopt both sample reweighting and unlabeled data. For example, reweight both labeled and unlabeled samples by using the inverse of (pseudo) label frequency. - It would be better to provide theoretical and empirical analysis on the computational cost of Alg. 1. - It would be better to perform experiments on real-world imbalanced datasets to validate the proposed method.

Correctness: To my best knowledge, the method is technically sound.

Clarity: The paper is well organized and clearly written.

Relation to Prior Work: There are many SSL methods that use class distribution as prior knowledge to align the prediction distribution. Please check section 7 of [1] for a review. However, the paper fails to refer to these works and discuss their relationship. [1] Xiaojin Zhu. Semi-Supervised Learning Literature Survey. Computer Sciences TR-1530, 2008.

Reproducibility: Yes

Additional Feedback: