Review for NeurIPS paper: Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion

NeurIPS 2020

Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion

Review 1

Summary and Contributions: This paper targets at the problem of reconstructing a partially-observed rank-one matrix when some of the observed entries are corrupted with arbitrary perturbations. A new algorithm is proposed to recover the original rank-one matrix with bounded recovery probability. Experiments on both synthetic and real datasets showed that the proposed method can achieve better performance than other state-of-the-art methods.

Strengths: The main contribution of this work is theoretical, i.e., the recovery probability bound in Theorem 1. To the best of my knowledge, the results are new to the problem and could be used for other similar problems, .e.g., low-rank matrix completion with corrupted entries.

Weaknesses: The bound of Theorem 1 is tight according to the authors. But the bound seems not very informative in general cases, i.e., the probability of exact recovery bound is usually very small. It may be better to have some assumptions about the adversaries so that a bound related to the adversarial behaviors may be derived for more practical scenarios. I am not sure how close are the current results of this paper to recommendation problem. The user-item rating matrix in recommender systems are usually low-rank but not rank-one. It is better to discuss more on this because the paper selected recommender system and collaborative filtering as its areas. ------------------------- I have read the authors' rebuttal and will keep my original scores for this paper.

Correctness: The theoretical and empirical results seem correct to me.

Clarity: The paper is well-written.

Relation to Prior Work: Prior works are clearly compared and discussed.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper studies the problem of reconstructing a rank-one matrix from observed entries in the presence of adversarial corruption. It proposes a robust version of the alternating minimization algorithm by removing the F largest and smallest values in the aggregation step. It shows that the proposed algorithm correctly recovers the underlying matrix if and only if the set of observed entries represented as a bipartite graph satisfies the so-called (2F+1)-robust condition. Further, it shows that the proposed algorithm is information-theoretically optimal when the set of observed entries is given by an ER random graph. These results are then applied to the problem of learning workers’ skills in crowd-sourcing under the David-Scene model. Extensive experimental results demonstrate the improvement of the proposed algorithm compared to the state-of-the-art in the adversarial scenario. --- Update based on response The authors addressed some of my concerns. I am satisfied with the response.

Strengths: The proposed algorithm combines the idea of alternating minimization and the robust aggregation in network consensus. The results and analysis are solid. The performance characterization of the proposed algorithm is tight and the optimality condition in the ER random graph case is sharp. The application to crowd-sourcing is important. Extensive numerical studies further justify the applicability of the obtained results.

Weaknesses: (1) The focus on rank-one setting is restrictive. It is unclear how the results in the paper can be extended to more general settings. (2) The idea of removing the F largest and smallest values and the so-called (2F+1)-robust condition are borrowed from the consensus literature. The convergence analysis also builds upon the previous work. These need to be discussed more thoroughly in the main paper with proper references/credits to previous work. (3) The baseline methods used in numerical comparisons are mostly not designed for adversarial settings. Comparisons to existing robust methods are needed to demonstrate the improvement of the proposed algorithm.

Correctness: Yes

Clarity: The writing is very well written and the results are clearly explained.

Relation to Prior Work: Yes. The paper has an adequate discussion of the prior work and how this work differs from prior work.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: This paper has proposed a matrix completion method for the label inference problem in crowdsourcing under the classification (categorization) setting. This framework is particularly effective when the adversary workers, who intend to provide the incorrect labels, are involved. The writing is excellent and framework is straightforward to understand.

Strengths: - Very strong motivation and well-explained problem definition as well as background analysis. - The framework is elegant with nice theoretical ganrantees. - The experiments are extensive, many popular crowdsourcing data sets are covered.

Weaknesses: - One coid model assumption may not fit well with crowdsourcing tasks that have dramatically different item difficulties, which could be a potential issue. - A few key refererences that are highly related are not compared and explained. - Experiments needs further improvement to show the full effectiveness of the proposed framework.

Correctness: Technical details are mostly solid with minor typos without affecting the reading.

Clarity: Good writing. Clear and straightforward.

Relation to Prior Work: References need to be expanded and add a few baselines for experimental comprehensiveness.

Reproducibility: No

Additional Feedback: These are a few issues need to be addressed for acceptance: - The one-coin model is not that popular after the two-coin model's prevelance in 2012. I believe the authors may adopt his model for mathematical convinence, however, you should also add the popular two-coid models into comparison. These papers includes: 1. Variational Inference for Crowdsourcing. 2. Learning from the Wisdom of Crowds by Minimax Entropy. 3. Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing 4. The KOS model also has a two-coin version. Please adopt it if possible. - The matrix completion methods are popular for crowdsourcing problems. The current state-of-the-art are tensor-based completions on workers' labeling tensors. Please compare with the following SOTA methods. 1. crowdsourcing via tensor augmentation and completion - The study of the adversary workers in crowdsourcing has been emerged in recent years, please elaborate the difference between the proposed framework and the exsiting works. 1. Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction. 2. MultiC2: An Optimization framework for learning from task and worker dual heterogeneity. - The experiments are mainly focus on the performance with adversary workers being added. However, in real applications, adversary could be easily eliminated by inserting gold standard questions into tasks. The workers with consistently wrong answers on these gold standard questions will be banned. Therefore, it is more important to test the accuracy when the adversary's portion among workers are very small. However, this work didn't show much improvement in such scenario. - The crowdsourcing data set is usually very sparse, I don't suggest to remove the workers who has less than 10 items labeled. With this modification, the inference problem will become extremely easy. So, what about the experimental results using the raw dataset without this strong preprocessing step? - Notation error: e.g., line 41, nuclear norm of matrix X, not matrix L. - Notation abuse: rank-one matrix M and M-classification tasks. ---- review after rebuttal ---- The authors have addressed most of my doubts, I'm glad that the two-coin model's results are added to increase the comprehensiveness of your framework. However, it is still bizarre that two-coin performs worse than one-coin. I will raise my score to 7 for a good response from the authors. Please add the explanation in the latest version to explain your new experimental results. I also agree with review #1 and have concerns on the rank-1 assumption which might be too strong? Unfortunately, it didn't receive an intuitive explanation. Please address your motivation for using rank-1 matrix completion, maybe build the connection (not only the difference) with the standard MF-based methods and nuclear norm minimization methods.

Review 4

Summary and Contributions: The authors studied the problem of matrix completion in the presence of corrupted entries. To address the problem, the authors proposed an algorithm, which integrates alternating minimization and extreme-value filtering to perform reliable matrix completion. Experimental results on 17 real data sets show the performance of the proposed method.

Strengths: (1) Some theoretical groundings are provided to support the validity of the proposed method. (2) Extensive experiments are conducted to demonstrate the performance of the algorithm.

Weaknesses: (1) The writing of the paper can be improved. Some notions are not well described. (2) It is unclear about the novelty and actual contribution for advancing the state of the art of the crowd sourcing. (3) Some key works are not considered in both literature review and comparison experiments.

Correctness: The claims and empirical methodology are clear and correct.

Clarity: No, The notations are messy. Many are used on-the-fly without clear definition.

Relation to Prior Work: The technical innovations are minor and they don't bring any conceptual breakthrough. They are just adhoc straightforward modifications of known algorithms.

Reproducibility: Yes

Additional Feedback: The approach and the solution to the problem are very well handled. However, it also suffers from multiple drawbacks. Below are the detailed comments: (1) The studied problem is not novel and the proposed technicals are somehow related to the existing works on matrix completion based crowdsourcing. (2) The notations are messy. Many are used on-the-fly without a clear definition. For example, in line 126~127, the authors introduce "W workers" and "M classes". Are they equivalent to "n" and "m" in Def. 1? If yes, the authors may want to clarify it and ensure the consistency through the paper. (3) According to Eq. 2, it is clear that D&S model does not consider the different capability/reliability of workers across different tasks, which could be critical in real applications. The authors may want to provide insightful discussion and empirical studies to show whether the M-MSR can handle the capability/reliability of workers and how in real scenarios. (4) The authors may want to conduct comparison experiments between the proposed one and two-coin model. It is interesting to know whether the proposed method can beat two coin model and to what extend. (5) Some related works are missing, e.g., Houping Xiao, Jing Gao, Zhaoran Wang, Shiyu Wang, Lu Su, Han Liu. A Truth Discovery Approach with Theoretical Guarantee. KDD'16 Jing Gao, Qi Li, Bo Zhao, Wei Fan, Jiawei Han: Mining Reliable Information from Passively and Actively Crowdsourced Data. KDD 2016: 2121-2122 Houping Xiao, Jing Gao, Qi Li, Fenglong Ma, Lu Su, Yunlong Feng, Aidong Zhang: Towards Confidence Interval Estimation in Truth Discovery. IEEE Trans. Knowl. Data Eng. 31(3): 575-588 (2019) ########################################################## Thank you for the feedback. Some of my concerns have been addressed, and I've adjusted my score accordingly. I've also included additional references.