NeurIPS 2020

Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion


Meta Review

The paper addresses the problem of rank one matrix completion when some rows can be adversarially corrupted. Motivations include crowdsourcing under the Dawid-Skene model and matrix completion in Yelp and other online platform. The reviewers agree that the topic and results are of interest to the community and recommend acceptance. There were three themes of concerns that were brought up in multiple reviews, as well as in my own opinion when reading the paper: (1) the assumptions underlying the work, and (2) appropriate credit and references to prior work. A reviewer also brought up concerns with (3) results. I discuss these issues in more detail below. IMPORTANT: In the revised version, please issue appropriate clarifications, references, and discussion to address these issues brought up by reviewers. (1) Assumptions underlying the work: Multiple reviewers question the application to recommendation systems "The user-item rating matrix in recommender systems are usually low-rank but not rank-one." "The focus on rank-one setting is restrictive. It is unclear how the results in the paper can be extended to more general settings." "I also agree with review #1 and have concerns on the rank-1 assumption which might be too strong? Unfortunately, it didn't receive an intuitive explanation." This is indeed a significantly restrictive assumption which was brought up by multiple reviewers but was not addressed in the rebuttal. The paper however seems to overclaim application to this setting. In my opinion, connections to this setting should be discussed with a clear highlight on the disconnect, and are best left as future work. Reviewer 3 also complains "One coin model assumption may not fit well with crowdsourcing tasks that have dramatically different item difficulties, which could be a potential issue." I do agree that this is an extremely stylized model and the community has moved beyond the one coin as well as the two coin model since neither of these models capture tasks that have dramatically different item difficulties. See instead the papers https://arxiv.org/pdf/1602.03481.pdf and  http://arxiv.org/pdf/1606.09632. In order to give a complete picture to the reader, please include a brief discussion on one coin, two coin, and more general models, possibly mentioning extensions to other models as future work. (2) Appropriate credit and references to prior work: A reviewer criticizes "The idea of removing the F largest and smallest values and the so-called (2F+1)-robust condition are borrowed from the consensus literature. The convergence analysis also builds upon the previous work." These need to be discussed more thoroughly in the main paper with proper references/credits to previous work. Reviewers 2, 3 and 4 provide relevant literature in crowdsourcing. (3) Results Reviewer 1 is also concerned about Theorem 1 being uninformative. In my own reading, I could not find the definition of "with high probability" anywhere. Is it 1-e^-n? or 1-1/loglog(n)? Or something else? Please issue appropriate clarifications regarding the two aforementioned points. In my reading, there is another issue that should be clarified in the camera ready. In the experiments with real data, the leftmost points are those which correspond to real data. The entire remaining plot is semi-synthetic. But if we look at what happens only on the real data, the algorithm performs comparably or frequently much worse than prior work. We do not know how many workers in this real data were adversarial. If there were many, then the algorithm is performing well only when a lot more adversarial workers are added in the right points. If there weren't then it goes against the crowdsourcing motivation of this work. The performance of the proposed algorithm improves as more adversaries are synthetically added. Such a tradeoff (between algorithms robust to adversaries vs. not, and where there exist adversaries vs. not) is natural, and must at least be discussed, also making it clear that the proposed algorithm does perform worse on the actual data obtained from the real world. --- Note: Review 4 was inappropriate in criticisms of the form "does not compare with related work" without actually providing concrete instances of what was not compared with. I've downweighted these parts of review 4 and requested the reviewer to update their review accordingly.