This is a paper for which the reviewers failed to reach consensus, despite each reviewer having read and responded to author feedback in their review, and despite a discussion amongst the reviewers. Thus, I have read the paper myself and have also read each review carefully along with the author response, and will be including my own viewpoints of the points raised by the reviewers. The primary points raised in favor of the paper are well summarized by R3 in discussion, who wrote "However, what makes me feel good are the three instructive issues discussed by the authors and the corresponding solutions presented in this work. The proposed approach is not that hard to implement but can address the key problems in co-saliency detection and finally obtains the outperforming experimental results. So this work should be insightful to the co-saliency detection community." The primary point raised against the paper is a perception of lack of novelty, viewing the paper more as an integration of previous approaches. After careful thought and consideration of each reviewers thoughts on this topic, I have decided to down-weight this point for the following reasons. First, I do think it's important that every paper have a novel contribution -- but this does not necessarily mean a novel method. The contribution can be new results helping to understand previous methods more fully. Indeed, I personally feel that our field has far too much emphasis on "new methods" and far too little time or space given to verification and understanding of previous methods. So, I think that a paper that does a good job of selecting and integrating previous methods that work well together and showing the results on this can be seen as an important and useful contribution in this way. The second criticism raised is that there is some concern about the experimental setup. In discussion, R2 notes: Indeed,  also uses an extra SOD dataset (MSRA-B which contains 2,500 training images) to train their model. However, the proposed model relies on a much larger dataset i.e., DUTS, which includes 10,553 training images. Since the authors treat  as the baseline for comparison, I am not clear why they did not use the same train set. Previous works have show that using DUTS as train set can bring significant performance boost in salient object detection, so it is hard to know whether the performance improvements over  is from additional data or the proposed OIaSG." I believe the authors have attempted to address this issue in the supplementary material, which notes that "Note that, for fair comparison, the backbone feature extractor in baseline model is pretrained on a single-image saliency detection dataset (i.e., DUTS ) for shared feature learning." Furthermore, in their author response, they note "Effectiveness of OIaSG: Sorry for making the confusion. As mentioned in the ablation study, we have pre-trained the baseline for the SOD task on the DUTS dataset, which could prove the superiority of our OIaSG scheme." Thus, I think that the authors have indeed produced a reasonable apples-to-apples comparison environment giving the baseline method the same data to train on. However, I do expect the authors to clarify this explicitly in the final version of the paper so that it is easy to understand. As a minor point, I think that the discussion of related work that happens in the introduction would benefit from the signpost sub-heading "Related Work", to help readers navigate the paper. Finally, I am flagging to Senior Area Chair for additional review, given the unusually wide level of disagreement here.