The reviews on this paper are mixed; even reviewers supporting acceptance stated during the discussion phase that this paper would benefit from another round of revisions before publication. The paper explores a combination of several ideas, most notably the separation of similar and dissimilar tasks along with masking of critical parameters, with significant improvements during continual learning. The results seem solid and demonstrate the strength of the proposed method. This is a nice contribution. However, the reviewers had significant concerns about the computational complexity of the proposed method. There are also a variety of clarity issues throughout the paper, including both confusing presentation or wording, and minor misstatements about previous work — the paper must have these corrected before publication. In particular, there is one aspect of the review that absolutely MUST be corrected before publication. This area chair agrees with the reviewer 3’s concerns that the paper includes misstatements regarding previous work, and disagrees with the authors’ rebuttal to this point. Many previous works did explicitly consider mixes of similar and dissimilar tasks, such as in the references mentioned in the reviews and those cited in [Chen & Liu, Lifelong Machine Learning, 2nd Edition, Ch 9.3.5]. This makes the author's statement "To the best of our knowledge, no existing work has been done to learn a sequence of mixed similar and dissimilar tasks that deal with forgetting and improve learning at the same time" in Line 93 (now) inaccurate. This authors have a responsibility to correct this before publication. Note that these earlier methods would likely not have the same level of performance as this method. Regardless, the technique used in this paper is novel and has shown good performance gains, so it would be beneficial to the community. The private comments to the Area Chair were taken into consideration. Minor notes: -Citations should all be in  or () so that they’re separated from the main text. -There is a mistake in line 86. Backward transfer was also investigated in some of the works mentioned on line 85 under the name "reverse transfer", not just in Wang et al 2019. -In response to the second concern you mentioned privately to the Area Chair, the issue likely stems from that conclusion not being mentioned explicitly in lines 286-290, and only coming from careful analysis of Table 3. To prevent such a misunderstanding, you should explicitly state what the ablative study shows in terms of the importance of similarity detection.