The reviews and scores on this paper were a bit divergent, but in the end, the reviewers agreed that, with the modifications proposed in the rebuttal, the paper presents an interesting model that is technically sound, and thus, appropriate for acceptance. However, the reviewers also all agreed that the authors need to scale back their claims of biological plausibility in the camera ready version. In particular, the following issues were noted in discussion: 1) The algorithm proposed still has a number of biologically implausible components to it, whether it is lack of excitatory recurrence, negative activities, etc. This is fine, but these failings should be recognized as short-comings, not swept away as unproblematic. 2) The proposed learning algorithm is spatially local, but not Hebbian in the typical meaning of the word. Specifically, the algorithm uses correlations across data points, not correlations in real-time (per Hebb's original proposal), and so is not in-line with what most people understand a "Hebbian" algorithm to be. Thus, the learning rule should be called "local", but not "Hebbian". 3) The supposed global third factor actually is a set of layer-by-layer (or group-by-group) calculations involving both activity from the layer and information about labels (the claim that labels aren't needed only works with one-hot outputs, which is also biologically questionable). Moreover, this third factor still must be communicated backwards to the appropriate layers at the appropriate time, i.e. a backward pass is still required. So, again, the references to a "global" signal, or claims that the need for labels or backward passes is eliminated, should be taken out of the paper.