Reviews: Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

The paper uses brain activity data (fMRI and MEG) obtained from subjects while reading natural text and computes representations of NN models (ELMO, BERT, etc.) on the same text data. The goal is to see which layers predict brain activity in different areas of the brain, as well as the role of context size for each of the method. Conclusions: - T-XL increase prediction accuracy with increase of context size - BERT and T-XL capture context in a way that is relevant to predicting brain activity in their middle layers - Removing the attention at various layers for BERT has similar effects on brain prediction and NLP tasks: lower layer uniform attention for BERT is better in both cases, while at layer 11 it decreases performance in both cases. While this type of analysis is not completely novel, the observations made are new and very interesting. For the most part the paper is clearly written (See Improvements section for clarification questions). The paper would be stronger if the empirical implications of the observations (the attention removal) would be tested more.

Reviewer 2

This paper presents interesting work on using brain imaging data to measure the quality of linguistic representations. Although the authors claim that they are the first in doing this, I believe there is a long tradition with this scope. See, for instance, https://aclweb.org/anthology/papers/D/D13/D13-1202/. Nonetheless, this is important work. I can't judge for the neuroscience methods, but otherwise the paper seems pretty solid. There are a few points that I'm not completely sure about: * The authors explore a fixed length window, whereas many of these models are trained at the sentence level. Aren't they introducing some arbitrary artifacts there? * The data reported in Figure 2 may be also interesting to report in a quantitative fashion (like, the distribution of red/blue areas in regions 1 and 2). * What are called NLP tasks, they are more like syntactic processing diagnostic tasks. Probably worth distinguishing them from downstream NLP tasks (like, sentiment analysis)

Reviewer 3

This work describes experiments done using pretrained word embedding model representations, and fMRI and MEG representations when reading the same text. Bringing these together is original and an interesting avenue of research, yet I have doubts about the significance, clarity and quality of this work. There are several points where references would have been needed to refer to prior work, or to back up some claim (e.g. line 41, line 63). The paper furthermore has a significant part of its material deferred into the appendix, including parts that are crucial to understand the experiment. From the main paper alone it is for example unclear which metrics are used when evaluating the fitted linear models, and even information as basic as whether the task is a regression or a classification task. In the main prediction task, no other baselines are tested (e.g. prediction from previous brain activity alone, without any text encoder representations). Linking observations made with the brain activity prediction model (uniform attention) to better NLP task performance is a relatively weak argument that would in my view need additional justification or empirical support. I am not convinced that the presented results (Table 1) are clearly different from the base model (small sample sizes, multiple testing) nor that the chosen syntactic tasks are very meaningful for NLP tasks in general. I also believe that predictiveness for other tasks than brain activity could equally have suggested the uniform-attention layer modifications, and one would have to be more precise in what exactly predicting fMRI or MEG adds, as compared to predictiveness of other NLP signals (e.g. syntactic). Changing the BERT model architecture after pretraining is somewhat of a hack -- the model would have to be retrained again with the new architecture to test whether the claimed insights on architecture have a practical benefit. I did not obtain insight or interpretation of the pretrained neural models, as suggested in the abstract ("We propose here a novel approach for interpreting neural networks [...]"), to the point that I think this is misleading. There are very specific hypotheses made: "When we align that specific network representation with fMRI and MEG data, the result will be a decomposition of the representation into parts that correspond to different processes and should therefore be more interpretable." (line 67) which are neither experimentally tested nor referred back to later on. Overall I think this is a very interesting direction of research, the paper is well-related to prior work, and takes care about some important experimental aspects (cross-validation, multiple participants, etc.). But overall the line of argumentation, presentation, and experiments have not convinced me.

Paper ID:	8531
Title:	Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Reviewer 1

Reviewer 2

Reviewer 3