Review for NeurIPS paper: Investigating Gender Bias in Language Models Using Causal Mediation Analysis

NeurIPS 2020

Investigating Gender Bias in Language Models Using Causal Mediation Analysis

Meta Review

The paper studies the problem of bias in neural models where the proposed solution is based on causal mediation analysis. The focus of the paper is on pre-trained transformer language models, GPT-2. The proposed method of using mediation analysis for analyzing attention heads and neurons through interventions is novel and interesting, and can be generalized to other types of biases. The paper is well-written, and experiments are thorough.