NeurIPS 2020

Towards Neural Programming Interfaces

Review 1

Summary and Contributions: The paper proposes a Neural Program Interface to interface with a pretrained language model by manipulating the hidden activations of the pretrained model to produce desired outputs. The pretrained models can be re-purposed for new tasks without overwriting any aspect of the language model.

Strengths: - The problem setting is useful and timely for the GPT-like models.

Weaknesses: - The method is only evaluated for GPT-2. It would be nice if the method can also be applied to other seq-to-seq models. - There should be more human evaluation involved in the experiments. It's important for proving the claim. I don't think the current evaluations are sound enough. - The method should be compared with [iclr-20] PLUG AND PLAY LANGUAGE MODELS.

Correctness: - There should be more human evaluation involved in the experiments. It's important for proving the claim. I don't think the current evaluations are sound enough.

Clarity: - Overall, the paper's presentation can be improved. - Figure 1 can be re-plotted to better describe the methods. - Table 1 and Table 2 don't use the same table format.

Relation to Prior Work: - The method proposed in [1] is simpler and also works well. There should be more comparisons between [1] and the proposed methods. [1] ICLR --- Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Reproducibility: Yes

Additional Feedback: - Is the methods compatible with different decoding algorithms, such as top-p sampling? - In the fifth example of Table 4, the NPI results are repetitive. Whether did the repetition issue is also considered in the NPI module? - How many examples are needed to train the NPI models?

Review 2

Summary and Contributions: This paper proposes a controllable text generation model where control perturbation vectors are added to the hidden states of (a subset of) layers of a pre-trained model (here, GPT-2). The control vectors are predicted by a NPI network that takes the hidden states of a (possibly different) pre-trained model as input. The NPI network is trained to predict the desired GPT-2 activation perturbations corresponding to the desired output. As training data, a fixed window size of text is generated by GPT-2, which is labeled as corresponding to the desired behavior or not. Experiments show that the proposed approach can be used to improve the probability that a generated sequence contains a given word (while fine-tuning fails completely in this task), or reduce the probability that the model generates one or a set of words. The best models included the desired word in 54% of outputs, and include words to avoided in 11% or 16% of outputs.

Strengths: - Proposes an approach to control the output of neural language models without fine-tuning, which leads to better control than fine-tuning.

Weaknesses: - The experiments are limited to learning to include or exclude specific words, which is a quite narrow task in controllability - there has to be something more interesting than trying to get the model to predict the word "cat". - There is a lack of simple baselines, as well as any comparison to previous work on text controllability, to put the results in context (see below). - The model description is not clear enough, in particular with regards to how the NPI model is trained (see below). Moreover, it is unclear how the model is learning what is the effect of the perturbations through multiple layers of the pre-trained network. - In many of the generated text examples (in Table 40, in particular for offense-avoidance, the quality of the NPI text output is worse than that of the original model.

Correctness: The methodology is correct, but it is unclear that the approach is conceptually well-suited to the problem of learning to control the output for specific attributes, and why it should be expected to pick up on the desired attributions more than fine-tuning would.

Clarity: The description of how the NPI model is trained is not clear enough (3.1.2). The loss function does not show what loss is used against D_out (the output of NPI model X), and the interaction between X, the classifier Y, and discriminator Z is also not clear.

Relation to Prior Work: Related work is discussed, but there isn't any experimental comparison with previous approaches to controlled text generation.

Reproducibility: No

Additional Feedback: - There isn't any ablation studies on the model. One possible ablation would be to only learn pertubations that are applied to the output layer of the model, rather than to the hidden states across multiple layers. - A simple baseline for avoiding to generate words is to have a hard restriction setting its output probabilities to 0 in the output. So to show the benifits of the proposed approach, you need to be able to show that it has some advantages over this baseline (e.g. generating more fluent text, but from the examples it doesn't really seem so). - For word inclusion, could have a baseline that increases the probability of the target word or decode it when its probability or rank is above some threshold (probably only until it has been generated, to avoid unnecessary repetition). --- Thanks for your response. While I appreciate the additional baseline experiments, the actual results are not given, so this does not sufficiently address my concerns about the very limited form of "topic control" that the model is evaluated on which doesn't fully justify the proposed model. Additionally, neither the response nor the appendix addresses my concerns about the clarity of the model description.

Review 3

Summary and Contributions: This paper proposed to create neural program interfaces (NPIs) to control the behaviour of large language models that unconditionally generate text sequences. They do this by keeping the original model intact, but manipulating hidden activations at output time, to get desired outputs through this interface. They also contribute a dataset and qualitatively show the impact of this framework on experiments on important tasks (e.g., offensive speech filtering) to highlight how large language model outputs can be controlled. Update: My score was fairly positive and I stand by it. I think this paper is (1) well placed in time, since we now have large language models that need to be tweaked/changed/interfaced with without overwriting too many parameters, and there is substantial progress to be made on the best/most efficient ways to do it. I think the main flaw in this is that they are unaware of previous work that attempts to do the same, but I still think their contributions are different enough to warrant usefulness to the community/others doing work in this area to make progress.

Strengths: 1. This is well-placed in the literature and focused on an important topic. 2. Besides other benefits (e.g., being able to control outputs) this is far more efficient and at a much lower cost than a fine-tuning method that attempts to update weights and re-train components of the model. 3. I think the idea/ties to NPI work is fascinating. There are obvious limitations when allowing control through fixed programs, but I think this is a promising approach to allow changes to be made to large generative language models (if the outputs are sound after this control). 4. The authors describe in detail the components of the loss function and the experimental setup. However, I think more detail can be given to the additive components of the loss (fluency etc.,). This is especially important, given that the qualitative results do not actually look that fluent. 5. The authors include a broader impact statement about potential downfalls/biases that could propagate through.

Weaknesses: 1. "no permanent changes are made to the weights of the original language model, allowing us to re-purpose pretrained models for new tasks without overwriting any aspect of the language model" is an efficient/interesting/useful way to allow prevalent information to be propagated through/not subject to any kind of language drift. However, this possible means that biases and inaccuracies of the original model can be exacerbated through the process. (in general, this is not particularly a weakness, but I think is good that it is addressed/elaborated) 2. Although the loss function has a component that attempts to address fluency, the NPI-GPT2 outputs do not actually look that sensible/fluent. It would be good if the authors could give insights/experiment with different ways of not allowing this drift to occur. 3. Although the dataset collection is interesting (and necessary for the process) it seems somewhat time consuming and intractable.

Correctness: The experimental setup is sound and explained in detail.

Clarity: The paper is written well and all experimental details and tables and figures are adequately explained.

Relation to Prior Work: This is well positioned in the literature.

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: The work aims at controlling the natural language generation using a pretrained model using Neural Program Interface- a NN that learns to manipulate the activations of the pretrained model to produce the desired output. Training NPI needs less number of training samples and avoids risks involved in finetuning LMs. The work applies NPI for some simple tasks to show the application of NPI.

Strengths: The training scheme and the experiments are well thought out and rigorous. The approach is simple and interesting.

Weaknesses: The experiments can be carried out on other LMs as well. In some of the examples shows in Table 4, the NPI output is less coherent and repetitive than GPT-2's. Some analysis on this front could strengthen the paper. The description of the NPI model is not provided.

Correctness: Yes I wonder is the peturbations can take the activations out of the activation distribution the layer was originally trained on. In which case constraining the peturbations can help.

Clarity: Formulations in Section 3 and algorithm 1, and Section 3.1.1 are difficult to understand, rewriting can help it.

Relation to Prior Work: Yes

Reproducibility: No

Additional Feedback: If the code is not getting released, then a description of NPI is necessary.