NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:9260
Title:Generative Models for Graph-Based Protein Design

Reviewer 1

I think the idea of using attention or transformer inspired architectures for protein modelling is useful and the authors changes on the standard transformer are helpful since the full attention computations are typically costly.

Reviewer 2

This paper addresses the problem of generation of protein sequences for a desired 3D structure, also known as the “inverse protein folding problem”. The authors introduce a model inspired by recent advances in language modeling (for the sequence decoder part of the model) and graph representation learning (for the encoder part of the model). Protein structures are represented as k-NN graphs, enriched with orientation/location-based features and features based on structural bindings. The encoder takes the form of an adapted Graph Attention Network, here termed “Structured Transformer”, which is enriched with edge features and relative positional encodings, and the decoder takes the form of an auto-regressive Transformer-based model. Results indicate improvements over a recent deep neural network baseline for this task. The problem is of high significance and the authors make several non-trivial contributions that clearly improve the state of the art in this area. The paper is very well written and generally of high quality. It is well-positioned w.r.t. related work and all the contributions are well-motivated. I am not an expert in the area of “inverse protein folding”, but the paper did a great job at introducing the problem and related work. It would be good, however, to provide a more detailed description of the SPIN2 baseline and discuss differences to this particular model. The experiments are chosen well, but it would be nice to see error bars on results and further ablation studies, especially on the proposed attention mechanism and the relative positional encodings. Overall, I can recommend this paper for acceptance. The authors seem to take it as a given that a Transformer-based model is naturally the best fit for this task, but I wonder whether a (potentially simpler) message passing neural network as in Gilmer et al. (ICML 2017) would perform similarly when used as an encoder for this task. To be more precise, this would correspond to performing a) an ablation study on the attention mechanism (i.e., leaving out a_{ij} in the update for h_i), and b) using a small MLP to transform v_{ij} instead of a linear map — this corresponds to the message in the message passing neural network framework. --- My questions have been addressed in the rebuttal and I am looking forward to seeing the comparison against Message Passing Neural Networks and a discussion of the SPIN baseline in the updated version of the paper. My recommendation remains accept, but I leave it to the other reviewers to judge the relevance of the new results comparing their method against non-deep learning baselines (for which I do not have any expertise).

Reviewer 3

Originality: Overall, the approach taken in this work differs significantly from past work. Unlike previous applications of deep learning to protein design, the authors present a model capable of modeling the joint distribution of protein sequences conditioned on their structure. The authors extend the Transformer to include sparse relations and handle data that is spatially structured. The authors provide a new representation of protein structure using a graph with relative spatial encodings. This representation is likely useful beyond the scope of protein design. Quality: The submission is technically sound and the experimental results are clearly described and analyzed. I do have one concern: in order for this algorithm to be practically useful, there must be a way to decode a sequence with high probability under the model given an input structure. By only looking at log-likelihoods of individual sequences in the evaluation metrics, the authors avoid using any decoding strategies. One concern is that given the large length of protein sequences, greedy decoding strategies such as beam-search will produce low quality sequences (i.e. sequences with low log-likelihoods). A simple experiment would be comparing log-likelihoods of the top sequences from the authors choice of a decoding strategy to the log-likelihood of native sequences. I would ask the authors to include a discussion of decoding strategies and experiments in the paper. Clarity: This paper is well written. The related works section is thorough and makes clear what the deficiencies of previous deep learning approaches are. The explanations of the graph representation and neural architecture are also clear. The results section provides a nice analysis of perplexities based on random protein sequences and first order profiles of protein sequences from the Pfam database. There were a few aspects of the paper that could use clarification and/or expansion. 1) I do not think the authors explain what the input backbone coordinates are? Since there is one coordinate for each residue, I would assume the authors are using the C_alpha coordinate. 2) I believe there is a typo on line 198. "We find that the model is able to attain considerably improved statistical performance over other statistical performance." Significance: The results provide a significant improvement over previous deep learning approaches but fail to make any comparison to traditional protein design algorithms. In my opinion, these comparisons would provide substantial value by illustrating how deep learning methods compare to conventional approaches. Nonetheless, the approach that the authors take is unique and seems to work reasonably well. The representations and architecture are likely useful outside of the scope of protein design. The authors do not provide any discussion of protein redesign, in which only a subset of the amino acid residues are designed and the rest are maintained at wild-type. The approach seems sub-optimal for redesign since it is tied to a particular choice of a decoding order. The only option I see to perform redesign with this method is to enforce the amino acids that are not being designed to their identities during the decoding. However, this might require more sophisticated decoding strategies to produce high-probability sequences. This limits the practical use of the method since full design is rarely the task at hand for practical protein design problems. --- My primary concerns were addressed in the rebuttal stage. I am updating my score from a (6) to a (7).