{"title": "Spherical convolutions and their application in molecular modelling", "book": "Advances in Neural Information Processing Systems", "page_first": 3433, "page_last": 3443, "abstract": "Convolutional neural networks are increasingly used outside the domain of image analysis, in particular in various areas of the natural sciences concerned with spatial data. Such networks often work out-of-the box, and in some cases entire model architectures from image analysis can be carried over to other problem domains almost unaltered. Unfortunately, this convenience does not trivially extend to data in non-euclidean spaces, such as spherical data. In this paper, we introduce two strategies for conducting convolutions on the sphere, using either a spherical-polar grid or a grid based on the cubed-sphere representation. We investigate the challenges that arise in this setting, and extend our discussion to include scenarios of spherical volumes, with several strategies for parameterizing the radial dimension. As a proof of concept, we conclude with an assessment of the performance of spherical convolutions in the context of molecular modelling, by considering structural environments within proteins. We show that the models are capable of learning non-trivial functions in these molecular environments, and that our spherical convolutions generally outperform standard 3D convolutions in this setting. In particular, despite the lack of any domain specific feature-engineering, we demonstrate performance comparable to state-of-the-art methods in the field, which build on decades of domain-specific knowledge.", "full_text": "Spherical convolutions and their application in\n\nmolecular modelling\n\nWouter Boomsma\n\nDepartment of Computer Science\n\nUniversity of Copenhagen\n\nwb@di.ku.dk\n\nJes Frellsen\n\nDepartment of Computer Science\n\nIT University of Copenhagen\n\njefr@itu.dk\n\nAbstract\n\nConvolutional neural networks are increasingly used outside the domain of image\nanalysis, in particular in various areas of the natural sciences concerned with\nspatial data. Such networks often work out-of-the box, and in some cases entire\nmodel architectures from image analysis can be carried over to other problem\ndomains almost unaltered. Unfortunately, this convenience does not trivially\nextend to data in non-euclidean spaces, such as spherical data. In this paper, we\nintroduce two strategies for conducting convolutions on the sphere, using either\na spherical-polar grid or a grid based on the cubed-sphere representation. We\ninvestigate the challenges that arise in this setting, and extend our discussion to\ninclude scenarios of spherical volumes, with several strategies for parameterizing\nthe radial dimension. As a proof of concept, we conclude with an assessment of the\nperformance of spherical convolutions in the context of molecular modelling, by\nconsidering structural environments within proteins. We show that the models are\ncapable of learning non-trivial functions in these molecular environments, and that\nour spherical convolutions generally outperform standard 3D convolutions in this\nsetting. In particular, despite the lack of any domain speci\ufb01c feature-engineering,\nwe demonstrate performance comparable to state-of-the-art methods in the \ufb01eld,\nwhich build on decades of domain-speci\ufb01c knowledge.\n\n1\n\nIntroduction\n\nGiven the transformational role that convolutional neural networks (CNNs) have had in the area of\nimage analysis, it is natural to consider whether such networks can be ef\ufb01ciently applied in other\ncontexts. In particular spatially embedded data can often be interpreted as images, allowing for\ndirect transfer of neural network architectures to these domains. Recent years have demonstrated\ninteresting examples in a broad selection of the natural sciences, ranging from physics (Aurisano\net al., 2016; Mills et al., 2017) to biology (Wang et al., 2016; Min et al., 2017), in many cases showing\nconvolutional neural networks to substantially outperform existing methods.\nThe standard convolutional neural network can be applied naturally to data embedded in a Euclidean\nspace, where uniformly spaced grids can be trivially de\ufb01ned. For other manifolds, such as the sphere,\nit is less obvious, and to our knowledge, convolutional neural networks for such manifolds have not\nbeen systematically investigated. In particular for the sphere, the topic has direct applications in a\nrange of scienti\ufb01c disciplines, such as the earth sciences, astronomy, and modelling of molecular\nstructure.\nThis paper presents two strategies for creating spherical convolutions, as understood in the context\nof convolutional neural networks (i.e., discrete, and ef\ufb01ciently implementable as tensor operations).\nThe \ufb01rst is a straightforward periodically wrapped convolution on a spherical-polar grid. The second\nbuilds on the concept of a cubed-sphere (Ronchi et al., 1996). We proceed with extending these\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fstrategies to include the radial component, using concentric grids, which allows us to conduct\nconvolutions in spherical volumes.\nOur hypothesis is that these concentric spherical convolutions should outperform standard 3D\nconvolutions in cases where data is naturally parameterized in terms of a radial component. We test\nthis hypothesis in the context of molecular modelling. We will consider structural environments in a\nmolecule as being de\ufb01ned from the viewpoint of a single amino acid or nucleotide: how does such an\nentity experience its environment in terms of the mass and charge of surrounding atoms? We show\nthat a standard convolutional neural network architectures can be used to learn various features of\nmolecular structure, and that our spherical convolutions indeed outperform standard 3D convolutions\nfor this purpose. We conclude by demonstrating state-of-the art performance in predicting mutation\ninduced changes in protein stability.\n\n2 Spherical convolutions\n\nConventional CNNs work on discretized input data on a grid in Rn, such as time series data in R\nand image data in R2. At each convolutional layer l a CNN performs discrete convolutions (or a\ncorrelation)\n\n(cid:88)\n\nCl(cid:88)\n\nx(cid:48)\u2208Zn\n\nc=1\n\n[f \u2217 ki](x) =\n\nfc(x(cid:48))ki\n\nc(x \u2212 x(cid:48))\n\n(1)\n\nof the input feature map f : Zn \u2192 RCl and a set of Cl+1 \ufb01lters ki : Zn \u2192 RCl (Cohen and Welling,\n2016; Goodfellow et al., 2016). While such convolutions are equivariant to translation on the grid,\nthey are not equivariant to scaling (Cohen and Welling, 2016). This means that in order to preserve\nthe translation equivariance in Rn, conventional CNNs rely on the grid being uniformly spaced within\neach dimension of Rn. Constructing such a grid is straightforward in Rn. However, for convolutions\non other manifolds such as the 2D sphere, S2 = {v \u2208 R3|vv\n= 1}, no such planar uniform grid is\navailable, due to the non-linearity of the space (Mardia and Jupp, 2009). In this section, we brie\ufb02y\ndiscuss the consequences of using convolutions in the standard non-uniform spherical-polar grid, and\npresent an alternative grid for which the non-uniformity is expected to be less severe.\n\n(cid:124)\n\n2.1 Convolutions of features on S2\n\nA natural approach to a discretization on the sphere is to represent points v on the sphere by their\nspherical-polar coordinates (\u03b8, \u03c6) and construct uniformly spaced grid in these coordinates, where\n(cid:124). Convolutions on such\nthe spherical coordinates are de\ufb01ned by v = (cos \u03b8, sin \u03b8 cos \u03c6, sin \u03b8 sin \u03c6)\na grid can be implemented ef\ufb01ciently using standard 2D convolutions when taking care of using\nperiodic padding at the \u03c6 boundaries. The problem with a spherical-polar coordinate grid is that it is\nhighly non-equidistant when projected onto the sphere: the distance between grid points becomes\nincreasingly small as we move from the equator to the poles (\ufb01gure 1, left). This reduces the ability\nto share \ufb01lters between different areas of the sphere.\n\nFigure 1: Two realizations of a grid on the sphere. Left: a grid using equiangular spacing in a\nstandard spherical-polar coordinate system, and Right: An equiangular cubed-sphere representation,\nas described in Ronchi et al. (1996).\n\n2\n\n\fFigure 2: Left: A cubed-sphere grid and a curve on the sphere. Right: The six planes of a cubed-sphere\nrepresentation and the transformation of the curve to this representation.\n\nAs a potential improvement, we will investigate a spherical convolution based on the cubed-sphere\ntransformation (Ronchi et al., 1996). The transformation is constructed by decomposing the sphere\ninto six patches de\ufb01ned by projecting the circumscribed cube onto the sphere (\ufb01gure 1, right). In\nthis transformation a point on the sphere v \u2208 S2 is mapped to a patch b \u2208 {1, 2, 3, 4, 5, 6} and two\ncoordinates (\u03be, \u03b7) \u2208 [\u2212\u03c0/4, \u03c0/4[2 on that patch. The coordinate are given by the angles between the\naxis pointing to the patch and v measured in the two coordinate planes perpendicular to the patch. For\ninstance the vectors {v \u2208 S2|vx > vy and vx > vz} map to patch b = 1 and we have tan \u03be = vy/vx\nand tan \u03b7 = vz/vx. The remaining mappings are described by Ronchi et al. (1996).\nIf we grid the two angles (\u03be, \u03b7) uniformly in the cubed-sphere transformation and project this grid\nonto the sphere, we obtain a grid that is more regular (Ronchi et al., 1996), although it has artefacts\nin the 8 corners of the circumscribed cube (\ufb01gure 1, right). The cubed-sphere convolution is then\nconstructed by applying the conventional convolution in equation (1) to a uniformly spaced grid on\neach of the six cubed shape patches. This construction has two main advances: 1) within each patch,\nthe convolution is almost equivariant to translation in \u03be and \u03b7 and 2) features on the cubed-sphere\ngrid can naturally be expressed using tensors, which means that the spherical convolution can be\nef\ufb01ciently implemented on a GPU. When implementing convolutions and pooling operations for\nthe cubed-sphere grid, one has to be careful in padding each patch with the contents of the four\nneighbouring patches, in order to preserve the wrapped topology of the sphere (\ufb01gure 2, right).\nBoth of these two approaches to spherical convolutions are hampered by a lack of rotational equivari-\nance, which restricts the degree with which \ufb01lters can be shared over the surface of the sphere, leading\nto suboptimal ef\ufb01ciency in the learning of the parameters. Despite this limitation, for capturing\npatterns in spherical volumes, we expect that the ability to express patterns naturally in terms of radial\nand angular dimensions has advantages over standard 3D convolutions. We test this hypothesis in the\nfollowing sections.\n\n\u221a\n\n2.2 Convolutions of features on B3\nThe two representations from \ufb01gure 1 generalize to the ball B3 by considering concentric shells at\nuniformly separated radii. In the case of the cubed-sphere, this means that a vector v \u2208 B3 is mapped\nvv(cid:124) is the radius and (b, \u03be, \u03b7) are the cubed-sphere\nto the unique coordinates (r, b, \u03be, \u03b7), where r =\ncoordinates at r, and we construct a uniform grid in r, \u03be and \u03b7. Likewise, in the spherical-polar case,\nwe construct a uniform grid in r, \u03b8 and \u03c6. We will refer to these grids as concentric cubed-sphere grid\nand concentric spherical-polar grid respectively (\ufb01gure 3). As is the case for their S2 counterparts,\nfeatures on these grids can be naturally expressed using tensors.\nWe can apply the conventional 3D convolutions in equation (1) to features on the concentric cubed-\nsphere and the concentric spherical-polar grids, and denote these as concentric cubed-sphere convolu-\ntion (CCSconv) and concentric spherical-polar convolution (CSPconv). For \ufb01xed r, the convolutions\nwill thus have the same properties as in the S2 case. In these concentric variants, the convolutions\nwill not be equivariant to translations in r, which again reduces the potential to share \ufb01lter parameters.\n\n3\n\n\fFigure 3: Three realizations of a grid on the ball. Left: a grid using equiangular spacing in a\nstandard spherical-polar coordinate system (concentric spherical-polar grid). Center: An equiangular\ncubed-sphere representation, as described in Ronchi et al. (1996) (concentric cubed-sphere grid).\nRight: a Cartesian grid.\n\nWe propose to address this issue in three ways. First, we can simply apply the convolution over\nthe full range of r with a large number of \ufb01lters Cl+1 and hope that the network will automatically\nallocate different \ufb01lters at different radii. Secondly, we can make the \ufb01lters ki(x \u2212 x(cid:48), xr) depend\non r, which corresponds to using different (possibly overlapping) \ufb01lters on each spherical shell\n(conv-banded-disjoint). Thirdly, we can divide the r-grid into segments and apply the same \ufb01lter\nwithin each segment (conv-banded), potentially with overlapping regions (depending on the stride).\nThe three approaches are illustrated in \ufb01gure 4.\nIn the experiments below, we will be comparing the performance of our concentric spherical convolu-\ntion methods to that of a simple 3D convolution in a Cartesian grid (\ufb01gure 3, right).\n\n(a) conv\n\n(b) conv-banded-disjoint (convbd)\n\n(c) conv-banded\n\n(convb)\n\nFigure 4: Three strategies for the radial component of concentric cubed-sphere or concentric spherical\nconvolutions. (a) conv: The same convolution-\ufb01lter is applied to all values of r, (b) conv-banded-\ndisjoint (convbd): convolution-\ufb01lters are only applied in the angular directions, using different \ufb01lters\nfor each block in r, (c) conv-banded (convb): convolutions are applied within radial segments, Note\nthat for visual clarity, we use a stride of 3 in this \ufb01gure, although we use a stride of 1 in practice.\n\n3 Modelling structural environments in molecules\n\nIn the last decades, substantial progress has been made in the ability to simulate and analyse molecular\nstructures on a computer. Much of this progress can be ascribed to the molecular force \ufb01elds used to\ncapture the physical interactions between atoms. The basic functional forms of these models were\nestablished in the late 1960s, and through gradual re\ufb01nements they have become a success story of\nmolecular modelling. Despite these positive developments, the accuracy of molecular force \ufb01elds is\nknown to still be a limiting factor for many biological and pharmaceutical applications, and further im-\nprovements are necessary in this area to increase the robustness of methods for e.g. protein prediction\nand design. There are indications that Machine Learning could provide solutions to such challenges.\nWhile, traditionally, most of the attention in the Machine Learning community has been dedicated\n\n4\n\n\fFigure 5: Example of the environment surrounding an amino acid in a protein, in this case the\nphenylalanine at position 30 in protein GB1 (PDB ID: 2GB1). Left: a cartoon representation of GB1,\nwhere the helix is red, the sheets are yellow and the coils are grey. The phenylalanine is shown using\nan atomic representation in green. Right: an atomic representation of GB1, where carbon atoms are\ngreen, oxygen atoms are red, nitrogen atoms are blue and hydrogen atoms are grey. A sphere centered\nat the C\u03b1 atom of the phenylalanine with a radius of 12\u00c5 is shown in grey.\n\nto predicting structural features from amino acid sequences (e.g. secondary structure, disorder, and\ncontact prediction), there are increasingly applications taking three dimensional molecular structure\nas input (Behler and Parrinello, 2007; Jasrasaria et al., 2016; Sch\u00fctt et al., 2017; Smith et al., 2017).\nIn particular in the \ufb01eld of quantum chemistry, a number of studies have demonstrated the ability\nof deep learning techniques to accurately predict energies of molecular systems. Common to many\nof these methods is a focus on manually engineered features, where the molecular input structure is\nencoded based on prior domain-speci\ufb01c knowledge, such as speci\ufb01c functional relationships between\natoms and their environments (Behler and Parrinello, 2007; Smith et al., 2017). Recently, a few\nstudies have demonstrated the potential of automatically learning such features, by encoding the\nmolecular structural input in a more domain-agnostic manner, for instance considering only pairwise\ndistance matrices (Sch\u00fctt et al., 2017), space \ufb01lling curves (Jasrasaria et al., 2016), or basic structural\nfeatures (Wallach et al., 2015).\nThe fact that atomic forces are predominantly distance-based suggests that molecular environments\nare most naturally represented with a radial-based parameterization, which makes it an obvious\ntest case for the convolutions presented in the previous section. If successful, such convolutions\ncould allow us to make inferences directly from the raw molecular structure of a molecule, avoiding\nthe need of manual feature engineering. We will consider the environments that each amino acids\nexperience within its globular protein structure as images in the 3-ball. Figure 5 shows an example\nof the environment experienced by an arbitrarily chosen amino acid in the GB1 protein (PDB\nID: 2GB1). Although distorted by the \ufb01sh-eye perspective, the local environment (right) displays\nseveral key features of the data: we see clear patterns among neighboring atoms, depending on their\nlocal structure, and we can imagine the model learning to recognize hydrogen bonds and charge\ninteractions between an amino acid and its surroundings.\nOur representation of the molecular environment includes all atoms within a 12 \u00c5 radius of the C\u03b1\natom of the amino acid in question. Each atom is represented by three fundamental properties: 1) its\nposition relative to the amino acid in question (i.e., the position in the grid), 2) its mass, and 3) its\npartial charge, as de\ufb01ned by the amber99sb force \ufb01eld (Hornak et al., 2006). We construct two types\nof models, which are identical except for their output. The \ufb01rst outputs the propensity for different\nsecondary structure labels at a given position (i.e., helix, extended, coil), while the second outputs the\npropensity for different amino acid types. Each of these models will be implemented with both the\nCartesian, the concentric spherical and the concentric cubed-sphere convolutions. Furthermore, for\nthe concentric cubed-sphere convolutions, we compare the three strategies for dealing with the radial\ncomponent illustrated in \ufb01gure 4.\n\n5\n\n\fTable 1: The architecture of the CNN where o represent the output size, which is 3 for secondary\nstructure output and 20 for amino acid output. As an example, we use the convolutional \ufb01lter sizes\nfrom the concentric cubed-sphere (CCS) case. Similar sizes are used for the other representations.\n\nLayer\n\nOperation\n\nFilter / weight size\n\nLayer output size\n\n6 \u00d7 24 \u00d7 38 \u00d7 38 \u00d7 2\n6 \u00d7 22 \u00d7 19 \u00d7 19 \u00d7 16\n6 \u00d7 22 \u00d7 10 \u00d7 10 \u00d7 16\n6 \u00d7 20 \u00d7 10 \u00d7 10 \u00d7 32\n6 \u00d7 9 \u00d7 5 \u00d7 5 \u00d7 32\n6 \u00d7 7 \u00d7 5 \u00d7 5 \u00d7 64\n6 \u00d7 7 \u00d7 3 \u00d7 3 \u00d7 64\n6 \u00d7 5 \u00d7 3 \u00d7 3 \u00d7 128\n6 \u00d7 5 \u00d7 3 \u00d7 3 \u00d7 128\n\n2 048\n2 048\n\no\n\n0\n1\n1\n2\n2\n3\n3\n4\n4\n5\n6\n7\n\nInput\n\nCCSconv + ReLU\n\nCCSsumpool\n\nCCSconv + ReLU\n\n3 \u00d7 5 \u00d7 5 \u00d7 2 \u00d7 16\n3 \u00d7 3 \u00d7 3 \u00d7 16 \u00d7 32\n3 \u00d7 3 \u00d7 3 \u00d7 32 \u00d7 64\nCCSconv + ReLU\nCCSconv + ReLU 3 \u00d7 3 \u00d7 3 \u00d7 64 \u00d7 128\n\n1 \u00d7 3 \u00d7 3\n3 \u00d7 3 \u00d7 3\n1 \u00d7 3 \u00d7 3\n1 \u00d7 3 \u00d7 3\n\nCCSsumpool\n\nCCSsumpool\n\nCCSsumpool\nDense + ReLU\nDense + ReLU\nDense + Softmax\n\n34 560 \u00d7 2 048\n2 048 \u00d7 2 048\n2 048 \u00d7 o\n\n3.1 Model architecture\n\nThe input to the network is a grid (concentric cubed-sphere, concentric spherical polar or Cartesian).\nEach voxel has two input channels: the mass of the atom that lies in the given bin and the atom\u2019s partial\ncharge (or zeros if no atom is found). The resolution of the grids are chosen so that the maximum\ndistance within a bin is 0.5\u00c5, which ensures that bins are occupied by at most one atom. The radius of\nthe ball is set to 12\u00c5, since most physical interactions between atoms occur within this distance (Irb\u00e4ck\nand Mohanty, 2006). This gives us an input tensor of shape (b = 6, r = 24, \u03be = 38, \u03b7 = 38, C1 = 2)\nfor the concentric cubed-sphere case, (r = 24, \u03b8 = 76, \u03c6 = 151, C1 = 2) for the concentric spherical\npolar case, and (x = 60, y = 60, z = 60, C1 = 2) for the Cartesian case.\nWe use a deep model architecture that is loosely inspired by the VGG models (Simonyan and\nZisserman, 2015), but employs the convolution operators described above. Our models have four\nconvolutional layers followed by three dense layers, as illustrated in table 1. Each convolutional layer\nis followed by recti\ufb01ed linear unit (ReLU) activation function (Hahnloser et al., 2000; Glorot et al.,\n2011) and a sum pooling operation which is appropriately wrapped in the case of the concentric\ncubed-sphere and the concentric spherical polar grid. We use sum pooling since the input features,\nmass and partial charge, are both physical quantities that are naturally additive. The total number\nof parameters is the models (with the amino acid output) are 75 313 253 (concentric cubed-sphere),\n69 996 645 (concentric spherical polar), and 61 159 077 (Cartesian). Furthermore, for the concentric\ncubed-sphere case, we include a comparison of the two alternative strategies for the radial component:\nthe convb and the convbd, which have 75 745 333 and 76 844 661 parameters respectively. Finally, to\nsee the effect of convolutions over a purely dense model, we include a baseline model where the\nconvolutional layers are replaced with dense layers, but otherwise following the same architecture,\nand roughly the same number of parameters (66 670 613).\n\n3.2 Training\n\nWe minimized the cross-entropy loss using Adam (Kingma and Ba, 2015), regularized by penalizing\nthe loss with the sum of the L2 of all weights, using a multiplicative factor of 0.001. All dense layers\nalso used dropout regularization with a probability of 0.5 of keeping a neuron. The models were\ntrained on NVIDIA Titan X (Pascal) GPUs, using a batch size of 100 and a learning rate of 0.0001.\nThe models were trained on data set of high resolution crystal structures. A large initial (non-\nhomology-reduced) data set was constructed using the PISCES server (Wang and Dunbrack, 2003).\nFor all structures, hydrogen atoms were added using the Reduce program (Word et al., 1999), after\nwhich partial charges were assigned using the OpenMM framework (Eastman et al., 2012), using\nthe amber99sb force \ufb01eld (Hornak et al., 2006). During these stages strict \ufb01lters were applied to\nremove structures that 1) were incomplete (missing chains or missing residues compared to the seqres\n\n6\n\n\fentry), 2) had chain breaks, 3) failed to parse in OpenMM, or 4) led the Reduce program to crash.\nFinally, the remaining set was resubmitted to the PISCES server, where homology-reduction was\ndone at the 30% level. This left us with 2336 proteins, out of which 1742 were used for training, 10\nfor validation, and the remainder was set aside for testing. The homology-reduction ensures that any\npair of sequences in the data set are at most 30% identical at the amino-acid-level, which allows us to\nsafely split the data into non-overlapping sets.\n\n4 Results\n\nWe now discuss results obtained with the secondary structure and amino acid models, respectively.\nDespite the apparent similarity of the two models, the two tasks have substantially different biological\nimplications: secondary structure is related to the 3D structure locally at a given position in a\nprotein, i.e. whether the protein assumes a helical or a more extended shape. In contrast, amino acid\npropensities describe allowed mutations in a protein, which is related to the fundamental biochemistry\nof the molecule, and is relevant for understanding genetic disease and for design of new proteins.\n\n4.1 Learning the DSSP secondary structure function\n\nPredicting the secondary structure of a protein conditioned on knowledge of the three dimensional\nstructure is not considered a hard problem. We include it here because we are interested in the ability of\nthe neural network to learn the function that is typically used to annotate three dimensional structures\nwith secondary structure, in our case DSSP (Kabsch and Sander, 1983). Interestingly, the different\nconcentric convolutional models are seen to perform about equally well on this problem (table 2, Q3),\nmarginally outperforming the Cartesian convolution and substantially outperforming the dense\nbaseline model.\nTo get a sense of the absolute performance, we would ideally compare to existing methods on the\nsame problem. However, rediscovering the DSSP function is not a common task in bioinformatics,\nand not many tools are available that would constitute a meaningful comparison, in particular because\nsecondary structure annotation algorithms use different de\ufb01nitions of secondary structure. We here\nuse the TORUSDBN model (Boomsma et al., 2008, 2014) to provide such a baseline. The model is\nsequential in the sequence of a protein, and thus captures local structural information only. While\nthe model is originally designed to sample backbone dihedral angles conditioned on an amino acid\nsequence or secondary structure sequence, it is generative, and can thus be used in reverse and provide\nthe most probable secondary structure or amino acid sequence given using viterbi decoding. Most\nimportantly, it is trained on DSSP, making it useful as a comparison for this study. Included as the last\nrow in table 2, TORUSDBN demonstrates slightly lower performance compared to our convolutional\napproaches, illustrating that most of the secondary structure signal is encoded in the local angular\npreferences. It is encouraging to see that the convolutional networks capture all these local signals,\nbut obtain additional performance through more non-local interactions.\n\n4.1.1 Learning amino acid propensities\n\nCompared to secondary structure, predicting the amino acid propensity is substantially harder\u2014partly\nbecause of the larger sample space, but also because we expect such preferences to be de\ufb01ned by\nmore global interaction patterns. Interestingly, the two concentric convolutions perform about equally\nwell, suggesting that the added regularity of the cubed-sphere representation does not provide a\nsubstantial bene\ufb01t for this case (table 2, Q20). However, both methods substantially outperform the\nstandard 3D convolution, which again outperforms the dense baseline model. We also note that there\nis now a signi\ufb01cant difference between the three radial strategies, with conv-banded-disjoint (bd)\nand conv-banded (b) both performing worse than the simpler case of using a single convolution over\nthe entire r-range. Again, we include TorusDBN as an external reference. The substantially lower\nperformance of this model con\ufb01rms that the amino acid label prediction task depends predominantly\non non-local features not captured by this model. Finally, we include another baseline: the most\nfrequent amino acid observed at this position among homologous (evolutionarily related) proteins. It\nis remarkable that the concentric models (which are trained on a homology-reduced protein set), are\ncapable of learning the structural preferences of amino acids to the same extent as the information\nthat is encoded as genetic variation in the sequence databases. This strongly suggests the ability of\nour models to learn general relationships between structure and sequence.\n\n7\n\n\fTable 2: Performance of various models in the prediction of (a) DSSP-style secondary structure\nconditioned and (b) amino acid propensity conditioned on the structure. The Q3 score is de\ufb01ned as\nthe percentage of correct predictions for the three possible labels: helix, extended and coil. The Q20\nscore is de\ufb01ned as the percentage of correct predictions for the 20 possible amino acid labels.\n\nModel\nCCSconv\nCCSconvbd\nCCSconvb\nCSPconv\nCartesian\nCCSdense\nPSSM\nTORUSDBN\n\nQ3 (secondary structure) Q20 (amino acid)\n\n0.933\n0.931\n0.932\n0.932\n0.922\n0.888\n\n-\n\n0.894\n\n0.564\n0.515\n0.548\n0.560\n0.500\n0.348\n0.547\n0.183\n\n4.1.2 Predicting change-of-stability\n\nThe models in the previous section not only predict the most likely amino acid, but also the entire\ndistribution. A natural question is whether the ratio of probabilities of two amino acids according\nto this distribution is related to the change of stability induced by the corresponding mutation. We\nbrie\ufb02y explore this question here.\nThe stability of a protein is the difference in free energy \u2206G between the folded and unfolded\nconformation of a protein. The change in stability that occurs as a consequence of a mutation is\nthus frequently referred to as \u2206\u2206G. These values can be measured experimentally, and several data\nsets with these values are publicly available. As a simple approximation, we can interpret the sum\nof negative log-probabilities of each amino acid along the sequence as a free energy of the folded\nstate Gf . To account for the free energy of the unfolded state, Gu, we could consider the negative\nlog-probability that the amino acid in question occurs in the given amino acid sequence (without\nconditioning on the environment). Again, assuming independence between sites in the chain, this\ncould be modelled by simply calculating the log-frequencies of the different amino acids across the\ndata set, and summing over all sites of the speci\ufb01c protein to get the total free energy. Subtracting\nthese two pairs of values for the wild type (W) and mutant (M) would give us a rough estimate of the\n\u2206\u2206G, which due to our assumption of independence between sites simpli\ufb01es to just the difference in\nvalues at the given site:\n\n\u2206\u2206G( \u00afW , \u00afM ) = (Gf (Mn) \u2212 Gu(Mn)) \u2212 (Gf (Wn) \u2212 Gu(Wn)),\n\n(2)\n\nwhere \u00afW and \u00afM denote the full wild type and mutant sequence respectively, and Wn and Mn denote\nthe amino acids of wild type and mutant at the site n at which they differ. Given the extensive set\nof simplifying assumptions in the argument above, we do not use the expression in equation (2)\ndirectly but rather use the four log-probabilities (Gf (Mn), Gu(Mn), Gf (Wn), Gu(Wn)) as input to\na simple regression model (a single hidden layer neural network with 10 hidden nodes and a ReLU\nactivation function), trained on experimentally observed \u2206\u2206G data. We calculate the performance\non several standard experimental data sets on mutation-induced change-of-stability, in each case\nusing 5-fold cross validation, and reporting the correlation between experimentally measured and our\ncalculated \u2206\u2206G. As a baseline, we compare our performance to two of the best known programs for\ncalculating \u2206\u2206G: Rosetta and FoldX. The former were taken from a recent publication (Conch\u00fair\net al., 2015), while the latter were calculated using the FoldX program (version 4). The comparison\nshows that even a very simple approach based on our convolutional models produces results that are\ncomparable to the state-of-the-art in the \ufb01eld (table 3). This is despite the fact that we use a rather\ncrude approximation of free energy, and that our approach disregards the fact that a mutation at a\ngiven site modi\ufb01es the environment grids of all amino acids within the 12 \u00c5 range. Although these\ninitial results should therefore not be considered conclusive, they suggest that models like the ones\nwe propose could play a future role in \u2206\u2206G predictions.\nApart from the overall levels of performance, the most remarkable feature of table 3 is that it shows\nequal performance for the Cartesian and concentric cubed-sphere convolutions, despite the fact that\nthe former displayed substantially lower Q20 scores. This peculiar result points to an interesting\n\n8\n\n\fTable 3: Pearson correlation coef\ufb01cients between experimentally measured and predicted changes of\nstability for several sets of published stability measurements.\n\nKellogg\nGuerois\nPotapov\nProTherm*\n\nRosetta\n\n0.65\n0.65\n0.52\n0.44\n\nFoldX CCSconv CSPconv Cartesian\n0.70\n0.73\n0.59\n0.53\n\n0.64\n0.64\n0.51\n0.48\n\n0.66\n0.66\n0.52\n0.49\n\n0.66\n0.66\n0.52\n0.49\n\ncaveat in the interpretation of the predicted distribution over amino acids for a given environment.\nAt suf\ufb01ciently high resolution of the structural environment, a perfect model would be able to\nreliably predict the identity of the wild type amino acid by the speci\ufb01c shape of the hole it left\nbehind. This means that as models improve, the entropy of the predicted amino acid distributions is\nexpected to decrease, with increasingly peaked distributions centered at the wild type. An increased\nsensitivity towards the exact molecular environment will therefore eventually decrease the models\nability to consider other amino acids at that position, leading to lower \u2206\u2206G performance. The\nmissing ingredient in our approach is the structural rearrangement in the environments that occurs\nas a consequence of the mutation. A full treatment of the problem should average the predictions\nover the available structural variation, and structural resampling is indeed part of both Rosetta and\nFoldX. For these reasons, it is dif\ufb01cult to make clear interpretations of the relative differences in\nperformance of the three convolution procedures in table 3. The overall performance of all three,\nhowever, indicates that convolutions might be useful as part of a more comprehensive modelling\nstrategy such as those used in Rosetta and FoldX.\n\n5 Conclusions\n\nConvolutional neural networks are a powerful tool for analyzing spatial data. In this paper, we\ninvestigated the possibility of extending the applicability of the technique to data in the 3-ball,\npresenting two strategies for conducting convolutions in these spherical volumes. We assessed the\nperformance of the two strategies (and variants thereof) on various tasks in molecular modelling, and\ndemonstrate a substantial potential of these such concentric convolutional approaches to outperform\nstandard 3D convolutions for such data.\nWe expect that further improvements to the concentric convolution approach can be obtained by\nimproving the spherical convolutions themselves. In particular, a convolution operation that is\nrotationally equivariant would provide greater data ef\ufb01ciency than the approach used here. Very\nrecently, a procedure for conducting convolutions in SO(3) was proposed, which seems to provide an\nelegant solution to this problem (Cohen et al., 2018).\nFinally, we note that while this manuscript was in review, another paper on the application of\nconvolutional neural networks for predicting amino acid preferences conditioned on structural\nenvironments was published, by Torng and Altman (Torng and Altman, 2017). Their study is\nconceptually similar to one of the applications described in this paper, but uses a Cartesian grid\nand standard 3D convolution (in addition to other minor differences, such as a one-hot atom type\nencoding). While Torng and Altman present a more thorough biological analysis in their paper than\nwe do here, the accuracy they report is considerably lower than what we obtained. Based on the\ncomparisons reported here, we anticipate that models such as theirs could be improved by switching\nto a concentric representation.\n\n6 Availability\n\nThe spherical convolution Tensor\ufb02ow code and the datasets used in this paper are available at\nhttps://github.com/deepfold.\n\nAcknowledgments\n\nThis work was supported by the Villum Foundation (W.B., grant number VKR023445).\n\n9\n\n\fReferences\nA. Aurisano, A. Radovic, D. Rocco, A. Himmel, M. Messier, E. Niner, G. Pawloski, F. Psihas, A. Sousa, and\nP. Vahle. A convolutional neural network neutrino event classi\ufb01er. Journal of Instrumentation, 11(9):P09001,\n2016.\n\nJ. Behler and M. Parrinello. Generalized neural-network representation of high-dimensional potential-energy\n\nsurfaces. Physical Review Letters, 98(14):146401, 2007.\n\nW. Boomsma, K. V. Mardia, C. C. Taylor, J. Ferkinghoff-Borg, A. Krogh, and T. Hamelryck. A generative,\nprobabilistic model of local protein structure. Proceedings of the National Academy of Sciences, 105(26):\n8932\u20138937, 2008.\n\nW. Boomsma, P. Tian, J. Frellsen, J. Ferkinghoff-Borg, T. Hamelryck, K. Lindorff-Larsen, and M. Vendrus-\ncolo. Equilibrium simulations of proteins using molecular fragment replacement and NMR chemical shifts.\nProceedings of the National Academy of Sciences, 111(38):13852\u201313857, 2014.\n\nT. Cohen and M. Welling. Group equivariant convolutional networks. In M. F. Balcan and K. Q. Weinberger,\neditors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings\nof Machine Learning Research, pages 2990\u20132999, New York, USA, 2016.\n\nT. S. Cohen, M. Geiger, J. K\u00f6hler, and M. Welling. Spherical CNNs. International Conference on Learning\n\nRepresentations, 2018.\n\nS. \u00d3. Conch\u00fair, K. A. Barlow, R. A. Pache, N. Ollikainen, K. Kundert, M. J. O\u2019Meara, C. A. Smith, and\nT. Kortemme. A web resource for standardized benchmark datasets, metrics, and Rosetta protocols for\nmacromolecular modeling and design. PLoS ONE, 10(9):e0130433, 2015.\n\nP. Eastman, M. S. Friedrichs, J. D. Chodera, R. J. Radmer, C. M. Bruns, J. P. Ku, K. A. Beauchamp, T. J. Lane,\nL.-P. Wang, D. Shukla, T. Tye, M. Houston, T. Stich, C. Klein, M. R. Shirts, and V. S. Pande. OpenMM 4: a\nreusable, extensible, hardware independent library for high performance molecular simulation. Journal of\nChemical Theory and Computation, 9(1):461\u2013469, 2012.\n\nX. Glorot, A. Bordes, and Y. Bengio. Deep sparse recti\ufb01er neural networks. In G. Gordon, D. Dunson, and\nM. Dud\u00edk, editors, Proceedings of the Fourteenth International Conference on Arti\ufb01cial Intelligence and\nStatistics, volume 15 of Proceedings of Machine Learning Research, pages 315\u2013323, Fort Lauderdale, FL,\nUSA, 2011.\n\nI. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.\n\nR. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung. Digital selection and\n\nanalogue ampli\ufb01cation coexist in a cortex-inspired silicon circuit. Nature, 405:947\u2013951, 2000.\n\nV. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, and C. Simmerling. Comparison of multiple Amber\nforce \ufb01elds and development of improved protein backbone parameters. Proteins: Structure, Function, and\nBioinformatics, 65(3):712\u2013725, 2006.\n\nA. Irb\u00e4ck and S. Mohanty. PROFASI: a Monte Carlo simulation package for protein folding and aggregation.\n\nJournal of Computational Chemistry, 27(13):1548\u20131555, 2006.\n\nD. Jasrasaria, E. O. Pyzer-Knapp, D. Rappoport, and A. Aspuru-Guzik. Space-\ufb01lling curves as a novel crystal\n\nstructure representation for machine learning models. arXiv, 1608.05747, 2016.\n\nW. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded\n\nand geometrical features. Biopolymers, 22(12):2577\u20132637, 1983.\n\nD. Kingma and J. Ba. Adam: A method for stochastic optimization. In 3th International Conference on Learning\n\nRepresentations, San Diego, USA, 2015.\n\nK. V. Mardia and P. E. Jupp. Directional statistics, 2009.\n\nK. Mills, M. Spanner, and I. Tamblyn. Deep learning and the Schr\u00f6dinger equation. Physical Review A, 96:\n\n042113, 2017.\n\nS. Min, B. Lee, and S. Yoon. Deep learning in bioinformatics. Brie\ufb01ngs in Bioinformatics, 18(5):851\u2013869, 2017.\n\nC. Ronchi, R. Iacono, and P. Paolucci. The \"cubed sphere\": A new method for the solution of partial differential\n\nequations in spherical geometry. Journal of Computational Physics, 124(1):93\u2013114, 1996.\n\n10\n\n\fK. T. Sch\u00fctt, F. Arbabzadah, S. Chmiela, K. R. M\u00fcller, and A. Tkatchenko. Quantum-chemical insights from\n\ndeep tensor neural networks. Nature Communications, 8:13890, 2017.\n\nK. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In 3th\n\nInternational Conference on Learning Representations, San Diego, USA, 2015.\n\nJ. Smith, O. Isayev, and A. Roitberg. ANI-1: an extensible neural network potential with DFT accuracy at force\n\n\ufb01eld computational cost. Chemical Science, 8(4):3192\u20133203, 2017.\n\nW. Torng and R. B. Altman. 3D deep convolutional neural networks for amino acid environment similarity\n\nanalysis. BMC Bioinformatics, 18(1):302, 2017.\n\nI. Wallach, M. Dzamba, and A. Heifets. AtomNet: a deep convolutional neural network for bioactivity prediction\n\nin structure-based drug discovery. arXiv, 1510.02855, 2015.\n\nG. Wang and R. L. Dunbrack. PISCES: a protein sequence culling server. Bioinformatics, 19(12):1589\u20131591,\n\n2003.\n\nS. Wang, J. Peng, J. Ma, and J. Xu. Protein secondary structure prediction using deep convolutional neural \ufb01elds.\n\nScienti\ufb01c Reports, 6, 2016.\n\nJ. M. Word, S. C. Lovell, J. S. Richardson, and D. C. Richardson. Asparagine and glutamine: using hydrogen\natom contacts in the choice of side-chain amide orientation. Journal of Molecular Biology, 285(4):1735\u20131747,\n1999.\n\n11\n\n\f", "award": [], "sourceid": 1952, "authors": [{"given_name": "Wouter", "family_name": "Boomsma", "institution": "University of Copenhagen"}, {"given_name": "Jes", "family_name": "Frellsen", "institution": "IT University of Copenhagen"}]}