{"title": "3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data", "book": "Advances in Neural Information Processing Systems", "page_first": 10381, "page_last": 10392, "abstract": "We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry.", "full_text": "3D Steerable CNNs: Learning Rotationally\nEquivariant Features in Volumetric Data\n\nMaurice Weiler*\n\nUniversity of Amsterdam\n\nm.weiler@uva.nl\n\nMario Geiger*\n\nEPFL\n\nmario.geiger@epfl.ch\n\nMax Welling\n\nWouter Boomsma\n\nUniversity of Amsterdam, CIFAR,\n\nUniversity of Copenhagen\n\nQualcomm AI Research\n\nm.welling@uva.nl\n\nwb@di.ku.dk\n\nTaco Cohen\n\nQualcomm AI Research\ntaco.cohen@gmail.com\n\nAbstract\n\nWe present a convolutional network that is equivariant to rigid body motions.\nThe model uses scalar-, vector-, and tensor \ufb01elds over 3D Euclidean space to\nrepresent data, and equivariant convolutions to map between such representations.\nThese SE(3)-equivariant convolutions utilize kernels which are parameterized\nas a linear combination of a complete steerable kernel basis, which is derived\nanalytically in this paper. We prove that equivariant convolutions are the most\ngeneral equivariant linear maps between \ufb01elds over R3. Our experimental results\ncon\ufb01rm the effectiveness of 3D Steerable CNNs for the problem of amino acid\npropensity prediction and protein structure classi\ufb01cation, both of which have\ninherent SE(3) symmetry.\n\nIntroduction\n\n1\nIncreasingly, machine learning techniques are being applied in the natural sciences. Many problems\nin this domain, such as the analysis of protein structure, exhibit exact or approximate symmetries.\nIt has long been understood that the equations that de\ufb01ne a model or natural law should respect\nthe symmetries of the system under study, and that knowledge of symmetries provides a powerful\nconstraint on the space of admissible models. Indeed, in theoretical physics, this idea is enshrined\nas a fundamental principle, known as Einstein\u2019s principle of general covariance. Machine learning,\nwhich is, like physics, concerned with the induction of predictive models, is no different: our models\nmust respect known symmetries in order to produce physically meaningful results.\nA lot of recent work, reviewed in Sec. 2, has focused on the problem of developing equivariant\nnetworks, which respect some known symmetry. In this paper, we develop the theory of SE(3)-\nequivariant networks. This is far from trivial, because SE(3) is both non-commutative and non-\ncompact. Nevertheless, at run-time, all that is required to make a 3D convolution equivariant using our\nmethod, is to parameterize the convolution kernel as a linear combination of pre-computed steerable\nbasis kernels. Hence, the 3D Steerable CNN incorporates equivariance to symmetry transformations\nwithout deviating far from current engineering best practices.\nThe architectures presented here fall within the framework of Steerable G-CNNs [8, 10, 40, 45],\nwhich represent their input as \ufb01elds over a homogeneous space (R3 in this case), and use steerable\n\n* Equal Contribution. MG initiated the project, derived the kernel space constraint, wrote the \ufb01rst network\nimplementation and ran the Shrec17 experiment. MW solved the kernel constraint analytically, designed the\nanti-aliased kernel sampling in discrete space and coded / ran many of the CATH experiments.\n\nSource code is available at https://github.com/mariogeiger/se3cnn.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f\ufb01lters [15, 37] to map between such representations. In this paper, the convolution kernel is modeled\nas a tensor \ufb01eld satisfying an equivariance constraint, from which steerable \ufb01lters arise automatically.\nWe evaluate the 3D Steerable CNN on two challenging problems: prediction of amino acid preferences\nfrom atomic environments, and classi\ufb01cation of protein structure. We show that a 3D Steerable CNN\nimproves upon state of the art performance on the former task. For the latter task, we introduce a\nnew and challenging dataset, and show that the 3D Steerable CNN consistently outperforms a strong\nCNN baseline over a wide range of trainingset sizes.\n\n2 Related Work\nThere is a rapidly growing body of work on neural networks that are equivariant to some group\nof symmetries [3, 9, 10, 12, 19, 20, 28, 30\u201332, 36, 42, 46]. At a high level, these models can\nbe categorized along two axes: the group of symmetries they are equivariant to, and the type of\ngeometrical features they use [8]. The class of regular G-CNNs represents the input signal in terms of\nscalar \ufb01elds on a group G (e.g. SE(3)) or homogeneous space G/H (e.g. R3 = SE(3)/ SO(3)) and\nmaps between feature spaces of consecutive layers via group convolutions [9, 29]. Regular G-CNNs\ncan be seen as a special case of steerable (or induced) G-CNNs which represent features in terms\nof more general \ufb01elds over a homogeneous space [8, 10, 27, 30, 40]. The models described in this\npaper are of the steerable kind, since they use general \ufb01elds over R3. These \ufb01elds typically consist of\nmultiple independently transforming geometrical quantities (vectors, tensors, etc.), and can thus be\nseen as a formalization of the idea of convolutional capsules [18, 34].\nRegular 3D G-CNNs operating on voxelized data via group convolutions were proposed in [43, 44].\nThese architectures were shown to achieve superior data ef\ufb01ciency over conventional 3D CNNs\nin tasks like medical imaging and 3D model recognition. In contrast to 3D Steerable CNNs, both\nnetworks are equivariant to certain discrete rotations only.\nThe most closely related works achieving full SE(3) equivariance are the Tensor Field Network\n(TFN) [40] and the N-Body networks (NBNs) [26]. The main difference between 3D Steerable\nCNNs and both TFN and NBN is that the latter work on irregular point clouds, whereas our model\noperates on regular 3D grids. Point clouds are more general, but regular grids can be processed\nmore ef\ufb01ciently on current hardware. The second difference is that whereas the TFN and NBN use\nClebsch-Gordan coef\ufb01cients to parameterize the network, we simply parameterize the convolution\nkernel as a linear combination of steerable basis \ufb01lters. Clebsch-Gordan coef\ufb01cient tensors have 6\nindices, and depend on various phase and normalization conventions, making them tricky to work\nwith. Our implementation requires only a very minimal change from the conventional 3D CNN.\nSpeci\ufb01cally, we compute conventional 3D convolutions with \ufb01lters that are a linear combination of\npre-computed basis \ufb01lters. Further, in contrast to TFN, we derive this \ufb01lter basis directly from an\nequivariance constraint and can therefore prove its completeness.\nThe two dimensional analog of our work is the SE(2) equivariant harmonic network [45]. The\nharmonic network and 3D steerable CNN use features that transform under irreducible representations\nof SO(2) resp. SO(3), and use \ufb01lters related to the circular resp. spherical harmonics.\nSE(3) equivariant models were already investigated in classical computer vision and signal processing.\nIn [33, 38], a spherical tensor algebra was utilized to expand signals in terms of spherical tensor\n\ufb01elds. In contrast to 3D Steerable CNNs, this expansion is \ufb01xed and not learned. Similar approaches\nwere used for detection and crossing preserving enhancement of \ufb01brous structures in volumetric\nbiomedical images [13, 21, 22].\n\n3 Convolutional feature spaces as \ufb01elds\nA convolutional network produces a stack of Kn feature maps fk in each layer n. In 3D, we can\nmodel the feature maps as (well-behaved) functions fk : R3 \u2192 R. Written another way, we have a\nmap f : R3 \u2192 RKn that assigns to each position x a feature vector f (x) that lives in what we call\nthe \ufb01ber RKn at x. In practice f will have compact support, meaning that f (x) = 0 outside of some\ncompact domain \u2126 \u2208 R3. We thus de\ufb01ne the feature space Fn as the vector space of continuous\nmaps from R3 to RKn with compact support.\nIn this paper, we impose additional structure on the \ufb01bers. Speci\ufb01cally, we assume the \ufb01ber consists\nof a number of geometrical quantities, such as scalars, vectors, and tensors, stacked into a single\n\n2\n\n\fKn-dimensional vector. The assignment of such a geometrical quantity to each point in space is\ncalled a \ufb01eld. Thus, the feature spaces consist of a number of \ufb01elds, each of which consists of a\nnumber of channels (dimensions).\nBefore deriving SE(3)-equivariant networks in Sec. 4 we discuss the transformation properties of\n\ufb01elds and the kinds of \ufb01elds we use in 3D Steerable CNNs.\n\n3.1 Fields, Transformations and Disentangling\nWhat makes a geometrical quantity (e.g. a vector) anything more than an arbitrary grouping of feature\nchannels? The answer is that under rigid body motions, information \ufb02ows within the channels of\na single geometrical quantity, but not between different quantities. This idea is known as Weyl\u2019s\nprinciple, and has been proposed as a way of formalizing the notion of disentangling [6, 23].\n\nAs an example, consider the three-dimensional\nvector \ufb01eld over R3, shown in Figure 1. At each\npoint x \u2208 R3 there is a vector f (x) of dimension\nK = 3. If the \ufb01eld is translated by t, each vector\nx \u2212 t would simply move to a new (translated)\nposition x. When the \ufb01eld is rotated, however,\ntwo things happen: the vector at r\u22121x is moved\nto a new (rotated) position x, and each vector\nis itself rotated by a 3 \u00d7 3 rotation matrix \u03c1(r).\nFigure 1: To transform a vector \ufb01eld (L) by a 90\u25e6\nThus, the rotation operator \u03c0(r) for vector \ufb01elds\nis de\ufb01ned as [\u03c0(r)f ](x) := \u03c1(r)f (r\u22121x). No-\nrotation g, \ufb01rst move each arrow to its new position (C),\ntice that in order to rotate this \ufb01eld, we need all\nkeeping its orientation the same, then rotate the vector\nitself (R). This is described by the induced representation\nthree channels: we cannot rotate each channel\nSO(3) \u03c1, where \u03c1(g) is a 3 \u00d7 3 rotation matrix\n\u03c0 = IndSE(2)\nindependently, because \u03c1 introduces a functional\ndependency between them. For contrast, con-\nthat mixes the three coordinate channels.\nsider the common situation where in the input\nspace we have an RGB image with K = 3 channels. Then f (x) \u2208 R3, and the rotation can be\ndescribed using the same formula \u03c1(r)f (r\u22121x) if we choose \u03c1(r) = I3 to be the 3\u00d7 3 identity matrix\nfor all r. Since \u03c1(r) is diagonal for all r, the channels do not get mixed, and so in geometrical terms,\nwe would describe this feature space as consisting of three scalar \ufb01elds, not a 3D vector \ufb01eld. The\nRGB channels each have an independent physical meaning, while the x and y coordinate channels of\na vector do not.\nThe RGB and 3D-vector cases constitute two examples of \ufb01elds, each one determined by a different\nchoice of \u03c1. As one might guess, there is a one-to-one correspondence between the type of \ufb01eld and\nthe type of transformation law (group representation) \u03c1. Hence, we can speak of a \u03c1-\ufb01eld.\nSo far, we have concentrated on the behaviour of a \ufb01eld under rotations and translations separately.\nA 3D rigid body motion g \u2208 SE(3) can always be decomposed into a rotation r \u2208 SO(3) and a\ntranslation t \u2208 R3, written as g = tr. So the transformation law for a \u03c1-\ufb01eld is given by the formula\n(1)\n\n[\u03c0(tr)f ](x) := \u03c1(r)f (r\u22121(x \u2212 t)).\n\nThe map \u03c0 is known as the representation of SE(3) induced by the representation \u03c1 of SO(3), which\nis denoted by \u03c0 = IndSE(3)\n\nSO(3) \u03c1. For more information on induced representations, see [5, 8, 17].\n\nIrreducible SO(3) features\n\n3.2\nWe have seen that there is a correspondence between the type of \ufb01eld and the type of inducing\nrepresentation \u03c1, which describes the rotation behaviour of a single \ufb01ber. To get a better understanding\nof the space of possible \ufb01elds, we will now de\ufb01ne precisely what it means to be a representation of\nSO(3), and explain how any such representation can be constructed from elementary building blocks\ncalled irreducible representations.\nA group representation \u03c1 assigns to each element in the group an invertible n \u00d7 n matrix. Here n is\nthe dimension of the representation, which can be any positive integer (or even in\ufb01nite). For \u03c1 to be\ncalled a representation of G, it has to satisfy \u03c1(gg(cid:48)) = \u03c1(g)\u03c1(g(cid:48)), where gg(cid:48) denotes the composition\nof two transformations g, g(cid:48) \u2208 G, and \u03c1(g)\u03c1(g(cid:48)) denotes matrix multiplication.\n\n3\n\n\f(cid:34) 2(cid:77)\n\n(cid:35)\n\n\u03c1(r) = Q\u22121\n\nDl(r)\n\nQ,\n\n(2)\n\nTo make this more concrete, and to introduce the concept of an irreducible representation, we consider\nthe classical example of a rank-2 tensor (i.e. matrix). A 3 \u00d7 3 matrix A transforms under rotations\nas A (cid:55)\u2192 R(r)AR(r)T , where R(r) is the 3 \u00d7 3 rotation matrix representation of the abstract group\nelement r \u2208 SO(3). This can be written in matrix-vector form using the Kronecker / tensor product:\nvec(A) (cid:55)\u2192 [R(r) \u2297 R(r)] vec(A) \u2261 \u03c1(r) vec(A). This is a 9-dimensional representation of SO(3).\nOne can easily verify that the symmetric and anti-symmetric parts of A remain symmetric respectively\nanti-symmetric under rotations. This splits R3\u00d73 into 6- and 3-dimensional linear subspaces that\ntransform independently. According to Weyl\u2019s principle, these may be considered as distinct quanti-\nties, even if it is not immediately visible by looking at the coordinates Aij. The 6-dimensional space\ncan be further broken down, because scalar matrices Aij = \u03b1\u03b4ij (which are invariant under rotation)\nand traceless symmetric matrices also transform independently. Thus a rank-2 tensor decomposes\ninto representations of dimension 1 (trace), 3 (anti-symmetric part), and 5 (traceless symmetric part).\nIn representation-theoretic terms, we have reduced the 9-dimensional representation \u03c1 into irreducible\nrepresentations of dimension 1, 3 and 5. We can write this as\n\nwhere we use(cid:76) to denote the construction of a block-diagonal matrix with blocks Dl(r), and Q is a\n\nl=0\n\nchange of basis matrix that extracts the trace, symmetric-traceless and anti-symmetric parts of A.\nMore generally, it can be shown that any representation of SO(3) can be decomposed into irreducible\nrepresentations of dimension 2l + 1, for l = 0, 1, 2, . . . ,\u221e. The irreducible representation acting on\nthis 2l + 1 dimensional space is known as the Wigner-D matrix of order l, denoted Dl(r). Note that\nthe Wigner-D matrix of order 4 is a representation of dimension 9, it has the same dimension as the\nrepresentation \u03c1 acting on A but these are two different representations.\nSince any SO(3) representation can be decomposed into irreducibles, we only use irreducible features\nin our networks. This means that the feature vector f (x) in layer n is a stack of Fn features\n\nf i(x) \u2208 R2li+1, so that Kn =(cid:80)Fn\n\ni=1 2lin + 1.\n\n4 SE(3)-Equivariant Networks\n\nOur general approach to building SE(3)-equivariant networks will be as follows: First, we will\nspecify for each layer n a linear transformation law \u03c0n(g) : Fn \u2192 Fn, which describes how the\nfeature space Fn transforms under transformations of the input by g \u2208 SE(3). Then, we will study\nthe vector space HomSE(3)(Fn,Fn+1) of equivariant linear maps (intertwiners) \u03a6 between adjacent\nfeature spaces:\n\nHomSE(3)(Fn,Fn+1) = {\u03a6 \u2208 Hom(Fn,Fn+1)| \u03a6\u03c0n(g) = \u03c0n+1(g)\u03a6, \u2200g \u2208 SE(3)}\n\n(3)\nHere Hom(Fn,Fn+1) is the space of linear (not necessarily equivariant) maps from Fn to Fn+1.\nBy \ufb01nding a basis for the space of intertwiners and parameterizing \u03a6n as a linear combination of\nbasis maps, we can make sure that layer n + 1 transforms according to \u03c0n+1 if layer n transforms\naccording to \u03c0n, thus guaranteeing equivariance of the whole network by induction.\nAs explained in the previous section, \ufb01elds transform according to induced representations [5, 8, 10,\n17]. In this section we show that equivariant maps between induced representations of SE(3) can\nalways be expressed as convolutions with equivariant / steerable \ufb01lter banks. The space of equivariant\n\ufb01lter banks turns out to be a linear subspace of the space of \ufb01lter banks of a conventional 3D CNN.\nThe \ufb01lter banks of our network are expanded in terms of a basis of this subspace with parameters\ncorresponding to expansion coef\ufb01cients.\nSec. 4.1 derives the linear constraint on the kernel space for arbitrary induced representations. From\nSec. 4.2 on we specialize to representations induced from irreducible representations of SO(3) and\nderive a basis of the equivariant kernel space for this choice analytically. Subsequent sections discuss\nchoices of equivariant nonlinearities and the actual discretized implementation.\n\n4\n\n\f4.1 The Subspace of Equivariant Kernels\nA continuous linear map between Fn and Fn+1 can be written using a continuous kernel \u03ba with\nsignature \u03ba : R3 \u00d7 R3 \u2192 RKn+1\u00d7Kn, as follows:\n\n[\u03ba \u00b7 f ](x) =\n\n\u03ba(x, y)f (y)dy\n\n(4)\n\nLemma 1. The map f (cid:55)\u2192 \u03ba \u00b7 f is equivariant if and only if for all g \u2208 SE(3),\n\n(5)\nProof. For this map to be equivariant, it must satisfy \u03ba \u00b7 [\u03c01(g)f ] = \u03c02(g)[\u03ba \u00b7 f ]. Expanding the left\nhand side of this constraint, using g = tr, and the substitution y (cid:55)\u2192 gy, we \ufb01nd:\n\n\u03ba(gx, gy) = \u03c12(r)\u03ba(x, y)\u03c11(r)\u22121,\n\n\u03ba \u00b7 [\u03c01(g)f ](x) =\n\n\u03ba(x, gy)\u03c11(r)f (y)dy\n\nFor the right hand side,\n\n\u03c02(g)[\u03ba \u00b7 f ](x) = \u03c12(r)\n\n\u03ba(g\u22121x, y)f (y)dy.\n\n(cid:90)\n\nR3\n\n(cid:90)\n\nR3\n\n(cid:90)\n\nR3\n\n(6)\n\n(7)\n\n(8)\n\n(10)\n\n(11)\n\nEquating these, and using that the equality has to hold for arbitrary f \u2208 Fn, we conclude:\n\n\u03c12(r)\u03ba(g\u22121x, y) = \u03ba(x, gy)\u03c11(r).\n\nSubstitution of x (cid:55)\u2192 gx and right-multiplication by \u03c11(r)\u22121 yields the result.\nTheorem 2. A linear map from Fn to Fn+1 is equivariant if and only if it is a cross-correlation with\na rotation-steerable kernel.\nProof. Lemma 1 implies that we can write \u03ba in terms of a one-argument kernel, since for g = \u2212x :\n(9)\n\n\u03ba(x, y) = \u03ba(0, y \u2212 x) \u2261 \u03ba(y \u2212 x).\n\nSubstituting this into Equation 4, we \ufb01nd\n\n(cid:90)\n\nR3\n\n(cid:90)\n\nR3\n\n[\u03ba \u00b7 f ](x) =\n\n\u03ba(x, y)f (y)dy =\n\n\u03ba(y \u2212 x)f (y)dy = [\u03ba (cid:63) f ](x).\n\nCross-correlation is always translation-equivariant, but Eq. 5 still constrains \u03ba rotationally:\n\nA kernel satisfying this constraint is called rotation-steerable.\n\n\u03ba(rx) = \u03c12(r)\u03ba(x)\u03c11(r)\u22121.\n\nWe note that \u03ba (cid:63) f (Eq. 10) is exactly the operation used in a conventional convolutional network, just\nwritten in an unconventional form, using a matrix-valued kernel (\u201cpropagator\u201d) \u03ba : R3 \u2192 RKn+1\u00d7Kn.\nSince Eq. 11 is a linear constraint on the correlation kernel \u03ba, the space of equivariant kernels (i.e.\nthose satisfying Eq. 11) forms a vector space. We will now proceed to compute a basis for this space,\nso that we can parameterize the kernel as a linear combination of basis kernels.\n\n4.2 Solving for the Equivariant Kernel Basis\nAs mentioned before, we assume that the Kn-dimensional feature vectors f (x) = \u2295if i(x) consist of\nirreducible features f i(x) of dimension 2 lin + 1. In other words, the representation \u03c1n(r) that acts\non \ufb01bers in layer n is block-diagonal, with irreducible representation Dlin (r) as the i-th block. This\nimplies that the kernel \u03ba : R3 \u2192 RKn+1\u00d7Kn splits into blocks1 \u03bajl : R3 \u2192 R(2j+1)\u00d7(2l+1) mapping\nbetween irreducible features. The blocks themselves are by Eq. 11 constrained to transform as\n\n\u03bajl(rx) = Dj(r)\u03bajl(x)Dl(r)\u22121.\n\n(12)\n\n1For more details on the block structure see Sec. 2.7 of [10]\n\n5\n\n\fFigure 2: Angular part of the basis for the space of steerable kernels \u03bajl (for j = l = 1, i.e. 3D vector \ufb01elds as\ninput and output). From left to right we plot three 3 \u00d7 3 matrices, for j \u2212 l \u2264 J \u2264 j + l i.e. J = 0, 1, 2. Each\n3 \u00d7 3 matrix corresponds to one learnable parameter per radial basis function \u03d5m. A seasoned eye will see the\nidentity, the curl (\u2207\u2227) and the gradient of the divergence (\u2207\u2207\u00b7).\nTo bring this constraint into a more manageable form, we vectorize these kernel blocks to vec(\u03bajl(x)),\nso that we can rewrite the constraint as a matrix-vector equation2\n\nvec(\u03bajl(rx)) = [Dj \u2297 Dl](r) vec(\u03bajl(x)),\n\n(13)\nwhere we used the orthogonality of Dl. The tensor product of representations is itself a representation,\nand hence can be decomposed into irreducible representations. For irreducible SO(3) representations\nDj and Dl of order j and l it is well known [17] that Dj \u2297 Dl can be decomposed in terms of\n2 min(j, l) + 1 irreducible representations of order3 |j \u2212 l| \u2264 J \u2264 j + l. That is, we can \ufb01nd a\nchange of basis matrix4 Q of shape (2l + 1)(2j + 1) \u00d7 (2l + 1)(2j + 1) such that the representation\nbecomes block diagonal:\n\n(cid:20)(cid:77)j+l\n\n(cid:21)\n\n[Dj \u2297 Dl](r) = QT\n\nJ=|j\u2212l| DJ (r)\n\nQ\n\nThus, we can change the basis to \u03b7jl(x) := Q vec(\u03bajl(x)) such that constraint 12 becomes\n\n(cid:20)(cid:77)j+l\n\n(cid:21)\n\n\u03b7jl(rx) =\n\nJ=|j\u2212l| DJ (r)\n\n\u03b7jl(x).\n\n(14)\n\n(15)\n\n(cid:77)j+l\n\nThe block diagonal form of the representation in this basis reveals that \u03b7jl decomposes into\n2 min(j, l) + 1 invariant subspaces of dimension 2J + 1 with separated constraints:\n\n\u03b7jl(x) =\n\nJ=|j\u2212l| \u03b7jl,J (x) ,\n\n\u03b7jl,J (rx) = DJ (r)\u03b7jl,J (x)\n\n(16)\n\nThis is a famous equation for which the unique and complete solution is well-known to be given\nJ (x)) \u2208 R2J+1. More speci\ufb01cally, since x\nby the spherical harmonics Y J (x) = (Y J\u2212J (x), . . . , Y J\nlives in R3 instead of the sphere, the constraint only restricts the angular part of \u03b7jl but leaves its\nradial part free. Therefore, the solutions are given by spherical harmonics modulated by an arbitrary\ncontinuous radial function \u03d5 : R+ \u2192 R as \u03b7jl,J (x) = \u03d5((cid:107)x(cid:107))Y J (x/(cid:107)x(cid:107)).\nTo obtain a complete basis, we can choose a set of radial basis functions \u03d5m : R+ \u2192 R, and de\ufb01ne\nkernel basis functions \u03b7jl,Jm(x) = \u03d5m((cid:107)x(cid:107)) Y J (x/(cid:107)x(cid:107)). Following [42], we choose a Gaussian\nradial shell \u03d5m((cid:107)x(cid:107)) = exp (\u2212 1\n2 ((cid:107)x(cid:107) \u2212 m)2/\u03c32) in our implementation. The angular dependency\nat a \ufb01xed radius of the basis for j = l = 1 is shown in Figure 2.\nBy mapping each \u03b7jl,Jm back to the original basis via QT and unvectorizing, we obtain a basis\n\u03bajl,Jm for the space of equivariant kernels between features of order j and l. This basis is indexed by\nthe radial index m and frequency index J. In the forward pass, we linearly combine the basis kernels\nJm wjl,Jm\u03bajl,Jm using learnable weights w, and stack them into a complete kernel \u03ba,\n\nas \u03bajl =(cid:80)\n\nwhich is passed to a standard 3D convolution routine.\n\n4.3 Equivariant Nonlinearities\nIn order for the whole network to be equivariant, every layer, including the nonlinearities, must\nbe equivariant. In a regular G-CNN, any elementwise nonlinearity will be equivariant because the\nregular representation acts by permuting the activations. In a steerable G-CNN however, special\nequivariant nonlinearities are required.\n\n2vectorize correspond to \ufb02atten it in numpy and the tensor product correspond to np.kron\n3There is a fascinating analogy with the quantum states of a two particle system for which the angular\n\nmomentum states decompose in a similar fashion.\n\n4Q can be expressed in terms of Clebsch-Gordan coef\ufb01cients, but here we only need to know it exists.\n\n6\n\n\fTrivial irreducible features, corresponding to scalar \ufb01elds, do not transform under rotations. So for\nthese features we use conventional nonlinearities like ReLUs or sigmoids. For higher order features\nwe considered tensor product nonlinearities [26] and norm nonlinearities [45], but settled on a novel\nn(x) \u2208 R2lin+1 in\ngated nonlinearity. For each non-scalar irreducible feature \u03bai\nlayer n, we produce a scalar gate \u03c3(\u03b3i\nn (cid:63) fn\u22121(x)), where \u03c3 denotes the sigmoid function and \u03b3i\nn\nis another learnable rotation-steerable kernel. Then, we multiply the feature (a non-scalar \ufb01eld) by\nthe gate (a scalar \ufb01eld): f i\nn (cid:63) fn\u22121) is a\nscalar \ufb01eld, and multiplying any feature by a scalar is equivariant. See Section 1.3 and Figure 1 in the\nSupplementary Material for details.\n\nn (cid:63) fn\u22121 is a scalar \ufb01eld, \u03c3(\u03b3i\n\nn (cid:63) fn\u22121(x)). Since \u03b3i\n\nn (cid:63) fn\u22121(x) = f i\n\nn(x) \u03c3(\u03b3i\n\n4.4 Discretized Implementation\nIn a computer implementation of SE(3) equivariant networks, we need to sample both the \ufb01elds /\nfeature maps and the kernel on a discrete sampling grid in Z3. Since this could introduce aliasing\nartifacts, care is required to make sure that high-frequency \ufb01lters, corresponding to large values of J,\nare not sampled on a grid of low spatial resolution. This is particularly important for small radii since\nnear the origin only a small number of pixels is covered per solid angle. In order to prevent aliasing\nwe hence introduce a radially dependent angular frequency cutoff. Aliasing effect originating from\nthe radial part of the kernel basis are counteracted by choosing a smooth Gaussian radial pro\ufb01le as\ndescribed above. Below we describe how our implementation works in detail.\n\n4.4.1 Kernel space precomputation\nBefore training, we compute basis kernels \u03bajl,Jm(xi) sampled on a s \u00d7 s \u00d7 s cubic grid of points\nxi \u2208 Z3, as follows. For each pair of output and input orders j and l we \ufb01rst sample spherical\nharmonics Y J ,|j \u2212 l| \u2264 J \u2264 j + l in a radially independent manner in an array of shape (2J + 1) \u00d7\ns \u00d7 s \u00d7 s. Then, we transform the spherical harmonics back to the original basis by multiplying\nby QJ \u2208 R(2j+1)(2l+1)\u00d7(2J+1), consisting of 2J + 1 adjacent columns of Q, and unvectorize the\nresulting array to unvec(QJ Y J (xi)) which has shape (2j + 1) \u00d7 (2l + 1) \u00d7 s \u00d7 s \u00d7 s.\nThe matrix Q itself could be expressed in terms of Clebsch-Gordan coef\ufb01cients [17], but we \ufb01nd it\neasier to compute it by numerically solving Eq. 14.\nThe radial dependence is introduced by multiplying the cubes with each windowing function \u03d5m. We\nuse integer means m = 0, . . . ,(cid:98)s/2(cid:99) and a \ufb01xed width of \u03c3 = 0.6 for the radial Gaussian windows.\nSampling high-order spherical harmonics will introduce aliasing effects, particularly near the origin.\nmax, and create basis functions only for |j \u2212 l| \u2264\n(cid:80)J m\nHence, we introduce a radius-dependent bandlimit J m\nJ \u2264 J m\nmax. Each basis kernel is scaled to unit norm for effective signal propagation [42]. In total we\n|j\u2212l| 1 \u2264 ((cid:98)s/2(cid:99) + 1)(2 min(j, l) + 1) basis kernels mapping between \ufb01elds of\n\nget B =(cid:80)(cid:98)s/2(cid:99)\n\norder j and l, and thus a basis array of shape B \u00d7 (2j + 1) \u00d7 (2l + 1) \u00d7 s \u00d7 s \u00d7 s.\n\nm=0\n\nmax\n\n4.4.2 Spatial dimension reduction\nWe found that the performance of the Steerable CNN models depends critically on the way of down-\nsampling the \ufb01elds. In particular, the standard procedure of downsampling via strided convolutions\nperformed poorly compared to smoothing features maps before subsampling. We followed [1] and\nexperiment with applying a low pass \ufb01ltering before performing the downsampling step which can be\nimplemented either via an additional strided convolution with a Gaussian kernel or via an average\npooling. We observed signi\ufb01cant improvements of the rotational equivariance by doing so. See\nTable 2 in the Supplementary Material for a comparison between performances with and without low\npass \ufb01ltering.\n\n4.4.3 Forward pass\nAt training time, we linearly combine the basis kernels using learned weights, and stack them together\ninto a full \ufb01lter bank of shape Kn+1 \u00d7 Kn \u00d7 s \u00d7 s \u00d7 s, which is used in a standard convolution\nroutine. Once the network is trained, we can convert the network to a standard 3D CNN by linearly\ncombining the basis kernels with the learned weights, and storing only the resulting \ufb01lter bank.\n\n7\n\n\f5 Experiments\nWe performed several experiments to gauge the performance and data ef\ufb01ciency of our model.\n\n5.1 Tetris\nIn order to con\ufb01rm the equivariance of our model, we performed a variant of the Tetris experiments re-\nported by [40]. We constructed a 4-layer 3D Steerable CNN and trained it to classify 8 kinds of Tetris\nblocks, stored as voxel grids, in a \ufb01xed orientation. Then we test on Tetris blocks rotated by random ro-\ntations in SO(3). As expected, the 3D Steerable CNN generalizes over rotations and achieves 99\u00b12%\naccuracy on the test set. In contrast, a conventional CNN is not able to generalize over larger unseen\nrotations and gets a result of only 27\u00b17%. For both networks we repeated the experiment over 17 runs.\n\n3D model classi\ufb01cation\n\n5.2\nMoving beyond the simple Tetris blocks, we next con-\nsidered classi\ufb01cation of more complex 3D objects. The\nSHREC17 task [35], which contains 51300 models of 3D\nshapes belonging to 55 classes (chair, table, light, oven,\nkeyboard, etc), has a \u2018perturbed\u2019 category where images\nare arbitrarily rotated, making it a well-suited test case\nfor our model. We converted the input into voxel grids\nof size 64x64x64, and used an architecture similar to the\nTetris case, but with an increased number of layers (see\nTable 3 in the Supplementary Material). Although we have\nnot done extensive \ufb01ne-tuning on this dataset, we \ufb01nd our\nmodel to perform comparably to the current state of the art,\nsee Figure 3 and Table 4 in the Supplementary Material.\n\nO\nu\nr\ns\n\nE\ns\nt\n\ne\nv\ne\ns\n\nF\nu\nr\nu\ny\na\n\nT\na\nt\ns\nu\nm\na\n\nP\nA\nm\no\nr\nc\na\nm\n+\nP\nA\nm\no\nr\nc\nm\n\ni\n\n1.1\n\n1.0\n\n0.9\n\n0.8\n\nZ\nh\no\nu\n\nK\na\nn\ne\nz\na\nk\n\ni\n\nD\ne\nn\ng\n\n105\n\n106\n\n107\n\n108\n\nnumber of parameters\n\nFigure 3: Shrec17 results[2, 7, 14, 16, 24,\n35, 39]. Comparison of different architec-\ntures by number of parameters and score. See\nTable 4 in the Supplementary Material for all\nthe details.\n\n5.3 Visualization of the equivariance property\nWe made a movie to show the action of rotating the input on the internal \ufb01elds. We found that the\naction are remarkably stable. A visualization is provided in https://youtu.be/ENLJACPHSEA.\n\n5.4 Amino acid environments\nNext, we considered the task of predicting amino acid preferences from the atomic environments, a\nproblem which has been studied by several groups in the last year [4, 41]. Since physical forces are\nprimarily a function of distance, one of the previous studies argued for the use of a concentric grid,\ninvestigated strategies for conducting convolutions on such grids, and reported substantial gains when\nusing such convolutions over a standard 3D convolution in a regular grid (0.56 vs 0.50 accuracy) [4].\nSince the classi\ufb01cation of molecular environments involves the recognition of particular interactions\nbetween atoms (e.g. hydrogen bonds), one would expect rotational equivariant convolutions to be\nmore suitable for the extraction of relevant features. We tested this hypothesis by constructing the\nexact same network as used in the original study, merely replacing the conventional convolutional\nlayers with equivalent 3D steerable convolutional layers. Since the latter use substantially fewer\nparameters per channel, we chose to use the same number of \ufb01elds as the number of channels in the\noriginal model, which still only corresponds to roughly half the number of parameters (32.6M vs\n61.1M (regular grid), and 75.3M (concentric representation)). Without any alterations to the model\nand using the same training procedure (apart from adjustment of learning rate and regularization\nfactor), we obtained a test accuracy of 0.58, substantially outperforming the conventional CNN on\nthis task, and also providing an improvement over the state-of-the-art on this problem.\n\n5.5 CATH: Protein structure classi\ufb01cation\nThe molecular environments considered in the task above are oriented based on the protein backbone.\nSimilar to standard images, this implies that the images have a natural orientation. For the \ufb01nal\nexperiment, we wished to investigate the performance of our Steerable 3D convolutions on a problem\ndomain with full rotational invariance, i.e. where the images have no inherent orientation. For this\npurpose, we consider the task of classifying the overall shape of protein structures.\n\n8\n\n\fWe constructed a new data set, based on the CATH protein structure classi\ufb01cation database [11],\nversion 4.2 (see http://cathdb.info/browse/tree). The database is a classi\ufb01cation hierarchy\ncontaining millions of experimentally determined protein domains at different levels of structural\ndetail. For this experiment, we considered the CATH classi\ufb01cation-level of \"architecture\", which\nsplits proteins based on how protein secondary structure elements are organized in three dimensional\nspace. Predicting the architecture from the raw protein structure thus poses a particularly challenging\ntask for the model, which is required to not only detect the secondary structure elements at any\norientation in the 3D volume, but also detect how these secondary structures orient themselves relative\nto one another. We limited ourselves to architectures with at least 500 proteins, which left us with\n10 categories. For each of these, we balanced the data set so that all categories are represented by\nthe same number of structures (711), also ensuring that no two proteins within the set have more\nthan 40% sequence identity. See Supplementary Material for details. The new dataset is available at\nhttps://github.com/wouterboomsma/cath_datasets.\nWe \ufb01rst established a state-of-the-art baseline consisting of a conventional 3D CNN, by conducting a\nrange of experiments with various architectures. We converged on a ResNet34-inspired architecture\nwith half as many channels as the original, and global pooling at the end. The \ufb01nal model consists of\n15, 878, 764 parameters. For details on the experiments done to obtain the baseline, see Supplementary\nMaterial.\nFollowing the same ResNet template, we then constructed a 3D Steerable network by replacing each\nlayer by an equivariant version, keeping the number of 3D channels \ufb01xed. The channels are allocated\nsuch that there is an equal number of \ufb01elds of order l = 0, 1, 2, 3 in each layer except the last, where\nwe only used scalar \ufb01elds (l = 0). This network contains only 143, 560 parameters, more than a\nfactor hundred less than the baseline.\nWe used the \ufb01rst seven of the ten splits for training, the eighth for validation and the last two for\ntesting. The data set was augmented by randomly rotating the input proteins whenever they were\npresented to the model during training. Note that due to their rotational equivariance, 3D Steerable\nCNNs bene\ufb01t only marginally from rotational data augmentation compared to the baseline CNN. We\ntrain the models for 100 epochs using the Adam optimizer [25], with an exponential learning rate\ndecay of 0.94 per epoch starting after an initial burn-in phase of 40 epochs.\nDespite having 100 times fewer parame-\nters, a comparison between the accuracy\non the test set shows a clear bene\ufb01t to the\n3D Steerable CNN on this dataset (Figure 4,\nleftmost value). We proceeded with an in-\nvestigation of the dependency of this perfor-\nmance on the size of the dataset by consid-\nering reductions of the size of each training\nsplit in the dataset by increasing powers of\ntwo, maintaining the same network archi-\ntecture but re-optimizing the regularization\nparameters of the networks. We found that\nthe proposed model outperforms the base-\nline even when trained on a fraction of the\ntraining set size. The results further demon-\nstrate the accuracy improvements across\nthese reductions to be robust (Figure 4).\n\nFigure 4: Accuracy on the CATH test set as a function\nof increasing reduction in training set size.\n\n3D Steerable CNN\n3D CNN\n\ntraining set size reduction factor\n\n0.55\n\n0.50\n\n0.65\n\n0.60\n\ny\nc\na\nr\nu\nc\nc\na\n\nt\ns\ne\nt\n\n23\n\n24\n\n20\n\n21\n\n22\n\n6 Conclusion\nIn this paper we have presented 3D Steerable CNNs, a class of SE(3)-equivariant networks which\nrepresents data in terms of various kinds of \ufb01elds over R3. We have presented a comprehensive\ntheory of 3D Steerable CNNs, and have proven that convolutions with SO(3)-steerable \ufb01lters provide\nthe most general way of mapping between \ufb01elds in an equivariant manner, thus establishing SE(3)-\nequivariant networks as a universal class of architectures. 3D Steerable CNNs require only a minor\nadaptation to the code of a 3D CNN, and can be converted to a conventional 3D CNN after training.\nOur results show that 3D Steerable CNNs are indeed equivariant, and that they show excellent\naccuracy and data ef\ufb01ciency in amino acid propensity prediction and protein structure classi\ufb01cation.\n\n9\n\n\fReferences\n[1] Aharon Azulay and Yair Weiss. Why do deep convolutional networks generalize so poorly to\n\nsmall image transformations? arXiv preprint arXiv:1805.12177, abs/1805.12177, 2018.\n\n[2] Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, and Longin Jan Latecki. Gift: A\nreal-time and scalable 3d shape search engine. In Proceedings of IEEE Conference on Computer\nVision and Pattern Recognition (CVPR), June 2016.\n\n[3] Erik J Bekkers, Maxime W Lafarge, Mitko Veta, Koen AJ Eppenhof, and Josien PW Pluim.\nRoto-translation covariant convolutional networks for medical image analysis. arXiv preprint\narXiv:1804.03393, 2018.\n\n[4] Wouter Boomsma and Jes Frellsen. Spherical convolutions and their application in molecular\nmodelling. In Advances in Neural Information Processing Systems 30, pages 3436\u20133446. 2017.\n\n[5] Tullio Ceccherini-Silberstein, A Mach\u00ec, Fabio Scarabotti, and Filippo Tolli. Induced representa-\n\ntions and mackey theory. Journal of Mathematical Sciences, 156(1):11\u201328, 2009.\n\n[6] Taco Cohen and Max Welling. Learning the irreducible representations of commutative lie\ngroups. In Proceedings of the 31st International Conference on Machine Learning (ICML),\nvolume 31, pages 1755\u20131763, 2014.\n\n[7] Taco S. Cohen, Mario Geiger, Jonas K\u00f6hler, and Max Welling. Spherical CNNs. In International\n\nConference on Learning Representations (ICLR), 2018.\n\n[8] Taco S Cohen, Mario Geiger, and Maurice Weiler.\n\nIntertwiners between induced repre-\nsentations (with applications to the theory of equivariant neural networks). arXiv preprint\narXiv:1803.10743, 2018.\n\n[9] Taco S Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of\nThe 33rd International Conference on Machine Learning (ICML), volume 48, pages 2990\u20132999,\n2016.\n\n[10] Taco S Cohen and Max Welling. Steerable CNNs. In International Conference on Learning\n\nRepresentations (ICLR), 2017.\n\n[11] Natalie L Dawson, Tony E Lewis, Sayoni Das, Jonathan G Lees, David Lee, Paul Ashford,\nChristine A Orengo, and Ian Sillitoe. CATH: an expanded resource to predict protein function\nthrough structure and sequence. Nucleic acids research, 45(D1):D289\u2013D295, 2016.\n\n[12] Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting cyclic symmetry in\nconvolutional neural networks. In International Conference on Machine Learning (ICML),\n2016.\n\n[13] Remco Duits and Erik Franken. Left-invariant diffusions on the space of positions and orien-\ntations and their application to crossing-preserving smoothing of hardi images. International\nJournal of Computer Vision, 92(3):231\u2013264, 2011.\n\n[14] Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. 3D\nobject classi\ufb01cation and retrieval with Spherical CNNs. arXiv preprint arXiv:1711.06721,\nabs/1711.06721, 2017.\n\n[15] William T. Freeman and Edward H Adelson. The design and use of steerable \ufb01lters. IEEE\n\nTransactions on Pattern Analysis & Machine Intelligence, (9):891\u2013906, 1991.\n\n[16] Takahiko Furuya and Ryutarou Ohbuchi. Deep aggregation of local 3d geometric features for\n3d model retrieval. In Proceedings of the British Machine Vision Conference (BMVC), pages\n121.1\u2013121.12, September 2016.\n\n[17] David Gurarie. Symmetries and Laplacians: Introduction to Harmonic Analysis, Group Repre-\n\nsentations and Applications. 1992.\n\n[18] Geoffrey Hinton, Nicholas Frosst, and Sabour Sara. Matrix capsules with EM routing. In\n\nInternational Conference on Learning Representations (ICLR), 2018.\n\n10\n\n\f[19] Emiel Hoogeboom, Jorn W T Peters, Taco S Cohen, and Max Welling. HexaConv.\n\nInternational Conference on Learning Representations (ICLR), 2018.\n\nIn\n\n[20] Truong Son Hy, Shubhendu Trivedi, Horace Pan, Brandon M. Anderson, and Risi Kondor.\nPredicting molecular properties with covariant compositional networks. The Journal of Chemical\nPhysics, 148(24):241745, 2018.\n\n[21] Michiel HJ Janssen, Tom CJ Dela Haije, Frank C Martin, Erik J Bekkers, and Remco Duits.\nThe hessian of axially symmetric functions on se (3) and application in 3d image analysis. In\nInternational Conference on Scale Space and Variational Methods in Computer Vision, pages\n643\u2013655. Springer, 2017.\n\n[22] Michiel HJ Janssen, Augustus JEM Janssen, Erik J Bekkers, Javier Oliv\u00e1n Besc\u00f3s, and Remco\nDuits. Design and processing of invertible orientation scores of 3d images. Journal of Mathe-\nmatical Imaging and Vision, pages 1\u201332, 2018.\n\n[23] Kenichi Kanatani. Group-Theoretical Methods in Image Understanding. Springer-Verlag New\n\nYork, Inc., Secaucus, NJ, USA, 1990.\n\n[24] Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi Nishida. Rotationnet: Joint object\n\ncategorization and pose estimation using multiviews from unsupervised viewpoints, 2018.\n\n[25] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings\n\nof the International Conference on Learning Representations (ICLR), 2015.\n\n[26] Risi Kondor. N-body networks: a covariant hierarchical neural network architecture for learning\n\natomic potentials. arXiv preprint arXiv:1803.01588, 2018.\n\n[27] Risi Kondor, Zhen Lin, and Shubhendu Trivedi. Clebsch\u2013gordan nets: a fully fourier space\nspherical convolutional neural network. In Neural Information Processing Systems (NIPS),\n2018.\n\n[28] Risi Kondor, Hy Truong Son, Horace Pan, Brandon Anderson, and Shubhendu Trivedi. Co-\nvariant compositional networks for learning graphs. In International Conference on Learning\nRepresentations (ICLR), 2018.\n\n[29] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in\n\nneural networks to the action of compact groups. arXiv preprint arXiv:1802.03690, 2018.\n\n[30] Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector\n\n\ufb01eld networks. In International Conference on Computer Vision (ICCV), 2017.\n\n[31] Chris Olah. Groups and group convolutions.\n\n2014-12-Groups-Convolution/, 2014.\n\nhttps://colah.github.io/posts/\n\n[32] Siamak Ravanbakhsh, Jeff Schneider, and Barnabas Poczos. Equivariance through parameter-\n\nsharing. arXiv preprint arXiv:1702.08389, 2017.\n\n[33] Marco Reisert and Hans Burkhardt. Ef\ufb01cient tensor voting with 3d tensorial harmonics. In\nComputer Vision and Pattern Recognition Workshops, 2008. CVPRW\u201908. IEEE Computer\nSociety Conference on, pages 1\u20137. IEEE, 2008.\n\n[34] Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. In\n\nAdvances in Neural Information Processing Systems 30, pages 3856\u20133866. 2017.\n\n[35] Manolis Savva, Fisher Yu, Hao Su, Asako Kanezaki, Takahiko Furuya, Ryutarou Ohbuchi,\nZhichao Zhou, Rui Yu, Song Bai, Xiang Bai, Masaki Aono, Atsushi Tatsuma, S. Thermos,\nA. Axenopoulos, G. Th. Papadopoulos, P. Daras, Xiao Deng, Zhouhui Lian, Bo Li, Henry\nJohan, Yijuan Lu, and Sanjeev Mk. Large-Scale 3D Shape Retrieval from ShapeNet Core55. In\nIoannis Pratikakis, Florent Dupont, and Maks Ovsjanikov, editors, Eurographics Workshop on\n3D Object Retrieval. The Eurographics Association, 2017.\n\n[36] Laurent Sifre and Stephane Mallat. Rotation, scaling and deformation invariant scattering for\ntexture discrimination. IEEE conference on Computer Vision and Pattern Recognition (CVPR),\n2013.\n\n11\n\n\f[37] Eero P Simoncelli and William T Freeman. The steerable pyramid: A \ufb02exible architecture for\nmulti-scale derivative computation. In Image Processing, 1995. Proceedings., International\nConference on, volume 3, pages 444\u2013447. IEEE, 1995.\n\n[38] Henrik Skibbe. Spherical Tensor Algebra for Biomedical Image Analysis. PhD thesis, 2013.\n\n[39] Atsushi Tatsuma and Masaki Aono. Multi-fourier spectra descriptor and augmentation with\n\nspectral clustering for 3d shape retrieval. The Visual Computer, 25(8):785\u2013804, Aug 2009.\n\n[40] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick\nRiley. Tensor Field Networks: Rotation-and Translation-Equivariant Neural Networks for 3D\nPoint Clouds. arXiv preprint arXiv:1802.08219, 2018.\n\n[41] Wen Torng and Russ B Altman. 3D deep convolutional neural networks for amino acid\n\nenvironment similarity analysis. BMC Bioinformatics, 18(1):302, June 2017.\n\n[42] Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable \ufb01lters for rotation\n\nequivariant CNNs. In Computer Vision and Pattern Recognition (CVPR), 2018.\n\n[43] Marysia Winkels and Taco S Cohen. 3D G-CNNs for Pulmonary Nodule Detection. arXiv\n\npreprint arXiv:1804.04656, 2018.\n\n[44] Daniel Worrall and Gabriel Brostow. CubeNet: Equivariance to 3D Rotation and Translation.\n\narXiv preprint arXiv:1804.04458, 2018.\n\n[45] Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Har-\nmonic networks: Deep translation and rotation equivariance. In Computer Vision and Pattern\nRecognition (CVPR), 2017.\n\n[46] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov,\nand Alexander J Smola. Deep sets. In Advances in Neural Information Processing Systems,\npages 3391\u20133401, 2017.\n\n12\n\n\f", "award": [], "sourceid": 6629, "authors": [{"given_name": "Maurice", "family_name": "Weiler", "institution": "University of Amsterdam"}, {"given_name": "Mario", "family_name": "Geiger", "institution": "EPFL"}, {"given_name": "Max", "family_name": "Welling", "institution": "University of Amsterdam / Qualcomm AI Research"}, {"given_name": "Wouter", "family_name": "Boomsma", "institution": "University of Copenhagen"}, {"given_name": "Taco", "family_name": "Cohen", "institution": "University of Amsterdam"}]}