{"title": "General E(2)-Equivariant Steerable CNNs", "book": "Advances in Neural Information Processing Systems", "page_first": 14334, "page_last": 14345, "abstract": "The big empirical success of group equivariant networks has led in recent years to the sprouting of a great variety of equivariant network architectures. A particular focus has thereby been on rotation and reflection equivariant CNNs for planar images. Here we give a general description of E(2)-equivariant convolutions in the framework of Steerable CNNs. The theory of Steerable CNNs thereby yields constraints on the convolution kernels which depend on group representations describing the transformation laws of feature spaces. We show that these constraints for arbitrary group representations can be reduced to constraints under irreducible representations. A general solution of the kernel space constraint is given for arbitrary representations of the Euclidean group E(2) and its subgroups. We implement a wide range of previously proposed and entirely new equivariant network architectures and extensively compare their performances. E(2)-steerable convolutions are further shown to yield remarkable gains on CIFAR-10, CIFAR-100 and STL-10 when used as drop in replacement for non-equivariant convolutions.", "full_text": "General E(2) - Equivariant Steerable CNNs\n\nMaurice Weiler\u2217\n\nUniversity of Amsterdam, QUVA Lab\n\nm.weiler@uva.nl\n\nGabriele Cesa\u2217\u2020\n\nUniversity of Amsterdam\n\ncesa.gabriele@gmail.com\n\nAbstract\n\nThe big empirical success of group equivariant networks has led in recent years to\nthe sprouting of a great variety of equivariant network architectures. A particular\nfocus has thereby been on rotation and re\ufb02ection equivariant CNNs for planar\nimages. Here we give a general description of E(2)-equivariant convolutions in\nthe framework of Steerable CNNs. The theory of Steerable CNNs thereby yields\nconstraints on the convolution kernels which depend on group representations\ndescribing the transformation laws of feature spaces. We show that these constraints\nfor arbitrary group representations can be reduced to constraints under irreducible\nrepresentations. A general solution of the kernel space constraint is given for\narbitrary representations of the Euclidean group E(2) and its subgroups. We\nimplement a wide range of previously proposed and entirely new equivariant\nnetwork architectures and extensively compare their performances. E(2)-steerable\nconvolutions are further shown to yield remarkable gains on CIFAR-10, CIFAR-100\nand STL-10 when used as drop in replacement for non-equivariant convolutions.\n\nIntroduction\n\n1\nThe equivariance of neural networks under symmetry group actions has in the recent years proven\nto be a fruitful prior in network design. By guaranteeing a desired transformation behavior of\nconvolutional features under transformations of the network input, equivariant networks achieve\nimproved generalization capabilities and sample complexities compared to their non-equivariant\ncounterparts. Due to their great practical relevance, a big pool of rotation- and re\ufb02ection- equivariant\nmodels for planar images has been proposed by now. Unfortunately, an empirical survey, reproducing\nand comparing all these different approaches, is still missing.\nAn important step in this direction is given by the theory of Steerable CNNs [1, 2, 3, 4, 5] which\nde\ufb01nes a very general notion of equivariant convolutions on homogeneous spaces. In particular,\nsteerable CNNs describe E(2)-equivariant (i.e. rotation- and re\ufb02ection-equivariant) convolutions on\nthe image plane R2. The feature spaces of steerable CNNs are thereby de\ufb01ned as spaces of feature\n\ufb01elds, characterized by a group representation which determines their transformation behavior under\ntransformations of the input. In order to preserve the speci\ufb01ed transformation law of feature spaces,\nthe convolutional kernels are subject to a linear constraint, depending on the corresponding group\nrepresentations. While this constraint has been solved for speci\ufb01c groups and representations [1, 2],\nno general solution strategy has been proposed so far. In this work we give a general strategy which\nreduces the solution of the kernel space constraint under arbitrary representations to much simpler\nconstraints under single, irreducible representations.\nSpeci\ufb01cally for the Euclidean group E(2) and its subgroups, we give a general solution of this kernel\nspace constraint. As a result, we are able to implement a wide range of equivariant models, covering\nregular GCNNs [6, 7, 8, 9, 10, 11], classical Steerable CNNs [1], Harmonic Networks [12], gated\nHarmonic Networks [2], Vector Field Networks [13], Scattering Transforms [14, 15, 16, 17, 18] and\nentirely new architectures, in one uni\ufb01ed framework. In addition, we are able to build hybrid models,\nmixing different \ufb01eld types (representations) of these networks both over layers and within layers.\n\n* Equal contribution, author ordering determined by random number generator.\n\u2020 This research has been conducted during an internship at QUVA lab, University of Amsterdam.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fWe further propose a group restriction operation, allowing for network architectures which are\ndecreasingly equivariant with depth. This is useful e.g. for natural images which show low level\nfeatures like edges in arbitrary orientations but carry a sense of preferred orientation globally. An\nadaptive level of equivariance accounts for the resulting loss of symmetry in the hierarchy of features.\nSince the theory of steerable CNNs does not give a preference for any choice of group representation\nor equivariant nonlinearity, we run an extensive benchmark study, comparing different equivariance\ngroups, representations and nonlinearities. We do so on MNIST 12k, rotated MNIST SO(2) and\nre\ufb02ected and rotated MNIST O(2) to investigate the in\ufb02uence of the presence or absence of certain\nsymmetries in the dataset. A drop in replacement of our equivariant convolutional layers is shown to\nyield signi\ufb01cant gains over non-equivariant baselines on CIFAR10, CIFAR100 and STL-10.\nBeyond the applications presented in this paper, our contributions are of relevance for general\nsteerable CNNs on homogeneous spaces [3, 4] and gauge equivariant CNNs on manifolds [5] since\nthese models obey the same kind of kernel constraints. More speci\ufb01cally, 2-dimensional manifolds,\nendowed with an orthogonal structure group O(2) (or subgroups thereof), necessitate exactly the\nkernel constraints solved in this paper. Our results can therefore readily be transferred to e.g. spherical\nCNNs [19, 5, 20, 21, 22, 23] or more general models of geometric deep learning [24, 25, 26, 27].\n2 General E(2) - Equivariant Steerable CNNs\nConvolutional neural networks process images by extracting a hierarchy of feature maps from a given\ninput signal. The convolutional weight sharing ensures the inference to be translation-equivariant\nwhich means that a translated input signal results in a corresponding translation of the feature maps.\nHowever, vanilla CNNs leave the transformation behavior of feature maps under more general\ntransformations, e.g. rotations and re\ufb02ections, unde\ufb01ned. In this work we devise a general framework\nfor convolutional networks which are equivariant under the Euclidean group E(2), that is, under\nisometries of the plane R2. We work in the framework of steerable CNNs [1, 2, 3, 4, 5] which provides\na quite general theory for equivariant CNNs on homogeneous spaces, including Euclidean spaces Rd\nas a speci\ufb01c instance. Sections 2.2 and 2.3 brie\ufb02y review the theory of Euclidean steerable CNNs as\ndescribed in [2]. The following subsections explain our main contributions: a decomposition of the\nkernel space constraint into irreducible subspaces (2.4), their solution for E(2) and subgroups (2.5),\nan overview on the group representations used to steer features, their admissible nonlinearities and\ntheir use in related work (2.6), the group restriction operation (2.7) and implementation details (2.8).\n\nIsometries of the Euclidean plane R2\n\n2.1\nThe Euclidean group E(2) is the group of isometries of the plane R2, consisting of translations,\nrotations and re\ufb02ections. Characteristic patterns in images often occur at arbitrary positions and\nin arbitrary orientations. The Euclidean group therefore models an important factor of variation of\nimage features. This is especially true for images without a preferred global orientation like satellite\nimagery or biomedical images but often also applies to low level features of globally oriented images.\nOne can view the Euclidean group as being constructed from the translation group (R2, +) and the\northogonal group O(2) = {O \u2208 R2\u00d72 | OT O = id2\u00d72} via the semidirect product operation as\nE(2) \u223c= (R2, +) (cid:111) O(2). The orthogonal group thereby contains all operations leaving the origin\ninvariant, i.e. continuous rotations and re\ufb02ections. In order to allow for different levels of equivariance\nand to cover a wide spectrum of related work we consider subgroups of the Euclidean group of the\nform (R2, +) (cid:111) G, de\ufb01ned by subgroups G \u2264 O(2). Speci\ufb01cally, G could be either the special\northogonal group SO(2), the group ({\u00b11}, \u2217) of the re\ufb02ections along a given axis, the cyclic groups\nCN , the dihedral groups DN or the orthogonal group O(2) itself. While SO(2) describes continuous\nrotations (without re\ufb02ections), CN and DN contain N discrete rotations by angles multiple of 2\u03c0\nN\nand, in the case of DN , re\ufb02ections. CN and DN are therefore discrete subgroups of order N and 2N,\nrespectively. For an overview over the groups and their interrelations see Table 6 in the Appendix.\nSince the groups (R2, +) (cid:111) G are semidirect products, one can uniquely decompose any of their\nelements into a product tg where t \u2208 (R2, +) and g \u2208 G [3] which we will do in the rest of the paper.\n2.2 E(2) - steerable feature \ufb01elds\nSteerable CNNs de\ufb01ne feature spaces as spaces of steerable feature \ufb01elds f : R2 \u2192 Rc which\nassociate a c-dimensional feature vector f (x) \u2208 Rc to each point x of a base space, in our case the\n\n2\n\n\fplane R2. In contrast to vanilla CNNs, the feature \ufb01elds of steerable CNNs are associated with a\ntransformation law which speci\ufb01es their transformation under actions of E(2) (or subgroups) and\ntherefore endows features with a notion of orientation. Formally, a feature vector f (x) encodes the\ncoef\ufb01cients of a coordinate independent geometric feature relative to a choice of reference frame or,\nequivalently, image orientation (see Appendix A).\nAn important example are scalar feature \ufb01elds\ns : R2 \u2192 R, describing for instance gray-scale im-\nages or temperature \ufb01elds. The Euclidean group acts\non scalar \ufb01elds by moving each pixel to a new posi-\nfor some tg \u2208 (R2, +) (cid:111) G; see Figure 1, left.\nVector \ufb01elds v : R2 \u2192 R2, like optical \ufb02ow or gra-\ndient\nimages, on the other hand transform as\n\ntion, that is, s(x) (cid:55)\u2192 s(cid:0)(tg)\u22121x(cid:1) = s(cid:0)g\u22121(x \u2212 t)(cid:1)\nv(x) (cid:55)\u2192 g \u00b7 v(cid:0)g\u22121(x \u2212 t)(cid:1). In contrast to the case of\n\nvector \ufb01eld \u03c1(g) = g\nscalar \ufb01eld \u03c1(g) = 1\nFigure 1: Transformation behavior of \u03c1-\ufb01elds.\nscalar \ufb01elds, each vector is therefore not only moved\nto a new position but additionally changes its orientation via the action of g \u2208 G; see Figure 1, right.\nThe transformation law of a general feature \ufb01eld f : R2 \u2192 Rc is fully characterized by its type \u03c1.\nHere \u03c1 : G (cid:55)\u2192 GL(Rc) is a group representation, specifying how the c channels of each feature vector\nf (x) mix under transformations. A representation satis\ufb01es \u03c1(g\u02dcg) = \u03c1(g)\u03c1(\u02dcg) and therefore models\nthe group multiplication g\u02dcg as multiplication of c \u00d7 c matrices \u03c1(g) and \u03c1(\u02dcg). More speci\ufb01cally, a\n\n\u03c1-\ufb01eld transforms under the induced representation12(cid:104)\n(cid:17)\n\nf (x) (cid:55)\u2192 (cid:16)(cid:104)\n\nInd(R2,+)(cid:111)G\n\n(tg) \u00b7 f\n\n(cid:105)\n\n\u03c1\n\nG\n\n(x)\n\n(1)\nAs in the examples above, it transforms feature \ufb01elds by moving the feature vectors from g\u22121(x \u2212 t)\nto a new position x and acting on them via \u03c1(g). We thus \ufb01nd scalar \ufb01elds to correspond to the trivial\nrepresentation \u03c1(g) = 1 \u2200g \u2208 G which re\ufb02ects that the scalar values do not change when being\nmoved. Similarly, a vector \ufb01eld corresponds to the standard representation \u03c1(g) = g of G.\nIn analogy to the feature spaces of vanilla CNNs comprising multiple channels, the feature spaces of\nsteerable CNNs consist of multiple feature \ufb01elds fi : R2 \u2192 Rci, each of which is associated with its\ni fi of feature \ufb01elds is then de\ufb01ned to be concatenated\ni \u03c1i of the individual\nrepresentations. A common example for a stack of feature \ufb01elds are RGB images f: R2\u2192R3. Since\nthe color channels transform independently under rotations we identify them as three independent\ni=1 1 = id3\u00d73 of\nthree trivial representations. While the input and output types of steerable CNNs are given by the\nlearning task, the user needs to specify the types \u03c1i of intermediate feature \ufb01elds as hyperparameters,\nsimilar to the choice of channels for vanilla CNNs. We discuss different choices of representations in\nSection 2.6 and investigate them empirically in Section 3.1.\n\nown type \u03c1i : G \u2192 GL(Rci). A stack f =(cid:76)\nfrom the individual feature \ufb01elds and transforms under the direct sum \u03c1 =(cid:76)\nscalar \ufb01elds. The stacked \ufb01eld representation is thus given by the direct sum(cid:76)3\n\n(cid:105)\n\nG\n\nof (R2, +) (cid:111) G as\n\nInd(R2,+)(cid:111)G\n\n:= \u03c1(g) \u00b7 f(cid:0)g\u22121(x \u2212 t)(cid:1) .\n\n\u03c1\n\n2.3 E(2) - steerable convolutions\nIn order to preserve the transformation law of steerable feature spaces, each network layer is required\nto be equivariant under the group actions. As proven for Euclidean groups in [2], the most general\nequivariant linear map between steerable feature spaces, transforming under \u03c1in and \u03c1out, is given by\nconvolutions with G-steerable kernels3 k : R2 \u2192 Rcout\u00d7cin, satisfying a kernel constraint\n\nk(gx) = \u03c1out(g)k(x)\u03c1in(g\u22121) \u2200g \u2208 G, x \u2208 R2 .\n\n(2)\nIntuitively, this constraint determines the form of the kernel in transformed coordinates gx in terms\nof the kernel in non-transformed coordinates x and thus its response to transformed input \ufb01elds.\nIt ensures that the output feature \ufb01elds transform as speci\ufb01ed by Ind \u03c1out when the input \ufb01elds are\nbeing transformed by Ind \u03c1in; see Appendix G.1 for a proof.\n\n1 Induced representations are the most general transformation laws compatible with convolutions [3, 4].\n2 Note that this simple form of the induced representation is a special case for semidirect product groups.\n3 As k : R2 \u2192 Rcout\u00d7cin returns a matrix of shape (cout, cin) for each position x \u2208 R2, its discretized version\n\ncan be represented by a tensor of shape (cout, cin, X, Y ) as usually done in deep learning frameworks.\n\n3\n\n\fSince the kernel constraint is linear, its solutions form a linear subspace of the vector space of\nunconstrained kernels considered in conventional CNNs. It is thus suf\ufb01cient to solve for a basis of the\nG-steerable kernel space in terms of which the equivariant convolutions can be parameterized. The\nlower dimensionality of the restricted kernel space enhances the parameter ef\ufb01ciency of steerable\nCNNs over conventional CNNs similarly to the increased parameter ef\ufb01ciency of CNNs over MLPs.\n\nIrrep decomposition of the kernel constraint\n\n2.4\nThe kernel constraint (2) in principle needs to be solved individually for each pair of input and\noutput types \u03c1in and \u03c1out to be used in the network. Here we show how the solution of the kernel\nconstraint for arbitrary representations can be reduced to much simpler constraints under irreducible\nrepresentations (irreps). Our approach relies on the fact that any representation of a \ufb01nite or compact\ngroup decomposes under a change of basis into a direct sum of irreps, each corresponding to an\ninvariant subspace of the representation space Rc on which \u03c1 acts. Denoting the change of basis\n\nrepresentations of G and the index set I encodes the types and multiplicities of irreps present in \u03c1.\nA decomposition can be found by exploiting basic results of character theory and linear algebra [28].\nThe decomposition of \u03c1in and \u03c1out in the kernel constraint (2) leads to\n\nby Q, this means that one can always write \u03c1 = Q\u22121(cid:2)(cid:76)\n(cid:104)(cid:77)\n(cid:104)(cid:77)\n\n(g)\nwhich, de\ufb01ning a kernel relative to the irrep bases as \u03ba := QoutkQ\u22121\n\n(cid:104)(cid:77)\n(cid:104)(cid:77)\n\nk(gx) = Q\u22121\nout\n\n(cid:3) Q where \u03c8i are the irreducible\n(cid:105)\n(cid:105)\n\n\u2200g \u2208 G, x \u2208 R2,\n\nQout k(x) Q\u22121\n\nin , implies\n\n(cid:105)\n(cid:105)\n\ni\u2208I \u03c8i\n\n\u03c8\u22121\n\n\u03c8i(g)\n\ni\u2208Iout\n\nj\u2208Iin\n\nQin\n\nin\n\nj\n\n\u2200g \u2208 G, x \u2208 R2.\n\n\u03ba(gx) =\n\n\u03c8i(g)\n\ni\u2208Iout\n\n\u03ba(x)\n\n\u03c8\u22121\n\nj\n\n(g)\n\nj\u2208Iin\n\nThe left and right multiplication with a direct sum of irreps reveals that the constraint decomposes\ninto independent constraints\n\n\u03baij(gx) = \u03c8i(g) \u03baij(x) \u03c8\u22121\n\n(g)\n\nj\n\n\u2200g \u2208 G, x \u2208 R2 where i \u2208 Iout, j \u2208 Iin\n\n(3)\n\non blocks \u03baij in \u03ba corresponding to invariant subspaces of the full space of equivariant kernels; see\nAppendix H for a visualization. In order to solve for a basis of equivariant kernels satisfying the\noriginal constraint (2), it is therefore suf\ufb01cient to solve the irrep constraints (3) to obtain bases for\neach block, revert the change of basis and take the union over different blocks. Speci\ufb01cally, given\nijdij-dimensional basis\n\n(cid:9) for the blocks \u03baij of \u03ba, we get a d =(cid:80)\n(cid:91)\n1 ,\u00b7\u00b7\u00b7, \u03baij\n\ndij-dimensional bases(cid:8)\u03baij\n(cid:8)k1,\u00b7\u00b7\u00b7 , kd\n\n(cid:9) :=\n\n1 Qin, \u00b7\u00b7\u00b7 , Q\u22121\n\nQ\u22121\nout \u03baij\n\n(cid:91)\n\n(cid:110)\n\n(cid:111)\n\nQin\n\n(4)\n\ndij\n\nout \u03baij\ndij\n\ni\u2208Iout\n\nj\u2208Iin\n\nof solutions of (2). Here \u03baij denotes a block \u03baij being \ufb01lled at the corresponding location of a matrix\nof the shape of \u03ba with all other blocks being set to zero; see Appendix H. The completeness of\nthe basis found this way is guaranteed by construction if the bases for each block ij are complete.\nNote that while this approach shares some basic ideas with the solution strategy proposed in [2], it is\ncomputationally more ef\ufb01cient for large representations; see Appendix J. We want to emphasize\nthat this strategy for reducing the kernel constraint to irreducible representations is not restricted to\nsubgroups of O(2) but applies to steerable CNNs in general.\n\n2.5 General solution of the kernel constraint for O(2) and subgroups\nIn order to build isometry-equivariant CNNs on R2 we need to solve the irrep constraints (3) for the\nspeci\ufb01c case of G being O(2) or one of its subgroups. For this purpose note that the action of G on\nR2 is norm-preserving, that is, |g.x| = |x| \u2200g \u2208 G, x \u2208 R2. The constraints (2) and (3) therefore\nonly restrict the angular parts of the kernels but leave their radial parts free. Since furthermore all\nirreps of G correspond to one unique angular frequency (see Appendix I.2), it is convenient to expand\nthe kernel w.l.o.g. in terms of an (angular) Fourier series\n\n(cid:0)x(r, \u03c6)(cid:1) = A\u03b1\u03b2,0(r) +\n\n(cid:88)\u221e\n\n(cid:104)\n\n\u03baij\n\u03b1\u03b2\n\n(5)\nwith real-valued, radially dependent coef\ufb01cients A\u03b1\u03b2,\u00b5 : R+ \u2192 R and B\u03b1\u03b2,\u00b5 : R+ \u2192 R for each\nmatrix entry \u03baij\n\u03b1\u03b2 of block \u03baij. By inserting this expansion into the irrep constraints (3) and projecting\non individual harmonics we obtain constraints on the Fourier coef\ufb01cients, forcing most of them to be\n\nA\u03b1\u03b2,\u00b5(r) cos(\u00b5\u03c6) + B\u03b1\u03b2,\u00b5(r) sin(\u00b5\u03c6)\n\n\u00b5=1\n\n(cid:105)\n\n4\n\n\fzero. The vector spaces of G-steerable kernel blocks \u03baij satisfying the irrep constraints (3) are then\nparameterized in terms of the remaining Fourier coef\ufb01cients. The completeness of this basis follows\nimmediately from the completeness of the Fourier basis. Similar approaches have been followed in\nsimpler settings for the cases of CN in [7], SO(2) in [12] and SO(3) in [2].\nThe resulting bases for the angular parts of kernels for each pair of irreducible representations of\nO(2) are shown in Table 1. It turns out that each basis element is harmonic and associated to one\nunique angular frequency. Appendix I gives an explicit derivation and the resulting bases for all\npossible pairs of irreps for all groups G \u2264 O(2) following the strategy presented in this section. The\nanalytical solutions for SO(2), ({\u00b11},\u2217), CN and DN are found in Tables 8, 10, 11 and 12. Since\nthese groups are subgroups of O(2), they enforce a weaker kernel constraint as compared to O(2).\nAs a result, the bases for G < O(2) are higher dimensional, i.e. they allow for a wider range of\nkernels. A higher level of equivariance therefore leads simultaneously to a guaranteed behavior of the\ninference process under transformations and on the other hand to an improved parameter ef\ufb01ciency.\n\ntrivial\n\n(cid:2)1(cid:3)\n\n\u2205\n\n\u03c8i\n\n\u03c8j\n\ntrivial\n\nsign-\ufb02ip\n\nfrequency\nm \u2208 N+\n\n(cid:34)\n\nsin(m\u03c6)\n(cid:57)cos(m\u03c6)\n\ncos(m\u03c6)\nsin(m\u03c6)\n\nsign-\ufb02ip\n\n\u2205\n\n(cid:2)1(cid:3)\n\n(cid:35)(cid:34)\n\nfrequency n \u2208 N+\n\n(cid:2) sin(n\u03c6), (cid:57) cos(n\u03c6)(cid:3)\n(cid:2) cos(n\u03c6),\nsin(n\u03c6)(cid:3)\nsin(cid:0)(m+n)\u03c6(cid:1) (cid:57)cos(cid:0)(m+n)\u03c6(cid:1)(cid:35)\ncos(cid:0)(m(cid:57)n)\u03c6(cid:1)(cid:35)\n(cid:35)(cid:34)\ncos(cid:0)(m+n)\u03c6(cid:1)\nsin(cid:0)(m+n)\u03c6(cid:1)\ncos(cid:0)(m(cid:57)n)\u03c6(cid:1) (cid:57)sin(cid:0)(m(cid:57)n)\u03c6(cid:1)\nsin(cid:0)(m(cid:57)n)\u03c6(cid:1)\n\n(cid:34)\n\n,\n\nTable 1: Bases for the angular parts of O(2)-steerable kernels satisfying the irrep constraint (3) for different pairs\nof input \ufb01eld irreps \u03c8j and output \ufb01eld irreps \u03c8i.The different types of irreps are explained in Appendix I.2.\n\ni\u2208I \u03c8i\n\ni\u2208I \u03c8i\n\n2.6 Group representations and nonlinearities\nA question which so far has been left open is which \ufb01eld types, i.e. which representations \u03c1 of G,\nshould be used in practice. Considering only the convolution operation with G-steerable kernels for\n\nthe moment, it turns out that any change of basis P to an equivalent representation(cid:101)\u03c1 := P \u22121\u03c1P is\nirrelevant. To see this, consider the irrep decomposition \u03c1 = Q\u22121(cid:2)(cid:76)\n(cid:3) Q used in the solution\nequivalent representation will decompose into(cid:101)\u03c1 = (cid:101)Q\u22121(cid:2)(cid:76)\n(cid:3)(cid:101)Q with (cid:101)Q = QP for some P and\n\u03c1 =(cid:76)\n\nof the kernel constraint to obtain a basis {ki}d\ntherefore result in a kernel basis {P \u22121\nout kiPin}d\ni=1 which entirely negates changes of bases between\nequivalent representations. It would therefore w.l.o.g. suf\ufb01ce to consider direct sums of irreps\ni\u2208I \u03c8i as representations only, reducing the question on which representations to choose to\n\ni=1 of G-steerable kernels as de\ufb01ned by Eq. (4). Any\n\nthe question on which types and multiplicities of irreps to use.\nIn practice, however, convolution layers are interleaved with other operations which are sensitive to\nspeci\ufb01c choices of representations. In particular, nonlinearity layers are required to be equivariant\nunder the action of speci\ufb01c representations. The choice of group representations in steerable CNNs\ntherefore restricts the range of admissible nonlinearities, or, conversely, a choice of nonlinearity allows\nonly for certain representations. In the following we review prominent choices of representations\nfound in the literature in conjunction with their compatible nonlinearities.\nAll equivariant nonlinearities considered here act spatially localized, that is, on each feature vector\nf (x) \u2208 Rcin for all x \u2208 R2 individually. They might produce different types of output \ufb01elds\n\u03c1out : G \u2192 GL(Rcout), that is, \u03c3 : Rcin \u2192 Rcout, f (x) (cid:55)\u2192 \u03c3(f (x)). As proven in Appendix G.2,\nit is suf\ufb01cient to require the equivariance of \u03c3 under the actions of \u03c1in and \u03c1out, i.e. \u03c3 \u25e6 \u03c1in(g) =\n\u03c1out(g)\u25e6\u03c3 \u2200g \u2208 G, for the nonlinearities to be equivariant under the action of induced representations\nwhen being applied to a whole feature \ufb01eld as \u03c3(f )(x) := \u03c3(f (x)).\nA general class of representations are unitary representations which preserve the norm of their\n\nrepresentation space, that is, they satisfy |\u03c1unitary(g)f (x)| = (cid:12)(cid:12)f (x)(cid:12)(cid:12) \u2200 g \u2208 G. As proven in\n\u03c3norm : Rc \u2192 Rc, f (x) (cid:55)\u2192 \u03b7(cid:0)|f (x)|(cid:1) f (x)\n\nAppendix G.2.2, nonlinearities which solely act on the norm of feature vectors but preserve their\norientation are equivariant w.r.t. unitary representations. They can in general be decomposed in\n|f (x)| for some nonlinear function \u03b7 : R\u22650 \u2192 R\u22650 acting\non the norm of feature vectors. Norm-ReLUs, de\ufb01ned by \u03b7(|f (x)|) = ReLU(|f (x)| \u2212 b) where\nb \u2208 R+ is a learned bias, were used in [12, 2]. In [29], the authors consider squashing nonlinearities\n|f (x)|2\n\u03b7(|f (x)|) =\n|f (x)|2+1. Gated nonlinearities were proposed in [2] as conditional version of norm\n\n5\n\n\f1\n\nnonlinearities. They act by scaling the norm of a feature \ufb01eld by learned sigmoid gates\n1+e\u2212s(x) ,\nparameterized by a scalar feature \ufb01eld s. All representations considered in this paper are unitary\nsuch that their \ufb01elds can be acted on by norm-nonlinearities. This applies speci\ufb01cally also to all\nirreducible representations \u03c8i of G \u2264 O(2) which are discussed in detail in Section I.2.\nA common choice of representations of \ufb01nite groups like CN and DN are regular representations.\nTheir representation space R|G| has dimensionality equal to the order of the group, e.g. RN for CN\nand R2N for DN . The action of the regular representation is de\ufb01ned by assigning each axis eg of R|G|\nto a group element g \u2208 G and permuting the axes according to \u03c1G\nreg(\u02dcg)eg := e\u02dcgg. Since this action is\njust permuting channels of \u03c1G\nreg-\ufb01elds, it commutes with pointwise nonlinearities like ReLU; a proof\nis given in Appendix G.2.3. While regular steerable CNNs were empirically found to perform very\nwell, they lead to high dimensional feature spaces with each individual \ufb01eld consuming |G| channels.\nRegular steerable CNNs were investigated for planar images in [6, 7, 8, 9, 10, 17, 18, 30], for spherical\nCNNs in [19, 5] and for volumetric convolutions in [31, 32]. Further, the translation of feature maps\nof conventional CNNs can be viewed as action of the regular representation of the translation group.\nClosely related to regular representations are quotient representations. Instead of permuting |G|\nchannels indexed by G, they permute |G|/|H| channels indexed by cosets gH in the quotient\nspace G/H of a subgroup H \u2264 G. Speci\ufb01cally, they act on axes egH of R|G|/|H| as de\ufb01ned by\n\u03c1G/H\nquot (\u02dcg)egH := e\u02dcggH. As permutation representations, quotient representations allow for pointwise\nnonlinearities; see Appendix G.2.3. Quotient representations were considered in [1, 11].\nRegular and quotient \ufb01elds can furthermore be acted on by nonlinear pooling operators. Via a\ngroup pooling or projection operation max : Rc \u2192 R, f (x) \u2192 max(f (x)) the works [6, 7, 9,\n32, 31] extract the maximum value of a regular or quotient \ufb01eld. The invariance of the maximum\noperation implies that the resulting features form scalar \ufb01elds. Since group pooling operations\ndiscard information on the feature orientations entirely, vector \ufb01eld nonlinearities \u03c3vect : RN \u2192 R2\nfor regular representations of CN were proposed in [13]. Vector \ufb01eld nonlinearities do not only\nkeep the maximum response max(f (x)) but also its index arg max(f (x)). This index corresponds\nto a rotation angle \u03b8 = 2\u03c0\nN arg max(f (x)) which is used to de\ufb01ne a vector \ufb01eld with elements\nv(x) = max(f (x))(cos(\u03b8), sin(\u03b8))T . The equivariance of this operation is proven in G.2.4.\n\n2.7 Group restrictions and inductions\nThe key idea of equivariant networks is to exploit symmetries in the distribution of characteristic\npatterns in signals. The level of symmetry present in data might thereby vary over length scales. For\ninstance, natural images typically show small features like edges in arbitrary orientations. On a larger\nlength scale, however, the rotational symmetry is broken as manifested in visual patterns exclusively\nappearing upright but still in different re\ufb02ections. Each individual layer of a convolutional network\nshould therefore be adapted to the symmetries present in the length scale of its \ufb01elds of view.\nA loss of symmetry can be implemented by restricting the equivariance at a certain depth to a sub-\ngroup (R2, +)(cid:111)H \u2264 (R2, +)(cid:111)G, e.g. from rotations and re\ufb02ections G = O(2) to mere re\ufb02ections\nH = ({\u00b11},\u2217) in the example above. This requires the feature \ufb01elds produced by a layer with a\nhigher level of equivariance to be reinterpreted in the following layer as \ufb01elds transforming under a\nsubgroup. Speci\ufb01cally, a \u03c1-\ufb01eld, transforming according to \u03c1 : G \u2192 GL(Rc), needs to be reinter-\npreted as a \u02dc\u03c1-\ufb01eld, where \u02dc\u03c1 : H \u2192 GL(Rc) is a representation of the subgroup H \u2264 G. This is natu-\nH (\u03c1) : H \u2192 GL(Rc), h (cid:55)\u2192 \u03c1(h) ,\nrally achieved by using the restricted representation \u02dc\u03c1 := ResG\nde\ufb01ned by restricting the domain of \u03c1 to H. Since a subsequent H-steerable convolution layers can\nmap \ufb01elds of arbitrary representations we can readily process the resulting ResG\nH (\u03c1)-\ufb01eld further.\n\nImplementation details\n\n2.8\nE(2)-steerable CNNs rely on convolutions with O(2)-steerable kernels. Our implementation therefore\nrequires the precomputation of steerable kernel bases according to the analytical solutions in Eq. (4)\nwith arbitrary radial parts. Since the kernel basis is sampled on a discrete pixel grid, care has to be\ntaken that no aliasing artifacts occur. During runtime, the sampled basis is expanded using learned\nweights. The resulting G-steerable kernel is then being used in a standard convolution routine. For\nmore details we refer to Appendix C. Our implementation is provided as a PyTorch extension which\nis available at https://github.com/QUVA-Lab/e2cnn.\n\n6\n\n\fFigure 2: Test errors of CN and DN regular steerable CNNs for different orders N for all three MNIST variants.\nLeft: All equivariant models improve upon the non-equivariant baseline on MNIST O(2). The error decreases\nbefore saturating at around 8 orientations. Since the dataset contains re\ufb02ected digits, the DN -equivariant models\nperform better than their CN counterparts. Middle: Since the intraclass variability of MNIST rot is reduced, the\nperformances of the CN model and the baseline improve. In contrast, the DN models are invariant to re\ufb02ections\nsuch that they can\u2019t distinguish between MNIST O(2) and MNIST rot. For N = 1 this leads to a worse\nperformance than that of the baseline. Restricted dihedral models, denoted by DN|5CN , make use of the local\nre\ufb02ectional symmetries but are not globally invariant. This makes them perform better than the CN models. Right:\nOn MNIST 12k the globally invariant models CN and DN don\u2019t yield better results than the baseline, however,\nthe restricted (i.e. non-invariant) models CN|5{e} and DN|5{e} do. For more details see Appendix D.1.\n3 Experiments\nSince the framework of general E(2)-equivariant steerable CNNs supports many choices of groups,\nrepresentations and nonlinearities, we \ufb01rst run an extensive benchmark study over the space of\nsupported models in Section 3.1. The insights from these benchmark experiments are then applied to\nclassify CIFAR and STL-10 images in Sections 3.2 and 3.3. All of our experiments are found in a\ndedicated repository at https://github.com/gabri95/e2cnn_experiments.\n\n3.1 Model benchmarking on transformed MNIST datasets\nWe \ufb01rst perform a comprehensive benchmarking to compare the impact of the different design\nchoices covered in this work. All benchmarked models are evaluated on three different versions\nof the MNIST dataset, each containing 12000 training and 50000 test images. The digits in the\nthree variations MNIST 12k, MNIST rot and MNIST O(2) are left untransformed, are rotated and\nare rotated and re\ufb02ected, respectively. These datasets allow us to study the bene\ufb01t from different\nlevels of G-steerability in the presence or absence of certain symmetries. In order to not disad-\nvantage models with lower levels of equivariance, we train all models using data augmentation\nby the transformations present in the corresponding dataset.\n\nRepresentation and nonlinearity benchmarking: Table 7 in the Appendix shows the test errors\nof 57 different models on the three MNIST variants. The \ufb01rst four columns state the equivariance\ngroups, representations, nonlinearities and invariant maps which distinguish the models. The invariant\nmaps of each model are applied after the last convolution layer to produce G-invariant features.\nAppendix D.1 compares and analyzes all results in detail. In particular, it discusses regular and\nquotient models, group pooling and vector \ufb01eld networks, as well as SO(2) and O(2)-equivariant\nirrep models. The latter employ new kinds of gated-nonlinearities and norm-nonlinearities and, in\nthe case of O(2), introduce induced representations as new feature types. The results of all models\nwhose feature \ufb01elds transform according to regular representations, are summarized in Figure 2.\n\nGroup restriction: All transformed MNIST datasets show local rotational and re\ufb02ectional symme-\ntries but differ in the level of symmetry present at the global scale. While DN and O(2)-equivariant\n\n7\n\n48121620N0.60.70.80.912345test error (%)MNIST O(2)CNDNCNN48121620NMNIST rotCNDNDN|5CNCNN48121620NMNIST 12kCNDNCN|5{e}DN|5{e}CNN\frestriction depth\n\n(0)\n1\n2\n3\n4\n5\n\nno restriction\n\nD16\n\nMNIST rot\n\nMNIST 12k\n\ngroup\nC16\n\nD16, C16\n\ntest error (%)\n0.82 \u00b1 0.02\n0.86 \u00b1 0.05\n0.82 \u00b1 0.03\n0.77 \u00b1 0.03\n0.79 \u00b1 0.03\n0.78 \u00b1 0.04\n1.65 \u00b1 0.02\n\ngroup\n{e}\n\nD16,{e}\n\nD16\n\ntest error (%)\n0.82 \u00b1 0.01\n0.79 \u00b1 0.03\n0.74 \u00b1 0.03\n0.73 \u00b1 0.03\n0.72 \u00b1 0.02\n0.68 \u00b1 0.04\n1.68 \u00b1 0.04\n\ngroup\n{e}\n\nC16,{e}\n\nC16\n\ntest error (%)\n0.82 \u00b1 0.01\n0.80 \u00b1 0.03\n0.77 \u00b1 0.03\n0.76 \u00b1 0.03\n0.77 \u00b1 0.03\n0.75 \u00b1 0.02\n0.95 \u00b1 0.04\n\nTable 2: Effect of the group restriction operation at different depths of the network on MNIST rot and MNIST 12k.\nAll restricted models perform better than non-restricted, and hence globally invariant, models.\n\nmodel\n\ngroup\nC4\nC4\n\nSO(2)\n\n[6]\n[6]\n[12]\n[33]\n[13]\nOurs\n[7]\nOurs\nOurs D16|5C16\n\n-\nC17\nC16\nC16\nC16\n\nrepresentation test error (%)\n3.21 \u00b1 0.0012\nregular/scalar\n2.28 \u00b1 0.0004\nregular\nirreducible\n1.69\n-\n1.2\nregular/vector 1.09\nregular\nregular\nquotient\nregular\n\n0.716 \u00b1 0.028\n0.714 \u00b1 0.022\n0.705 \u00b1 0.025\n0.682 \u00b1 0.022\n\nTable 3: Final runs on MNIST rot\n\n[34]\n\nCIFAR-10 CIFAR-100\nmodel\nwrn28/10\n3.87\n3.36 \u00b1 0.08\nwrn28/10 D1 D1 D1\n3.28 \u00b1 0.10\nwrn28/10* D8 D4 D1\n3.20 \u00b1 0.04\nwrn28/10 C8 C4 C1\n3.13 \u00b1 0.17\nwrn28/10 D8 D4 D1\n2.91 \u00b1 0.13\nwrn28/10 D8 D4 D4\nAA 2.6 \u00b1 0.1\nwrn28/10\nwrn28/10* D8 D4 D1 AA 2.39 \u00b1 0.11\nwrn28/10 D8 D4 D1 AA 2.05 \u00b1 0.03\nTable 4: Test errors on CIFAR (AA=autoaugment)\n\n18.80\n17.97 \u00b1 0.11\n17.42 \u00b1 0.33\n16.47 \u00b1 0.22\n16.76 \u00b1 0.40\n16.22 \u00b1 0.31\n17.1 \u00b1 0.3\n15.55 \u00b1 0.13\n14.30 \u00b1 0.09\n\n[35]\n\nmodels exploit these local symmetries, their global invariance leads to a considerable loss of informa-\ntion. On the other hand, models which are equivariant to the symmetries present at the global scale of\nthe dataset only are not able to generalize over all local symmetries. The proposed group restriction\noperation allows for models which are locally equivariant but are globally invariant only to the level\nof symmetry present in the data. Table 2 reports the results of models which are restricted at different\ndepths. The overall trend is that a restriction at later stages of the model improves the performance.\nAll restricted models perform signi\ufb01cantly better than the invariant models. Figure 2 shows that this\nbehavior is consistent for different orders N.\n\nConvergence rate:\nIn our experiments we \ufb01nd that steerable CNNs converge signi\ufb01cantly faster\nthan non-equivariant CNNs. Figure 4 in the Appendix shows this behavior for regular CN -steerable\nCNNs in comparison to a vanilla CNN. The rate of convergence thereby increases with the order N\nand, as already observed in Figure 2, saturates at approximately N = 8. All models share about the\nsame number of parameters. The faster convergence of equivariant networks is explained by the fact\nthat they generalize over G-transformed images by design which reduces the amount of intra-class\nvariability which they have to learn. Conversely, a conventional CNN has to learn to classify all\ntransformed versions of an image explicitly which requires an increased batch size or more training it-\nerations. The enhanced data ef\ufb01ciency of E(2)-steerable CNNs thus leads to a reduced training time.\n\nCompetitive runs: As a \ufb01nal experiment on MNIST rot we are replicating the regular C16 model\nfrom [7]. It is mostly similar to the models evaluated before but is wider and adds additional fully\nconnected layers; see Table 14 in the Appendix. As reported in Table 3, our reimplementation\nmatches the accuracy of the original model. Replacing the regular feature \ufb01elds with the quotient\nrepresentations used in the benchmarking leads to slightly better results. We refer to Appendix F for\nmore insights on the improved performance of quotient model. A further signi\ufb01cant improvement and\na new state of the art is being achieved by a D16 model, which is restricted to C16 in the \ufb01nal layer.\n\n3.2 CIFAR experiments\nThe statistics of natural images are typically invariant under global translations and re\ufb02ections but\nare not under global rotations. Here we investigate the bene\ufb01t of G-steerable convolutions for\nsuch images by classifying CIFAR-10 and CIFAR-100. For this purpose we implement several DN\nand CN -equivariant versions of WideResNet [34]. Different levels of equivariance, stated in the\nmodel speci\ufb01cations in Table 4, are thereby used in the three main blocks of the network. Regular\nrepresentations are used throughout the whole model. For a fair comparison we scale the width\nof all layers such that the number of parameters of the original wrn28/10 model is preserved. We\nfurther add a small model, marked by an additional *, which has about the same number of channels\n\n8\n\n\fas the non-equivariant wrn28/10. All runs use the same training procedure as reported in [34] and\nAppendix K.3. We want to emphasize that we perform no further hyperparameter tuning.\nThe results of the D1 D1 D1 model con\ufb01rm that incorporating the global symmetries of the data\nyields a signi\ufb01cant boost in accuracy. Interestingly, the C8 C4 C1 model, which is rotation but not\nre\ufb02ection-equivariant, achieves better results, which shows that it is worthwhile to leverage local\nrotational symmetries. Both symmetries are respected simultaneously by the wrn28/10 D8 D4 D1\nmodel. While this model performs better than the two previous ones on CIFAR-10, it surprisingly\nyields slightly worse result on CIFAR-100. The best results are obtained by the D8 D4 D4 model\nwhich suggests that rotational symmetries are useful even on a larger scale. The small wrn28/10*\nD8 D4 D1 model shows a remarkable gain compared to the non-equivariant wrn28/10 baseline despite\nnot being computationally more expensive. To investigate whether equivariance is useful even when a\npowerful data augmentation policy is available, we further rerun both D8 D4 D1 models with AutoAug-\nment (AA) [35]. As without AA, both equivariant models outperform the baseline by a large margin.\n\ngroup\n\n-\n\nD1 D1 D1\nD1 D1 D1\nD8 D4 D1\nD8 D4 D1\n\nmodel\nwrn16/8 [36]\nwrn16/8*\nwrn16/8\nwrn16/8*\nwrn16/8\n\n#params test error (%)\n12.74\u00b10.23\n11M\n11.05\u00b10.45\n5M\n11.17\u00b10.60\n10M\n4.2M 10.57\u00b10.70\n9.80\u00b10.40\n12M\n\n3.3 STL-10 experiments\nIn order to test whether the previous results generalize\nto natural images of higher resolution we run experi-\nments on STL-10 [37]. We adapt the experiments in [36]\nby replacing the non-equivariant convolutions of their\nwrn16/8 model with regular DN -steerable convolutions.\nAs in the CIFAR experiments, we adopt the training set-\ntings and hyperparameters of [36] without changes. Our\nTable 5: Test errors of different equivariant\nfour adapted models, reported in Table 5, are equivariant\nmodels on the STL-10 dataset. Models with *\nunder either the action of D1 in all blocks or the actions\npreserve the number of channels of the baseline.\nof D8, D4 and D1. For both choices we build a large\nmodel, preserving the number of parameters of the baseline, and a small model, which preserves its\nnumber of channels and thus computational requirements. All models improve signi\ufb01cantly over the\nbaseline. Due to their extended equivariance, the small D8 D4 D1 model performs better than the\nlarge D1 D1 D1 model. In comparison to the CIFAR experiments, rotational equivariance gives a\nlarger boost in accuracy since the higher resolution of 96px of STL-10 allows for more detailed local\npatterns which occur in arbitrary orientations. Appendix D.3 reports the results of a data ablation\nstudy. The results validate that the gains from incorporating equivariance are consistent over all\ntraining set sizes. More information on the training procedures is given in Appendix K.4.\n4 Conclusions\nIn this work we presented a general theory of E(2)-equivariant steerable CNNs. By analytically solv-\ning the kernel constraint for any representation of O(2) or its subgroups we were able to reproduce and\ncompare many different models from previous work. We further proposed a group restriction opera-\ntion which allows us to adapt the level of equivariance to the symmetries present on the corresponding\nlength scale. When using G-steerable convolutions as drop in replacement for conventional convolu-\ntion layers we obtained signi\ufb01cant improvements on CIFAR and STL-10 without additional hyperpa-\nrameter tuning. While the kernel expansion leads to a small overhead during train time, the \ufb01nal ker-\nnels can be stored such that during test time steerable CNNs are computationally not more expensive\nthan conventional CNNs of the same width. Due to the enhanced parameter ef\ufb01ciency of equivariant\nmodels it is a common practice to adapt the model width to match the parameter cost of conventional\nCNNs. Our results show that even non-scaled models outperform conventional CNNs in accuracy.\nWe believe that equivariant CNNs will in the long term become the default choice for tasks like\nbiomedical imaging, where symmetries are present on a global scale. The impressive results on\nnatural images demonstrate the great potential of applying E(2)-steerable CNNs to more general\nvision tasks which involve only local symmetries. Future research still needs to investigate the wide\nrange of design choices of steerable CNNs in more depth and collect evidence on whether our \ufb01ndings\ngeneralize to different settings. We hope that our library will help equivariant CNNs to be adopted by\nthe community and facilitate further research.\n\nAcknowledgments\nWe would like to thank Taco Cohen for fruitful discussions on an ef\ufb01cient implementation and helpful feedback\non the paper and Daniel Worrall for elaborating on the implementation of Harmonic Networks.\n\n9\n\n\fReferences\n[1] Taco S. Cohen and Max Welling. Steerable CNNs. In International Conference on Learning\n\nRepresentations (ICLR), 2017.\n\n[2] Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco S. Cohen. 3D\nsteerable CNNs: Learning rotationally equivariant features in volumetric data. In Conference\non Neural Information Processing Systems (NeurIPS), 2018.\n\n[3] Taco S. Cohen, Mario Geiger, and Maurice Weiler.\n\nIntertwiners between induced repre-\nsentations (with applications to the theory of equivariant neural networks). arXiv preprint\narXiv:1803.10743, 2018.\n\n[4] Taco S. Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant CNNs on\n\nhomogeneous spaces. arXiv preprint arXiv:1811.02017, 2018.\n\n[5] Taco S. Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant\nconvolutional networks and the icosahedral CNN. In International Conference on Machine\nLearning (ICML), 2019.\n\n[6] Taco S. Cohen and Max Welling. Group equivariant convolutional networks. In International\n\nConference on Machine Learning (ICML), 2016.\n\n[7] Maurice Weiler, Fred A. Hamprecht, and Martin Storath. Learning steerable \ufb01lters for rotation\nequivariant CNNs. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.\n\n[8] Emiel Hoogeboom, Jorn W. T. Peters, Taco S. Cohen, and Max Welling. HexaConv.\n\nInternational Conference on Learning Representations (ICLR), 2018.\n\nIn\n\n[9] Erik J. Bekkers, Maxime W Lafarge, Mitko Veta, Koen A.J. Eppenhof, Josien P.W. Pluim, and\nRemco Duits. Roto-translation covariant convolutional networks for medical image analysis. In\nInternational Conference on Medical Image Computing and Computer-Assisted Intervention\n(MICCAI), 2018.\n\n[10] Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting cyclic symmetry in\nconvolutional neural networks. In International Conference on Machine Learning (ICML),\n2016.\n\n[11] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution\nin neural networks to the action of compact groups. In International Conference on Machine\nLearning (ICML), 2018.\n\n[12] Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, and Gabriel J. Brostow.\nHarmonic networks: Deep translation and rotation equivariance. In Conference on Computer\nVision and Pattern Recognition (CVPR), 2017.\n\n[13] Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector\n\n\ufb01eld networks. In International Conference on Computer Vision (ICCV), 2017.\n\n[14] Laurent Sifre and St\u00e9phane Mallat. Combined scattering for rotation invariant texture analysis.\nIn European Symposium on Arti\ufb01cial Neural Networks, Computational Intelligence and Machine\nLearning (ESANN), volume 44, pages 68\u201381, 2012.\n\n[15] Laurent Sifre and St\u00e9phane Mallat. Rotation, scaling and deformation invariant scattering for\ntexture discrimination. Conference on Computer Vision and Pattern Recognition (CVPR), 2013.\n\n[16] Joan Bruna and St\u00e9phane Mallat. Invariant scattering convolution networks. IEEE transactions\n\non pattern analysis and machine intelligence, 35(8):1872\u20131886, 2013.\n\n[17] Laurent Sifre and St\u00e9phane Mallat. Rigid-motion scattering for texture classi\ufb01cation. arXiv\n\npreprint arXiv:1403.1687, 2014.\n\n[18] Edouard Oyallon and St\u00e9phane Mallat. Deep roto-translation scattering for object classi\ufb01cation.\n\nIn Conference on Computer Vision and Pattern Recognition (CVPR), 2015.\n\n10\n\n\f[19] Taco S. Cohen, Mario Geiger, Jonas K\u00f6hler, and Max Welling. Spherical CNNs. In International\n\nConference on Learning Representations (ICLR), 2018.\n\n[20] Risi Kondor, Zhen Lin, and Shubhendu Trivedi. Clebsch\u2013Gordan Nets: A Fully Fourier Space\nSpherical Convolutional Neural Network. In Conference on Neural Information Processing\nSystems (NeurIPS), 2018.\n\n[21] Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning\nSO(3) equivariant representations with spherical CNNs. In European Conference on Computer\nVision (ECCV), 2018.\n\n[22] Nathana\u00ebl Perraudin, Micha\u00ebl Defferrard, Tomasz Kacprzak, and Raphael Sgier. DeepSphere:\nEf\ufb01cient spherical Convolutional Neural Network with HEALPix sampling for cosmological\napplications. arXiv:1810.12186 [astro-ph], 2018.\n\n[23] Chiyu Jiang, Jingwei Huang, Karthik Kashinath, Prabhat, Philip Marcus, and Matthias Niessner.\nSpherical CNNs on unstructured grids. In International Conference on Learning Representations\n(ICLR), 2019.\n\n[24] Adrien Poulenard and Maks Ovsjanikov. Multi-directional geodesic neural networks via\n\nequivariant convolution. ACM Transactions on Graphics, 2018.\n\n[25] Jonathan Masci, Davide Boscaini, Michael M. Bronstein, and Pierre Vandergheynst. Geodesic\nIn International Conference on\n\nconvolutional neural networks on Riemannian manifolds.\nComputer Vision Workshop (ICCVW), 2015.\n\n[26] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral Networks and Deep Locally Connected\nNetworks on Graphs. In International Conference on Learning Representations (ICLR), 2014.\n\n[27] Davide Boscaini, Jonathan Masci, Simone Melzi, Michael M. Bronstein, Umberto Castellani,\nand Pierre Vandergheynst. Learning class-speci\ufb01c descriptors for deformable shapes using\nlocalized spectral convolutional networks. Computer Graphics Forum, 2015.\n\n[28] Jean-Pierre Serre. Linear representations of \ufb01nite groups. 1977.\n\n[29] Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. Dynamic routing between capsules. In\n\nConference on Neural Information Processing Systems (NIPS), 2017.\n\n[30] Nichita Diaconu and Daniel Worrall. Learning to convolve: A generalized weight-tying\n\napproach. In International Conference on Machine Learning (ICML), 2019.\n\n[31] Marysia Winkels and Taco S. Cohen. 3D G-CNNs for pulmonary nodule detection.\n\nConference on Medical Imaging with Deep Learning (MIDL), 2018.\n\nIn\n\n[32] Daniel E. Worrall and Gabriel J. Brostow. Cubenet: Equivariance to 3D rotation and translation.\n\nIn European Conference on Computer Vision (ECCV), pages 585\u2013602, 2018.\n\n[33] Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, and Marc Pollefeys. Ti-pooling:\nTransformation-invariant pooling for feature learning in convolutional neural networks. In\nConference on Computer Vision and Pattern Recognition (CVPR), June 2016.\n\n[34] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In British Machine Vision\n\nConference (BMVC), 2016.\n\n[35] Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. Autoaugment:\nLearning augmentation strategies from data. In Conference on Computer Vision and Pattern\nRecognition (CVPR), 2019.\n\n[36] Terrance DeVries and Graham W. Taylor. Improved regularization of convolutional neural\n\nnetworks with cutout. arXiv preprint arXiv:1708.04552, 2017.\n\n[37] Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsu-\npervised feature learning. In International Conference on Arti\ufb01cial Intelligence and Statistics\n(AISTATS), 2011.\n\n11\n\n\f[38] Djork-Arn\u00e9 Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep net-\nwork learning by exponential linear units (ELUs). In International Conference on Learning\nRepresentations (ICLR), 2016.\n\n[39] Nathaniel Thomas, Tess Smidt, Steven M. Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and\nPatrick Riley. Tensor \ufb01eld networks: Rotation- and translation-equivariant neural networks for\n3d point clouds. arXiv preprint arXiv:1802.08219, 2018.\n\n[40] Risi Kondor. N-body networks: a covariant hierarchical neural network architecture for learning\n\natomic potentials. arXiv preprint arXiv:1803.01588, 2018.\n\n[41] Brandon Anderson, Truong-Son Hy, and Risi Kondor. Cormorant: Covariant molecular neural\n\nnetworks. arXiv preprint arXiv:1906.04015, 2019.\n\n[42] Diego Marcos, Michele Volpi, and Devis Tuia. Learning rotation invariant convolutional \ufb01lters\n\nfor texture classi\ufb01cation. In International Conference on Pattern Recognition (ICPR), 2016.\n\n[43] Diego Marcos, Michele Volpi, Benjamin Kellenberger, and Devis Tuia. Land cover mapping\nat very high resolution with rotation equivariant CNNs: Towards small yet accurate models.\nISPRS Journal of Photogrammetry and Remote Sensing, 145:96\u2013107, 2018.\n\n[44] Bastiaan S. Veeling, Jasper Linmans, Jim Winkens, Taco S. Cohen, and Max Welling. Rota-\ntion equivariant CNNs for digital pathology. In International Conference on Medical Image\nComputing and Computer-Assisted Intervention (MICCAI), 2018.\n\n[45] Geoffrey Hinton, Nicholas Frosst, and Sabour Sara. Matrix capsules with EM routing. In\n\nInternational Conference on Learning Representations (ICLR), 2018.\n\n12\n\n\f", "award": [], "sourceid": 8112, "authors": [{"given_name": "Maurice", "family_name": "Weiler", "institution": "University of Amsterdam"}, {"given_name": "Gabriele", "family_name": "Cesa", "institution": "University of Amsterdam"}]}