Max Jaderberg, Karen Simonyan, Andrew Zisserman, koray kavukcuoglu
Convolutional Neural Networks define an exceptionallypowerful class of model, but are still limited by the lack of abilityto be spatially invariant to the input data in a computationally and parameterefficient manner. In this work we introduce a new learnable module, theSpatial Transformer, which explicitly allows the spatial manipulation ofdata within the network. This differentiable module can be insertedinto existing convolutional architectures, giving neural networks the ability toactively spatially transform feature maps, conditional on the feature map itself,without any extra training supervision or modification to the optimisation process. We show that the useof spatial transformers results in models which learn invariance to translation,scale, rotation and more generic warping, resulting in state-of-the-artperformance on several benchmarks, and for a numberof classes of transformations.