Summary: | This paper proposes a drop transformation networks (DTNs), a novel framework of learning transformation-invariant representations of images with good flexibility and generalization ability. Convolutional neural networks are a powerful end-to-end learning framework that can learn hierarchies of representations. Although the invariance to translation of the representations can be introduced by the approach of stacking convolutional and max-pooling layers, the approach is not effective in tackling other geometric transformations such as rotation and scale. Rotation and scale invariance are usually obtained through data augmentation, but this requires larger model size and more training time. DTN formulates transformation-invariant representations through explicitly manipulating geometric transformations within it. DTN applies multiple random transformations to its inputs but keeps only one output according to the given dropout policy. In this way, the complex dependencies of the knowledge on transformations contained in training data can be alleviated, and therefore the generalization to transformations is improved. Another advantage of DTN is the flexibility. Under the proposed framework, data augmentation can be seen as a special case. We evaluate DTN on three benchmark data sets and show that it can provide better performance with smaller number of parameters compared to state-of-the-art methods.
|