Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks

This paper proposes a drop transformation networks (DTNs), a novel framework of learning transformation-invariant representations of images with good flexibility and generalization ability. Convolutional neural networks are a powerful end-to-end learning framework that can learn hierarchies of repre...

Full description

Bibliographic Details
Main Authors: Chunxiao Fan, Yang Li, Guijin Wang, Yong Li
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8397162/
id doaj-14e803b61cfa4ab6b4d9233356d23e16
record_format Article
spelling doaj-14e803b61cfa4ab6b4d9233356d23e162021-03-29T21:24:48ZengIEEEIEEE Access2169-35362018-01-016733577336910.1109/ACCESS.2018.28509658397162Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation NetworksChunxiao Fan0Yang Li1https://orcid.org/0000-0001-5848-3461Guijin Wang2https://orcid.org/0000-0002-2131-3044Yong Li3School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, ChinaDepartment of Electronic Engineering, Tsinghua University, Beijing, ChinaSchool of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, ChinaThis paper proposes a drop transformation networks (DTNs), a novel framework of learning transformation-invariant representations of images with good flexibility and generalization ability. Convolutional neural networks are a powerful end-to-end learning framework that can learn hierarchies of representations. Although the invariance to translation of the representations can be introduced by the approach of stacking convolutional and max-pooling layers, the approach is not effective in tackling other geometric transformations such as rotation and scale. Rotation and scale invariance are usually obtained through data augmentation, but this requires larger model size and more training time. DTN formulates transformation-invariant representations through explicitly manipulating geometric transformations within it. DTN applies multiple random transformations to its inputs but keeps only one output according to the given dropout policy. In this way, the complex dependencies of the knowledge on transformations contained in training data can be alleviated, and therefore the generalization to transformations is improved. Another advantage of DTN is the flexibility. Under the proposed framework, data augmentation can be seen as a special case. We evaluate DTN on three benchmark data sets and show that it can provide better performance with smaller number of parameters compared to state-of-the-art methods.https://ieeexplore.ieee.org/document/8397162/Convolutional neural networksdeep learningimage representationtransformation-invariance
collection DOAJ
language English
format Article
sources DOAJ
author Chunxiao Fan
Yang Li
Guijin Wang
Yong Li
spellingShingle Chunxiao Fan
Yang Li
Guijin Wang
Yong Li
Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks
IEEE Access
Convolutional neural networks
deep learning
image representation
transformation-invariance
author_facet Chunxiao Fan
Yang Li
Guijin Wang
Yong Li
author_sort Chunxiao Fan
title Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks
title_short Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks
title_full Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks
title_fullStr Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks
title_full_unstemmed Learning Transformation-Invariant Representations for Image Recognition With Drop Transformation Networks
title_sort learning transformation-invariant representations for image recognition with drop transformation networks
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description This paper proposes a drop transformation networks (DTNs), a novel framework of learning transformation-invariant representations of images with good flexibility and generalization ability. Convolutional neural networks are a powerful end-to-end learning framework that can learn hierarchies of representations. Although the invariance to translation of the representations can be introduced by the approach of stacking convolutional and max-pooling layers, the approach is not effective in tackling other geometric transformations such as rotation and scale. Rotation and scale invariance are usually obtained through data augmentation, but this requires larger model size and more training time. DTN formulates transformation-invariant representations through explicitly manipulating geometric transformations within it. DTN applies multiple random transformations to its inputs but keeps only one output according to the given dropout policy. In this way, the complex dependencies of the knowledge on transformations contained in training data can be alleviated, and therefore the generalization to transformations is improved. Another advantage of DTN is the flexibility. Under the proposed framework, data augmentation can be seen as a special case. We evaluate DTN on three benchmark data sets and show that it can provide better performance with smaller number of parameters compared to state-of-the-art methods.
topic Convolutional neural networks
deep learning
image representation
transformation-invariance
url https://ieeexplore.ieee.org/document/8397162/
work_keys_str_mv AT chunxiaofan learningtransformationinvariantrepresentationsforimagerecognitionwithdroptransformationnetworks
AT yangli learningtransformationinvariantrepresentationsforimagerecognitionwithdroptransformationnetworks
AT guijinwang learningtransformationinvariantrepresentationsforimagerecognitionwithdroptransformationnetworks
AT yongli learningtransformationinvariantrepresentationsforimagerecognitionwithdroptransformationnetworks
_version_ 1724192962174255104