Extended Global–Local Representation Learning for Video Person Re-Identification
Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for vide...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8818112/ |
id |
doaj-2dcca853fce94e86bc3a8db5f989b9eb |
---|---|
record_format |
Article |
spelling |
doaj-2dcca853fce94e86bc3a8db5f989b9eb2021-03-29T23:17:44ZengIEEEIEEE Access2169-35362019-01-01712268412269610.1109/ACCESS.2019.29379748818112Extended Global–Local Representation Learning for Video Person Re-IdentificationWanru Song0https://orcid.org/0000-0002-7067-6108Yahong Wu1Jieying Zheng2https://orcid.org/0000-0003-4933-4688Changhong Chen3Feng Liu4Jiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRecently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.https://ieeexplore.ieee.org/document/8818112/Bi-directional LSTMfeature extractionglobal-local feature representationperson re-identificationvideo |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Wanru Song Yahong Wu Jieying Zheng Changhong Chen Feng Liu |
spellingShingle |
Wanru Song Yahong Wu Jieying Zheng Changhong Chen Feng Liu Extended Global–Local Representation Learning for Video Person Re-Identification IEEE Access Bi-directional LSTM feature extraction global-local feature representation person re-identification video |
author_facet |
Wanru Song Yahong Wu Jieying Zheng Changhong Chen Feng Liu |
author_sort |
Wanru Song |
title |
Extended Global–Local Representation Learning for Video Person Re-Identification |
title_short |
Extended Global–Local Representation Learning for Video Person Re-Identification |
title_full |
Extended Global–Local Representation Learning for Video Person Re-Identification |
title_fullStr |
Extended Global–Local Representation Learning for Video Person Re-Identification |
title_full_unstemmed |
Extended Global–Local Representation Learning for Video Person Re-Identification |
title_sort |
extended global–local representation learning for video person re-identification |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches. |
topic |
Bi-directional LSTM feature extraction global-local feature representation person re-identification video |
url |
https://ieeexplore.ieee.org/document/8818112/ |
work_keys_str_mv |
AT wanrusong extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT yahongwu extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT jieyingzheng extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT changhongchen extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT fengliu extendedglobalx2013localrepresentationlearningforvideopersonreidentification |
_version_ |
1724189773691617280 |