Extended Global–Local Representation Learning for Video Person Re-Identification

Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for vide...

Full description

Bibliographic Details
Main Authors: Wanru Song, Yahong Wu, Jieying Zheng, Changhong Chen, Feng Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8818112/
id doaj-2dcca853fce94e86bc3a8db5f989b9eb
record_format Article
spelling doaj-2dcca853fce94e86bc3a8db5f989b9eb2021-03-29T23:17:44ZengIEEEIEEE Access2169-35362019-01-01712268412269610.1109/ACCESS.2019.29379748818112Extended Global–Local Representation Learning for Video Person Re-IdentificationWanru Song0https://orcid.org/0000-0002-7067-6108Yahong Wu1Jieying Zheng2https://orcid.org/0000-0003-4933-4688Changhong Chen3Feng Liu4Jiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRecently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.https://ieeexplore.ieee.org/document/8818112/Bi-directional LSTMfeature extractionglobal-local feature representationperson re-identificationvideo
collection DOAJ
language English
format Article
sources DOAJ
author Wanru Song
Yahong Wu
Jieying Zheng
Changhong Chen
Feng Liu
spellingShingle Wanru Song
Yahong Wu
Jieying Zheng
Changhong Chen
Feng Liu
Extended Global–Local Representation Learning for Video Person Re-Identification
IEEE Access
Bi-directional LSTM
feature extraction
global-local feature representation
person re-identification
video
author_facet Wanru Song
Yahong Wu
Jieying Zheng
Changhong Chen
Feng Liu
author_sort Wanru Song
title Extended Global–Local Representation Learning for Video Person Re-Identification
title_short Extended Global–Local Representation Learning for Video Person Re-Identification
title_full Extended Global–Local Representation Learning for Video Person Re-Identification
title_fullStr Extended Global–Local Representation Learning for Video Person Re-Identification
title_full_unstemmed Extended Global–Local Representation Learning for Video Person Re-Identification
title_sort extended global–local representation learning for video person re-identification
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.
topic Bi-directional LSTM
feature extraction
global-local feature representation
person re-identification
video
url https://ieeexplore.ieee.org/document/8818112/
work_keys_str_mv AT wanrusong extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT yahongwu extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT jieyingzheng extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT changhongchen extendedglobalx2013localrepresentationlearningforvideopersonreidentification
AT fengliu extendedglobalx2013localrepresentationlearningforvideopersonreidentification
_version_ 1724189773691617280