Extended Global–Local Representation Learning for Video Person Re-Identification

Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for vide...

Full description

Bibliographic Details
Main Authors:	Wanru Song, Yahong Wu, Jieying Zheng, Changhong Chen, Feng Liu
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Bi-directional LSTM feature extraction global-local feature representation person re-identification video
Online Access:	https://ieeexplore.ieee.org/document/8818112/

id	doaj-2dcca853fce94e86bc3a8db5f989b9eb
record_format	Article
spelling	doaj-2dcca853fce94e86bc3a8db5f989b9eb2021-03-29T23:17:44ZengIEEEIEEE Access2169-35362019-01-01712268412269610.1109/ACCESS.2019.29379748818112Extended Global–Local Representation Learning for Video Person Re-IdentificationWanru Song0https://orcid.org/0000-0002-7067-6108Yahong Wu1Jieying Zheng2https://orcid.org/0000-0003-4933-4688Changhong Chen3Feng Liu4Jiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaJiangsu Key Laboratory of Image Processing and Image Communications, Nanjing University of Posts and Telecommunications, Nanjing, ChinaRecently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.https://ieeexplore.ieee.org/document/8818112/Bi-directional LSTMfeature extractionglobal-local feature representationperson re-identificationvideo
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wanru Song Yahong Wu Jieying Zheng Changhong Chen Feng Liu
spellingShingle	Wanru Song Yahong Wu Jieying Zheng Changhong Chen Feng Liu Extended Global–Local Representation Learning for Video Person Re-Identification IEEE Access Bi-directional LSTM feature extraction global-local feature representation person re-identification video
author_facet	Wanru Song Yahong Wu Jieying Zheng Changhong Chen Feng Liu
author_sort	Wanru Song
title	Extended Global–Local Representation Learning for Video Person Re-Identification
title_short	Extended Global–Local Representation Learning for Video Person Re-Identification
title_full	Extended Global–Local Representation Learning for Video Person Re-Identification
title_fullStr	Extended Global–Local Representation Learning for Video Person Re-Identification
title_full_unstemmed	Extended Global–Local Representation Learning for Video Person Re-Identification
title_sort	extended global–local representation learning for video person re-identification
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Recently, person re-identification has become one of the research hotspots in the field of computer vision and has received extensive attention in the academic community. Inspired by the part-based research of image ReID, this paper presents a novel feature learning and extraction framework for video-based person re-identification, namely, the extended global-local representation learning network (E-GLRN). Given a video sequence of a pedestrian, the holistic and local features are simultaneously extracted using the E-GLRN network. Specifically, for the global feature learning, we adopt the channel attention convolutional neural network (CNN) and the bidirectional long short-term memory (Bi-LSTM) networks, which are responsible for introducing a CNN-LSTM module to learn the features of consecutive frames. The local feature learning module relies on the key local information extraction, which is based on the Bi-LSTM networks. In order to obtain the local feature more effectively, our work defines a concept of “the main image group” by selecting three representative frames. The local feature representation of a video is obtained by exploiting the spatial contextual and appearance information of this group. The local and global features extracted in this paper are complementary and further combined into a discriminative and robust feature representation of the video sequence. Extensive experiments are conducted on three video-based ReID datasets, including the iLIDS-VID, PRID2011 and MARS datasets. The experimental results demonstrate that the proposed method outperforms state-of-the-art video-based re-identification approaches.
topic	Bi-directional LSTM feature extraction global-local feature representation person re-identification video
url	https://ieeexplore.ieee.org/document/8818112/
work_keys_str_mv	AT wanrusong extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT yahongwu extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT jieyingzheng extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT changhongchen extendedglobalx2013localrepresentationlearningforvideopersonreidentification AT fengliu extendedglobalx2013localrepresentationlearningforvideopersonreidentification
_version_	1724189773691617280

Extended Global&#x2013;Local Representation Learning for Video Person Re-Identification

Similar Items

Extended Global–Local Representation Learning for Video Person Re-Identification