Person Re-Identification Based on Two-Stream Network With Attention and Pose Features

Due to posture, blurring, occlusion, and other problems, person re-identification(Re-ID) remains a challenging task at present. In this paper, we combine the advantages of pose estimation and attention mechanism to better solve these problems with better performance, which combines pose and attentio...

Full description

Bibliographic Details
Main Authors: Xiaowei Gong, Suguo Zhu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8795487/
id doaj-b4dfba4b039a49b389363482e0c795de
record_format Article
spelling doaj-b4dfba4b039a49b389363482e0c795de2021-04-05T17:17:04ZengIEEEIEEE Access2169-35362019-01-01713137413138210.1109/ACCESS.2019.29351168795487Person Re-Identification Based on Two-Stream Network With Attention and Pose FeaturesXiaowei Gong0https://orcid.org/0000-0002-4828-5928Suguo Zhu1Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaKey Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaDue to posture, blurring, occlusion, and other problems, person re-identification(Re-ID) remains a challenging task at present. In this paper, we combine the advantages of pose estimation and attention mechanism to better solve these problems with better performance, which combines pose and attention with two-stream network. Our proposed method mainly consists of two parts. 1) Spatial Features with Fusion Multi-Layer Features and Attention: the same pedestrian presents different gestures under different camera angles, indicating that the simple spatial information is no longer reliable. Therefore, it becomes important to distinguish view invariant features from multiple semantic levels. As a consequence, we fusion the mid-level and high-level features, and then correlate global information through self-attention. Due to fusion the mid-level and high-level features, semantic information is more abundant, which enables the attention mechanism to better focus on the important areas of the picture; 2) Aggregation Attention Stream and Pose Estimation Stream Features: although self-attention mechanism can automatically pay attention to the important areas of the image, it may pay too much focus on the prominent parts of the body and ignore the edge information of the body. Hence, the guidance of pedestrian posture is needed to make self-attention better able to pay attention to all parts of the body. Finally, we use bilinear pooling aggregates the features of two-stream as the final features. We do not use any data enhancement and re-ranking methods to achieve the $rank=1$ accuracy of 93.3% and 85.5% in Market1501 and DukeMTMC-reID datasets, respectively, which indicates the effectiveness of our method.https://ieeexplore.ieee.org/document/8795487/Attentionpose estimationperson re-identificationtwo-stream
collection DOAJ
language English
format Article
sources DOAJ
author Xiaowei Gong
Suguo Zhu
spellingShingle Xiaowei Gong
Suguo Zhu
Person Re-Identification Based on Two-Stream Network With Attention and Pose Features
IEEE Access
Attention
pose estimation
person re-identification
two-stream
author_facet Xiaowei Gong
Suguo Zhu
author_sort Xiaowei Gong
title Person Re-Identification Based on Two-Stream Network With Attention and Pose Features
title_short Person Re-Identification Based on Two-Stream Network With Attention and Pose Features
title_full Person Re-Identification Based on Two-Stream Network With Attention and Pose Features
title_fullStr Person Re-Identification Based on Two-Stream Network With Attention and Pose Features
title_full_unstemmed Person Re-Identification Based on Two-Stream Network With Attention and Pose Features
title_sort person re-identification based on two-stream network with attention and pose features
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Due to posture, blurring, occlusion, and other problems, person re-identification(Re-ID) remains a challenging task at present. In this paper, we combine the advantages of pose estimation and attention mechanism to better solve these problems with better performance, which combines pose and attention with two-stream network. Our proposed method mainly consists of two parts. 1) Spatial Features with Fusion Multi-Layer Features and Attention: the same pedestrian presents different gestures under different camera angles, indicating that the simple spatial information is no longer reliable. Therefore, it becomes important to distinguish view invariant features from multiple semantic levels. As a consequence, we fusion the mid-level and high-level features, and then correlate global information through self-attention. Due to fusion the mid-level and high-level features, semantic information is more abundant, which enables the attention mechanism to better focus on the important areas of the picture; 2) Aggregation Attention Stream and Pose Estimation Stream Features: although self-attention mechanism can automatically pay attention to the important areas of the image, it may pay too much focus on the prominent parts of the body and ignore the edge information of the body. Hence, the guidance of pedestrian posture is needed to make self-attention better able to pay attention to all parts of the body. Finally, we use bilinear pooling aggregates the features of two-stream as the final features. We do not use any data enhancement and re-ranking methods to achieve the $rank=1$ accuracy of 93.3% and 85.5% in Market1501 and DukeMTMC-reID datasets, respectively, which indicates the effectiveness of our method.
topic Attention
pose estimation
person re-identification
two-stream
url https://ieeexplore.ieee.org/document/8795487/
work_keys_str_mv AT xiaoweigong personreidentificationbasedontwostreamnetworkwithattentionandposefeatures
AT suguozhu personreidentificationbasedontwostreamnetworkwithattentionandposefeatures
_version_ 1721539945402728448