Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature

Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification meth...

Full description

Bibliographic Details
Main Authors:	Rui Sun, Qiheng Huang, Miaomiao Xia, Jun Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2018-10-01
Series:	Sensors
Subjects:	person re-identification end-to-end architecture appearance-temporal features Siamese network pivotal frames
Online Access:	https://www.mdpi.com/1424-8220/18/11/3669

id	doaj-cacc87ada8544f7d973c00763ae6bdfc
record_format	Article
spelling	doaj-cacc87ada8544f7d973c00763ae6bdfc2020-11-24T21:09:59ZengMDPI AGSensors1424-82202018-10-011811366910.3390/s18113669s18113669Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal FeatureRui Sun0Qiheng Huang1Miaomiao Xia2Jun Zhang3School of Computer Science and Information Engineering, Hefei University of Technology, Feicui Road 420, Hefei 230000, ChinaSchool of Computer Science and Information Engineering, Hefei University of Technology, Feicui Road 420, Hefei 230000, ChinaSchool of Computer Science and Information Engineering, Hefei University of Technology, Feicui Road 420, Hefei 230000, ChinaSchool of Computer Science and Information Engineering, Hefei University of Technology, Feicui Road 420, Hefei 230000, ChinaVideo-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person’s temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons.https://www.mdpi.com/1424-8220/18/11/3669person re-identificationend-to-end architectureappearance-temporal featuresSiamese networkpivotal frames
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Rui Sun Qiheng Huang Miaomiao Xia Jun Zhang
spellingShingle	Rui Sun Qiheng Huang Miaomiao Xia Jun Zhang Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature Sensors person re-identification end-to-end architecture appearance-temporal features Siamese network pivotal frames
author_facet	Rui Sun Qiheng Huang Miaomiao Xia Jun Zhang
author_sort	Rui Sun
title	Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature
title_short	Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature
title_full	Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature
title_fullStr	Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature
title_full_unstemmed	Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature
title_sort	video-based person re-identification by an end-to-end learning architecture with hybrid deep appearance-temporal feature
publisher	MDPI AG
series	Sensors
issn	1424-8220
publishDate	2018-10-01
description	Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person’s temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons.
topic	person re-identification end-to-end architecture appearance-temporal features Siamese network pivotal frames
url	https://www.mdpi.com/1424-8220/18/11/3669
work_keys_str_mv	AT ruisun videobasedpersonreidentificationbyanendtoendlearningarchitecturewithhybriddeepappearancetemporalfeature AT qihenghuang videobasedpersonreidentificationbyanendtoendlearningarchitecturewithhybriddeepappearancetemporalfeature AT miaomiaoxia videobasedpersonreidentificationbyanendtoendlearningarchitecturewithhybriddeepappearancetemporalfeature AT junzhang videobasedpersonreidentificationbyanendtoendlearningarchitecturewithhybriddeepappearancetemporalfeature
_version_	1716756728357322752

Video-Based Person Re-Identification by an End-To-End Learning Architecture with Hybrid Deep Appearance-Temporal Feature

Similar Items