Video Person Re‐Identification with Frame Sampling–Random Erasure and Mutual Information–Temporal Weight Aggregation

Partial occlusion and background clutter in camera video surveillance affect the accuracy of video‐based person re‐identification (re‐ID). To address these problems, we propose a person re‐ ID method based on random erasure of frame sampling and temporal weight aggregation of mutual information of p...

Full description

Bibliographic Details
Main Authors:	Li, J. (Author), Piao, Y. (Author)
Format:	Article
Language:	English
Published:	MDPI 2022
Subjects:	Background clutter deep learning Deep learning Frame sampling random erasure frame sampling random erasure (FSE) Global feature Mutual information temporal weight aggregation mutual information temporal weight aggregation (MI TWA) Mutual informations Partial occlusions Person re identifications Security systems video person re‐identification Video person re‐identification Video surveillance
Online Access:	View Fulltext in Publisher


LEADER	02626nam a2200337Ia 4500
001	10-3390-s22083047
008	220425s2022 CNT 000 0 und d
020			\|a 14248220 (ISSN)
245	1	0	\|a Video Person Re‐Identification with Frame Sampling–Random Erasure and Mutual Information–Temporal Weight Aggregation
260		0	\|b MDPI \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.3390/s22083047
520	3		\|a Partial occlusion and background clutter in camera video surveillance affect the accuracy of video‐based person re‐identification (re‐ID). To address these problems, we propose a person re‐ ID method based on random erasure of frame sampling and temporal weight aggregation of mutual information of partial and global features. First, for the case in which the target person is interfered or partially occluded, the frame sampling–random erasure (FSE) method is used for data enhancement to effectively alleviate the occlusion problem, improve the generalization ability of the model, and match persons more accurately. Second, to further improve the re‐ID accuracy of video‐based persons and learn more discriminative feature representations, we use a ResNet‐50 network to extract global and partial features and fuse these features to obtain frame‐level features. In the time dimension, based on a mutual information–temporal weight aggregation (MI–TWA) module, the partial features are added according to different weights and the global features are added according to equal weights and connected to output sequence features. The proposed method is exten-sively experimented on three public video datasets, MARS, DukeMTMC‐VideoReID, and PRID‐ 2011; the mean average precision (mAP) values are 82.4%, 94.1%, and 95.3% and Rank‐1 values are 86.4%, 94.8%, and 95.2%, respectively. © 2022 by the authors. Licensee MDPI, Basel, Switzerland.
650	0	4	\|a Background clutter
650	0	4	\|a deep learning
650	0	4	\|a Deep learning
650	0	4	\|a Deep learning
650	0	4	\|a Frame sampling random erasure
650	0	4	\|a frame sampling random erasure (FSE)
650	0	4	\|a Global feature
650	0	4	\|a Mutual information temporal weight aggregation
650	0	4	\|a mutual information temporal weight aggregation (MI TWA)
650	0	4	\|a Mutual informations
650	0	4	\|a Partial occlusions
650	0	4	\|a Person re identifications
650	0	4	\|a Security systems
650	0	4	\|a video person re‐identification
650	0	4	\|a Video person re‐identification
650	0	4	\|a Video surveillance
700	1		\|a Li, J. \|e author
700	1		\|a Piao, Y. \|e author
773			\|t Sensors

Video Person Re‐Identification with Frame Sampling–Random Erasure and Mutual Information–Temporal Weight Aggregation

Similar Items