TIPCB: A simple but effective part-based convolutional baseline for text-based person search

Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment...

Full description

Bibliographic Details
Main Authors: Chen, Y. (Author), Lu, Y. (Author), Wang, Z. (Author), Zhang, G. (Author), Zheng, Y. (Author)
Format: Article
Language:English
Published: Elsevier B.V. 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02453nam a2200397Ia 4500
001 10.1016-j.neucom.2022.04.081
008 220517s2022 CNT 000 0 und d
020 |a 09252312 (ISSN) 
245 1 0 |a TIPCB: A simple but effective part-based convolutional baseline for text-based person search 
260 0 |b Elsevier B.V.  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1016/j.neucom.2022.04.081 
520 3 |a Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description. The significant feature gap between two modalities makes this task very challenging. Many existing methods attempt to utilize local alignment to address this problem in the fine-grained level. However, most relevant methods introduce additional models or complicated training and evaluation strategies, which are hard to use in realistic scenarios. In order to facilitate the practical application, we propose a simple but effective baseline for text-based person search named TIPCB (i.e., Text-Image Part-based Convolutional Baseline). Firstly, a novel dual-path local alignment network structure is proposed to extract visual and textual local representations, in which images are segmented horizontally and texts are aligned adaptively. Then, we propose a multi-stage cross-modal matching strategy, which eliminates the modality gap from three feature levels, including low level, local level and global level. Extensive experiments are conducted on the widely-used benchmark datasets (CUHK-PEDES and ICFG-PEDES) and verify that our method outperforms all the existing methods. Our code has been released in https://github.com/OrangeYHChen/TIPCB. © 2022 Elsevier B.V. 
650 0 4 |a adult 
650 0 4 |a article 
650 0 4 |a Convolution 
650 0 4 |a Cross modality 
650 0 4 |a Cross-modality 
650 0 4 |a Fine grained 
650 0 4 |a human 
650 0 4 |a Image retrieval 
650 0 4 |a Local alignment 
650 0 4 |a Local representation 
650 0 4 |a Local representation 
650 0 4 |a Part based 
650 0 4 |a Person search 
650 0 4 |a Person search 
650 0 4 |a Simple++ 
650 0 4 |a Subtask 
650 0 4 |a Text images 
650 0 4 |a Textual description 
700 1 |a Chen, Y.  |e author 
700 1 |a Lu, Y.  |e author 
700 1 |a Wang, Z.  |e author 
700 1 |a Zhang, G.  |e author 
700 1 |a Zheng, Y.  |e author 
773 |t Neurocomputing