Researchers develop algorithm to solve text-based person search problems

Research

Chenyang GAO | 10/18/2022

Text-based person search aims at retrieving the target person in an image gallery using a descriptive sentence of that person. It is an important and challenging problem with numerous real-world applications, such as intelligent video surveillance, criminal investigation, etc.

With the increasing demand for public security, video surveillance technology has been widely used, resulting in massive video surveillance data. However, searching for criminal suspects from such large-scale video data is very difficult. Designing algorithms to automatically search for target pedestrians has important application value.

Researchers from the Department of Computer Science and Engineering (CSE) at the Southern University of Science and Technology (SUSTech) have recently proposed a Conditional Feature Learning based Transformer to learn a better feature distribution. Their work, entitled “Conditional Feature Learning based Transformer for Text-Based Person Search,” has been published in IEEE Transactions on Image Processing, a flagship international journal in the computer vision field.

They noticed that most previous Transformer-based methods simply concatenate image region features and text features as input and learn a cross-modal representation in a brute force manner. Such weakly supervised learning approaches fail to explicitly build alignment between image region features and text features, causing an inferior feature distribution. To address this issue, they proposed a novel Conditional Feature Learning based Transformer (Figure 1). Their proposed Transformer can explicitly build alignment between image regions and words. For each image region or word, their Transformer outputs a score that measures how well it matches with the other modality. The experimental results show that the accuracy of their proposed method (Figure. 2) significantly outperforms the international frontier methods, which is of great significance to the research on Transformer in the fine-grained cross-modal retrieval field.

Figure 1: Conditional Feature Learning based Transformer

Figure 2: Overall framework

Chenyang Gao, an undergraduate student of the Department of CSE at SUSTech, is the first author of this paper. Assoc. Prof. Feng Zheng from the Department of CSE at SUSTech is the corresponding author.

This work was supported by the 2020 Tencent Rhino-Bird Elite Training Program. With the support of Prof. Feng Zheng, Chenyang Gao was selected for the Tencent Rhino-Bird Elite Training Program in 2020.

Every year, undergraduates of the Department of CSE at SUSTech publish their research results in top international journals or conferences. There has been a heavy emphasis on problem-solving and computational thinking skills, rather than just book knowledge, in computer science undergraduate majors at SUSTech.

Paper link: https://ieeexplore.ieee.org/document/9893017

To read all stories about SUSTech science, subscribe to the monthly SUSTech Newsletter.

2022, 10-18

By Chenyang GAO