Abstract:In real surveillance scenarios, pedestrian re-identification tasks face numerous challenges, such as partial image occlusions (trees, people, cars, small objects, etc.) that lead to the loss of key information and a decline in recognition accuracy. To address issues like low recognition accuracy in occluded pedestrian re-identification tasks, methods that combine local and global features or use pose estimators are commonly employed. Although single-stream networks can achieve good recognition performance under partial occlusions, they fail to fully exploit the remaining critical feature information during processing. Therefore, we propose an occluded pedestrian re-identification method based on a multi-granularity dual-stream network. By designing a multi-granularity local feature extraction strategy, a dual-stream feature processing network, and a feature weight fusion module, the ability to extract key feature information is enhanced. This method employs a vision Transformer (ViT) to extract global features and divides them into multiple groups of local features. Subsequently, each group of local features is processed through a dual-stream feature processing network. The features obtained from the dual-stream network are then fused using a feature weight fusion mechanism, thereby more effectively mining key feature information. Experimental results on the Occluded-Duke, Market-1501, DukeMTMC-reID, and MSMT17 datasets demonstrate the effectiveness and validity of the proposed method, achieving mAP/Rank-1 indicators of 61.3%/68.3%, 89.0%/95.2%, 82.5%/91.1%, and 66.8%/84.5%, respectively.