Abstract:The current cross-modal person re-identification research focuses on extracting modality-shared features from global features or local features via identity labels to reduce modality differences, but ignores the Subtle features of discernment. This paper proposes a feature enhanced clustering learning (FECL) network. The network mines and enhances the subtle features of different modalities through global and local features, and combines a multilevel joint clustering learning strategy to minimize the modal differences and intraclass variation. In addition, this paper also designs a random color transition module for training data, which increases the interaction between modalities at the image input to overcome the influence of color deviation. The experiments on public datasets verify the effectiveness of the proposed methods. In the Allsearch mode of SYSU-MM01 dataset, the Rank-1 and mAP reach 70.52% and 64.02%. In the V2I retrieval mode of RegDB dataset, the Rank-1 and mAP reach 88.88% and 80.93%.