Abstract:Railway defect detection faces many challenges. The complex texture of the railway surface, background noise interference is serious, making it difficult to detect defects; defects of various types, different morphology, resulting in the traditional detection methods are difficult to capture all the details of the features at the same time; smaller defects due to the characteristics of the characteristics are not obvious, often missed. To address these issues, this paper proposes a novel semantic segmentation network that integrates a multi-level parallel attention mechanism and multi-scale information fusion to enhance defect segmentation accuracy. In the encoder, feature extraction and encoding efficiency are improved by leveraging stacked Inverted Bottleneck Convolutions and Fused Inverted Bottleneck Convolutions. The decoder incorporates a multi-level parallel pixel attention module (PAM) to enable the network to effectively focus on and localize defect regions amidst considerable background noise. Additionally, a pyramid pooling module (PPM) is introduced to capture multi-scale contextual information, enhancing the model’s ability to extract both local and global features. A multi-scale spatial information fusion strategy further integrates the outputs of PAM and PPM, maximizing the utilization of feature representations across different levels. Experimental evaluations on the NRSD-MN dataset demonstrate that the proposed method achieves mPA values of 0.836 4 and 0.725 8 and mIoU scores of 0.685 8 and 0.634 2 on the Craft and Real data subsets, respectively. The results confirm that the proposed network outperforms existing models in railway track surface defect segmentation, offering superior accuracy and robustness.