Abstract:To address the problems of low target segmentation accuracy and poor mask quality in traffic scenes, an improved YOLOv11n efficient traffic instance segmentation algorithm, ETIS-YOLO, is proposed. Firstly, the C3k2-WTConv module is constructed by fusing the Wavelet Transform Convolution into the C3k2 module of the backbone network to efficiently expand the receptive field and enhance low-frequency feature extraction; Secondly, the feature interaction enhancement AIFI-LA module is designed to reduce the multi-scale computational redundancy of spatial pyramid pooling-fast (SPPF) and improve its ability to handle long sequences and preserve key feature information; Additionally, the feature recalibration EMCSA module is proposed and embedded into the up-sampling operator content aware reassembly of features (CARAFE) to form a CARAFE-EMCSA module, which reconstructs the up-sampling process to enhance the capture of contextual features and the overall discriminability of feature maps; Finally, Soft-NMS and DIoU-NMS are fused and replaced with the original non-maximum suppression (NMS), which further optimizes the selection and improves the accuracy of the bounding boxes by utilizing relative position information while retaining more high-quality bounding boxes. The experimental results show that on the cityscapes dataset, the bounding box accuracy mAP@0.5 and mAP@0.5:0.95 values are improved by 9.2% and 8.5%, and the segmentation mask accuracy mAP@0.5 and mAP@0.5:0.95 values are improved by 10.6% and 8.8%, respectively, compared with the YOLOv11n model; on the BDD100K dataset, the bounding box accuracy mAP@0.5 and mAP@0.5:0.95 values are improved by 5.1% and 7.4%, and the segmentation mask accuracy mAP@0.5 and mAP@0.5:0.95 values are improved by 4.5% and 6.6%, respectively. It can be seen that the proposed method is effective in traffic scene segmentation.