Abstract:Addressing the issues of low accuracy, false positives, and missed detections in metal surface defect detection in industrial field production, as well as the challenges faced by metal surface defect detection, such as difficulty in distinguishing small targets, strong interference from complex backgrounds, and significant noise impact, especially in the areas of small target detection and multi-scale feature extraction, this study proposes the GMS2-YOLO model to improve the accuracy of target defect detection by combining multi-scale feature extraction, feature fusion enhancement, and a separated batch normalization detection head. Firstly, the GhostConv is combined with the HGBlock module, and the GHGnet network structure is used as the backbone network to improve the model’s ability to extract the detailed features of the target defects. Then, in the innovative fusion of MANet and StarBlock in the neck structure, MSNet is used to replace the C3k2 module to achieve more diversified and rich gradient flow information and enhance the extraction and fusion ability of the model. Secondly, the YOLOv11n detection head is redesigned, and the separated batch normalization detection head is used to realize the interaction and fusion of information at different levels, so as to accurately identify the defect target. Finally, using the new loss function WIoU, the accuracy of defect detection is improved by increasing the attention to medium-quality images. The experimental results show that the mean average precision mAP@0.5 index of this method on the Self-Dataset dataset reaches 80.8%, which is 3.6% higher than that of YOLOv11n. The mAP@0.5:0.95 index reached 53.5%, which was 8.5% higher than that of YOLOv11n. In addition, in the NEU-DET dataset, the AP value of the small defect CR in the dataset reached 68.3%, which was 15.8% higher than that of YOLOv11n. The proposed model greatly improves the accuracy of defect detection, and has obvious advantages in solving the problems of false detection and missed detection. In addition, compared with other mainstream models, the improved model improves the detection accuracy without losing more inference speed, and has good prospects for engineering applications.