Abstract:To address the limitations of traditional steel strip surface defect detection methods, such as insufficient feature extraction capability, restricted detection accuracy, and high computational resource consumption, this study proposes ESE-YOLO, a model based on YOLOv8, designed to effectively detect surface defects on steel strips. Firstly, to enhance the model’s ability to extract edge features, an EIEStem efficient front-end module is introduced. This module utilizes a SobelConv branch to extract edge information from images, combined with a pooling branch to capture essential spatial information, thereby improving the model’s perception of defect regions. Secondly, within the backbone network, shift-wise convolution is integrated with the C2f module to construct the C2f_SWC module. This integration expands the model’s field of view through shift operations, enhancing its ability to capture contextual information and further improving the accuracy of spatial feature extraction. Additionally, to optimize the structure of the feature pyramid network, the EMBSFPN module is employed. This module adaptively selects multi-scale convolutional kernels based on different feature layers, enabling progressive acquisition of multi-scale perceptual information. By weighted fusion of the importance of features across different scales, the detection accuracy is enhanced while significantly reducing the model’s parameter count and computational cost. Experimental results indicate that, compared to the original YOLOv8n, ESE-YOLO achieves a 4.1% improvement in mAP on the NEU-DET dataset, with a 26.8% reduction in parameters and a 64% decrease in floating-point operations. On the GC10-DET dataset, ESE-YOLO demonstrates a 9.9% improvement in mAP. Thus, ESE-YOLO significantly enhances detection accuracy while drastically reducing computational resource requirements, better meeting the deployment needs of resource-constrained devices in industrial scenarios.