Abstract:Owing to the constraint of the single-channel structure of gray images, the target contrast within the image is low, the feature information is indistinct, and the color information is lacking. Hence, the detection accuracy is low and the detection process is arduous. To enhance the accuracy of gray image detection and reduce the rates of false detection and missed detection, an object detection algorithm, SAC-YOLO, combining dual observation and attention mechanism was proposed. Firstly, transform atrous convolution was integrated into the backbone network to convert the standard convolution layer into an atrous convolution layer, and the global context module was combined to enhance the model’s accuracy in processing information of different scales and complexities. Secondly, the feature fusion part employs an efficient multi-scale attention mechanism to recalibrate the weight of each channel by encoding global information and interactively captures the pixel-level relationship in gray images across latitudes. Finally, a super-resolution reconstruction detection head was added, and a receptive field attention module and a convolution module were constructed to focus on the spatial information within the receptive field and provide effective attention weights for the large-size convolution kernel, enabling the model to adapt and represent the characteristics of small target information in gray images more precisely. The comparison experiment in the NEU-DET dataset reveals that the recognition accuracy of the improved YOLOv8 algorithm for gray image information attains 79.3%, which is 3.1% higher than that of the original YOLOv8 network. It can be observed from the visualization experiment that the issue of false detection and missed detection has been alleviated. The above experimental results indicate that SAC-YOLO has an excellent detection effect and can achieve high-quality detection in grayscale image scenarios.