Abstract:Small object detection in drone images is one of the key and difficult research areas. Compared with large targets, small targets have fewer features and are more susceptible to interference from occlusion and complex backgrounds. To address this issue, a multi model fusion object detection network YOLO-DA based on YOLOv7 tiny is proposed. Firstly, add layers for detecting small and extremely small targets to enhance the network’s ability to learn small target features; Secondly, the spatial adaptive feature fusion ASFF-L detection head is introduced to suppress the inconsistency of features at different scales by learning spatial filtering conflict information, achieving adaptive fusion of multi-scale features; Finally, DCNS deformable convolution was introduced and a modulation mechanism was designed to expand the range of deformable modeling, enhance the modeling ability of the model, and reduce the impact of occlusion overlap on detection. Through experimental verification, the proposed method achieved an average accuracy of 44.7% and a inference speed of 71 fps on the Visdrone2019 dataset. The average accuracy was improved by 9.7% compared to the baseline algorithm, and the model memory was 63.8 M, enabling real-time detection. Through ablation and comparative experiments, it has been shown that YOLO-DA significantly reduces false positives and false negatives in drone aerial image detection, and has higher detection performance. Moreover, the algorithm parameters and computational complexity can meet the real-time detection requirements of edge devices such as drones.