Abstract:To address the challenges in detecting small targets with dense distribution and large-scale variations in UAV aerial images, a cross-scale target detection model for UAV aerial images, named CS-YOLOv5s, is proposed. Firstly, based on YOLOv5s, micro-object detector is utilized to improve the model ability for capturing small targets. Then, the max-pooling branch is embedded into the context augment model, extracting and enhancing deep feature maps at the tail of the backbone network. The PANet is injected to achieve effective fusion of deep and shallow features with enhancing the cross-scale detection capability. Furthermore, the down-sampling convolution module is replaced with the SPDConv module to achieve efficient detection of dense objects in UAV aerial images. Experiments demonstrate that CS-YOLOv5s achieves 42.0% mAP0.5 on the VisDrone2019 dataset, which is increased by 9.8% than that of the baseline model. Our model enhances the network ability to recognize small targets in UAV aerial images effectively, which provides a new way for intelligent targets recognition of UAV.