Abstract:Small object detection in UAV aerial imagery encounters critical challenges including extremely small target sizes, complex background interference, and insufficient feature representation. Addressing the limitations of existing RT-DETR models in small object feature extraction and multi-scale fusion, this paper proposes an adaptive multi-scale gated enhancement fusion DETR (MGEF-DETR). A multi-order cross-stage gated aggregation (MCGA) module is designed to achieve selective enhancement of small object texture features through adaptive gating mechanisms. A Micro-OmniPyramid feature pyramid is constructed by integrating space-to-depth (SPD) convolution sparse encoding and cross-stage enhanced spectral kernel (CESK) modules, establishing lossless transmission pathways for small object features. An enhanced feature correlation (EFC) module is introduced to optimize cross-scale feature fusion through grouped attention and multi-level reconstruction strategies. An inner-modified penalty distance IoU (IMIoU) loss function is designed to enhance boundary regression sensitivity for small objects. Experimental results on the VisDrone2019 dataset demonstrate that MGEF-DETR achieves improvements of 3.9% and 3.1% in mAP@0.5 and mAP@0.5:0.95 metrics respectively compared to the baseline RT-DETR, while reducing parameters by 13.6%. Validation on TinyPerson and CODrone datasets further confirms the generalization capability of the algorithm, indicating significant improvements in both accuracy and efficiency for small object detection in aerial scenarios while maintaining lightweight characteristics.