基于模态引导与自适应对比学习的无人机目标检测方法
DOI:
CSTR:
作者:
作者单位:

哈尔滨理工大学自动化学院哈尔滨150080

作者简介:

通讯作者:

中图分类号:

TH74

基金项目:

黑龙江省自然科学基金(YQ2024E047)、黑龙江省优秀青年教师基础研究支持计划(YQJH2024067)项目资助


UAV object detection method based on modality-guided selection and adaptive contrastive learning
Author:
Affiliation:

School of Automation, Harbin University of Science and Technology, Harbin 150080, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有多模态目标检测方法在模态选择、空间建模及跨模态一致性约束上的不足,尤其是在低光照条件、目标完全或部分遮挡、复杂背景干扰等具有挑战性的实际应用场景中出现检测精度显著下降的问题,创新性地提出了一种基于模态引导与自适应对比学习的无人机目标检测方法。首先,设计了模态引导选择模块,该模块通过构建全局语义感知的通道注意力机制动态评估不同模态的贡献,实现动态权重分配与特征融合,有效解决了传统固定权重融合策略在环境变化时出现的模态贡献度失衡问题。其次,提出了模态增强模块,该模块基于单分支的空间自注意力增强机制,在标准多头自注意力结构中引入局部-全局协同的相对位置偏置并结合归一化残差结构,增强对融合特征的空间结构感知能力,从而提升复杂背景和遮挡场景下的目标判别性能。最后,提出了检测感知自适应跨模态对比学习策略,该策略以检测框为监督单元,结合检测感知权重与模态自适应温度调节,对跨模态特征进行显式约束,强化语义一致性并提升模型鲁棒性。实验结果表明,该方法在Drone Vehicle与LLVIP数据集上实现目标检测精度mAP50分别为78.6%和98.3%,相比现有方法均有明显提升。在搭建的无人机平台上实现12.61 fps的实时推理性能,验证了方法的有效性和实用价值。

    Abstract:

    Existing multimodal object detection methods often suffer from limitations in modality selection, spatial modeling, and cross-modal consistency, particularly under challenging conditions such as low illumination, target occlusion, and complex backgrounds. To address these issues, this paper proposes a UAV object detection method based on modality-guided selection and adaptive contrastive learning. First, a modality-guided selection module is designed, which employs global semantic-aware channel attention to dynamically evaluate modal contributions and enable adaptive feature fusion, thereby effectively resolving the modality imbalance issue inherent in conventional fixed-weight fusion strategies. Second, a Modality Enhancement Module is introduced, incorporating locally-globally coordinated relative positional biases and normalized residual connections into a single-branch self-attention structure to enhance spatial perception in complex and occluded scenes. Finally, a detection-aware adaptive crossmodal contrastive learning strategy is proposed, utilizing detection boxes with modality-adaptive temperature scaling to explicitly align multimodal features and improve semantic consistency. Experimental results demonstrate that the proposed method achieves mAP50 scores of 78.6% on the Drone Vehicle dataset and 98.3% on the LLVIP dataset, outperforming existing approaches. Deployment on a real UAV platform achieves 12.61 fps, validating both the accuracy and practical utility of the framework.

    参考文献
    相似文献
    引证文献
引用本文

孙明晓,杨子祯,李传龙,栾添添,梁洪杰.基于模态引导与自适应对比学习的无人机目标检测方法[J].仪器仪表学报,2026,47(4):398-406

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-06-08
  • 出版日期:
文章二维码