面向FPGA的CNN软硬协同加速方法研究综述
DOI:
CSTR:
作者:
作者单位:

1.哈尔滨工业大学电子与信息工程学院哈尔滨150001;2.哈尔滨工业大学郑州高等研究院郑州450000

作者简介:

通讯作者:

中图分类号:

TN791

基金项目:


Survey of hardware-software co-design acceleration for CNNs on FPGAs
Author:
Affiliation:

1.School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China; 2.Zhengzhou Advanced Research Institute, Harbin Institute of Technology, Zhengzhou 450000, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    卷积神经网络(convolutional neural networks, CNN)凭借卓越的视觉特征提取能力,在深度学习领域获得了广泛应用。然而,随着网络复杂度提升及边缘计算场景对计算性能、能源效率的严苛需求,硬件加速面临巨大挑战。在低批量推理、需确定性低延迟以及硬件难以更新的特定应用场景中,通用中央处理器(central processing unit, CPU)和图形处理器(graphics processing unit, GPU)往往难以满足需求。在此背景下,基于现场可编程门阵列(field programmable gate arrays, FPGA)的CPU-FPGA异构计算平台,凭借其定制计算数据流、可重构性,以及确定性低延迟等优势,为实现边缘端低批量、低延迟的CNN推理任务提供了有效路径。从软硬件协同设计视角,系统性地探讨面向CPU-FPGA平台部署的CNN轻量化与计算加速方法:首先介绍降低网络复杂度的网络轻量化技术;其次介绍计算单元、计算阵列以及数据访存的层次化设计硬件优化策略;然后介绍基于高层次综合(high-level synthesis, HLS)的敏捷开发流程如何赋能加速器系统的快速迭代与验证。最后,从复杂芯片前端验证、确定性低延迟计算以及算法敏捷迭代3个维度,展望FPGA在人工智能新时代的技术定位、发展前景与挑战。

    Abstract:

    Convolutional neural networks (CNN) have gained widespread application in the field of deep learning due to their exceptional visual feature extraction capabilities. However, with increasing model complexity and the stringent demands of edge computing scenarios for computational performance and energy efficiency, hardware acceleration faces significant challenges. In specific application scenarios requiring low-batch inference, deterministic low latency, and difficulty in updating hardware, general-purpose central processing units (CPU) and graphics processing units (GPU) often fall short of the requirements. In this context, CPU-FPGA heterogeneous computing platforms based on field programmable gate arrays (FPGA) offer an effective pathway for implementing low-batch, low-latency CNN inference tasks at the edge. Their advantages lie in customized computational dataflow, reconfigurability, and deterministic low latency. This paper systematically explores CNN compression and computational acceleration methods for deployment on CPU-FPGA platforms from a hardware-software co-design perspective. First, we introduce network compression techniques for reducing model complexity. Second, we present hierarchical hardware optimization strategies for computational units, computational arrays, and data memory access. Third, we describe how an agile development flow based on high-level synthesis (HLS) enables rapid iteration and verification of accelerator systems. Finally, from the three dimensions of complex chip front-end verification, deterministic low-latency computing, and agile algorithm iteration, we explore the technological positioning, development prospects, and challenges of FPGAs in the new era of Artificial Intelligence.

    参考文献
    相似文献
    引证文献
引用本文

郭楚亮,娄越,岳豪帅,季拓,程谞哲,彭宇.面向FPGA的CNN软硬协同加速方法研究综述[J].电子测量与仪器学报,2026,40(4):1-22

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-06-12
  • 出版日期:
文章二维码
×
《电子测量与仪器学报》
关于防范虚假编辑部邮件的郑重公告