Abstract:Convolutional neural networks (CNN) have gained widespread application in the field of deep learning due to their exceptional visual feature extraction capabilities. However, with increasing model complexity and the stringent demands of edge computing scenarios for computational performance and energy efficiency, hardware acceleration faces significant challenges. In specific application scenarios requiring low-batch inference, deterministic low latency, and difficulty in updating hardware, general-purpose central processing units (CPU) and graphics processing units (GPU) often fall short of the requirements. In this context, CPU-FPGA heterogeneous computing platforms based on field programmable gate arrays (FPGA) offer an effective pathway for implementing low-batch, low-latency CNN inference tasks at the edge. Their advantages lie in customized computational dataflow, reconfigurability, and deterministic low latency. This paper systematically explores CNN compression and computational acceleration methods for deployment on CPU-FPGA platforms from a hardware-software co-design perspective. First, we introduce network compression techniques for reducing model complexity. Second, we present hierarchical hardware optimization strategies for computational units, computational arrays, and data memory access. Third, we describe how an agile development flow based on high-level synthesis (HLS) enables rapid iteration and verification of accelerator systems. Finally, from the three dimensions of complex chip front-end verification, deterministic low-latency computing, and agile algorithm iteration, we explore the technological positioning, development prospects, and challenges of FPGAs in the new era of Artificial Intelligence.