Abstract:Five approximate multiplier design methods are proposed to address the issues of incomplete models, high on-chip resource consumption, and limited performance of Field Programmable Gate Array (FPGA) in accelerating convolutional neural networks, image processing algorithms, and other approximate computing fields. Based on an 8-bit×8-bit unsigned carry chain approximation multiplier, two LUT based 8-bit×8-bit unsigned approximation multipliers are proposed for different real-world scenarios with a lookup table (LUT) to optimizing the critical path simplification structure by compressed recursive invocation methodology and sub-product recombination computation strategy. This method can save up to 60% of area, about 60.76% of power consumption, and about 25.4% of critical path delay (CPD) compared to similar multipliers within an acceptable range of accuracy. At the same time, in order to meet the needs of more complex scenarios, two 16 bit×16 bit unsigned approximate multipliers with LUT are proposed by doubling the number of multiplies digits. Compared with similar multipliers, the method can save up to about 41.2% of the area, about 77% of the power consumption, about 35.4% of the CPD, which can compensate for the loss caused by the decrease in accuracy. In addition, based on the signed number calculation module, proposed a 16 bit × 16 bit signed approximate multiplier with LUT is proposed to replace Xilinx’s (now ADM) Multiplier IP core, which is deployed in the convolutional neural network convolutional layer with handwritten number recognition function and tested using handwritten number images in the MNIST dataset. It saves about 32.48% of area, about 41.21% of power consumption, and about 24.28% of CPD, at the cost of a 3.4% decrease in accuracy. It is shown that these multipliers can effectively meet the requirements of FPGA accelerated convolutional neural networks and achieve the optimal balance between accuracy and resource overhead.