Abstract:The coordinate rotation digital computer (CORDIC) algorithm has the feature of simple hardware implementation. It has been widely applied in various fields, such as electronic measurement, radar detection, and image processing. High-radix and parallel CORDIC effectively reduce CORDIC iteration latency to meet the real-time requirements. However, both approaches introduce a variable scaling factor, increasing the computational complexity and results in additional resource consumption. In comparison, scaling-free (SF) CORDIC algorithm eliminates the variable scaling factor. However, the most existing SF-CORDIC algorithms still require improvements in resource consumption and latency performance, while maintaining acceptable accuracy and supporting a wide convergence range. Therefore, this article proposes a hybrid CORDIC algorithm and its computing architecture design, which combines the look-up table (LUT) and parallel SF iterations. A method is proposed to determine the angle boundary between the LUT and parallel SF iterations using approximate angles with fewer non-zero terms, which extends the convergence range supported by the parallel SF iterations; furthermore, a method is proposed to divide the parallel SF iterations into two-parallel and four-parallel SF iterations to balance the computational complexity of each iteration stage, ensuring the overall design performance. Specifically, the LUT is used to rapidly fold a large-angle input located in the range (-π/2,π/2) into the convergence range supported by the two-parallel SF iterations. Then, the two-parallel SF iterations are performed to bring the residual angle into the convergence range supported by the four-parallel SF iterations. Finally, the four-parallel SF iterations are performed and the CORDIC iteration results are output. The proposed design is implemented in Verilog hardware description language and validated on field-programmable gate array (FPGA). Experimental results demonstrate that, compared with the existing designs,the proposed design reduces resource consumption by 23.1% and latency by 22.1%, while maintaining comparable accuracy and convergence range.