DOI: 10.3390/electronics15132850 ISSN: 2079-9292

An Optimized Floating-Point Unit Set for FPGA-Based DSP: Improving Area, Energy, and Throughput Trade-Offs

Fernando Flores, Juan Portela Queimaño, Jesús Manuel Costa Pazo, María Dolores Valdés-Peña, Camilo Quintáns Graña, José Manuel Villapún Sánchez

Floating-point arithmetic provides the dynamic range that fixed-point lacks for digital signal processing (DSP) algorithms with widely varying operand magnitudes. This work presents a parameterizable floating-point unit set for field programmable gate array (FPGA)-based DSP. The set consists of five units: adder/subtractor, multiplier, multiply–accumulate (MAC), fixed-to-float and float-to-fixed converters. Two architectural choices distinguish the proposed format from IEEE-754: configurable exponent and mantissa widths during synthesis and a 0.f significand encoding that reduces corner-case logic at the cost of one additional mantissa bit. The format is therefore IEEE-754-inspired rather than fully compliant: special values (NaN, ±∞) are not implemented, and overflow and underflow are handled through saturation to predefined constants. The design is implemented in standard VHDL-2008 without relying on high-level synthesis (HLS) tools or vendor-specific primitives, ensuring portability across different FPGA families and application-specific integrated circuits (ASICs). The multiplier and MAC are evaluated in two configurations: inferring DSP blocks or look-up table (LUT)-only, both close timing at 300MHz on Artix-7 and Kintex Ultrascale devices. The proposed blocks outperform vendor IP Cores and recent academic designs in terms of area-throughput-power (ATP), achieving improvements from 10% to 108%, except for the adder/subtractor, which does not outperform two optimized Xilinx IP cores (HS-R and HS-P) and is therefore included for design coherence rather than as a strict resource improvement over all vendor IPs. All these blocks meet the theoretical error bound, and a representative 200-tap finite impulse response (FIR) filter built from them closes timing at 300MHz with 76% LUT utilization.

More from our Archive