daiFPU - Configurable Floating-Point Unit
for LEON and NOEL-V
Overview
The daiteq FPU (daiFPU) is an IEEE Std. 754 (2019) compliant floating-point unit, designed for the LEON and NOEL-V processors. The daiFPU supports binary64, binary32, binary16 formats and their combinations, including full hardware support for subnormal numbers. The unit consists of a floating-point datapath and a floating-point controller. The datapath executes all floating-point arithmetic operations and format conversions. The controller manages data exchange between the integer pipeline and the daiFPU. The controller also executes floating-point comparisons.
IEEE Std 754 | Abbreviation | Precision [b] | Partitioning |
---|---|---|---|
binary64 | DP | 53 | (1,11,52) |
binary32 | SP | 24 | (1,8,23) |
binary16 | HP | 11 | (1,5,10) |
N/A | PSP | 24 | ((1,8,23),(1,8,23)) |
N/A | PHP | 11 | ((1,5,10),(1,5,10)) |
The daiFPU is targeted to providing flexibility for the FPGA and ASIC technology used in satellite navigation, deep learning and audio/video processing applications. The key advantage is the ability to increase the actual functional density of the silicon used on board of satellites in the context of the actual on-board computations. This is done through allowing the user to parameterize the FPU at the synthesis time in a way to ensure the correct function of the application while not using more resources than necessary. Classical FPUs used for example with the LEON processors are based on fixed data bus widths of 32 or 64 bits, often in situations where a reduced precision would be sufficient (e.g. 16 bits), also with operations that may not be used in their application. With the daiFPU the user can select seven major configurations (shown in the table below) at the synthesis time that support individual floating-point formats, their combinations, or packed floating-point formats. For each major configuration the user can specify whether floating-point division and square root should be supported.
Implementation | DP | SP | HP | PSP | PHP |
---|---|---|---|---|---|
Two-precision configurations | |||||
DAIFPU-DUAL-DPSP | Y | Y | |||
DAIFPU-DUAL-SPHP | Y | Y | |||
One-precision configurations | |||||
DAIFPU-DP | Y | ||||
DAIFPU-SP | Y | ||||
DAIFPU-HP | Y | ||||
Packed-word configurations | |||||
DAIFPU-PSP | Y | Y | |||
DAIFPU-PHP | Y | Y |
Packed operations are supported in some daiFPU configurations. They are defined for pairs of floating-point values stored in a single register (for two half-precision values stored in one single-precision floating-point register), or in a register pair of two consecutive registers (for two single-precision values stored in a pair of even-odd single-precision registers). Besides common SIMD processing on pairs of values new floating-point instructions have been implemented that support implementation of complex floating arithmetic for the packed formats.
For packed word operations the result is computed as the selected operation performed independently on the upper sub-words and lower sub-words. Exceptions and flags are computed as logical OR of the exceptions and flags generated for the upper and lower word.
Validation
Validation of the daiFPU has been performed in these steps:
- Validation of individual FPU modules and operations in self-checking stand-alone testbenches. Test vectors were generated using the TestFloat tool that has been developed and distributed by John Hauser.
- Validation of the FPU integration with the integer pipeline using a simple C program that applies a limited number of TestFloat vectors on the FPU inputs and compares the result with a reference result stored in the TestFloat vectors.
- Validation of the LEON2 / FPU integration using the paranoia program originally developed by Prof. Kahan.
- Validation of correct floating-point results computed in LEON2, LEON3 and NOEL-V with daiFPU by comparing them to results of a desktop execution of an identical C program.
Availability
The daiFPU IP core is provided in the form of a synthesizable VHDL code or FPGA netlist. The IP core is available either separately or bundled together with the LEON2-FT processor or the LEON3 and NOEL-V processors.
For the bundled options a separate license has to be obtained from the European Space Agency for the LEON2-FT processor, or from Cobham Gaisler AB for the GRLIB / LEON3 or NOEL-V package.
The deliverables include:
- VHDL-RTL code or gate-level netlist,
- testing environment,
- simulation scripts,
- golden reference test vectors,
- synthesis scripts,
- user documentation.
The IP core is guaranteed against defects for ninety days from the date of purchase. Thirty days of technical support over email and phone is included. Additional support and maintenance options are available.
Hardware Compatibility
The daiFPU is compatible with the following processors:
- LEON2 / LEON2-FT
- LEON3
- NOEL-V
Software Compatibility
When used with LEON and NOEL-V processors, the daiFPU is compatible with existing compilation toolchains in the configuration DAIFPU-DUAL-DPSP that supports the same floating-point operations as other common FPUs, e.g. Meiko or GRFPU.
For other daiFPU configurations, that is those that introduce new floating-point data types and/or operations, SPARCv8 llvm compiler and binutils with daiteq extensions are required to generate binary files with the new floating-point opcodes.
Implentation Results
Indicative implementation results are provided for the daiFPU when implemented with the LEON2 processor in Xilinx Virtex7. For the LEON3 and NOEL-V processors and other FPGA families the results are similar.
Flavour |
Slices |
Slice regs |
LUTs |
LUTRAM |
DSP48E1 |
---|---|---|---|---|---|
daifpu-dual-dpsp |
|||||
divsqrt |
3592 |
3402 |
9447 |
385 |
15 |
divonly |
2832 |
2920 |
8120 |
362 |
15 |
none |
2612 |
2588 |
6741 |
279 |
15 |
daifpu-dual-sphp |
|||||
divsqrt |
2011 |
2228 |
5197 |
155 |
2 |
divonly |
1570 |
2022 |
4383 |
147 |
2 |
none |
1509 |
1621 |
3735 |
132 |
2 |
daifpu-dp |
|||||
divsqrt |
2587 |
2581 |
6181 |
325 |
15 |
divonly |
2090 |
2259 |
5258 |
295 |
15 |
none |
1447 |
1921 |
4215 |
229 |
15 |
daifpu-sp |
|||||
divsqrt |
1244 |
1540 |
3261 |
157 |
2 |
divonly |
1152 |
1394 |
2810 |
106 |
2 |
none |
771 |
1195 |
2354 |
106 |
2 |
daifpu-hp |
|||||
divsqrt |
685 |
955 |
1824 |
73 |
1 |
divonly |
687 |
899 |
1534 |
62 |
1 |
none |
547 |
748 |
1327 |
57 |
1 |
daifpu-psp |
|||||
divsqrt |
2859 |
2954 |
6641 |
280 |
4 |
divonly |
2561 |
2801 |
5701 |
226 |
4 |
none |
1626 |
2172 |
4615 |
208 |
4 |
daifpu-php |
|||||
divsqrt |
1621 |
1834 |
3625 |
147 |
2 |
divonly |
1186 |
1732 |
2961 |
139 |
2 |
none |
1013 |
1440 |
2553 |
128 |
2 |
Floating-Point Performance
Benchmark |
Unit |
LEON2-FT |
LEON3 |
NOEL (RV64) |
|||||
---|---|---|---|---|---|---|---|---|---|
. |
. |
AT697 / Meiko |
DAIFPU-DUAL-DPSP |
GRFPU-lite |
DAIFPU-DUAL-DPSP |
GRFPU |
nanofpunv |
DAIFPU-DUAL-DPSP |
GRFPUnv |
whetstone-dp |
kWIPS/MHz |
261.68 |
298.25 |
241.49 |
309.05 |
429.10 |
141.44 |
299.27 |
539.82 |
whetstone-sp |
kWIPS/MHz |
445.71 |
451.13 |
391.58 |
461.46 |
620.45 |
187.67 |
312.65 |
539.08 |
linpack-dp-rolled |
kFLOPS/MHz |
49.25 |
54.55 |
37.46 |
57.68 |
49.45 |
25.60 |
48.80 |
96.63 |
linpack-sp-rolled |
kFLOPS/MHz |
83.49 |
71.3 |
55.30 |
67.44 |
69.94 |
31.60 |
51.10 |
101.43 |
linpack-dp-unrolled |
kFLOPS/MHz |
49.51 |
59.4 |
38.24 |
63.56 |
51.14 |
26.90 |
53.77 |
117.14 |
linpack-sp-unrolled |
kFLOPS/MHz |
84.05 |
78.2 |
59.20 |
76.44 |
76.87 |
33.62 |
56.59 |
125.32 |