daiFPU - Configurable Floating-Point Unit
for LEON and NOEL-V

Overview

The daiteq FPU (daiFPU) is an IEEE Std. 754 (2019) compliant floating-point unit, designed for the LEON and NOEL-V processors. The daiFPU supports binary64, binary32, binary16 formats and their combinations, including full hardware support for subnormal numbers. The unit consists of a floating-point datapath and a floating-point controller. The datapath executes all floating-point arithmetic operations and format conversions. The controller manages data exchange between the integer pipeline and the daiFPU. The controller also executes floating-point comparisons.

Supported precisions.
IEEE Std 754	Abbreviation	Precision [b]	Partitioning
binary64	DP	53	(1,11,52)
binary32	SP	24	(1,8,23)
binary16	HP	11	(1,5,10)
N/A	PSP	24	((1,8,23),(1,8,23))
N/A	PHP	11	((1,5,10),(1,5,10))

The daiFPU is targeted to providing flexibility for the FPGA and ASIC technology used in satellite navigation, deep learning and audio/video processing applications. The key advantage is the ability to increase the actual functional density of the silicon used on board of satellites in the context of the actual on-board computations. This is done through allowing the user to parameterize the FPU at the synthesis time in a way to ensure the correct function of the application while not using more resources than necessary. Classical FPUs used for example with the LEON processors are based on fixed data bus widths of 32 or 64 bits, often in situations where a reduced precision would be sufficient (e.g. 16 bits), also with operations that may not be used in their application. With the daiFPU the user can select seven major configurations (shown in the table below) at the synthesis time that support individual floating-point formats, their combinations, or packed floating-point formats. For each major configuration the user can specify whether floating-point division and square root should be supported.

FPU configurations.
Implementation	DP	SP	HP	PSP	PHP
Two-precision configurations
DAIFPU-DUAL-DPSP	Y	Y
DAIFPU-DUAL-SPHP		Y	Y
One-precision configurations
DAIFPU-DP	Y
DAIFPU-SP		Y
DAIFPU-HP			Y
Packed-word configurations
DAIFPU-PSP		Y		Y
DAIFPU-PHP			Y		Y

Packed operations are supported in some daiFPU configurations. They are defined for pairs of floating-point values stored in a single register (for two half-precision values stored in one single-precision floating-point register), or in a register pair of two consecutive registers (for two single-precision values stored in a pair of even-odd single-precision registers). Besides common SIMD processing on pairs of values new floating-point instructions have been implemented that support implementation of complex floating arithmetic for the packed formats.

For packed word operations the result is computed as the selected operation performed independently on the upper sub-words and lower sub-words. Exceptions and flags are computed as logical OR of the exceptions and flags generated for the upper and lower word.

Validation

Validation of the daiFPU has been performed in these steps:

Validation of individual FPU modules and operations in self-checking stand-alone testbenches. Test vectors were generated using the TestFloat tool that has been developed and distributed by John Hauser.

Validation of the FPU integration with the integer pipeline using a simple C program that applies a limited number of TestFloat vectors on the FPU inputs and compares the result with a reference result stored in the TestFloat vectors.

Validation of the LEON2 / FPU integration using the paranoia program originally developed by Prof. Kahan.

Validation of correct floating-point results computed in LEON2, LEON3 and NOEL-V with daiFPU by comparing them to results of a desktop execution of an identical C program.

Availability

The daiFPU IP core is provided in the form of a synthesizable VHDL code or FPGA netlist. The IP core is available either separately or bundled together with the LEON2-FT processor or the LEON3 and NOEL-V processors.

For the bundled options a separate license has to be obtained from the European Space Agency for the LEON2-FT processor, or from Cobham Gaisler AB for the GRLIB / LEON3 or NOEL-V package.

The deliverables include:

VHDL-RTL code or gate-level netlist,

testing environment,

simulation scripts,

golden reference test vectors,

synthesis scripts,

user documentation.

The IP core is guaranteed against defects for ninety days from the date of purchase. Thirty days of technical support over email and phone is included. Additional support and maintenance options are available.

Hardware Compatibility

The daiFPU is compatible with the following processors:

LEON2 / LEON2-FT

LEON3

NOEL-V

Software Compatibility

When used with LEON and NOEL-V processors, the daiFPU is compatible with existing compilation toolchains in the configuration DAIFPU-DUAL-DPSP that supports the same floating-point operations as other common FPUs, e.g. Meiko or GRFPU.

For other daiFPU configurations, that is those that introduce new floating-point data types and/or operations, SPARCv8 llvm compiler and binutils with daiteq extensions are required to generate binary files with the new floating-point opcodes.

Implentation Results

Indicative implementation results are provided for the daiFPU when implemented with the LEON2 processor in Xilinx Virtex7. For the LEON3 and NOEL-V processors and other FPGA families the results are similar.

daiFPU, resources used.
Flavour	Slices	Slice regs	LUTs	LUTRAM	DSP48E1
daifpu-dual-dpsp
divsqrt	3592	3402	9447	385	15
divonly	2832	2920	8120	362	15
none	2612	2588	6741	279	15
daifpu-dual-sphp
divsqrt	2011	2228	5197	155	2
divonly	1570	2022	4383	147	2
none	1509	1621	3735	132	2
daifpu-dp
divsqrt	2587	2581	6181	325	15
divonly	2090	2259	5258	295	15
none	1447	1921	4215	229	15
daifpu-sp
divsqrt	1244	1540	3261	157	2
divonly	1152	1394	2810	106	2
none	771	1195	2354	106	2
daifpu-hp
divsqrt	685	955	1824	73	1
divonly	687	899	1534	62	1
none	547	748	1327	57	1
daifpu-psp
divsqrt	2859	2954	6641	280	4
divonly	2561	2801	5701	226	4
none	1626	2172	4615	208	4
daifpu-php
divsqrt	1621	1834	3625	147	2
divonly	1186	1732	2961	139	2
none	1013	1440	2553	128	2

Floating-Point Performance

daiFPU performance for Whetstone and Linpack compared to other alternative LEON and NOEL-V FPUs.
Benchmark	Unit	LEON2-FT		LEON3			NOEL (RV64)
.	.	AT697 / Meiko	DAIFPU-DUAL-DPSP	GRFPU-lite	DAIFPU-DUAL-DPSP	GRFPU	nanofpunv	DAIFPU-DUAL-DPSP	GRFPUnv
whetstone-dp	kWIPS/MHz	261.68	298.25	241.49	309.05	429.10	141.44	299.27	539.82
whetstone-sp	kWIPS/MHz	445.71	451.13	391.58	461.46	620.45	187.67	312.65	539.08
linpack-dp-rolled	kFLOPS/MHz	49.25	54.55	37.46	57.68	49.45	25.60	48.80	96.63
linpack-sp-rolled	kFLOPS/MHz	83.49	71.3	55.30	67.44	69.94	31.60	51.10	101.43
linpack-dp-unrolled	kFLOPS/MHz	49.51	59.4	38.24	63.56	51.14	26.90	53.77	117.14
linpack-sp-unrolled	kFLOPS/MHz	84.05	78.2	59.20	76.44	76.87	33.62	56.59	125.32

daiFPU - Configurable Floating-Point Unitfor LEON and NOEL-V