Fast Float to IntAssembler/Pentium+FPU

Converting a floating point number to integer is normally done like this:
        fistp   [dword ptr temp]
        mov     eax,[temp]
An alternative method is:
DATASEG
ALIGN 8

temp    dq      ?
magic   dd      59c00000h       ; f.p. representation of 2^51 + 2^52

CODESEG

        fadd    [magic]
        fstp    [qword ptr temp]
        mov     eax,[dword ptr temp]
Adding the 'magic number' of 2^51 + 2^52 has the effect that any integer between -2^31 and +2^31 will be aligned in the lower 32 bits when storing as a double precision floating point number. The result is the same as you get with FISTP for all rounding methods except truncation towards zero. The result is different from FISTP if the control word specifies truncation or in case of overflow. You may need a WAIT instruction for compatibility with other processors.
This method is not faster than using FISTP, but it gives better scheduling opportunities because there is a 3 clock void between FADD and FSTP which may be filled with other instrucions. You may multiply or divide the number by a power of 2 in the same operation by doing the opposite to the magic number. You may also add a constant by adding it to the magic number, which then has to be double precision.

The second thing is the inability of the FP unit to convert float to int internally at any reasonable speed (FRNDINT takes 19 cycles apparently). In some situations you could use:


        fistp   [qword mem]     ; 6 (estimated clock cycle count)
        fild    [qword mem]     ; 7-9
However accuracy will be sacrificed since a 64-bit integer can not represent as high values as floating point numbers can. You can also use the "magic" trick here, but you would get even less accuracy.
If you want to use the "magic" trick here just add a similar "magic" number and then subtract it away. Because of insufficient precisions the floating point value will change. This method has two drawbacks: Another FRNDINT replacement:
magic   dd      59c00000h       ; f.p. representation of 2^51 + 2^52

        fadd    [magic]         ; 1-3
        fsub    [magic]         ; 4-6
The through-put of the above code is only 2 clocks. Unfortunately there are situations where the results will not be the same than with FRNDINT. The lowest bit of the result may be wasted.
Gem writers: Vesa Karvonen
Agner Fog
last updated: 1998-03-16