Replace MOVSX/MOVZXAssembler/Pentium

On the Pentium the byte extension:

        movzx   eax,[addr]
is slow. The recommended way is:
        xor     eax,eax
        mov     al,[addr]
this performs good on a PPro too. The word extension:
        movzx   eax,[addr]
can be replaced with the faster (only on Pentium) sequence:
        xor     eax,eax
        mov     ax,[addr]
or:
        xor     eax,eax
        mov     al,[addr]
        mov     ah,[addr+1]
in which case you better have enough instructions to pair with the operation so that you can hide all the inherent stalls. Another way:
        mov     eax,[addr]
        and     eax,0ffffh
This is a win if [addr] is dword aligned, and can be OK even without alignment, but make sure that the two extra bytes loaded won't cause an exception. Other fast sign/zero extensions:
        movsx   eax,[addr]
can be replaced with the faster sequence:
        mov     eax,[addr-2]
        sar     eax,16
if addr - 2 is divisible by 4.
For hand-written code I would suggest you bias the input values instead, so you can do:
        xor     eax,eax
        mov     al,[bytevar]    ; Biased by 128, so it is positive
        sub     eax,128
Some of these variants can actually be used on the 16 bit machines too - they can however only extend 8 to 16 bits.
Gem writers: Terje Mathisen
Vesa Karvonen
last updated: 1998-06-06