On the Pentium the byte extension: movzx eax,[addr]
is slow. The recommended way is: xor eax,eax
mov al,[addr]
this performs good on a PPro too. The word extension: movzx eax,[addr]
can be replaced with the faster (only on Pentium) sequence: xor eax,eax
mov ax,[addr]
or: xor eax,eax
mov al,[addr]
mov ah,[addr+1]
in which case you better have enough instructions to pair with the operation so that you can hide all the inherent stalls. Another way: mov eax,[addr]
and eax,0ffffh
This is a win if [addr]
is dword aligned, and can be OK even without alignment, but make sure that the two extra bytes loaded won't cause an exception. Other fast sign/zero extensions: movsx eax,[addr]
can be replaced with the faster sequence: mov eax,[addr-2]
sar eax,16
if addr - 2 is divisible by 4.
For hand-written code I would suggest you bias the input values instead, so you can do: xor eax,eax
mov al,[bytevar] ; Biased by 128, so it is positive
sub eax,128
Some of these variants can actually be used on the 16 bit machines too - they can however only extend 8 to 16 bits.