Is there a simple way to add 4 unsigned bytes at a time with 255 saturation? This is one way:;
; add 4 bytes with 255 saturation
;
; input:
;
; output:
;
; destroys:
;
add al,bl
sbb cl,cl
add ah,bh
sbb ch,ch
or al,cl
ror ebx,16
or ah,ch
ror eax,16 ; 5 cycles
Repeat the code above to handle the two upper bytes. To be any faster, we must handle more than one byte simultaneously. If we have data in the first and third byte (00ff00ff)
of EAX
and EBX
, then we can do something like this: add eax,ebx
mov ebx,eax
and ebx,0100100h ; Get the carry bits from both additions
mov ecx,ebx
shr ecx,8 ; Carry bit (0 or 1) in lowest position
sub ebx,ecx ; EBX is 00ff00ff, 00000000, 000000ff or 00ff0000
or eax,ebx
This method uses 7 instructions to handle two bytes, so it will be faster to use the naive code above. This gem can not run on any 16 bit machine, as of the name 4 bytes at a time means a 32 bits and a 16 bit machine does not have that, you can do it but not the way shown here.