Back to the stoneage : filled vectors. The most dirty solution for filling a horizontal line (instead of a REP STOSB
) is probably this:;
; simple aligned fill
;
; input:
; ax = color, with ah=al
; cx = number of bytes to fill
; es:edi = target
;
; output:
; es:edi = plotted line
;
; destroys:
; cx, edi
; flags
test cl,1
je _one
stosb
_one:
test cl,2
je _two
stosw
_two:
shr cx,2
rep stosb
Generally this is a really time-wasting way. A doubleword written to the memory may take some extra cycles if it wasn't aligned on a dword boundary. Writing ONE doubleword MISALIGNED may take as much tome as writing TWO doublewords ALIGNED. So here follows a horizontal line filler which writes everything completely aligned without any conditional jumps:;
; aligned fill
;
; input:
; eax = color, with all 4 bytes equal to color
; cx = number of bytes to fill
; es:edi = target
;
; output:
; es:edi = plotted line
;
; destroys:
; bx, edi
; flags
;
mov bx,cx ; save CX
xor cx,cx ; put 1 to CX if it
test bx,bx ; wasn't 0, else leave
setne cl ; it zero
and cx,di ; leave CX 1 if DI is
sub bx,cx ; odd, else clear it
; and adjust BX
rep stosb ; fill one byte if DI
; was odd
cmp bx,2 ; put 2 to CX if we
setnb cl ; need to fill two ore
add cx,cx ; more bytes, else 0
and cx,di ; clear CX if DI is on
sub bx,cx ; dword boundary, else
; leave it & adjust BX
shr cx,1 ; fill one word (if CX
rep stosw ; isn't 0)
mov cx,bx ; put the number of
shr cx,2 ; remaining bytes to
rep stosd ; CX and fill dwords
and bx,3 ; fill the rest
mov cx,bx
shr cx,1
rep stosw
adc cx,cx
rep stosb
Is it really faster than a REP STOSB
? Not allways. When a lot of bytes have to be filled - around 10 - it will be slower. And of course it can be even faster with conditional jumps. But it looks so nice without them.