Aligned FillAssembler/8086

Back to the stoneage : filled vectors. The most dirty solution for filling a horizontal line (instead of a REP STOSB) is probably this:

;
; simple aligned fill
;
; input:
;   ax     = color, with ah=al
;   cx     = number of bytes to fill
;   es:edi = target
;
; output:
;   es:edi = plotted line
;
; destroys:
;   cx, edi
;   flags

        test    cl,1
        je      _one
        stosb
_one:
        test    cl,2
        je      _two
        stosw
_two:
        shr     cx,2
        rep     stosb
Generally this is a really time-wasting way. A doubleword written to the memory may take some extra cycles if it wasn't aligned on a dword boundary. Writing ONE doubleword MISALIGNED may take as much tome as writing TWO doublewords ALIGNED. So here follows a horizontal line filler which writes everything completely aligned without any conditional jumps:
;
; aligned fill
;
; input:
;   eax    = color, with all 4 bytes equal to color
;   cx     = number of bytes to fill
;   es:edi = target
;
; output:
;   es:edi = plotted line
;
; destroys:
;   bx, edi
;   flags
;

        mov     bx,cx           ; save CX

        xor     cx,cx           ; put 1 to CX if it
        test    bx,bx           ; wasn't 0, else leave
        setne   cl              ; it zero

        and     cx,di           ; leave CX 1 if DI is
        sub     bx,cx           ; odd, else clear it
                                ; and adjust BX
        rep     stosb           ; fill one byte if DI
                                ; was odd
        cmp     bx,2            ; put 2 to CX if we
        setnb   cl              ; need to fill two ore
        add     cx,cx           ; more bytes, else 0

        and     cx,di           ; clear CX if DI is on
        sub     bx,cx           ; dword boundary, else
                                ; leave it & adjust BX
        shr     cx,1            ; fill one word (if CX
        rep     stosw           ; isn't 0)

        mov     cx,bx           ; put the number of
        shr     cx,2            ; remaining bytes to
        rep     stosd           ; CX and fill dwords

        and     bx,3            ; fill the rest
        mov     cx,bx
        shr     cx,1
        rep     stosw
        adc     cx,cx
        rep     stosb
Is it really faster than a REP STOSB? Not allways. When a lot of bytes have to be filled - around 10 - it will be slower. And of course it can be even faster with conditional jumps. But it looks so nice without them.
Gem writer: Ervin Toth
last updated: 1998-06-07