Clear / Fill memory fastAssembler/MC68000

A common problem is how to clear or fill a range of memory in a short time. If there is much memory to clear the following way is very usefull:

;
; fill / clear memory fast
;
; input:
;   memend = where the memory region ends
;
; ouput:
;   none (memory cleared / filled)
;
; destroys:
;   d0-d7/a0-a6
;   

        move.l  sp,TempSp       ; 20 cycles
        lea     MemEnd,sp       ;  4 cycles
        moveq   #0,d0           ;  4 cycles
        moveq   #0,d1           ;  4 cycles
        moveq   #0,d2           ;  4 cycles
        moveq   #0,d3           ;  4 cycles
        moveq   #0,d4           ;  4 cycles
        moveq   #0,d5           ;  4 cycles
        moveq   #0,d6           ;  4 cycles
        moveq   #0,d7           ;  4 cycles
        move.l  d0,a0           ;  4 cycles
        move.l  d0,a1           ;  4 cycles
        move.l  d0,a2           ;  4 cycles
        move.l  d0,a3           ;  4 cycles
        move.l  d0,a4           ;  4 cycles
        move.l  d0,a5           ;  4 cycles
        move.l  d0,a6           ;  4 cycles => setup time: 16*4+20 = 84

; after this, one instruction can clear 60 bytes of memory (15*4):

        movem.l d0-d7/a0-a6,-(sp)
The last instruction takes: 8+8*n cycles, here: 8+8*15 = 128 cycles. The naive move.l d0,-(sp) would use 12 * n cycles, here: 12*15 = 180 cycles. The presented gem will be a win when more than 84+128*blocks < 4+180*blocks. This happes when block is slightly larger than 1, which means that this solution is preferred in almost every case.
Do not forget the following row when you are finished copying

        move.l   TempSp,sp      ;  4 cycles - restore stack pointer
It will restore the stack pointer to it's previous state.

There is however one problem with the solution. It is not possible to make it into a loop without sacrificing a register. It would be possible to use memory, but it would be faster if the following solution was used (which does sacrifice a register):

;
; fill / clear memory fast
;
; input:
;   memtomove = number of 4*56-bytes block to fill
;   memend = where the memory region ends
;
; output:
;   none (memory cleared / filled)
;
; destorys:
;   d0-d7/a0-a6
;

        move.l  sp,TempSp       ; 20 cycles - save stack pointer
        lea     MemEnd,sp       ;  4 cycles
        moveq   #0,d0           ;  4 cycles
        moveq   #0,d1           ;  4 cycles
        moveq   #0,d2           ;  4 cycles
        moveq   #0,d3           ;  4 cycles
        moveq   #0,d4           ;  4 cycles
        moveq   #0,d5           ;  4 cycles
        moveq   #0,d6           ;  4 cycles
        move.l  d0,a0           ;  4 cycles
        move.l  d0,a1           ;  4 cycles
        move.l  d0,a2           ;  4 cycles
        move.l  d0,a3           ;  4 cycles
        move.l  d0,a4           ;  4 cycles
        move.l  d0,a5           ;  4 cycles
        move.l  d0,a6           ;  4 cycles 
        move.l  #memtomove,d7   ;  4 cycles => setup time: 16*4+20 = 84

.localloop
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes => 224 bytes
        dbf     d7,.localloop

        move.l   TempSp,sp      ;  4 cycles - restore stack pointer
Which should run in 4*(8+8*14)+12 = 492 cycles per 224 bytes, which is about 2 cycles/byte (not including setup time). Unrolling the loop further will not gain any noticable speed increase. The last row is not to be forgotten since it restores the state of the stack pointer.

Making the loop fill it with a constant value is simple, just replace moveq #0,d0 with something more appropriate, for example move.l #$FF00FF00,d0. Note: The timings are for a MC68000 CPU and may be incorrect.

Gem writer: John Eckerdal
last updated: 1998-03-16