A common problem is how to clear or fill a range of memory in a short time. If there is much memory to clear the following way is very usefull:;
; fill / clear memory fast
;
; input:
; memend = where the memory region ends
;
; ouput:
; none (memory cleared / filled)
;
; destroys:
; d0-d7/a0-a6
;
move.l sp,TempSp ; 20 cycles
lea MemEnd,sp ; 4 cycles
moveq #0,d0 ; 4 cycles
moveq #0,d1 ; 4 cycles
moveq #0,d2 ; 4 cycles
moveq #0,d3 ; 4 cycles
moveq #0,d4 ; 4 cycles
moveq #0,d5 ; 4 cycles
moveq #0,d6 ; 4 cycles
moveq #0,d7 ; 4 cycles
move.l d0,a0 ; 4 cycles
move.l d0,a1 ; 4 cycles
move.l d0,a2 ; 4 cycles
move.l d0,a3 ; 4 cycles
move.l d0,a4 ; 4 cycles
move.l d0,a5 ; 4 cycles
move.l d0,a6 ; 4 cycles => setup time: 16*4+20 = 84
; after this, one instruction can clear 60 bytes of memory (15*4):
movem.l d0-d7/a0-a6,-(sp)
The last instruction takes: 8+8*n cycles
, here: 8+8*15 = 128 cycles
. The naive move.l d0,-(sp)
would use 12 * n
cycles, here: 12*15 = 180 cycles
. The presented gem will be a win when more than 84+128*blocks < 4+180*blocks
. This happes when block
is slightly larger than 1, which means that this solution is preferred in almost every case.
Do not forget the following row when you are finished copying
move.l TempSp,sp ; 4 cycles - restore stack pointer
It will restore the stack pointer to it's previous state.
There is however one problem with the solution. It is not possible to make it into a loop without sacrificing a register. It would be possible to use memory, but it would be faster if the following solution was used (which does sacrifice a register):;
; fill / clear memory fast
;
; input:
; memtomove = number of 4*56-bytes block to fill
; memend = where the memory region ends
;
; output:
; none (memory cleared / filled)
;
; destorys:
; d0-d7/a0-a6
;
move.l sp,TempSp ; 20 cycles - save stack pointer
lea MemEnd,sp ; 4 cycles
moveq #0,d0 ; 4 cycles
moveq #0,d1 ; 4 cycles
moveq #0,d2 ; 4 cycles
moveq #0,d3 ; 4 cycles
moveq #0,d4 ; 4 cycles
moveq #0,d5 ; 4 cycles
moveq #0,d6 ; 4 cycles
move.l d0,a0 ; 4 cycles
move.l d0,a1 ; 4 cycles
move.l d0,a2 ; 4 cycles
move.l d0,a3 ; 4 cycles
move.l d0,a4 ; 4 cycles
move.l d0,a5 ; 4 cycles
move.l d0,a6 ; 4 cycles
move.l #memtomove,d7 ; 4 cycles => setup time: 16*4+20 = 84
.localloop
movem.l d0-d6/a0-a6,-(sp) ; move 56 bytes
movem.l d0-d6/a0-a6,-(sp) ; move 56 bytes
movem.l d0-d6/a0-a6,-(sp) ; move 56 bytes
movem.l d0-d6/a0-a6,-(sp) ; move 56 bytes => 224 bytes
dbf d7,.localloop
move.l TempSp,sp ; 4 cycles - restore stack pointer
Which should run in 4*(8+8*14)+12 = 492 cycles
per 224 bytes
, which is about 2 cycles/byte
(not including setup time). Unrolling the loop further will not gain any noticable speed increase. The last row is not to be forgotten since it restores the state of the stack pointer.
Making the loop fill it with a constant value is simple, just replace moveq #0,d0
with something more appropriate, for example move.l #$FF00FF00,d0
.
Note: The timings are for a MC68000 CPU and may be incorrect.