The Motorola 68K family consists of a wide range of members from the micro-coded MC68000 to the super-scalar, hard-wired MC68060. This text will discuss the integer performance of the 68K family and ways to optimize both for the individual processors as well as produce code that runs well on any 68K processor.
In general, optimizations can take place on three different levels:
- On the assembly level for code that is written in assembly language.
- On the compiler level if the application is written in a high level language (i.e. C).
- On the user level by changing algorithms.
We will discuss what can be done to optimize 68K code concerning level 1.
The 68K family includes the following members:
MC68000First generation 68K processor.
16 bit internal/external data paths.
16 Mb address space.
MC680088 bit external data path.
1-4 MB address space.
MC68010Similar to MC68000, but with restartable instructions.
Can be used in a virtual memory environment.
Loop mode.
MC68EC000Low-power MC68000.
8 or 16 bit external data bus.
MC6802032 bit virtual memory microprocessor.
32 bit internal/external data paths.
4 GB address space.
Can be used with floating point coprocessor.
New instructions added including bitfield instructions.
New addressing modes added.
256 bytes instruction cache.
MC68030Similar to MC68020 but slightly faster.
256 bytes data cache added. On-chip MMU.
MC68EC030Low-power MC68030. No MMU.
CPU32Basically a 68020 core but without cache, bitfield instructions and memory indirect addressing modes.
16 bit external data path.
No coprocessor.
CPU32+ Same as CPU32 but with 32 bit external data path.
MC68040Third generation 32 bit processor.
4K instruction cache.
4K data cache.
On chip floating point processor.
On chip MMU.
Most instructions take one cycle.
MC68EC040Low-power MC68040.
No MMU.
No FPU.
MC68060Super scalar implementation of the 68K architecture.
Can issue up to two instructions per cycle.
8K instruction cache.
8K data cache.
MC68EC060Similar to MC68060.
No FPU.
No MMU.
The following table summarizes the characteristics of the different members in the 68000 family:
Processor | Cache | Register Add | Memory Add | Mul | Index | Branch | UAcc | HWFP |
68000 | None | 6 | 18 | 40 | 18 | 10 / 6 | no | no |
68020 | 256 / 0 | 2 | 6 | 28 | 9 | 6 / 4 | yes | 68881/2 |
68030 | 256 / 256 | 2 | 5 | 28 | 8 | 6 / 4 | yes | 68881/2 |
CPU32 | None | 2 | 9 | 16 | 12 | 8 / 4 | no | no |
68040 | 4 K/4 K | 1 | 1 | 16 | 3 | 2 / 3 | yes | yes |
68060 | 8 K/8 K | 1 | 1 | 2 | 1 | 0 / 1 | yes | yes |
Register Add | Register to register 32 bit add (ADD.L D0,D1 ) |
Memory Add | Absolute long address to register add (ADD.L _MEM,D1 ) |
Mul | 16 x 16 multiplication (max. time) (MULU.W D0,D1 ) |
Index | Indexed addressing mode (MOVE.L 2(A0,D0),D1 ) |
Branch | Byte conditional branch taken / not taken (BNE.B Label ) |
UAcc | Unaligned access allowed (MOVE.L 0xFFFF0001,D1 ) |
HWFP | Hardware floating point support |
When optimizing for the 68K family, we divide the members into the following groups:
68000
Optimize for the following processors: MC68000/10,MC68008/MC68EC000
68020
Optimize for the following processors: MC68020/30,MC68EC020/30,CPU32/CPU32+
68040
Optimize for the following processors: MC68040/MC68EC040
68060
Optimize for the following processors: MC68060/MC68EC060
680xx
Optimize so the code will execute reasonably on any 68K processor.
Since optimizations for one 68K processor can make another one execute slower, it is fairly important to know the individual instruction timings for each member. Here are some examples of different ways of doing operations and the preferred method for each 68K processor:
- Operations with long immediate values between -128 and 127:
A | add.l #20,d1 | B | moveq.l #20,d0 |
| | | add.l d0,d1 |
- Byte/word operations that could be replaced with long operations:
A | 68000/20/40/xx | B | 68020/40/60 |
- Keep memory operands in registers:
A | add.l _var,d1 | B | move.l _var,d0 |
| add.l _var,d2 | | add.l d0,d1 |
| | | add.l d0,d2 |
A | 68040 (as long as total number of instructions are less) | B | 68000/20/60/xx |
- Reschedule operations using address registers:
A | add.l d0,d1 | B | move.l (a1),a0 |
| move.l (a1),a0 | | add.l d0,d1 |
| move.l (a0),d2 | | move.l (a0),d2 |
- Replace constant multiplications with adds/subs/shifts:
A | mulu.w #254,d1 | B | move.l d1,d0 |
| | | lsl.l #8,d1 |
| | | lsl.l #1,d0 |
| | | sub.l d0,d1 |
- Operations using indexing modes:
A | add.l (a0,d7),d1 | B | add.l d7,a0 |
| add.l (a0,d7),d2 | | add.l (a0),d1 |
|
| | add.l (a0),d2 |
- Saving/restoring registers:
A | movem.l d4-d7,-(a7) | B | move.l d7,-(a7) |
| | | move.l d6,-(a7) |
| | | move.l d5,-(a7) |
| | | move.l d4,-(a7) |
A | 68000/20/60/xx | B | 68040 (if time critical) |
Summary of characteristics for each processor:
68000:
- Lacks 68020 instruction extensions:
No extb.l instruction
No 32 bit multiply
No scaled indexing mode
No 32 bit PC relative branches
- Use short instructions
- Keep values in registers
- No scheduling necessary
- Code optimized for 68020 or 68060 runs great
68020:
- Use short instructions
- Keep values in registers
- Almost no scheduling necessary
- Code optimized for the 68060 runs great
68040:
- Use as few instructions as possible (even if they are longer)
- Values can be kept in memory
- Avoid pipe-line stalls for some effective addresses
- Avoid subtracts to address registers
68060:
- Use short instructions
- Keep values in registers
- Schedule instructions for superscalar execution
- Inline short functions
680xx:
- If the code is to be executed on a 68000 processor, the 68000 instruction subset must be used.
- Avoid bitfield instructions.
- Align all data.
- Schedule the instructions for an 68060.
- Avoid complex addressing modes (memory indirect).
Note: This text was a bit longer, covering all 3 levels of optimisation, but since this is a MC68k specific text the rest has been cut. The text has also been slightly modified. The original author is unknown.