When you start programming 32-bit Intel Assembler, you may often find that you need more registers than available. On the other hand, the registers are rarely fully utilised, mainly because of the trouble to access the upperpart in few cycles.
To explain this neat gem, we need a some example code. I take a linear texture map-adder as an example:
add ax,a_fraction_value_1 adc bh,an_integer_value_1 add cx,a_fraction_value_2 adc bx,an_integer_value_2This piece of code takes two 16-bit fraction values, and add them together with its corresponding integer value. Please not the order, and the size of the two integer variables. The reason for
adc bx,an_integer_value_2lies in the fact that if
BL
overflows, BH
must be incremented. In 16-bit code this is as fast as you can get, with memory variables. The first and most obvious optimization lies in using registers att all places, if one got any free. The other thing to use might be self modifying code.
But as every one can see, IF I have 32-bit registers, I only use the low part of them. Now one can start to fiddle around with the spare registers. First move CX
to the upper part of EBX
. Since CX
is zero on begin, all you need to do is zero EBX
. Next thing is to make [an_integer_value_1]
into something more usefull:
an_integer_value1:=256*an_integer_value_1+65536*a_fraction_value2Or declare them like this:
a_bogus_value_1 db 0 an_integer_value1 db ? a_fraction_value2 dw ?Now, the second add can be removed, and the result will be like this:
add ax,a_fraction_value_1 adc ebx,a_bogus_value_1 adc bx,an_integer_value_2This does NEARLY the same thing as the piece above. The problem is that
EBX
is used, if that addition overflows or underflows BX
, the top of EBX
will be changed. The underflow problem is easy, just use two sets of loops, one to handle positive steps and one for the negative ones, but this is out of the scope for this article.