Remove shift by one latencyAssembler/80486

On the 486, there is an anomaly in the instruction latencies. Note the following:

This is a well know issue. The solutions I have seen proposed as a solution to getting the faster encoding for the shift-by-1 case are: Here is a method that I think is even easier. This is based on the fact that all x86 processor since the 186 mask shift counts modulo 32. Note that the "imm8" can hold shift counts of 0 through 255. So, we can code a shift count of 33 to get an effective shift count of 1. To make this a little bit more readable to the casual code reader, who might not realize right away that we are really shifting by 1, me might do something like this:
        FASTSHIFT   EQU   32

        SHR     reg, 1+FASTSHIFT
        SHL     reg, 1+FASTSHIFT
        SAR     reg, 1+FASTSHIFT
Side note: SHL reg,1 should be replaced by the faster ADD reg,reg in all cases. However for rightshifts this gem is indeed usefull.
Gem writer: Norbert Juffa
last updated: 1998-03-16