Find common divisorAssembler/8086

On comp.lang.asm.x86, Jon Kirwan asked for compiler output for the "greatest common divisor" function from the following C implementation:

        unsigned int gcd (unsigned int a, unsigned int b)
        {
           if (a == 0 && b == 0)
              b = 1;
           else if (b == 0)
              b = a;
           else if (a != 0)
              while (a != b)
                 if (a < b)
                    b -= a;
                 else
                    a -= b;

           return b;
        }
Here is the assembly imlementation (optimised):
;
; gcd - greatest common divisor
;       by Paul Hsieh
;
; input:
;   eax = a
;   ebx = b
;
; output:
;   eax = gcd
;
; destroys:
;   edx
;   flags
;

gcd:    neg     eax
        je      L3
L1:     neg     eax
        xchg    eax,edx
L2:     sub     eax,edx
        jg      L2
        jne     L1
L3:     add     eax,edx
        jne     L4
        inc     eax
L4:     ret
Although xchg is not a particularly fast Intel instruction, it does help make it very tight, and probably not more than a cycle off of optimal for performance. The main loop of the routine exists entirely within an instruction prefetch buffer (16 bytes.)
Much as I would like, it is pointless to try and describe the exact performance of the algorithms given above. They are performance limited almost entirely by branch misprediction penalties and branch taken clocks.
Note: The gem may run on 8086 only if the 32-bit registers are replaced with 16-bit registers.
Paul Hsieh
last updated: 1998-03-16