On comp.lang.asm.x86, Jon Kirwan asked for compiler output for the "greatest common divisor" function from the following C implementation:
unsigned int gcd (unsigned int a, unsigned int b) { if (a == 0 && b == 0) b = 1; else if (b == 0) b = a; else if (a != 0) while (a != b) if (a < b) b -= a; else a -= b; return b; }Here is the assembly imlementation (optimised):
; ; gcd - greatest common divisor ; by Paul Hsieh ; ; input: ; eax = a ; ebx = b ; ; output: ; eax = gcd ; ; destroys: ; edx ; flags ; gcd: neg eax je L3 L1: neg eax xchg eax,edx L2: sub eax,edx jg L2 jne L1 L3: add eax,edx jne L4 inc eax L4: retAlthough xchg is not a particularly fast Intel instruction, it does help make it very tight, and probably not more than a cycle off of optimal for performance. The main loop of the routine exists entirely within an instruction prefetch buffer (16 bytes.)