63 bit CRCAssembler/80386

The following is a typical cyclic redundancy check computing algorithm. It is basically a deterministic iterating step of a function that takes 63 bits of input scrambles the bits in some uniformly pseudo-random way, then returns those 63 bits. Such algorithms can be used for high performance (but only moderately secure) encryption, pseudo-random numbers, testing for data integrity or a hash function.

        typedef unsigned long UINT;

        void TestC(UINT &b0, UINT &b1)
        {
           UINT L1, L2, L3;
           UINT H1, H2;

           // x = (x>>31)^(x>>30)^(x<<32) (mod 2^63)

           L1 = (b1<<1)|(b0>>31);
           L2 = (b1<<2)|(b0>>30);
           H1 = (b1>>31);
           H2 = (b1>>30);
           b1 = H1^H2^b0 & 0x7FFFFFFF;
           b0 = L1^L2;

        }
I have implemented this by using 63 of 64 bits in a pair of 32 bit call by reference variables:
;
; 63 bit CRC
;
; by Paul Hsieh and Pisen Chiang
;
; input:
;   edx:eax
;
; output:
;
; destroys:
;   esi, cl, 
;

        mov     esi,edx         ; 1 U
        xor     cl,cl           ; 1   V
        shld    edx,eax,1       ; 5 NP  +4c*
        shld    esi,eax,2       ; 9 NP  +4c
        adc     cl,0            ;10 U
        xor     edx,esi         ;10   V
        xchg    eax,edx         ;12 NP  +2c
        xor     dl,cl           ;13 U
Here the cycle count is a close one on the Pentium MMX (*in fact pre-MMX CPUs brings the two to a tie since the first shld takes an extra clock to decode.) This is mostly because of the surprisingly slow shld instructions. On Pentium Pro/II, K6 and 6x86MX processors, shld is significantly improved in performance making this a more compelling solution based on size and speed. The following is optimized purely for performance on Pentium based CPUs:
;
; 63 bit CRC
;
; by hand for Pentiums
;
; input:
;   edx:eax
;
; output:
;
; destroys:
;   esi, ebx, 
;

        mov     esi,edx         ; 1 U
        xor     cl,cl           ; 1   V
        shl     esi,2           ; 2 U
        mov     ebx,eax         ; 2   V
        adc     cl,0            ; 3 U
        lea     edx,[edx*2]     ; 3   V
        shr     ebx,30          ; 4 U
        xor     edx,esi         ; 4   V
        xor     edx,ebx         ; 5 U
        xor     al,cl           ; 5   V
        shr     ebx,1           ; 6 U -
        xchg    eax,edx         ; 8 NP  +2c
        xor     eax,ebx         ; 9 U
Gem writer: Paul Hsieh
last updated: 1998-03-16