63 bit CRC | Assembler/80386 |
The following is a typical cyclic redundancy check computing algorithm. It is basically a deterministic iterating step of a function that takes 63 bits of input scrambles the bits in some uniformly pseudo-random way, then returns those 63 bits. Such algorithms can be used for high performance (but only moderately secure) encryption, pseudo-random numbers, testing for data integrity or a hash function.
typedef unsigned long UINT;
void TestC(UINT &b0, UINT &b1)
{
UINT L1, L2, L3;
UINT H1, H2;
// x = (x>>31)^(x>>30)^(x<<32) (mod 2^63)
L1 = (b1<<1)|(b0>>31);
L2 = (b1<<2)|(b0>>30);
H1 = (b1>>31);
H2 = (b1>>30);
b1 = H1^H2^b0 & 0x7FFFFFFF;
b0 = L1^L2;
}
I have implemented this by using 63 of 64 bits in a pair of 32 bit call by reference variables:;
; 63 bit CRC
;
; by Paul Hsieh and Pisen Chiang
;
; input:
; edx:eax
;
; output:
;
; destroys:
; esi, cl,
;
mov esi,edx ; 1 U
xor cl,cl ; 1 V
shld edx,eax,1 ; 5 NP +4c*
shld esi,eax,2 ; 9 NP +4c
adc cl,0 ;10 U
xor edx,esi ;10 V
xchg eax,edx ;12 NP +2c
xor dl,cl ;13 U
Here the cycle count is a close one on the Pentium MMX (*in fact pre-MMX CPUs brings the two to a tie since the first shld takes an extra clock to decode.) This is mostly because of the surprisingly slow shld instructions. On Pentium Pro/II, K6 and 6x86MX processors, shld is significantly improved in performance making this a more compelling solution based on size and speed. The following is optimized purely for performance on Pentium based CPUs:;
; 63 bit CRC
;
; by hand for Pentiums
;
; input:
; edx:eax
;
; output:
;
; destroys:
; esi, ebx,
;
mov esi,edx ; 1 U
xor cl,cl ; 1 V
shl esi,2 ; 2 U
mov ebx,eax ; 2 V
adc cl,0 ; 3 U
lea edx,[edx*2] ; 3 V
shr ebx,30 ; 4 U
xor edx,esi ; 4 V
xor edx,ebx ; 5 U
xor al,cl ; 5 V
shr ebx,1 ; 6 U -
xchg eax,edx ; 8 NP +2c
xor eax,ebx ; 9 U