Fast strlen() | Assembler/80386 |
Fast implementation of strlen()
Recently, someone wrote to me with the comment that strlen() is a very commonly called function, and as such was interested in possible performance improvements for it. At first, without thinking too hard about it, I didn't see how there was any opportunity to fundamentally improve the algorithm. I was right, but as far as low level algorithmic scrutiny is concerned, there is plenty of opportunity. Basically, the algorithm is byte scan based, and as such the typical thing that the C version will do wrong is miss the opportunity to reduce load redundancy.; ; fast strlen() ; ; input: ; eax = offset to string ; ; output: ; ecx = length ; ; destroys: ; ebx ; eflags ; lea ecx,[eax-1] l1: inc ecx test ecx,3 jz l2 cmp [byte ptr ecx],0 jne l1 jmp l6 l2: mov ebx,[ecx] ; U add ecx,4 ; V test bl,bl ; U jz l5 ; V test bh,bh ; U jz l4 ; V test ebx,0ff0000h ; U jz l3 ; V test ebx,0ff000000h ; U jnz l2 ; V +1brt inc ecx l3: inc ecx l4: inc ecx l5: sub ecx,4 l6: sub ecx,eaxHere, I've sacrificed size for performance, by essentially unrolling the loop 4 times. If the input strings are fairly long (which is when performance will matter) on a Pentium, the asm code will execute at a rate of 1.5 clocks per byte, while the C compiler takes 3 clocks per byte.