Bit Transposition algorithmAssembler/8086

This gem shows how to transpose bits in a block. For a more occasional use, on single blocks, it's probably better to optimize for code size. What is the shortest sequence we can come up with, disregarding speed? Maybe something like:

;
; bit transposition algorithm
;
; input:
;   ds:si = input buffer
;   es:di = output buffer
;
; output:
;   none (si is transposed to di)
;
; destroys:
;   al, bx, cx, di
;   flags
;

	mov     cx,8
next_outer:
	mov     bx,-8
next_inner:
	rol     [si+bx+8],1
	adc     al,al
	inc     bx
	jnz     next_inner
	stosb
        loop    next_outer
This is 17 bytes, but it is probably easy to beat.
Gem writer: Terje Mathisen
last updated: 1998-03-16