Bit Block InversionAssembler/80386

What if you want a routine to invert an 8x8, 16x16, or 32x32 bit array. Inversion, as used here, is defined as exchanging the bit position and byte/word/dword offset for each bit (as opposed to flipping each bit, which is also called inversion):

     Bit #: 76543210              76543210

        x0  00010000b         y0  00000000b
        x1  00111000b         y1  01111000b
        x2  01101100b         y2  01111100b
        x3  11000110b   -->   y3  00010110b
        x4  11111110b         y4  00010011b
        x5  11000110b         y5  00010110b
        x6  11000110b         y6  01111100b
        x7  00000000b         y7  01111000b
The inversion consists of a 90-degree clock-wise rotation plus a flip around the X axis. The latter part can be disregarded, since that can be done by using negative instead of positive Y-increments.

I'd do the rotation part with one or more lookup tables. 4x4 bit chunks would need a single 64k 16-bit table (i.e. 128KB), which will fit easily inside the 256K of L2 cache which I assume you have.

Each 8x4 block would then be rotated independently and joined together, something like this:

        mov     al,[x1]         ; 00,01,02,03,04,05,06,07
        mov     bl,[x2]         ; 08,09,0a,0b,0c,0d,0e,0f
        mov     ah,[x3]         ; 10,11,12,13,14,15,16,17
        mov     bh,[x4]         ; 18,19,1a,1b,1c,1d,1e,1f
        mov     esi,eax
        and     eax,0ff00ffh
        shr     esi,4
        mov     edx,ebx
        shl     edx,4
        and     ebx,0ff00ff00h
        and     esi,0ff00ffh
        or      eax,ebx         ; 00,01,02,03,08,09,0a,0b,10,11,12,13,18,19,1a,1b
        and     edx,0ff00ff00h
        or      edx,esi
        mov     ax,rotate_4x4[eax*2]
        mov     dx,rotate_4x4[edx*2]

        mov     [temp1],eax
        mov     [temp2],edx
Using the same code twice and splitting/merging the two sets of results, it seems like it should take about 30 cycles for each 8x8 block.
With a 90MHz pentium, this equates to about 90/30*8 = 24MB/s, which is probably faster than the rest of the system could keep up.
Gem writer: Terje Mathisen
last updated: 1998-03-16