这是一个最优化问题. 我想将一个包含6个5位元素的位域复制到U8缓冲区,简单的操作如下:
void Expand(u32 x, u8 b[6]) {
b[0] = (x >> 0) & 31;
b[1] = (x >> 5) & 31;
b[2] = (x >> 10) & 31;
b[3] = (x >> 15) & 31;
b[4] = (x >> 20) & 31;
b[5] = (x >> 25) & 31;
}
这是集合生成的百面旗帜,/O2 /Ot /Gr
面、GCC和叮当会给出大致相同的东西.
@Expand@8 PROC
mov al, cl
and al, 31
mov BYTE PTR [edx], al
mov eax, ecx
shr eax, 5
and al, 31
mov BYTE PTR [edx+1], al
mov eax, ecx
shr eax, 10
and al, 31
mov BYTE PTR [edx+2], al
mov eax, ecx
shr eax, 15
and al, 31
mov BYTE PTR [edx+3], al
mov eax, ecx
shr eax, 20
shr ecx, 25
and al, 31
and cl, 31
mov BYTE PTR [edx+4], al
mov BYTE PTR [edx+5], cl
ret 0
@Expand@8 ENDP
But I just don't like it; I know it does exactly what it should be doing, it just seems to me that it could be a lot more efficient.
To me it looks like a 30-bit number that needs to be scaled up to a 48-bit number while inserting zeroes.
11111 11111 11111 11111 11111 11111
↓
00011111 00011111 00011111 00011111 00011111 00011111
我一直在try 移位、或运算,只在最后用U64(0x1f1f1f1f1f1f
)进行AND运算,但我的优化努力仍然不成功.我相信这should是可行的,在不到10个说明,任何指导将不胜感激.
EDIT个
我又抓挠了一下脑袋,到目前为止,这是我能想到的最好的:
void Expand(u32 x, u8 b[6]) {
memset(b, 31, 6);
b[0] &= x;
b[1] &= x >>= 5;
b[2] &= x >>= 5;
b[3] &= x >>= 5;
b[4] &= x >>= 5;
b[5] &= x >>= 5;
}
编译为:
@Expand@8 PROC
mov eax, 0x1f1f1f1f
mov DWORD PTR [edx], eax
mov WORD PTR [edx+4], ax
and BYTE PTR [edx], cl
shr ecx, 5
and BYTE PTR [edx+1], cl
shr ecx, 5
and BYTE PTR [edx+2], cl
shr ecx, 5
and BYTE PTR [edx+3], cl
shr ecx, 5
and BYTE PTR [edx+4], cl
shr ecx, 5
and BYTE PTR [edx+5], cl
ret 0
@Expand@8 ENDP