ARM64 ASM代码中有p256one个全局数据作为示例:
DATA p256one<>+0x00(SB)/8, $0x0000000000000001
DATA p256one<>+0x08(SB)/8, $0xffffffff00000000
DATA p256one<>+0x10(SB)/8, $0xffffffffffffffff
DATA p256one<>+0x18(SB)/8, $0x00000000fffffffe
GLOBL p256one<>(SB), 8, $32
我需要将p256one;lt;>;(SB)加载到V0&;V1寄存器中.目前我使用的方法如下:
LDP p256one<>+0x00(SB), (R0, R1)
LDP p256one<>+0x10(SB), (R2, R3)
VMOV R0, V0.D[0]
VMOV R1, V0.D[1]
VMOV R2, V1.D[0]
VMOV R3, V1.D[1]
这里总共使用了六条指令.我们知道可以加载内存数据,如下所示:
VLD1 (R0), [V0.B16, V1.B16]
But it seems we can't load global data with the same method.
So, is there a more efficient way to load global data into NEON registers in Go's Assembler code?