I've written a small function with C-code and a short inline assembly statement.
Inside the inline assembly statement I need 2 "temporary" registers to load and compare some memory values.
To allow the compiler to choose "optimal temporary registers" I would like to avoid hard-coding those temp registers (and putting them into the clobber list). Instead I decided to create 2 local variables in the surrounding C-function just for this purpose. I used "=r" to add these local variables to the output operands specification of the inline asm statement and then used them for my load/compare purposes.
These local variables are not used elsewhere in the C-function and (maybe because of this fact) the compiler decided to assign the same register to the two related output operands which makes my code unusable (comparison is always true).

Is the compiler allowed to use overlapping registers for different output operands or is this a compiler bug (I tend to rate this as a bug)?
I only found information regarding early clobbers which prevent overlapping of register for inputs and outputs... but no statement for just output operands.

A workaround is to initialize my temporary variables and to use "+r" instead of "=r" for them in the output operand specification. But in this case the compiler emits initialization instructions which I would like to avoid.
Is there any clean way to let the compiler choose optimal registers that do not overlap each other just for "internal inline assembly usage"?

非常感谢你!

P.S.: I code for some "exotic" target using a "non-GNU" compiler that supports "GNU inline assembly".
P.P.S.: I also don't understand in the example below why the compiler doesn't generate code for "int eq=0;" (e.g. 'mov d2, 0'). Maybe I totally misunderstood the "=" constraint modifier?

下面共有useless and stupid个例子来说明(关注)问题:

int foo(const int *s1, const int *s2)
{
    int eq = 0;
#ifdef WORKAROUND
    int t1=0, t2=1;
#else
    int t1, t2;
#endif

    __asm__ volatile(
        "ld.w  %[t1], [%[s1]]   \n\t"
        "ld.w  %[t2], [%[s2]]   \n\t"
        "jne   %[t1], %[t2], 1f \n\t"
        "mov   %[eq], 1         \n\t" 
        "1:"
        : [eq] "=d" (eq),
          [s1] "+a" (s1), [s2] "+a" (s2),
#ifdef WORKAROUND
          [t1] "+d" (t1), [t2] "+d" (t2)
#else
          [t1] "=d" (t1), [t2] "=d" (t2)
#endif
    );

    return eq;
}

在创建的asm中,编译器对操作数"t1"和"t2"使用寄存器"d8":

foo:
    ; 'mov d2, 0' is missing
    ld.w  d8, [a4]  ; 'd8' allocated for 't1'
    ld.w  d8, [a5]  ; 'd8' allocated for 't2' too!
    jne   d8, d8, 1f 
    mov   d2, 1         
1:
    ret16

编译w/"-DWORKAROUND":

foo:
    ; 'mov d2, 0' is missing
    mov16 d9,1
    mov16 d8,0

    ld.w  d9, [a5]   
    jne   d8, d9, 1f 
    mov   d2, 1         
1:
    ret16

此机器的EABI:

  • 返回寄存器(非指针/指针):d2,a2
  • 非指针args:d4..D7
  • 指针参数:a4..a7

推荐答案

我认为这是你的编译器中的一个错误.

如果它说它支持"GNU内联汇编",那么人们会期望它遵循GCC,GCC的manual是最接近正式规范的东西.现在,GCC手册似乎没有明确说明"输出操作数不会相互共享寄存器",但正如o11c所提到的,他们确实建议将输出操作数用于暂存寄存器,如果它们可以共享寄存器,那就行不通了.

一个可能比您的解决方案更有效的方法是,在内联asm之后使用第二个"使用"两个输出的伪asm语句.希望这能让编译器相信它们可能是不同的值,因此需要单独的寄存器:

    int t1, t2;
    __asm__ volatile(" ... code ..."
          : [t1] "=d" (t1), [t2] "=d" (t2) : ...);
    __asm__ volatile("" // no code
          : : "r" (t1), "r" (t2));

幸运的是,这将避免为不必要的初始化等生成任何额外的代码.

另一种可能是对特定的暂存器进行硬编码,并将其声明为clobbered.它给寄存器分配器留下了较少的灵活性,但取决于周围的代码和编译器的智能程度,它可能不会有太大的区别.

C++相关问答推荐

为什么这个select()会阻止?

Mbed TLS:OAEP的就地en—/decryption似乎不起作用'

GCC不警告隐式指针到整数转换'

如何在C宏中确定Windows主目录?

括号中的堆栈实现错误问题

以前版本的tty_ldisc_ops.ioctl()是否也需要文件参数?

GCC预处理宏和#杂注GCC展开

在每种If-Else情况下执行语句的最佳方式

在移动数组元素时获得意外输出

关于scanf()和空格的问题

搜索使用int代替time_t的用法

C:如何将此代码转换为与数组一起使用?

共享目标代码似乎不能在Linux上的进程之间共享

强制GCC始终加载常量(即只读),即使启用了优化

将size_t分配给off_t会产生符号转换错误

atoi函数最大长-长误差的再创造

使用 GCC 将一个函数中初始化的 struct 体实例通过指针传递到 C 中的另一个函数会产生不同的结果

是什么阻止编译器优化手写的 memcmp()?

保存有符号整数结果的变量是否会溢出(后增量的副作用),并且此后从未在任何表达式中使用过它,是否会导致 UB?

clion.我无法理解 Clion 中发生的 scanf 错误