Linux 为什么 printf 仍然可以使用低于 XMM 寄存器中 FP args 数量的 RAX

发布于04月24日

I am following the book "Beginning x64 Assembly Programming", in Linux 64 system. I am using NASM and gcc.
In the chapter about floating point operations the book specifies the below code 对于 adding 2 float numbers. In the book, and other online sources, I have read that register RAX specifies the number of XMM registers to be used, according to calling conventions.
The code in the book goes as follows:

extern printf
section .data
num1        dq  9.0
num2        dq  73.0
fmt     db  "The numbers are %f and %f",10,0
f_sum       db  "%f + %f = %f",10,0

section .text
global main
main:
    push rbp
    mov rbp, rsp
printn:
    movsd xmm0, [num1]
    movsd xmm1, [num2]
    mov rdi, fmt
    mov rax, 2      ;对于 printf rax specifies amount of xmm registers
    call printf

sum:
    movsd xmm2, [num1]
    addsd xmm2, [num2]
printsum:
    movsd xmm0, [num1]
    movsd xmm1, [num2]
    mov rdi, f_sum
    mov rax, 3
    call printf

That works as expected.
Then, be对于e the last printf call, I tried changing

mov rax, 3

对于

mov rax, 1

然后我重新组装并运行程序.

我期待着一些不同的无意义输出，但我很惊讶输出完全相同.printf正确输出3个浮点值:

The numbers are 9.000000 and 73.000000
9.000000 + 73.000000 = 82.000000

I suppose there is some kind of override when printf is expecting the use of several XMM registers, and as long as RAX is not 0, it will use consecutive XMM registers. I have searched 对于 an explanation in calling conventions and NASM manual, but didn't find one.

这是为什么？

# GCC4.5.3 -O3 -fPIC to compile like glibc would add_them: movzx eax, al sub rsp, 48 # reserve stack space, needed either way lea rdx, 0[0+rax*4] # each movaps is 4 bytes long lea rax, .L2[rip] # code pointer to after the last movaps lea rsi, -136[rsp] # used later by va_arg. test/jz version does the same, but after the movaps stores sub rax, rdx lea rdx, 39[rsp] # used later by va_arg, test/jz version also does an LEA like this jmp rax # AL=0 case jumps to L2 movaps XMMWORD PTR -15[rdx], xmm7 # using RDX as a base makes each movaps 4 bytes long, 与. 5 with RSP movaps XMMWORD PTR -31[rdx], xmm6 movaps XMMWORD PTR -47[rdx], xmm5 movaps XMMWORD PTR -63[rdx], xmm4 movaps XMMWORD PTR -79[rdx], xmm3 movaps XMMWORD PTR -95[rdx], xmm2 movaps XMMWORD PTR -111[rdx], xmm1 movaps XMMWORD PTR -127[rdx], xmm0 # xmm0 last, will be ready for store-forwading last .L2: lea rax, 56[rsp] # first stack arg (if any), I think ## rest of the function

# GCC11.2 -O3 -fPIC add_them: sub rsp, 48 test al, al je .L15 # only one test&branch macro-fused uop movaps XMMWORD PTR -88[rsp], xmm0 # xmm0 first movaps XMMWORD PTR -72[rsp], xmm1 movaps XMMWORD PTR -56[rsp], xmm2 movaps XMMWORD PTR -40[rsp], xmm3 movaps XMMWORD PTR -24[rsp], xmm4 movaps XMMWORD PTR -8[rsp], xmm5 movaps XMMWORD PTR 8[rsp], xmm6 movaps XMMWORD PTR 24[rsp], xmm7 .L15: lea rax, [rsp+56] # first stack arg (if any), I think lea rsi, -136[rsp] # used by va_arg. done after the movaps stores instead of before. ... lea rdx, 56[rsp] # used by va_arg. With a different offset than older GCC, but used somewhat similarly. Redundant with the LEA into RAX; silly compiler.

Linux 为什么 printf 仍然可以使用低于 XMM 寄存器中 FP args 数量的 RAX

推荐答案

Footnote 1: AL, not RAX, is what matters

Footnote 2: efficiency of the two strategies

Linux相关问答推荐

是否可以在Bash正则表达式中排除？

列出Linux上特定目录和子目录名称的bash命令

Linux 的 __fastfail 替代方案？

为什么在已连接的设备上调用 btmgmt conn-info 返回：状态 0x02(未连接)

为什么库中不调用全局变量的构造函数？

linux shell获取多文件交集

如何恢复已停止的进程？

查找更高版本的文件

使用具有特定值的字段对文件进行排序

在没有root访问权限的情况下安装zsh？

Java 8 上的 SQL Server JDBC 错误：驱动程序无法使用安全套接字层 (SSL) 加密建立与 SQL Server 的安全连接

判断条件是否为假

Linux 应用程序分析

diff 命令仅获取不同行的数量

如何将文件从 Vagrant 机器复制到 localhost

低功耗蓝牙：在 linux 中监听通知/指示

如何在python中找到文件或目录的所有者

grep 递归查找 Linux 上的特定文件类型

在Linux中使用空格设置环境变量

如何拖尾除第一行以外的所有行