C++ GCC 如何优化循环内递增的未使用变量

发布于01月13日

我写了一个简单的C程序:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

我想看看GCC编译器是如何优化这个循环的(显然加2000000000次应该是"加2000000000一次").因此:

a.out上的gcc test.c和time给出:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s

$ gcc -O2 test.c，然后time ona.out`给出:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s

然后我用gcc -S把它们拆开.第一点似乎很清楚:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3相加，L2比较-4(%rbp)和1999999999，如果是i < 2000000000，则循环到L3.

Now the optimized one:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

我完全不明白那里发生了什么！我对组装知之甚少，但我希望

addl $2000000000, -8(%rbp)

我甚至试着用gcc -c -g -Wa,-a,-ad -O2 test.c来查看C代码及其转换成的程序集，但结果与前一个不一样.

Can someone briefly explain:

gcc -S -O2输出.
如果循环像我预期的那样进行了优化(一次求和而不是多次求和)？

#include <stdio.h> int main(void) { int i; int count = 0; for(i = 0; i < 2000000000; i++){ count = count + 1; } // Print result to prevent Dead Code Elimination printf("%d\n", count); }

; 57 : int main(){ $LN8: sub rsp, 40 ; 00000028H ; 58 : ; 59 : ; 60 : int i; int count = 0; ; 61 : for(i = 0; i < 2000000000; i++){ ; 62 : count = count + 1; ; 63 : } ; 64 : ; 65 : // Print result to prevent Dead Code Elimination ; 66 : printf("%d\n",count); lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@ mov edx, 2000000000 ; 77359400H call QWORD PTR __imp_printf ; 67 : ; 68 : ; 69 : ; 70 : ; 71 : return 0; xor eax, eax ; 72 : } add rsp, 40 ; 00000028H ret 0

.file "test.c" .section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "%d\n" .text .p2align 4,,15 .globl main .type main, @function main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $2000000000, 8(%esp) movl $.LC0, 4(%esp) movl $1, (%esp) call __printf_chk leave ret .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits

C++ GCC 如何优化循环内递增的未使用变量

推荐答案

C++相关问答推荐

在使用GTK 4 Columnview列表模型时，如何为多列添加排序函数.C编码，Linux/GNOME环境

在C中使用动态内存分配找到最小的负数

标准的C17标准是用括号将参数包装在函数声明中吗

我应该如何解决我自己为iOS编译的xmlsec1库的问题？转换Ctx.first在xmlSecTransformCtxPrepare()之后为空

在没有动态内存分配的情况下，用C语言最快地将各种数组复制到单个较大的数组中

ESP32在vTaskDelay上崩溃

Clang：如何强制运行时错误的崩溃/异常由于-fsanitize=undefined

理解C版宏(看起来像未声明的变量？)

如何在STM8项目中导入STM8S/A标准外设库(ST VisualDeveloper)？

为什么此共享库没有预期的依赖项？

无法在OpenGL上绘制三角形

For循环中的变量行为不符合预期.[C17]

如何在C中使数组变量的值为常量？

为什么realloc函数在此代码中修改变量？

C语言中MPI发送接收字符串时出现的分段错误

如何在不读取整个字符串的情况下删除UTF8字符串的尾随空格以提高性能？

使用C++中的字符串初始化 struct 时，从‘char*’初始化‘char’使指针变为整数，而不进行强制转换

GETS()在C++中重复它前面的行

程序如何解释变量中的值

使用复合文字数组初始化的指针数组