C++ 为什么下面的循环展开会导致错误的结果

发布于04月28日

我目前正在try 优化我为一个24x24矩阵三角化程序编写的一些MIPS汇编程序.我目前的目标是利用延迟分支和手动循环展开来try 减少循环.Note: I am using 32-bit single precision for all the matrix arithmetic.

算法的一部分涉及我试图展开的以下循环(N始终为24)

...
    float inv = 1/A[k][k]
    for (j = k + 1; j < N; j++) {
        /* divide by pivot element */
        A[k][j] = A[k][j] * inv;
    }
...

我想要

...
    float inv = 1/A[k][k]
    for (j = k + 1; j < N; j +=2) {
        /* divide by pivot element */
        A[k][j]     = A[k][j]     * inv;
        A[k][j + 1] = A[k][j + 1] * inv;
    }
...

但它产生了错误的结果，我不知道为什么.有趣的是，使用循环展开的版本正确地生成了矩阵的第一行，但其余的不正确.没有循环展开的版本可以正确地对矩阵进行三角化.

以下是我的try .

...

# No loop unrolling
loop_2:
    move    $a3, $t2          # column number b = j (getelem A[k][j])
    jal     getelem           # Addr of A[k][j] in $v0 and val in $f0
    addiu   $t2, $t2, 1       ## j += 2
    mul.s   $f0, $f0, $f2     # Perform A[k][j] * inv
    bltu    $t2, 24, loop_2   # if j < N, jump to loop_2
    swc1    $f0, 0($v0)       ## Perform A[k][j] := A[k][j] * inv

    # The matrix triangulates without problem with this original code.

...

...

# One loop unrolling
loop_2:
    move    $a3, $t2         # column number b = j (getelem A[k][j])
    jal     getelem          # Addr of A[k][j] in $v0 and val in $f0
    addiu   $t2, $t2, 2      ## j += 2
    lwc1    $f1, 4($v0)      # $f1 <- A[k][j + 1]
    mul.s   $f0, $f0, $f2    # Perform A[k][j] * inv
    mul.s   $f1, $f1, $f2    # Perform A[k][j+1] * inv
    swc1    $f0, 0($v0)      # Perform A[k][j] := A[k][j] * inv
    bltu    $t2, 24, loop_2  # if j < N, jump to loop_2
    swc1    $f1, 4($v0)      ## Perform A[k][j + 1] := A[k][j + 1] * inv

    # The first row in the resulting matrix is correct, but the remaining ones not when using this once unrolled loop code.

...

# some checking outside the loop, maybe with a bxx to the end of it. looptop: # do{ lwc1 $f2, 0($t0) lwc1 $f3, 4($t0) addiu $t0, $t0, 4*2 # p+=2 advance by 8 bytes, 2 floats ... swc1 something, 0($t0) swc1 something, 4($t0) bne $t0, $t1 # }while(p!=endp) # maybe another condition to check if you should run one last iteration.

C++ 为什么下面的循环展开会导致错误的结果

推荐答案

efficiency of your MIPS asm

C++相关问答推荐

为什么PLT表中没有push指令？

了解一些CLIPS原语数据类型

C中是否有语法可以直接初始化一个常量文本常量数组的 struct 成员？

如何将字符串argv[]赋给C中的整型数组？

减法运算结果的平方的最快方法？

当我更改编译优化时，相同的C代码以不同的方式运行

struct -未知大小

理解C版宏(看起来像未声明的变量？)

For循环中的变量行为不符合预期.[C17]

如何在C中使数组变量的值为常量？

GCC错误，共享内存未定义引用？

赋值两侧的后置增量，字符指针

C中的空指针是什么(_N)？

使用mmap为N整数分配内存

分支预测和UB(未定义的行为)

无法理解 fgets 输出

std：：malloc/calloc/realloc/free 与纯 C 的 malloc/calloc/realloc/free 有什么不同

将数组返回到链表

在 C 中的 scanf() 格式说明符中使用宏获取字符串长度

strlen 可以是[[未排序]]吗？