我浏览了strlen代码here,我想知道代码中使用的优化是否真的需要?例如,为什么像下面这样的工作不能同样好或更好?

unsigned long strlen(char s[]) {
    unsigned long i;
    for (i = 0; s[i] != '\0'; i++)
        continue;
    return i;
}

代码越简单,编译器就越容易优化吗?

链接后页面上strlen的代码如下所示:

/* Copyright (C) 1991, 1993, 1997, 2000, 2003 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Written by Torbjorn Granlund (tege@sics.se),
   with help from Dan Sahlin (dan@sics.se);
   commentary by Jim Blandy (jimb@ai.mit.edu).

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

#include <string.h>
#include <stdlib.h>

#undef strlen

/* Return the length of the null-terminated string STR.  Scan for
   the null terminator quickly by testing four bytes at a time.  */
size_t
strlen (str)
     const char *str;
{
  const char *char_ptr;
  const unsigned long int *longword_ptr;
  unsigned long int longword, magic_bits, himagic, lomagic;

  /* Handle the first few characters by reading one character at a time.
     Do this until CHAR_PTR is aligned on a longword boundary.  */
  for (char_ptr = str; ((unsigned long int) char_ptr
            & (sizeof (longword) - 1)) != 0;
       ++char_ptr)
    if (*char_ptr == '\0')
      return char_ptr - str;

  /* All these elucidatory comments refer to 4-byte longwords,
     but the theory applies equally well to 8-byte longwords.  */

  longword_ptr = (unsigned long int *) char_ptr;

  /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
     the "holes."  Note that there is a hole just to the left of
     each byte, with an extra at the end:

     bits:  01111110 11111110 11111110 11111111
     bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

     The 1-bits make sure that carries propagate to the next 0-bit.
     The 0-bits provide holes for carries to fall into.  */
  magic_bits = 0x7efefeffL;
  himagic = 0x80808080L;
  lomagic = 0x01010101L;
  if (sizeof (longword) > 4)
    {
      /* 64-bit version of the magic.  */
      /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
      magic_bits = ((0x7efefefeL << 16) << 16) | 0xfefefeffL;
      himagic = ((himagic << 16) << 16) | himagic;
      lomagic = ((lomagic << 16) << 16) | lomagic;
    }
  if (sizeof (longword) > 8)
    abort ();

  /* Instead of the traditional loop which tests each character,
     we will test a longword at a time.  The tricky part is testing
     if *any of the four* bytes in the longword in question are zero.  */
  for (;;)
    {
      /* We tentatively exit the loop if adding MAGIC_BITS to
     LONGWORD fails to change any of the hole bits of LONGWORD.

     1) Is this safe?  Will it catch all the zero bytes?
     Suppose there is a byte with all zeros.  Any carry bits
     propagating from its left will fall into the hole at its
     least significant bit and stop.  Since there will be no
     carry from its most significant bit, the LSB of the
     byte to the left will be unchanged, and the zero will be
     detected.

     2) Is this worthwhile?  Will it ignore everything except
     zero bytes?  Suppose every byte of LONGWORD has a bit set
     somewhere.  There will be a carry into bit 8.  If bit 8
     is set, this will carry into bit 16.  If bit 8 is clear,
     one of bits 9-15 must be set, so there will be a carry
     into bit 16.  Similarly, there will be a carry into bit
     24.  If one of bits 24-30 is set, there will be a carry
     into bit 31, so all of the hole bits will be changed.

     The one misfire occurs when bits 24-30 are clear and bit
     31 is set; in this case, the hole at bit 31 is not
     changed.  If we had access to the processor carry flag,
     we could close this loophole by putting the fourth hole
     at bit 32!

     So it ignores everything except 128's, when they're aligned
     properly.  */

      longword = *longword_ptr++;

      if (
#if 0
      /* Add MAGIC_BITS to LONGWORD.  */
      (((longword + magic_bits)

        /* Set those bits that were unchanged by the addition.  */
        ^ ~longword)

       /* Look at only the hole bits.  If any of the hole bits
          are unchanged, most likely one of the bytes was a
          zero.  */
       & ~magic_bits)
#else
      ((longword - lomagic) & himagic)
#endif
      != 0)
    {
      /* Which of the bytes was the zero?  If none of them were, it was
         a misfire; continue the search.  */

      const char *cp = (const char *) (longword_ptr - 1);

      if (cp[0] == 0)
        return cp - str;
      if (cp[1] == 0)
        return cp - str + 1;
      if (cp[2] == 0)
        return cp - str + 2;
      if (cp[3] == 0)
        return cp - str + 3;
      if (sizeof (longword) > 4)
        {
          if (cp[4] == 0)
        return cp - str + 4;
          if (cp[5] == 0)
        return cp - str + 5;
          if (cp[6] == 0)
        return cp - str + 6;
          if (cp[7] == 0)
        return cp - str + 7;
        }
    }
    }
}
libc_hidden_builtin_def (strlen)

为什么这个版本运行得很快?

它不是做了很多不必要的工作吗?

推荐答案

你需要don'tshould never编写这样的代码——尤其是如果你不是C编译器/标准库供应商的话.它是用于实现strlen的代码,带有一些非常可疑的速度攻击和假设(未使用断言进行测试或在 comments 中提及):

  • unsigned long是4或8字节
  • 字节是8位
  • 指针可以投射到unsigned long long,而不是uintptr_t
  • 只需判断2或3个最低阶位是否为零,就可以对齐指针
  • 一个人可以访问unsigned long个字符串
  • 可以读取数组的末尾,而不会产生任何不良影响.

更重要的是,一个好的编译器甚至可以取代按原样编写的代码

size_t stupid_strlen(const char s[]) {
    size_t i;
    for (i=0; s[i] != '\0'; i++)
        ;
    return i;
}

(请注意,它必须是与size_t兼容的类型)具有编译器内置strlen的内联版本,或者将代码矢量化;但编译器不太可能优化复杂的版本.


strlen功能由C11 7.24.6.3描述为:

Description

  1. strlen函数计算s指向的字符串的长度.

Returns

  1. strlen函数返回终止空字符之前的字符数.

现在,如果s所指向的字符串位于一个长度刚好足以包含字符串和终止NUL的字符数组中,那么如果我们通过空终止符访问字符串,例如在

char *str = "hello world";  // or
char array[] = "hello world";

因此,在完全可移植/符合标准的C中,实现这correctly的方式实际上是question中编写它的方式,除了一些琐碎的转换——你可以通过展开循环等来假装更快,但仍然需要一次完成one byte.

(正如 comments 者所指出的,当严格的可移植性是一种负担时,利用合理或已知的安全假设并不总是一件坏事.尤其是在一个特定的C实现的代码中.但在知道如何/何时可以改变规则之前,你必须了解规则.)


linked strlen实现首先逐个判断字节,直到指针指向unsigned long的自然4或8字节对齐边界.C标准表示,访问未正确对齐的指针有undefined behaviour个,因此必须这样做,才能让下一个脏把戏变得更脏.(实际上,在x86以外的一些CPU体系 struct 上,一个未对齐的字或双字负载将出现故障.C是一种可移植的汇编语言,但这段代码正以这种方式使用它).这也使得在内存保护在对齐块(例如4KB虚拟内存页)中工作的实现中,可以读取超过对象末尾的内容,而不存在出错的风险.

现在是糟糕的部分:代码breaks实现了promise ,一次读取4或8位字节(long int),并使用无符号加法的位技巧快速计算出这4或8个字节中是否有any个零字节——它使用了一个精心编制的数字,这将导致进位更改位掩码捕获的位.本质上,这将计算出掩码中的4或8个字节中是否有任何一个是零,而不是循环通过这些字节中的每个字节.最后,在末尾有一个循环,以确定which字节是第一个零(如果有的话),并返回结果.

最大的问题是,在sizeof (unsigned long)种情况中,有sizeof (unsigned long) - 1次它会读取超过字符串末尾的内容-只有当空字节在last个被访问的字节中(即,在小端为最高有效字节,在大端为最低有效字节)时,它not才会越界访问数组!


尽管用于在C标准库中实现strlen的代码是bad代码.它有几个实现定义和未定义的方面,不应使用anywhere而不是系统提供的strlen-我在这里将函数重命名为the_strlen,并添加了以下main:

int main(void) {
    char buf[12];
    printf("%zu\n", the_strlen(fgets(buf, 12, stdin)));
}

缓冲区的大小经过仔细调整,以便它能够准确地容纳hello world字符串和终止符.然而,在我的64位处理器上,unsigned long是8字节,因此对后一部分的访问将超过这个缓冲区.

如果我现在用-fsanitize=undefined-fsanitize=address编译并运行生成的程序,我会得到:

% ./a.out
hello world
=================================================================
==8355==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffffe63a3f8 at pc 0x55fbec46ab6c bp 0x7ffffe63a350 sp 0x7ffffe63a340
READ of size 8 at 0x7ffffe63a3f8 thread T0
    #0 0x55fbec46ab6b in the_strlen (.../a.out+0x1b6b)
    #1 0x55fbec46b139 in main (.../a.out+0x2139)
    #2 0x7f4f0848fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #3 0x55fbec46a949 in _start (.../a.out+0x1949)

Address 0x7ffffe63a3f8 is located in stack of thread T0 at offset 40 in frame
    #0 0x55fbec46b07c in main (.../a.out+0x207c)

  This frame has 1 object(s):
    [32, 44) 'buf' <== Memory access at offset 40 partially overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (.../a.out+0x1b6b) in the_strlen
Shadow bytes around the buggy address:
  0x10007fcbf420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007fcbf470: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00[04]
  0x10007fcbf480: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf4a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf4b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fcbf4c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==8355==ABORTING

也就是说,坏事发生了.

C++相关问答推荐

在32位处理器上优化53—32位模计算>

来自stdarg.h的c中的va_args无法正常工作<>

Tiva TM4C123GXL的I2C通信

C编译器是否遵循restrict的正式定义?

如何使用低级C++写出数值

为什么未初始化的 struct 的数组从另一个数组获取值?

For循环不会迭代所有字符串字符吗?(初学者问题)

不确定如何处理此编译错误

从不兼容的指针类型返回&&警告,但我看不出原因

从C中的函数返回静态字符串是不是一种糟糕的做法?

C++中PUTS函数的返回值

如何在C中定义指向函数的指针并将该指针赋给函数?

计算SIZE_MAX元素的长数组的大小

C中的空指针是什么(_N)?

分配给静态变量和动态变量的位置之间有区别吗?

未使用sem_open正确初始化信号量

既然我们在 if 中将 int 的值更改为 10,为什么在第二个 fork 后,子进程及其创建的子进程都会打印 33 ?

将数组中的所有元素初始化为 struct 中的相同值

可以从指针数组中的值初始化指针吗?

如何让 unlinkat(dir_fd, ".", AT_REMOVEDIR) 工作?