Python3.x Python 3：如何指定标准输入编码

发布于05月15日

在将代码从Python 2移植到Python 3时，我在从标准输入读取UTF-8文本时遇到了这个问题.在Python2中，这很好:

for line in sys.stdin:
    ...

但是Python 3期望ASCII从sys.stdin开始，如果输入中有非ASCII字符，我会得到错误:

UnicodeDecodeError:"ascii"编解码器无法解码字节..就位序号不在范围内(128)

对于常规文件，我会在打开文件时指定编码:

with open('filename', 'r', encoding='utf-8') as file:
    for line in file:
        ...

但是我如何为标准输入指定编码呢？其他SO帖子(如How to change the stdin encoding on python)建议使用

input_stream = codecs.getreader('utf-8')(sys.stdin)
for line in input_stream:
    ...

然而，这在Python3中不起作用.我仍然收到相同的错误消息.我正在使用Ubuntu12.04.2，我的语言环境设置为en_US.UTF-8.

推荐答案

Python3并不期望sys.stdin中包含not.它将在文本模式下打开stdin，并对使用的编码进行有根据的猜测.这个猜测可能会下降到ASCII，但这不是一个确定的数字.请参阅关于如何 Select 编解码器的sys.stdin documentation.

与以文本模式打开的其他文件对象一样，sys.stdin对象源自io.TextIOBase base class；它有一个.buffer属性，指向底层缓冲IO实例(该实例又有一个.raw属性).

将sys.stdin.buffer属性包装为新的io.TextIOWrapper() instance，以指定不同的编码:

import io
import sys

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')

或者，在运行python时将PYTHONIOENCODING environment variable设置为所需的编解码器.

从Python3.7开始，您还可以使用reconfigure the existing std* wrappers，前提是您在开始时(在读取任何数据之前)这样做:

# Python 3.7 and newer
sys.stdin.reconfigure(encoding='utf-8')