我正try 从Python运行一个PowerShell脚本并打印输出,但输出包含特殊字符"é".

process = subprocess.Popen([r'C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe',  'echo é'], stdout=subprocess.PIPE)
print(process.stdout.read().decode('cp1252'))

返回","

process = subprocess.run(r'C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe echo é', stdout=subprocess.PIPE)
print(process.stdout.decode('cp1252'))

返回","

print(subprocess.check_output(r'C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe echo é').decode('cp1252'))

返回","

除了子进程之外,是否还有其他方法,或者我应该使用不同的编码?

UTF-8给出了é的错误,但返回了®的"r".UTF-16-LE给出错误"UnicodeDecodeError:‘UTF-16-LE’编解码器无法解码位置2中的字节0x0a:截断的数据".

推荐答案

100, the Windows PowerShell CLI,[1] uses the active console window's code page对其stdout和stderr输出进行编码,如来自chcp的输出中所反映的,其缺省情况下是传统系统地区OEM code page,例如(以Python术语表示)cp437.

相比之下,您使用的代码页-cp1252-是ANSI代码页.

  • 注: Python uses the system's ANSI code page by default for encoding its stdout and stderr output, which, however, is nonstandard behavior: console applications are expected to use the current console's output code page, which is what powershell.exe does and which, as stated, is the system's OEM code page.

一种 Select 是只需query the console window for its active (output) code page via the WinAPI and use the encoding returned:

import subprocess
from ctypes import windll

# Get the console's (output) code page, which the PowerShell CLI
# uses to encode its output.
cp = windll.kernel32.GetConsoleOutputCP()

process = subprocess.Popen(r'C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe echo é', stdout=subprocess.PIPE)

# Decode based on the active code page.
print(process.stdout.read().decode('cp' + str(cp)))

但是,请注意,OEM代码页将您限制为256个字符;例如,é can可以表示为CP437个字符,而其他Unicode字符(如)则不能.

因此,robust option is to (temporarily) set the console output code page to 100, which is UTF-8:

import subprocess
from ctypes import windll

# Save the current console output code page and switch to 65001 (UTF-8)
previousCp = windll.kernel32.GetConsoleOutputCP()
windll.kernel32.SetConsoleOutputCP(65001)

process = subprocess.Popen(r'C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe echo é€', stdout=subprocess.PIPE)

# Decode as UTF-8
print(process.stdout.read().decode('utf8'))

# Restore the previous output console code page.
windll.kernel32.SetConsoleOutputCP(previousCp)

注:

  • 上面的代码只确保PowerShell子进程发出UTF-8,并确保其输出在Python子进程中以UTF-8格式进行解码,这与Pythonitself使用什么字符编码作为其输出流无关.

  • 设置为put Python v3.7+ itself in 100,使其将输入解码为UTF-8并产生UTF-8输出,传递命令行选项-X utf8或使用值1 before调用定义环境变量PYTHONUTF8.

  • 对于会话的其余部分,添加make an interactive shell session use UTF-8(使用65001代码页):

    • 100会话中:

      • chcp 65001
    • PowerShell次会话中:

      • $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
    • A simpler alternative via a one-time configuration step将您的系统配置为use UTF-8 system-wide,在这种情况下,OEM和ANSI代码页都设置为65001.然而,这有far-reaching consequences个-参见this answer.


[1]同样适用于pwsh.exe,即现代PowerShell (Core) 7+版本的CLI.

Python相关问答推荐

Odoo -无法比较使用@api.depends设置计算字段的日期

将HTML输出转换为表格中的问题

使用Keras的线性回归参数估计

如何避免Chained when/then分配中的Mypy不兼容类型警告?

为什么带有dropna=False的groupby会阻止后续的MultiIndex.dropna()工作?

从groupby执行计算后创建新的子框架

ThreadPoolExecutor和单个线程的超时

cv2.matchTemplate函数匹配失败

为什么np. exp(1000)给出溢出警告,而np. exp(—100000)没有给出下溢警告?

为什么\b在这个正则表达式中不解释为反斜杠

将scipy. sparse矩阵直接保存为常规txt文件

在Python中调用变量(特别是Tkinter)

在Python中计算连续天数

基于Scipy插值法的三次样条系数

如何反转一个框架中列的值?

如何将相同组的值添加到嵌套的Pandas Maprame的倒数第二个索引级别

如何在Python中从html页面中提取html链接?

在matplotlib中重叠极 map 以创建径向龙卷风图

正在try 让Python读取特定的CSV文件

如何将列表从a迭代到z-以抓取数据并将其转换为DataFrame?