我正在try 理解Java中字符串的字符编码.我在Windows 10上工作,默认的字符编码是windows-1251
.8位编码字符.因此对于1个符号,它必须是1个字节.因此,当我为具有6个符号的字符串调用getBytes()
时,我期望得到一个6个字节的array.但是下面的代码片段返回12,而不是6.
"Привет".getBytes("windows-1251").length // returns 12
起初,我认为字符的第一个字节必须为零.但与该字符相关的两个字节都具有非零值.有没有人能解释一下,我错过了什么?
下面是我如何测试它的一个示例
import java.nio.charset.Charset;
import java.io.*;
import java.util.HexFormat;
public class Foo
{
public static void main(String[] args) throws Exception
{
System.out.println(Charset.defaultCharset().displayName());
String s = "Привет";
System.out.println("bytes count in windows-1251: " + s.getBytes("windows-1251").length);
printBytes(s.getBytes("windows-1251"), "windows-1251");
}
public static void printBytes(byte[] array, String name) {
for (int k = 0; k < array.length; k++) {
System.out.println(name + "[" + k + "] = " + "0x" +
byteToHex(array[k]));
}
}
static public String byteToHex(byte b) {
// Returns hex String representation of byte b
char hexDigit[] = {
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
};
char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
return new String(array);
}
}
结果是:
windows-1251
bytes count in windows-1251: 12
windows-1251[0] = 0xd0
windows-1251[1] = 0x9f
windows-1251[2] = 0xd1
windows-1251[3] = 0x80
windows-1251[4] = 0xd0
windows-1251[5] = 0xb8
windows-1251[6] = 0xd0
windows-1251[7] = 0xb2
windows-1251[8] = 0xd0
windows-1251[9] = 0xb5
windows-1251[10] = 0xd1
windows-1251[11] = 0x82
但我期望的是:
windows-1251
bytes count in windows-1251: 6
windows-1251[0] = 0xcf
windows-1251[1] = 0xf0
windows-1251[2] = 0xe8
windows-1251[3] = 0xe2
windows-1251[4] = 0xe5
windows-1251[5] = 0xf2