我需要将通用字符名称(UCN)数据从数据库转换为UTF-8.看似微不足道,但我花了几个小时阅读有关Unicode、UTF-8、宽字符串等方面的内容.没有任何结果.
例如,需要将以下字符串从D\u00c3\u00bcsseldorf
转换为Düsseldorf
.
我try 了什么:
char str[] = "\u00c3\u00bc"; // corresponds to ü
size_t str_len = strlen(str);
for (i = 0; i < str_len; i++)
printf("%02hhx ", str[i]);
printf("- %zu - %s\n", str_len, str); // prints "c3 83 c2 bc - 4 - ü"
c3
is correct, but the next 3 bytes are unexpected.
The compiler only considers the first part of the UCN (\u00c3
).
wchar_t wcs[] = L"\u00c3\u00bc";
size_t wcs_len = wcslen(wcs);
for (i = 0; i < wcs_len; i++)
printf("%02hhx ", wcs[i]);
printf("- %zu - %ls\n", wcs_len, wcs); // prints "c3 bc - 2 - ü"
Looks better.
The entire UCN is considered (c3 bc
), but still no ü
.
char str[] = "\xc3\xbc";
size_t str_len = strlen(str);
for (i = 0; i < str_len; i++)
printf("%02hhx ", str[i]);
printf("- %zu %s\n", str_len, str); // prints "c3 bc - 2 ü"
这会打印ü
,但我将str
从UCN修改为十六进制代码.
从\u00c3\u00bc
分到ü
分,我错过了什么?