Caveat
Whatever solution you take, keep in mind that the JSON standard requires that you escape all control characters. This seems to be a common misconception. Many developers get that wrong.
All control characters表示从'\x00'
到'\x1f'
的所有内容,而不仅仅是'\x0a'
(也称为'\n'
)这样的简短表示.例如,您将'\x02'
个字符的must escape表示为\u0002
.
另请参阅:ECMA-404 - The JSON data interchange syntax, 2nd edition, December 2017, Page 4
Simple solution
如果您确信您的输入字符串是UTF-8编码的,那么您可以让事情变得简单.
由于JSON允许您通过\uXXXX
,甚至"
和\
转义所有内容,因此一个简单的解决方案是:
#include <sstream>
#include <iomanip>
std::string escape_json(const std::string &s) {
std::ostringstream o;
for (auto c = s.cbegin(); c != s.cend(); c++) {
if (*c == '"' || *c == '\\' || ('\x00' <= *c && *c <= '\x1f')) {
o << "\\u"
<< std::hex << std::setw(4) << std::setfill('0') << static_cast<int>(*c);
} else {
o << *c;
}
}
return o.str();
}
Shortest representation
对于最短的表示,可以使用JSON快捷方式,例如\"
而不是\u0022
.以下函数生成UTF-8编码字符串s
的最短JSON表示:
#include <sstream>
#include <iomanip>
std::string escape_json(const std::string &s) {
std::ostringstream o;
for (auto c = s.cbegin(); c != s.cend(); c++) {
switch (*c) {
case '"': o << "\\\""; break;
case '\\': o << "\\\\"; break;
case '\b': o << "\\b"; break;
case '\f': o << "\\f"; break;
case '\n': o << "\\n"; break;
case '\r': o << "\\r"; break;
case '\t': o << "\\t"; break;
default:
if ('\x00' <= *c && *c <= '\x1f') {
o << "\\u"
<< std::hex << std::setw(4) << std::setfill('0') << static_cast<int>(*c);
} else {
o << *c;
}
}
}
return o.str();
}
Pure switch statement
It is also possible to get along with a pure switch statement, that is, without if
and <iomanip>
. While this is quite cumbersome, it may be preferable from a "security by simplicity and purity" point of view:
#include <sstream>
std::string escape_json(const std::string &s) {
std::ostringstream o;
for (auto c = s.cbegin(); c != s.cend(); c++) {
switch (*c) {
case '\x00': o << "\\u0000"; break;
case '\x01': o << "\\u0001"; break;
...
case '\x0a': o << "\\n"; break;
...
case '\x1f': o << "\\u001f"; break;
case '\x22': o << "\\\""; break;
case '\x5c': o << "\\\\"; break;
default: o << *c;
}
}
return o.str();
}
Using a library
You might want to have a look at https://github.com/nlohmann/json, which is an efficient header-only C++ library (MIT License) that seems to be very well-tested.
You can either call their escape_string()
method directly (Note that this is a bit tricky, see comment below by Lukas Salich), or you can take their implementation of escape_string()
as a starting point for your own implementation:
https://github.com/nlohmann/json/blob/ec7a1d834773f9fee90d8ae908a0c9933c5646fc/src/json.hpp#L4604-L4697