From the 2gis API I got the following JSON string.

{
    "api_version": "1.3",
    "response_code": "200",
    "id": "3237490513229753",
    "lon": "38.969916127827",
    "lat": "45.069889625267",
    "page_url": null,
    "name": "ATB",
    "firm_group": {
        "id": "3237499103085728",
        "count": "1"
    },
    "city_name": "Krasnodar",
    "city_id": "3237585002430511",
    "address": "Turgeneva,   172/1",
    "create_time": "2008-07-22 10:02:04 07",
    "modification_time": "2013-08-09 20:04:36 07",
    "see_also": [
        {
            "id": "3237491513434577",
            "lon": 38.973110606808,
            "lat": 45.029031222211,
            "name": "Advance",
            "hash": "5698hn745A8IJ1H86177uvgn94521J3464he26763737242Cf6e654G62J0I7878e",
            "ads": {
                "sponsored_article": {
                    "title": "Center "ADVANCE"",
                    "text": "Business.English."
                },
                "warning": null
            }
        }
    ]
}

But Python doesn't recognize it:

json.loads(firm_str)

需要,分隔符:第1行第3646列(字符3645)

It looks like a problem with quotes in: "title": "Center "ADVANCE""

如何在Python中自动修复它?

推荐答案

answer by @Michael给了我一个 idea .这不是一个非常好的 idea ,但它似乎可以工作,至少在您的示例中是这样:try 解析JSON字符串,如果失败,在异常字符串1中查找失败的字符并替换该字符.

while True:
    try:
        result = json.loads(s)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'\"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '\')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'\"' + s[closg+1:]
print result

You may want to add some additional checks to prevent this from ending in an infinite loop (e.g., at max as many repetitions as there are characters in the string). Also, this will still not work if an incorrect " is actually followed by a comma, as pointed out by @gnibbler.

Update: This seems to work pretty well now (though still not perfect), even if the unescaped " is followed by a comma, or closing bracket, as in this case it will likely get a complaint about a syntax error after that (expected property name, etc.) and trace back to the last ". It also automatically escapes the corresponding closing " (assuming there is one).


1) The exception's str is "Expecting , delimiter: line XXX column YYY (char ZZZ)", where ZZZ is the position in the string where the error occurred. Note, though, that this message may depend on the version of Python, the json module, the OS, or the locale, and thus this solution may have to be adapted accordingly.

Json相关问答推荐

使用SQL查询从SON中查找第n个密钥对值

使用jolt删除空对象

盒子图显示不正确

NiFi QueryRecord处理器- Select 可选的JSON属性

无法从JSON解析ZonedDateTime,但可以使用格式化程序很好地解析

重构JOLT代码以获得预期输出

Golang jsonrpc2 服务器在哪里监听?

Python 将 struct 化文本转换和筛选为对象

使用 jq 工具将文本从 txt 文件转换为 json

如何在 JSonPath 中按值查找列表中的所有元素

父键中的 Perl JSON 数组

如何判断 Json 对象中是否存在键并获取其值

如何在golang中获取 struct 的json字段名称?

python,将Json写入文件

字符串的 Gson 数组到 JsonArray

有没有办法使用 Jackson 将 Map 转换为 JSON 表示而不写入文件?

如何使用 Newtonsoft.Json 包在 C#(4.0) 中解析我的 json 字符串?

JSON.parse 返回字符串而不是对象

如何使用 Serde 反序列化带有自定义函数的可选字段?

如何从 BindingResult 获取控制器中的错误文本