I have a very, very large JSON file (1000+ MB) of identical JSON objects. For example:

[
    {
        "id": 1,
        "value": "hello",
        "another_value": "world",
        "value_obj": {
            "name": "obj1"
        },
        "value_list": [
            1,
            2,
            3
        ]
    },
    {
        "id": 2,
        "value": "foo",
        "another_value": "bar",
        "value_obj": {
            "name": "obj2"
        },
        "value_list": [
            4,
            5,
            6
        ]
    },
    {
        "id": 3,
        "value": "a",
        "another_value": "b",
        "value_obj": {
            "name": "obj3"
        },
        "value_list": [
            7,
            8,
            9
        ]

    },
    ...
]

Every single item in the root JSON list follows the same structure and thus would be individually deserializable. I already have the C# classes written to receive this data, and deserializing a JSON file containing a single object without the list works as expected.

At first, I tried to just directly deserialize my objects in a loop:

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (!sr.EndOfStream)
    {
        o = serializer.Deserialize<MyObject>(reader);
    }
}

这不起作用,抛出了一个异常,清楚地声明需要的是对象,而不是列表.我的理解是,此命令将只读取JSON文件根级别包含的单个对象,但由于我们有list个对象,因此这是一个无效请求.

My next idea was to deserialize as a C# List of objects:

JsonSerializer serializer = new JsonSerializer();
List<MyObject> o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (!sr.EndOfStream)
    {
        o = serializer.Deserialize<List<MyObject>>(reader);
    }
}

This does succeed. However, it only somewhat reduces the issue of high RAM usage. In this case it does look like the application is deserializing items one at a time, and so is not reading the entire JSON file into RAM, but we still end up with a lot of RAM usage because the C# List object now contains all of the data from the JSON file in RAM. This has only displaced the problem.

然后,我决定在进入循环之前,通过执行sr.Read(),简单地try 从流的开头删除一个字符(以消除[个字符).然后,第一个对象确实读取成功,但后续对象没有成功,只有"意外令牌"例外.我猜这是逗号和对象之间的空格,这会让读者产生反感.

简单地删除方括号是行不通的,因为对象确实包含它们自己的基本列表,正如您在示例中看到的那样.即使try 使用},作为分隔符也行不通,因为正如您所看到的,对象中存在子对象.

我的目标是,能够从流中一次读取一个对象.读取一个对象,对其进行处理,然后从RAM中丢弃它,然后读取下一个对象,依此类推.这将消除将整个JSON字符串或数据的全部内容作为C#对象加载到RAM中的需要.

我遗漏了什么?

推荐答案

This should resolve your problem. Basically it works just like your initial code except it's only deserializing object when the reader hits the { character in the stream and otherwise it's just skipping to the next one until it finds another start object token.

JsonSerializer serializer = new JsonSerializer();
MyObject o;
using (FileStream s = File.Open("bigfile.json", FileMode.Open))
using (StreamReader sr = new StreamReader(s))
using (JsonReader reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        // deserialize only when there's "{" character in the stream
        if (reader.TokenType == JsonToken.StartObject)
        {
            o = serializer.Deserialize<MyObject>(reader);
        }
    }
}

Json相关问答推荐

SSIS Kingswaysoft Json源动态 node 名称

当有2个嵌套数组时展平复杂的JSON

对一些JSON模式验证的混淆

JOLT转换以根据条件删除json对象

如何使用SQLite Trigger将JSON对象数组转换为新记录?

Jolt 不打印任何东西

将=分隔值文件转换为:json文件

如何将复杂的 JSON 反序列化为 Rust 类型?

使用 jq 将消息转换为数组

如何迭代、动态加载我的表单输入元素,然后在 React 中的表单提交上检索输入值?

判断golang中解析的json响应中是否存在所需的json键(不是值)

通过sql查询读取嵌套Json

如何解决名为 null 的map值

apple-app-site-association json 文件是否会在应用程序中更新?

在 Rails 3 中处理 JS/ERB 模板中的 JSON

使用 API 搜索维基百科

在自定义 JsonConverter 的 ReadJson 方法中处理空对象

为不同类型的项目数组正确的 JSON Schema

处理 HTTP 请求正文中的可选 JSON 字段

ASP.NET Web API JSON 输出中没有时间的日期