我有一个json类型的文件,包含以下内容:

{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
...

我想解析上面提到的内容,并将其格式化为有效的JSON,尤其是在以下 struct 中:

{
 "entries":[
  {"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
  {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
  {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
  {"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
  ....
]} 

然后用JAVA将其写入文件.

如何在GSon(主要)或其他库中高效地实现这一点(考虑到大尺寸的输入文件)?

我try 了以下方法来转换 struct :

    ....
    File jsonFile= new File("pathToJSONFile");
    FileReader fileReader
                = new FileReader(  
                jsonFile);
        // Convert fileReader to
        // bufferedReader
        BufferedReader buffReader
                = new BufferedReader(
                fileReader);
        String textToAppend = null;
        String line;
        textToAppend = '{' + "\"entries\":" + '[' ;
        line = buffReader.readLine();
        textToAppend += line;

        while ((line = buffReader.readLine()) != null ){
           textToAppend += ',';
           textToAppend += line;
        }

        textToAppend += ']';
        textToAppend += '}';
        // then FileWrite textToAppend to the output file.

但是,对于大型JSON输入文件,我的解决方案在时间上效率不高.

推荐答案

我不知道使用JSON库解析无效JSON会有多困难.即使使用GsonBuilder.setLenient() method进行解析,最终还是会得到与所需不匹配的数据 struct .


或者,您可以通过在处理输入文件时递增地写入输出文件,而不是累积要写入的所有数据,来提高当前文本处理方法的效率.

比如:

import java.io.File;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;

public class FixJson {
    public static void main (String[] args) {
        if (args.length != 2) {
            System.err.println("Usage java FixJson <input-file> <output-file");
            System.exit(1);
        }

        File inputFile = new File(args[0]);
        File outputFile = new File(args[1]);

        try (
            BufferedReader input = Files.newBufferedReader(inputFile.toPath());
            BufferedWriter output = Files.newBufferedWriter(outputFile.toPath());
        ) {
            output.write("{\n \"entries\":[\n");

            String prevLine;
            String line;

            prevLine = input.readLine();
            while ((line = input.readLine()) != null) {
                prevLine = "   " + prevLine + ",\n";
                output.write(prevLine);
                prevLine = line;
            }
            prevLine = "   " + prevLine + "\n";
            output.write(prevLine);

            output.write(" ]\n}\n");
        } catch (IOException ex) {
            System.err.println("file IO error: " + ex);
            System.exit(1);
        }
    }
}

在使用中,它看起来像:

small input

$ /usr/bin/time -f "%E time elapsed, %M kB max resident memory" java FixJson input-small.txt output-small.txt
0:00.24 time elapsed, 31132 kB max resident memory

$ cat input-small.txt
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}

$ cat output-small.txt
{
 "entries":[
   {"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
 ]
}

$ ls -lh *-small.txt
-rw-r--r-- 1 chuckx chuckx 300 Apr 25 19:01 input-small.txt
-rw-r--r-- 1 chuckx chuckx 335 Apr 25 23:31 output-small.txt

large input(只需重复少量输入即可创建444000行文件)

$ /usr/bin/time -f "%E time elapsed, %M kB max resident memory" java FixJson input-large.txt output-large.txt
0:01.05 time elapsed, 130148 kB max resident memory

$ head -5 input-large.txt ; echo ... ; tail -5 input-large.txt
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
...
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}

$ head -5 output-large.txt ; echo ... ; tail -5 output-large.txt
{
 "entries":[
   {"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
...
   {"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
   {"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
 ]
}

$ ls -lh *-large.txt
-rw-r--r-- 1 chuckx chuckx 32M Apr 25 22:12 input-large.txt
-rw-r--r-- 1 chuckx chuckx 34M Apr 25 23:31 output-large.txt

Java相关问答推荐

@从类文件中删除JsonProperty—Java

Intellij显示项目语言级别最高为12,尽管有java版本17 SDK

查找剩余的枚举

SpringBoot Kafka自动配置-适用于SASL_PLAYTEXT的SSLBundle 包,带SCRAM-SHA-512

如何将Pane的图像快照保存为BMP?

使用Jackson库反序列化json

Java.lang.invke.LambdaConversionException:实例方法InvokeVirtual的参数数量不正确

除0错误/抱歉我的句子是PT

对角线填充二维数组

有没有办法知道在合并中执行了什么操作?

在不使用instanceof或强制转换的情况下从父类变量调用子类方法

如何通过Java java.lang.Foreign API访问本机字节数组

如何使用jooq更新记录?

在ECLIPSE上的M1 Pro上运行JavaFX的问题

为什么创建Java动态代理需要接口参数

ResponseEntity.控制器截断响应的JSON部分

原始和参数化之间的差异调用orElseGet时可选(供应商)

为什么我得到默认方法的值而不是被覆盖的方法的值?

java.util.LinkedList()是如何成为MutableList的实例的?

如何使用java区分以下结果