我不知道使用JSON库解析无效JSON会有多困难.即使使用GsonBuilder.setLenient()
method进行解析,最终还是会得到与所需不匹配的数据 struct .
或者,您可以通过在处理输入文件时递增地写入输出文件,而不是累积要写入的所有数据,来提高当前文本处理方法的效率.
比如:
import java.io.File;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
public class FixJson {
public static void main (String[] args) {
if (args.length != 2) {
System.err.println("Usage java FixJson <input-file> <output-file");
System.exit(1);
}
File inputFile = new File(args[0]);
File outputFile = new File(args[1]);
try (
BufferedReader input = Files.newBufferedReader(inputFile.toPath());
BufferedWriter output = Files.newBufferedWriter(outputFile.toPath());
) {
output.write("{\n \"entries\":[\n");
String prevLine;
String line;
prevLine = input.readLine();
while ((line = input.readLine()) != null) {
prevLine = " " + prevLine + ",\n";
output.write(prevLine);
prevLine = line;
}
prevLine = " " + prevLine + "\n";
output.write(prevLine);
output.write(" ]\n}\n");
} catch (IOException ex) {
System.err.println("file IO error: " + ex);
System.exit(1);
}
}
}
在使用中,它看起来像:
small input
$ /usr/bin/time -f "%E time elapsed, %M kB max resident memory" java FixJson input-small.txt output-small.txt
0:00.24 time elapsed, 31132 kB max resident memory
$ cat input-small.txt
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
$ cat output-small.txt
{
"entries":[
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
]
}
$ ls -lh *-small.txt
-rw-r--r-- 1 chuckx chuckx 300 Apr 25 19:01 input-small.txt
-rw-r--r-- 1 chuckx chuckx 335 Apr 25 23:31 output-small.txt
large input(只需重复少量输入即可创建444000行文件)
$ /usr/bin/time -f "%E time elapsed, %M kB max resident memory" java FixJson input-large.txt output-large.txt
0:01.05 time elapsed, 130148 kB max resident memory
$ head -5 input-large.txt ; echo ... ; tail -5 input-large.txt
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
...
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
$ head -5 output-large.txt ; echo ... ; tail -5 output-large.txt
{
"entries":[
{"c0":"1","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
...
{"c0":"2","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"3","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"},
{"c0":"4","c1":"2","c2":"810001000","c3":"A","c10":"A","c11":"2019-02-06"}
]
}
$ ls -lh *-large.txt
-rw-r--r-- 1 chuckx chuckx 32M Apr 25 22:12 input-large.txt
-rw-r--r-- 1 chuckx chuckx 34M Apr 25 23:31 output-large.txt