使用ApacheCommons库解析CSV数据时遇到错误
java.lang.IllegalStateException: IOException reading next record: java.io.IOException:
(line 46196) invalid char between encapsulated token and delimiter
我正在使用如下设置:
try {
File csvInput = getLatestFilefromDir(CSV_PATH);
reader = new FileReader(csvInput);
final CSVFormat csvFormat = CSVFormat.Builder.create()
.setHeader(HEADERS)
.setDelimiter(';')
.setQuote('"')
.setEscape('\\')
.setSkipHeaderRecord(true)
.build();
Iterable<CSVRecord> csvRecords = csvFormat.parse(reader);
for (CSVRecord csvRecord : csvRecords) {
// processing
}
} catch (Exception e) {
log.error("Error retrieving CSV data.");
e.printStackTrace();
}
如错误所示,数据存在某些缺陷,无效条目:
"TABLE_NAME";"ATTRIBUTE";"VALUE"
"SWAP_LEG_TYPE";"SWAP_LEG_TYPE_DESC";"The payments (PAY or RECEIVE) of this \"Leg\" are based on the yield linked to a specific equity or an index. (or to the actual market price of the equity or the index ???)"
"CNTPTY_TYPE";"CNTPTY_TYPE_DESC";"With Local Government we mean the so called \Regional Governments or Local Authorities\\" (RGLA) as defined by the EBA (European Banking Authority).\""
更改数据不在我的控制范围之内.假设反斜杠用于转义引号,就像在其他示例中一样,在本例中使用不当,并将其写入CSV文件,希望应该有
...Authorities\ \" (RGLA)...
有没有办法在解析前替换字符串?
或者,我可以做些什么来扩展CSVFormat
构建器以接受这样的数据?
我正在考虑一种简单的方法来读取整个输入,并只为\
替换字符串\\
,因为这是百万行中唯一的实例,但这似乎是错误的.