我得到了一个写得很差的csv
文件,我想用Pandas‘read_csv
加载它.下面是说明它的外观和生成的错误的前几行.
文件test.csv
:
feature_idx,cv_scores,avg_score,total-features
(4,),[0.71657 0.75430665 0.77866281 0.85293036 0.76370522],0.773235007449579,80
(4, 15),[0.79150981 0.82751849 0.83777517 0.9246948 0.82462535],0.8412247254527763,80
(1, 4, 15),[0.82173419 0.85052599 0.86065046 0.93704226 0.84315839],0.862622256166522,80
(1, 4, 15, 70),[0.82448556 0.86513518 0.87640778 0.93881338 0.84777784],0.8705239466728865,80
当我try 加载它时:
pandas.read_csv('test.csv')
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6
我理解这是因为第一个项目是tuple
分.如何让pandas
知道第一个字段是tuple
,这样(..)
之间的所有字段都被视为一个字段?
EDIT个
目前的答案还不起作用.
df = pd.read_csv('test.csv', converters={'feature_idx': parse_tuple}) # parse_tuple as per the answer
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6
# pandas version
>>> print(pd.__version__)
1.5.3