根据thisAnswer的回答,我一直在try 使用DASK从压缩目录中读取多个CSV.但是,我收到了一条很长的错误消息,我无法理解.我认为最重要的一句话是:
msgpack.exceptions.ExtraData: unpack(b) received extra data.
个
这data台是公开发售的.
import numpy as np
import pandas as pd
import dask.dataframe as dd
# read data, the dask way
df = dd.read_csv('zip://BACI*.csv', sep=",", dtype={"k":str, "i":int, "j":int, "t":int}, storage_options={'fo': '../input/baci_hs92.zip'})
df.head()
我相信这种飞过的提取应该在Dask中起作用,我宁愿不像other个答案建议的那样将所有文件解压缩到某个目录中.