IIUC,您可以:
cols_to_drop = ['score', 'evalue', 'Description', 'EC', 'PFAMs']
data = []
for chunk in pd.read_csv('myinfile.csv', sep='\t', na_values='-', chunksize=100):
chunk = chunk.drop(columns=cols_to_drop)
data.append(chunk)
pd.concat(data).to_csv('my.csv', sep='\t', index=False)
如果您知道要保留哪些列而不是要删除哪些列,请使用:
cols_to_keep = ['col1', 'col2', 'col3']
data = []
for chunk in pd.read_csv('myinfile.csv', usecols=cols_to_keep, usesep='\t', na_values='-', chunksize=100):
data.append(chunk)
pd.concat(data).to_csv('my.csv', sep='\t', index=False)
另一种灵感来自@el_Oso:
cols_to_drop = ['score', 'evalue', 'Description', 'EC', 'PFAMs']
with (open('myinfile.csv') as inp,
open('my.csv', 'w') as out):
headers = inp.readline().split('\t')
out.write('\t'.join([col for col in headers if col not in cols_to_drop]))
for chunk in pd.read_csv(inp, header=None, names=headers, sep='\t', na_values='-', chunksize=100):
chunk = chunk.drop(columns=cols_to_drop)
chunk.to_csv(out, sep='\t', index=False, header=False)