因为我需要读取的一些CSV文件非常大(多GB),所以我try 实现一个进度条,该进度条指示使用pandas从URL读取CSV文件时读取的总字节数.
我正在try 实现这样的功能:
from tqdm import tqdm
import requests
from sodapy import Socrata
import contextlib
import urllib
import pandas as pd
url = "https://public.tableau.com/views/PPBOpenDataDownloads/UseOfForce-All.csv?:showVizHome=no"
response = requests.get(url, params=None, stream=True)
response.raise_for_status()
total_size = int(response.headers.get('Content-Length', 0))
block_size = 1000
df = []
last_position = 0
cur_position = 1
with tqdm(desc=url, total=total_size,
unit='iB',
unit_scale=True,
unit_divisor=1024
) as bar:
with contextlib.closing(urllib.request.urlopen(url=url)) as rd:
# Create TextFileReader
reader = pd.read_csv(rd, chunksize=block_size)
for chunk in reader:
df.append(chunk)
# Here I would like to calculate the current file position: cur_position
bar.update(cur_position - last_position)
last_position = cur_position
有没有办法从pandas TextFileReader获取文件位置?对于TextFileReader,可能有与C++中的ftell等效的东西?