你在file.write(chunk)
号公路上没有flow control.您需要注意file.write(chunk)
的返回值,当它返回false
时,您必须等待drain
事件,然后才能写入更多内容.否则,您可能会使写入流上的缓冲区溢出,特别是在将大量内容写入速度较慢的介质(如磁盘)时.
当您试图以比磁盘所能保持的速度更快的速度写入大型内容时,如果您缺乏流控制,则可能会导致内存使用量激增,因为流必须在其缓冲区中积累比预期更多的数据.
因为您的数据来自可读器,所以当您从file.write(chunk)
返回false
时,您还必须暂停传入的读取流,以便在您等待写流上的drain
事件时,它不会一直向您喷发数据事件.当您获得drain
事件时,您就可以对Readstream执行resume
操作.
仅供参考,如果你不需要进度信息,你可以让pipeline()
为你做所有的工作(包括流量控制).您不必自己编写代码.你甚至可能仍然能够收集进度信息,只需在使用pipeline()
时查看WritestStream活动.
以下是您自己实现流控制的一种方法,不过我建议您在流模块中使用pipeline()
函数,如果可以,让它为您完成所有这些操作:
const file = fs.createWriteStream(fileName);
file.on("error", err => console.log(err));
http.get(url).on("response", function(res) {
let downloaded = 0;
res.on("data", function(chunk) {
let readyForMore = file.write(chunk);
if (!readyForMore) {
// pause readstream until drain event comes
res.pause();
file.once('drain', () => {
res.resume();
});
}
downloaded += chunk.length;
process.stdout.write(`Downloaded ${(downloaded / 1000000).toFixed(2)} MB of ${fileName}\r`);
}).on("end", function() {
file.end(); console.log(`${fileName} downloaded successfully.`);
}).on("error", err => console.log(err));
});
Http请求中似乎也存在超时问题.当我添加以下内容时:
// set client timeout to 24 hours
res.setTimeout(24 * 60 * 60 * 1000);
然后我就可以下载整个7 GB的ZIP文件了.
以下是适用于我的交 keys 代码:
const fs = require('fs');
const https = require('https');
const url =
"https://www2.census.gov/programs-surveys/acs/summary_file/2020/data/5_year_entire_sf/All_Geographies_Not_Tracts_Block_Groups.zip";
const fileName = "census-data2.zip";
const file = fs.createWriteStream(fileName);
file.on("error", err => {
console.log(err);
});
const options = {
headers: {
"accept-encoding": "gzip, deflate, br",
}
};
https.get(url, options).on("response", function(res) {
const startTime = Date.now();
function elapsed() {
const delta = Date.now() - startTime;
// convert to minutes
const mins = (delta / (1000 * 60));
return mins;
}
let downloaded = 0;
console.log(res.headers);
const contentLength = +res.headers["content-length"];
console.log(`Expecting download length of ${(contentLength / (1024 * 1024)).toFixed(2)} MB`);
// set timeout to 24 hours
res.setTimeout(24 * 60 * 60 * 1000);
res.on("data", function(chunk) {
let readyForMore = file.write(chunk);
if (!readyForMore) {
// pause readstream until drain event comes
res.pause();
file.once('drain', () => {
res.resume();
});
}
downloaded += chunk.length;
const downloadPortion = downloaded / contentLength;
const percent = downloadPortion * 100;
const elapsedMins = elapsed();
const totalEstimateMins = (1 / downloadPortion) * elapsedMins;
const remainingMins = totalEstimateMins - elapsedMins;
process.stdout.write(
` ${elapsedMins.toFixed(2)} mins, ${percent.toFixed(1)}% complete, ${Math.ceil(remainingMins)} mins remaining, downloaded ${(downloaded / (1024 * 1024)).toFixed(2)} MB of ${fileName} \r`
);
}).on("end", function() {
file.end();
console.log(`${fileName} downloaded successfully.`);
}).on("error", err => {
console.log(err);
}).on("timeout", () => {
console.log("got timeout event");
});
});