最佳缓冲区大小与许多因素有关:文件系统块大小、CPU缓存大小和缓存延迟.
Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well.
这就是为什么您会看到大多数缓冲区的大小是2的幂,并且通常大于(或等于)磁盘挡路大小.这意味着您的一个流读取可能会导致多个磁盘挡路读取-但这些读取将始终使用完整的挡路-不会浪费读取.
Now, this is offset quite a bit in a typical streaming scenario because the block that is read from disk is going to still be in memory when you hit the next read (we are doing sequential reads here, after all) - so you wind up paying the RAM -> L3/L2 cache latency price on the next read, but not the disk->RAM latency. In terms of order of magnitude, disk->RAM latency is so slow that it pretty much swamps any other latency you might be dealing with.
因此,我怀疑,如果您使用不同的缓存大小运行测试(我自己没有这样做),您可能会发现缓存大小对文件系统块的大小有很大影响.除此之外,我怀疑事情会很快稳定下来.
There are a ton of conditions and exceptions here - the complexities of the system are actually quite staggering (just getting a handle on L3 -> L2 cache transfers is mind bogglingly complex, and it changes with every CPU type).
这就引出了"真实世界"的答案:如果你的应用程序有99%的可用性,请将缓存大小设置为8192,然后继续(更好的是, Select 封装而不是性能,并使用BufferedInputStream隐藏细节).如果有1%的应用程序高度依赖于磁盘吞吐量,请精心设计实现,这样你就可以交换不同的磁盘交互策略,并提供旋钮和刻度盘,让用户能够测试和优化(或想出一些self 优化的系统).