我有大量的小文件要搜索.我一直在寻找一个好的事实上的多线程版本的grep
,但找不到任何东西.如何提高我对grep的使用?到目前为止,我正在这样做:
grep -R "string" >> Strings
我有大量的小文件要搜索.我一直在寻找一个好的事实上的多线程版本的grep
,但找不到任何东西.如何提高我对grep的使用?到目前为止,我正在这样做:
grep -R "string" >> Strings
如果您在多核处理器上安装了xargs,您可以从以下几点中受益,以防有人感兴趣.
Environment:
Processor: Dual Quad-core 2.4GHz
Memory: 32 GB
Number of files: 584450
Total Size: ~ 35 GB
Tests:
1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.
time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8
real 3m24.358s
user 1m27.654s
sys 9m40.316s
2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.
time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P4 grep -H "string" >> Strings
real 16m3.051s
user 0m56.012s
sys 8m42.540s
3. Suggested by @Stephen: Find the necessary files and use + instead of xargs
time find ./ -name "*.ext" -exec grep -H "string" {} \+ >> Strings
real 53m45.438s
user 0m5.829s
sys 0m40.778s
4. Regular recursive grep.
grep -R "string" >> Strings
real 235m12.823s
user 38m57.763s
sys 38m8.301s
就我而言,第一个命令运行得很好.