参考在网上找到的代码,没想到相差那么大,目前有个项目要对50GB~70GB的代码,260个关键字做扫描,急需一个比较快速的方案。
[gzhy@nearby stat]$ wc -l 1 234033 1 [gzhy@nearby stat]$ perl 1.pl cost 1 seconds zjtel : 32606 [gzhy@nearby stat]$ perl 2.pl cost 111 seconds zjtel : 32606
#!/usr/bin/perl
my $time=time();
open(file,"1");
while(<file>;)
{
chomp;
if(m/:zjtel:/)
{
$zjtel++;
}
}
close(file);
$time=time()-$time;
print "cost $time seconds\n";
print "zjtel : $zjtel\n"; 2.pl
#!/usr/bin/perl $time=time(); $count=`grep zjtel 1 | wc -l `; $time=time()-$time; print "cost $time seconds\n"; print "zjtel : $count\n"
我的等待测试代码:
pattern-match:
use strict;
use File::Basename;
//在一个目录的文件文件中查找包含关键字的 <文件名>:<行数>:<行内容>
my ($dir,$keywords)= @ARGV;
opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";
my @filenames=sort readdir(DIRHANDLE);
close(DIRHANDLE);
open KEY,"<$keywords" or die "Can‘t open $keywords";
my @keywords=<KEY>;
close KEY;
my $num_key=scalar @keywords;
my @match_lines;
my $time=time();
foreach my $file(@filenames){
open FILE,"<$file";
$n=1;
while my $line(<FILE>){
chomp $line;
foreach my $key(@keywords){
if($line=~m/$key/){
$context="$file:$n:$line\n";
push @match_lines,$context;
}
}
}
close(file);
}
open RS,">result_file_pattern";
foreach(@match_lines){
print RS $_;
}
close RS;
$time=time()-$time;
print "Patter-match ($num_key keywords) end:$time seconds\n";
//如果直接将$context print到RS句柄和现在这种方式是否有区别? grep:
use strict;
use File::Basename;
//在一个目录的文件文件中查找包含关键字的 <文件名>:<行数>:<行内容>
my ($dir,$keywords)= @ARGV;
opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";
my @filenames=sort readdir(DIRHANDLE);
close(DIRHANDLE);
open KEY,"<$keywords" or die "Can‘t open $keywords";
my @keywords=<KEY>;
close KEY;
my $num_key=scalar @keywords;
my @match_lines;
my $time1=time();
foreach my $file(@filenames){
foreach $key(@keywords){
chomp $key;
my @sub_match_lines=`grep $key $file`;
push @match_lines,@sub_match_lines;
}
}
open RS,">result_file_grep";
foreach(@match_lines){
print RS $_;
}
close RS;
my $time2=time();
print "Grep ($num_key keywords) end:",$time2-$time1,"\n";
//如果直接将$context print到RS句柄和现在这种方式是否有区别?
【linux】grep 和【perl】 脚本实现的grep功能的运行时间差异
原文:http://my.oschina.net/u/347414/blog/352435