Word count (Perl)

From LiteratePrograms

Jump to: navigation, search
Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx

An implementation of the UNIX wc tool in Perl.

The wc tool counts characters, words and lines in text files or stdin. When invoked without any options, it will print all three values. These options are supported:

  • -c - Only count characters
  • -w - Only count words
  • -l - Only count lines

If the tool is invoked without any file name parameters, it will use stdin.

<<wc.perl>>=
#!/usr/bin/env perl
use strict;
use warnings;
printwc
wc
usage
my $nfiles=0;
my $opts;
my ($tot_lines, $tot_words, $tot_chars);

Command line

First we read all options (arguments starting with '-'). Unrecoginzed options are treated as errors.

<<wc.perl>>=
my $arg;
while($arg=shift(@ARGV) and $arg=~/^-/) {
	$arg=~/^-[lwc]+$/ or usage();
	$opts.=substr($arg, 1);
}

If the user did not provide any options, we use the default, "lwc", so that line-, word- and character-count are printed.

<<wc.perl>>=
$opts or $opts="lwc";

The remaining arguments are file names. wc() is used to count the values, and printwc() prints the result.

<<wc.perl>>=
while($arg) {
	++$nfiles;
	my ($lines, $words, $chars)=wc($arg);
	$tot_lines+=$lines;
	$tot_words+=$words;
	$tot_chars+=$chars;
	printwc($arg, $opts, $lines, $words, $chars);
	$arg=shift;
}

If there was no file names in the command line, we call wc() with "-" as file name, resulting in that stdin is counted.

If there was more than 1 file, the sum is printed.

<<wc.perl>>=
if($nfiles<1) {
	my ($lines, $words, $chars)=wc("-");
	printwc("-", $opts, $lines, $words, $chars);
} elsif($nfiles>1) {
	printwc("total", $opts, $tot_lines, $tot_words, $tot_chars);
}
exit 0;
<<usage>>=
sub usage
{
	print "Usage: $0 [-cwl] [<fname>*]\n";
	exit 1;
}

wc()

wc() takes 1 argument, the name of the file to count. If the name is "-", stdin is counted.

<<wc>>=
sub wc
{
	my ($fname)=@_;
	my $lines=0;
	my $words=0;
	my $chars=0;
	open(FILE, $fname) || die "Couldn\'t open file $fname";
	foreach my $line (<FILE>) {

Counting characters is simple. We just use the length of each line.

Lines are only counted when they end in a line-feed character. If the file ends without a line-feed, the last line is not counted.

<<wc>>=
		$chars+=length($line);
		$line=~/\n$/ and ++$lines;

To count words, we generate an array with the split() function and use the size of the generated array. To avoid extra empty words at start and end of the array, we remove all leading and trailing spaces.

<<wc>>=		
		$line=~s/^[ \t]*//g;
		$line=~s/[ \t\r\n]*$//g;
		my @w=split(/[ \t]+/, $line);
		$words+=@w;
	}
	close(FILE);
	return ($lines, $words, $chars);
}

printwc()

printwc() will use the $opts argument to decide which values to print. The file name is printed after the values.

<<printwc>>=
sub printwc
{
	my ($fname, $opts, $lines, $words, $chars)=@_;
	$opts=~/l/ && printf("% 8d", $lines);
	$opts=~/w/ && printf("% 8d", $words);
	$opts=~/c/ && printf("% 8d", $chars);
	$fname ne "-" && print " $fname";
	print "\n";
}
Download code
Views