Word count (C)

From LiteratePrograms

Jump to: navigation, search
Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx

An implementation of the UNIX wc tool.

The wc tool counts characters, words and lines in text files or stdin. When invoked without any options, it will print all three values. These options are supported:

  • -c - Only count characters
  • -w - Only count words
  • -l - Only count lines

If the tool is invoked without any file name parameters, it will use stdin.

<<wc.c>>=
#include<stdio.h>
#include<ctype.h>
enum {OPT_C=4, OPT_W=2, OPT_L=1};
print
wc
main

main()

<<main>>=
int main(int argc, char *argv[])
{
	int chars=0;
	int words=0;
	int lines=0;
	int nfiles=0;
	int opt=0;
	int n;

We use the argv argument in main() to get the options and file names, if any. Options start with a '-'. Initially, argv is increased until it points to an argument that is not an option, in which case we have reached the end of the option list.

<<main>>=
	while((++argv)[0] && argv[0][0]=='-') {

Options are stored in opt. If we don't support the option, we just print out an error message and quit.

<<main>>=
		n=1;
		while(argv[0][n]) {
			switch(argv[0][n++]) {
			case 'c': opt|=OPT_C; break;
			case 'w': opt|=OPT_W; break;
			case 'l': opt|=OPT_L; break;
			default:
				fprintf(stderr, "Unknown option %c\n", argv[0][n-1]);
				fprintf(stderr, "Usage: wc [-cwl] [<filename>*]\n");
				return -1;
			}
		}
	}

If no options were specified, we default to having all of the options on.

<<main>>=
	if(!opt) opt=OPT_L|OPT_W|OPT_C;

Having processed all of the options, we proceed to the files. Any command-line argument not starting with '-' is treated as a file name. The wc() function is called for each file, to do the actual counting work.

<<main>>=
	while(argv[0]) {
		++nfiles;
		if(wc(*argv, opt, &chars, &words, &lines)==-1) {
			perror(*argv);
			return 1;
		}
		++argv;
	}

If there was no files specified, we specify the - as the "file name", to indicate we want stdin.

<<main>>=
	if(nfiles==0) wc("-", opt, &chars, &words, &lines);

If there was more than 1 file, the sum is printed.

<<main>>=
	else if(nfiles>1) print("total", opt, chars, words, lines);
	return 0;
}

print()

The print() function uses the opt argument to decide what to print.

<<print>>=
int print(const char *fname, int opt, int chars, int words, int lines)
{
	if(opt&OPT_L) printf("% 8d", lines);
	if(opt&OPT_W) printf("% 8d", words);
	if(opt&OPT_C) printf("% 8d", chars);

The file name is printed after the values, unless it starts with '-', indicating stdin.

<<print>>=
	if(fname[0]!='-') printf(" %s", fname);
	putchar('\n');
	return 0;
}

wc()

wc() is the function doing the actual counting.

<<wc>>=
int wc(const char *fname, int opt, int *tot_chars, int *tot_words, int *tot_lines)
{
	int ch;
	int chars=0;
	int words=0;
	int lines=0;
	int sp=1;
	FILE *fp;

The file specified in fname is opened, unless it starts with '-' indicating stdin.

<<wc>>=
	if(fname[0]!='-') fp=fopen(fname, "r");
	else fp=stdin;
	if(!fp) return -1;
	while((ch=getc(fp))!=EOF) {

All bytes are counted as a characters. This ignores UTF8 and other encodings where a character can consume multiple bytes.

<<wc>>=
		++chars;

sp indicates if the last character was a white-space. Each time we hit a non white-space character, and the previous was white-space, we have started on a new word.

<<wc>>=
		if(isspace(ch)) sp=1;
		else if(sp) {
			++words;
			sp=0;
		}

A line-feed character obviously means end of line, so we increase the line count.

<<wc>>=
		if(ch=='\n') ++lines;
	}

When the counting is done, we print the result, and increase the "total" values.

<<wc>>=
	print(fname, opt, chars, words, lines);
	if(fname[0]!='-') fclose(fp);
	*tot_chars+=chars;
	*tot_words+=words;
	*tot_lines+=lines;
	return 0;
}
Download code
Views