Word count (Python)
From LiteratePrograms
- Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx
An implementation of the UNIX wc tool, in Python.
The wc tool counts characters, words and lines in text files or stdin. When invoked without any options, it will print all three values. These options are supported:
- -c - Only count characters
- -w - Only count words
- -l - Only count lines
If the tool is invoked without any file name parameters, it will use stdin.
Contents |
General structure
<<wc.py>>= #!/usr/bin/env python from optparse import OptionParser import sys configure parser support functions main
Configuring the command line parser
First, the supported options are defined. Note that this also defines long options --char
, --words
and --lines
as alternative to the short options -c
, -w
and -l
. Note that the option parser automatically defines an additional option -h
with long form --help
which gives a short summary of all options. That's why also an usage string and a help string for each option is given.
<<configure parser>>= parser = OptionParser(usage="usage: %prog [options] [file1 file2 ...]") parser.add_option("-c", "--char", dest="characters", action="store_true", default=False, help="Only count characters") parser.add_option("-w", "--words", dest="words", action="store_true", default=False, help="Only count words") parser.add_option("-l", "--lines", dest="lines", action="store_true", default=False, help="Only count lines")
After all supported options are defined, the supplied options can be parsed.
<<configure parser>>= (options, args) = parser.parse_args()
If none of the options were given, we actually want all of them.
<<configure parser>>= if not (options.characters or options.words or options.lines): options.characters, options.words, options.lines = True, True, True
The main program
If any file arguments are given, we loop through the arguments, keeping track of the total number of characters, words, characters and files. If more than one file was processed, the total is also output. If only one file was processed, the total would just repeat the counts of the single file and therefore is not given. If no file arguments are given, standard input is processed.
<<main>>= if args: total_lines = 0 total_words = 0 total_chars = 0 file_count = 0 for file_string in args: process single file argument if file_count > 1: print_count(total_lines, total_words, total_chars, "total") else: process stdin
In case the file name contains a file glob (wild card), that glob is expanded. Note that on POSIX compatible systems, this is the wrong thing to do, since there the wild cards are already expanded by the shell. If a glob symbol ends up in the file name, on such systems it means that either it was quoted (indicating that expansion was explicitly not wanted), the file glob didn't match a file name, or it was a result of file globbing (note that on POSIX systems, a star can legally be part of the file name). However, systems like Microsoft Windows don't do file globbing in the shell, therefore there it's the job of the program to do it. This questions a list of platforms where file globbing has to be performed.
<<process single file argument>>= #this comment is there to prevent a literateprograms bug messing up the indentation (which is fatal in python) platforms_needing_glob = ["win32"] # please populate! if sys.platform in platforms_needing_glob: import glob file_list = glob.glob(file_string) else: file_list = [file_string]
Next, we loop through all file names corresponding to the argument, read the total file content into memory, count the chars, words and lines, print them and update the globals.
<<process single file argument>>= for file_name in file_list: file = open(file_name) data = file.readlines() lines, words, chars = get_count(data) print_count(lines, words, chars, file_name) total_lines += lines total_words += words total_chars += chars file_count += 1
Processing standard input works the same as processing a file, except that there are no totals to update. As file name for output we give the empty string.
<<process stdin>>= file = sys.stdin data = file.readlines() lines, words, chars = get_count(data) print_count(lines, words, chars, "")
Support functions
We need a couple of support functions.
<<support functions>>= def get_count(data): lines = len(data) words = sum(len(x.split()) for x in data) chars = sum(len(x) for x in data) return lines, words, chars def print_count(lines, words, chars, filename): print "\t", if options.lines: print lines, "\t", if options.words: print words, "\t", if options.characters: print chars, "\t", print filename
Download code |