Word count (Forth)

From LiteratePrograms

Jump to: navigation, search
Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx

This program tallies the number of characters, words, and newlines in the file supplied on standard input.

Note: In Forth terminology, a word is a program that has been defined for use by other programs, what most other languages call a routine. Words generally expect their parameters to be placed on the stack in a specific order before execution, and leave their results on the stack. As this article is about counting words, we will avoid the confusion by calling these words "routines".

Word counting routine

This routine counts the characters, words, and lines in a buffer.

nc, nw, and nl are variables containing the current number of characters, words, and lines, respectively.

<<declare variables>>=
variable nc
variable nw
variable nl

This code takes the number of characters on the top of the stack and adds it to our running count in the variable nc and leaves the original value on the stack for use by later code.

<<count the characters>>=
dup nc +!

(count) pad + pad do ... loop iterates over all the addresses in the buffer named pad.

<<iterate over buffer>>=
pad + pad
do   process one character
loop


(buffer) i c@ obtains the current character in the buffer.

<<get current character>>=
i c@

bl > is a simple-minded test for a non-blank character. bl is the standard Forth name for the space character, which in ASCII divides the control characters and whitespace characters from the printable characters. Thus any character that is greater than bl is printable, and anything else is a word-separator.

<<detect end-of-word>>=
get current character bl >

(bl?) nw +! detects the transition from blank to non-blank characters (i.e. the start of a word). Within a word, the flag is 0, and the phrase has no effect, but at the start of a word the flag is 1 and the count is incremented.

<<count a word>>=
nw +!

#lf is a constant for the line feed character defined by gforth.

<<detect end-of-line>>=
get current character #lf =

The line count is incremented on each line-feed. 1 nl +! is the Forth idiom for adding one to a variable and storing the result back into that variable (similar to C's ++nl or nl+=1):

<<count a line>>=
detect end-of-line if 1 nl +! then

Thus one pass of the character loop looks like this, and leaves a flag on the stack indicating whether it is skipping blanks (0) or not (1):

<<process one character>>=
detect end-of-word
if   count a word 0
else drop 1
     count a line
then

To be able to use this buffer-word-counting routine, we need to define it to the Forth interpreter, in this case under the name wc-pad. This routine expects the stack to contain a flag indicating whether blanks are being skipped (0) or scanned (1) and the number of bytes to process, and leaves an updated flag on the stack when it finishes.

<<wc-pad>>=
: wc-pad ( bl? n -- bl? )
  count the characters
  iterate over buffer ;

Main routine

Forth's file functions are unbuffered by default. For efficiency, large blocks of the input are read at a time so that the counting can be done on memory buffers. Forth has a standard scratchpad buffer called pad which we will use for the data buffer.

read-file takes a buffer variable, size to read, and file on the stack, and leaves the count of bytes read and the result code when it finishes (in common Forth usage, it has a stack effect of ( buffer length file -- count result )).

<<read a chunk of data>>=
pad 4096 stdin read-file

throw eats the result code, aborting the program if there was an I/O error, leaving just the count of the number of characters read.

<<abort on error>>=
throw

The dup while exits the loop when the number of characters read is zero (end of file).

The initial 1 put on the stack serves as a flag for whether we are currently scanning blanks.

<<wc>>=
: wc
  1
  begin  read a chunk of data abort on error
         dup
  while  wc-pad
  repeat 2drop ;

The main routine defines the variables for the counts (initialized to zero by gforth), invokes the counting function, prints the counts (? is shorthand for @ .), and exits.

<<wc.f>>=
declare variables
wc-pad
wc
wc  nl ? nw ? nc ? cr  bye
Download code
Views