Word count (J)

From LiteratePrograms

Jump to: navigation, search
Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx

This is not an implementation of the UNIX wc tool. Instead, it is a response to a paper of Gibbons, as discussed on Lambda the Ultimate, which makes the claim that three different wordcount programs might all have arisen from the same high-level design, namely the composition . This being a "Word count" program, we favor renaming the composition to , and implement it in the terse, but powerful, J array programming language.

Contents

theory

By placing an ordering on character classes (nonblank < blank), we avoid Algol-style folds and use whole-array operations.

Some things to be aware of when reading J code:

  • the . and : do not occur alone, but are diacritic marks that modify the interpretation of the character which they follow.
  • { and } (as well as the brackets and a few others) have their own, independent meanings and usually occur unpaired.

implementation

locating drops

Within a word, the classification increases monotonically, so the crux of the program is to flag all the spots where the classification decreases — where a nonblank character follows a blank. We avoid iteration by comparing the entire boolean array with a shifted version of itself:

In J, we need not even make up a temporary name, such as bs, but can instead leave the array argument implicit.

  • }., or behead, produces the array without its first element
  • }:, or curtail, produces the array without its last element
  • < performs the obvious comparison
<<drops>>=
(}.<}:)

flagging blanks

Blanks are easily found — the expression turns into a membership check.

  • {, or from, selects items from an array
  • a., or alphabet, contains the system character set (so we will select ASCII space, tab, and linefeed)
  • e., or member (in), checks if the elements of its left argument somewhere in its right
  • but ~, or passive, reverses the arguments, so now we check for which characters of the right argument occur in the whitespace array given on the left.
<<blank>>=
(32 9 10{a.)e.~

indicating words

Now we have a straight-line definition for words: :

<<words>>=
dropsblank

wrapping up

We still need a definition for count, but this is a traditional idiom in both APL and J.

  • +/, or insert plus, sums up its argument
<<count>>=
+/

Finally, we include the boilerplate for a jconsole script:

<<wc.ijs>>=
echo countwords' ',stdin '' NB. word count script (use jconsole)

which will result in a single-line script:

echo +/(}.<}:)(32 9 10{a.)e.~' ',stdin '' NB. word count script (use jconsole)

that can be run as follows:

$ jconsole wc.ijs < wc.ijs
12
Download code
Views