Word count (J)
From LiteratePrograms
- Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx
This is not an implementation of the UNIX wc tool. Instead, it is a response to a paper of Gibbons, as discussed on Lambda the Ultimate, which makes the claim that three different wordcount programs might all have arisen from the same high-level design, namely the composition . This being a "Word count" program, we favor renaming the composition to , and implement it in the terse, but powerful, J array programming language.
Contents |
theory
By placing an ordering on character classes (nonblank < blank), we avoid Algol-style folds and use whole-array operations.
Some things to be aware of when reading J code:
- the . and : do not occur alone, but are diacritic marks that modify the interpretation of the character which they follow.
- { and } (as well as the brackets and a few others) have their own, independent meanings and usually occur unpaired.
implementation
locating drops
Within a word, the classification increases monotonically, so the crux of the program is to flag all the spots where the classification decreases — where a nonblank character follows a blank. We avoid iteration by comparing the entire boolean array with a shifted version of itself:
In J, we need not even make up a temporary name, such as bs, but can instead leave the array argument implicit.
- }., or behead, produces the array without its first element
- }:, or curtail, produces the array without its last element
- < performs the obvious comparison
<<drops>>= (}.<}:)
flagging blanks
Blanks are easily found — the expression turns into a membership check.
- {, or from, selects items from an array
- a., or alphabet, contains the system character set (so we will select ASCII space, tab, and linefeed)
- e., or member (in), checks if the elements of its left argument somewhere in its right
- but ~, or passive, reverses the arguments, so now we check for which characters of the right argument occur in the whitespace array given on the left.
<<blank>>= (32 9 10{a.)e.~
indicating words
Now we have a straight-line definition for words: :
<<words>>= dropsblank
wrapping up
We still need a definition for count, but this is a traditional idiom in both APL and J.
- +/, or insert plus, sums up its argument
<<count>>= +/
Finally, we include the boilerplate for a jconsole
script:
<<wc.ijs>>= echo countwords' ',stdin '' NB. word count script (use jconsole)
which will result in a single-line script:
echo +/(}.<}:)(32 9 10{a.)e.~' ',stdin '' NB. word count script (use jconsole)
that can be run as follows:
$ jconsole wc.ijs < wc.ijs 12
Download code |