Word count (Python, functional)
From LiteratePrograms
- Other implementations: Assembly Intel x86 Linux | C | C++ | Forth | Haskell | J | Lua | OCaml | Perl | Python | Python, functional | Rexx
This is not an implementation of the UNIX wc tool, in Python, for which see Word count (Python).
Instead, it is a response to a paper of Gibbons, as discussed on Lambda the Ultimate, which makes the claim that three different wordcount programs might all have arisen from the same high-level design, namely the composition . This being a "Word count" program, we favor renaming the composition to .
<<wc>>= wc=lambda s: count(words(s))
theory
The solutions given by Gibbons follow the normal Algol style (as represented here by the C, Haskell, Perl, and Lua examples) of iterating over an input stream while incrementing a running count. Even the Python example, although it treats whole lines as units, has a for lurking in the comprehension defining the word count for the entire input.
In this case, we make yet another design decision: by placing an ordering on character classes (nonblank < blank), we can replace Algol-style folds with APL-style whole-array operations.
implementation
Within a word, the classification increases monotonically, so the crux of the program is to flag all the spots where the classification decreases — where a nonblank character follows a blank. We avoid iteration by comparing the entire boolean array with a shifted version of itself:
<<defns>>= # Gibbons' unlengths are nondecreasing runs, eg 00001111 drops=lambda bs: bs[1:] < bs[:-1]
Blanks are easily found — the expression translates straightforwardly:
<<defns>>= blank=lambda cs: (cs==' ')+(cs=='\t')+(cs=='\n')
Now we have a straight-line definition for words: :
(Question: how do the spaces prepended to the argument help avoid conditional branching? at which arguments do they make a difference?)
<<defns>>= chars=lambda cs: array(tuple(cs)) words=lambda cs: drops(blank(chars(' '+cs)))
wrapping up
We still need a definition for count, but numpy.sum provides a suitable implementation:
<<defns>>= count=sum
Finally, we include the boilerplate:
<<wcfa.py>>= from numpy import array, sum defns wc if __name__=="__main__": import sys print wc(sys.stdin.read())
Download code |