Literate Programming (Python)

From LiteratePrograms

Jump to: navigation, search

On LiteratePrograms we use noweb to untangle literate programs. This page presents a much simpler implementation, with the aim of explaining the essence of the process.


A literate program is presented in such a manner that the structure of the explanatory text takes precedence over the structure of the program itself. To produce input useful for a compiler or interpreter, one must then do some file processing to transform this original text. This transformation takes the form of a metamorphism, a fold followed by an unfold.


The literate program alternates between explanatory material and program fragments. Each line is either explanatory material, a fragment name, or program text. We scan the program line by line, ignoring commentary and collecting program lines into a dictionary keyed by the given heads.

<<scan for code blocks>>=
def findcode(file):
    cs, cur = {}, None
    for line in file:
        if line[0] == '@':    cur = line[1:-1]
        if line[0] == '>':    cs[cur] = cs.get(cur,"") + line[1:]
    return cs

The program fragments alternate between literal text and references to other fragments (forming, one hopes, a DAG of fragments). To reweave a fragment, we copy out literal text and recursively reweave any other fragments mentioned.

<<reweave program>>=
def expand(node, cs):
    phase = True
    for x in cs[node][:-1].split('@'):
        if phase:    sys.stdout.write(x)
        else:        expand(x, cs)
        phase = not phase

wrapping up

Finally, we add a small application wrapper that, given a top-level fragment to expand, takes a "barely literate" program on standard input and produces the resulting source file on standard output.

import sys
if len(sys.argv) < 2:
    print "Usage: %s node" % sys.argv[0]
scan for code blocks
reweave program
expand(sys.argv[1], findcode(sys.stdin))

expand(..., findcode(...)) is our metamorphism, where expand is the unfold and findcode the fold.

One advantage of explicitly naming the top-level fragment is that one may then encode multiple versions of a program in the same literate file.

this is a test of barely-literate programming.
I'm afraid I can't find the developer tools CD
for my PowerBook, so none of the code here has
been tested.
:: :: ::
First, we try an old-timer program:
@k & r
>main(@formal names@)
>@formal declarations@
>	printf(@greetings@);
Not that we use them in this program (it is,
after all, a constant function, cf. the 'K'
combinator) but for nostalgia value, we have
@formal names
>argc, argv
@formal declarations
>int argc;
>char *argv[];
and, of course, we should specify the value
which the function takes at all arguments:
>"Hello, world\n"
:: :: ::
Now, let's rewrite this in a more modern style:
>@external declarations@
>int main(@inline formal declarations@)
>	printf(@greetings@);
>	@more boilerplate...@
You'll notice that we have an 'int' in there.
That's because once one has more than a few
K of memory to play with during a compile, it
turns out that silently assuming all params
are machine words isn't the best tradeoff.
Of course, now that we've declared that we're
returning an int, we'd probably better do so:
@more boilerplate...
>return 0;
We must declare the argument types as well
as the return types:
@inline formal declarations
>int argc, char **argv, char **envp
As in Pascal, the types are now specified in
with the argument names.  One might expect
that we'd need some additional bookkeeping
for printf() as well as main(), but luckily
(if one is not the preprocessor) this is a
matter of including the proper header file.
@external declarations
>#include <stdio.h>
:: :: ::
To extract either of these two versions, use
the following command lines:
% python "k & r" <
% python "ansi" <
(to get fancier, one might try piping output
through "tcc -run", but it would probably be
better to fill in the code sketched here and
improve its robustness and input format)
Download code