A rudimentary wiki, written as a Bourne shell CGI program.



A wiki is a server: it accepts requests and generates replies. Some servers must keep session state with their clients; in this case the program runs to completion, providing a single reply for each request, and the filesystem holds the only durable state.

Question: What is the difference between a single-reply server and a procedure call?


We map WikiPages directly onto files in the filesystem, so this wiki is simple enough to be straight-line code.

<<serve WikiPage>>=
parse request
synchronize with filesystem
reply to user

During a request, there are only two minor complications:

  • distinguishing WikiPages (which have content in the filesystem) from WikiWords (which have yet to be created)
  • tracking the listing state (to turn the linear lists of wikitext into the hierarchical lists of HTML)

(in fact, of the handful of program variables: request,wikipage,allpages, and listing, the latter is the only one which actually varies after it has been initialized)

Question: What about the filesystem? How often does it vary during a progam run?

parse request

We expect the request to be packaged in the QUERY_STRING as request+wikipage, allowing us to set the positional arguments directly after url-decoding, which we then assign to the more mnemonic request (the verb) and wikipage (the direct object).

<<parse request>>=
set -- $(echo $QUERY_STRING | urldecode)

If there is no request, we redirect the user to the WelcomePage, and need do nothing more.

<<parse request>>=
[ $request ] || { reply "Location: $me?read+$homepage"; exit; }

synchronize with filesystem

If we have been asked to perform a write, we update the filesystem. (the data to be written is found in url-encoded format on standard input)

Because there will be no further mutations, we generate the list of currently valid WikiPages now.

<<synchronize with filesystem>>=
[ $request = write ] && sed 's/wikitext=//' | urldecode > $wikipage
allpages=$(ls [A-Z][a-z]*[A-Z][a-z]*)

reply to user

Depending upon the particular request, we respond differently.

  • for read or write we mark up the source to display the wiki page
  • for edit or create we provide an edit field with the source wikitext of the page
  • for link we generate a list of pages containing references to the current one
<<reply to user>>=
reply 'Content-Type: text/html'
case $request in
    read|write) cat <<-EOF
	(<a href=$me?edit+$wikipage>edit</a>)
	(<a href=$me?link+$wikipage>links</a>)
	$(markup < "$wikipage")
    edit|create) cat <<-EOF
	<html><body><h1>editing: $wikipage</h1>
	(<a href=$me?read+$wikipage>page</a>)<hr>
	<form action=$me?write+$wikipage method=POST>
	<textarea name=wikitext rows=20 cols=60>
	$(escapehtml < "$wikipage")
	</textarea><br><input type=submit>
    link) cat <<-EOF
	<html><body><h1>pages linking to: $wikipage</h1>
	(<a href=$me?read+$wikipage>page</a>)<hr>
	$(grep -l "$wikipage" $allpages | sed -e 's/^/* /' | markup)
    *) cat <<-EOF
	<html><body><h1>unknown request</h1>
	(<a href=$me?read+$homepage>home</a>)<hr>
	didn't grok <code>$QUERY_STRING</code>

Question: This server, being a toy demonstation, is not intended to be secure enough for production environments. What changes are necessary to ensure that a malicious request can't be used to execute shell commands?

marking up

Most of the work of markup is handled by an AWK program, which has the list of current WikiPages passed to it in an environment variable, PAGE.

<<define markup functions>>=
wikify() {
    PAGE="$allpages" awk '
    <<awk patterns>>'
markup() { escapehtml | tr -d "\r" | wikify; }

Unfortunately, the split function generates the reverse of the relation we want, so we invert it into the page dictionary.

<<awk patterns>>=
    BEGIN { split(ENVIRON["PAGE"],rev)
            for(r in rev) { page[rev[r]] = r } }

By keeping a listing variable (holding the proper termination tag during open lists) we can handle at most one level of list elements.

<<awk patterns>>=
    listing && $0 !~ /^[#\*]/  { print listing; listing = "" }
    $0 ~ /^# /   { if(!listing) { print "<ol>"; listing = "</ol>" } }
    $0 ~ /^\* /  { if(!listing) { print "<ul>"; listing = "</ul>" } }
    listing      { sub(/./, "<li>") }

Some wikitext markup is straightforward,

<<awk patterns>>=
    $0 == ""     { $0 = "<p>" }
    $0 == "----" { $0 = "<hr>" }

but for the WikiWords themselves, we check each field of each line, and when we find a WikiWord, we make the proper substitution depending upon whether or not it exists in the filesystem. External URLs are similar, but not handled so carefully.

<<awk patterns>>=
    { for(i=1; i<=NF; ++i) {
        if(match($i, /^[A-Z][a-z]+[A-Z][a-z]+/))	{
          wikiname = substr($i, RSTART, RLENGTH)
          if(page[wikiname]) l = "<a href='$me'?read+"wikiname">"wikiname"</a>"
          else               l = wikiname"<a href='$me'?create+"wikiname">?</a>"
          $i = l substr($i, RSTART+RLENGTH)
        } else if(match($i, /http:\/\//)) {
          $i = "<a href=\""$i"\">"$i"</a>"
      print }

Question: How could the WikiWord/URL subsitition be made in a less procedural, more declarative manner? Would a subprocess help?

Exercise: Extend the implementation to multiple listing levels (and fix the bug relating to immediately consecutive lists).


We must also provide a few definitions that, in more mainstream programming languages, would be library functions.

  • reply generates a header for the response to the browser
  • urldecode decodes text received from the browser
  • escapehtml encodes text to be sent to the browser
<<define CGI functions>>=
reply()      { printf '%s\r\n\r\n' "$1"; }
urldecode()  { echo "16i[$(sed -e 's/+/ /g;s/\%\(..\)/]P\1P[/g')]P" | dc; }
escapehtml() { sed -e 's/&/\&/g;s/</\</g;s/>/\>/g'; }

wrapping up

Finally, we provide a configuration variable me which should be set to reflect the URL under which the webserver runs this program,

define CGI functions
define markup functions
serve WikiPage

and provide the intial Wiki content:

Welcome to this bare-bones Wiki.
* click on WikiWords to enter new definitions,
* or to navigate to existing pages, like this WelcomePage.
a (very) few other features:
# numbered lists
# URLs such as are automatically linked.
# horizontal rules
