Literate Programming Tool (Python)
From LiteratePrograms
Contents |
AsciiLitProg : A Literate Programming Tool
This is a simple literate programming tool that uses AsciiDoc it documentation language. It was developed by Ed Keith and lives at http://developer.berlios.de/projects/asciilitprog/.
- Note: another literate programming tool written in Python and supporting ReST as documentation markup is pyWeb http://sourceforge.net/projects/pywebtool/
Motivation
There are many literate programming tools, why write another? Because IMHO none of the existing tools, except for eWeb (http://eweb.sourceforge.net/), use a mark up language as powerful and flexible and most importantly, readable as AcsiiDoc (http://www.methods.co.nz/asciidoc/). I had done quite a bit of work on AsciiLitProg before I heard of eWeb. After looking at eWeb I determined that it did not have the power and flexibility I wanted, so I continued working on AsciiLitProg.
Design Requirements
AsciiLitProg is designed to be extremely flexible and powerful.
The document is written in pure AsciiDoc, so there is no need for any additional program to pre-process the documentation. It might be desirable at some point to configure the Source Code Highlight Filter to handle chunks.
AsciiLitProg is designed to work with any programming language. The chunk tags are delimited with a set of four angle braces. Four angle brackets were chosen because it was thought to be unlikely that this would ever appear in source code.
<<define tag marker patterns>>= tag_start_pattern = '<{4}' tag_end_pattern = '>{4}'
Chunks can only be defined inside code blocks or special filter blocks. But multiple chunks can be defined in a single block. At this time the definition of a chunk must appear at the beginning of a line. The closing angle bracket can have only one optional plus sign between it and a equal sign. At this time if two chunks with the same name are defined the second will overwrite the first unless a plus sign preceding the equals sign then the new definition is appended to the end of the existing definition.
A chunk with a name that starts and ends with an asterisk represents a source code file, the chunk will be expanded into a file whose name appears between the asterisks. If the name is blank it will be expanded to the standard output.
The definition of a chunk ends at the end of the containing block or when a new chunk is defined. Any blank lines at the end of a chunk definition are ignored.
Chunks can be referenced anywhere inside the definition of a chunk. When a chunk is expanded the first non blank line of the chunk is inserted at starting at the start of the referencing tag. Additional lines are prepended with a new line and the same amount of white space as precedes the tag referencing the chunk. Nothing is appended after the last line. All tabs are expanded to spaces.
Implementation
AsciiLitProg is implemented in Python, just like AsciiDoc. It is structured
like most python programs.
<<asciilitprog.py>>= #!/usr/bin/env python """ Literate programming tool for use with AsciiDoc. Copyright (C) 2009 Ed Keith Free use of this software is granted under the terms of the GNU General Public License (GPL). """ __version__ = "1.0.0a1" __author__ = "Ed Keith" __copyright__ = "Copyright 2009, Ed Keith" __license__ = "GPL" __email__ = "edk@users.berlios.de" __status__ = "Alpha" __date__ = "16 December 2009" imports support functions define tag marker patterns do real work main function if __name__ == "__main__": main()
The Real Work
The real work is done in two phases
* Collecting the chunks * Assembling the files.
<<do real work>>= collect all the chunks from the document assemble the output files
Collect The Chunks
First all the chunks are collected from the document.
The first thing that most be done is to find the start of a block that is allowed to contain chunk definitions. By default a line consisting only of four or more dashes delimits a block. The number necessary is configurable in AsciiDoc, I may make it configurable here at some time in the future, but it is not at this time. The problem is that a line of dashes can also indicate a first level section header. To distinguish this a line of dashes will not be considered to start a block if it is within plus or minus three characters as long as the line that precedes it.
<<collect all the chunks from the document>>= support for collecting the chunks def extract_chunks(ins, outs): """ The input parameter is an open stream from which the document is read. Returns a dictionary of chunks each made up of a list of the lines of text in the chunk. """ define regex's needed for extraction main loop return chunks
A block of text is delimited by a line of at least four dashes starting in the first column, followed by nothing but white space.
<<define regex's needed for extraction>>= block_delimiter = re.compile('^(-{4,})\s*$')
The start of a Chunk definition must start at the first column.
<<define regex's needed for extraction>>= start_chunk_definition = re.compile('^' + tag_start_pattern + '(?P<name>[^<>]+)' + tag_end_pattern + '(?P<mod>\+?)=(?P<value>.*)')
The main reading loop
<<main loop>>= preceding_line_length = 0 cur_chunk_name = "" cur_chunk_value = [] chunks = {} mod = '' in_code_block = False while True: line = ins.readline() logger.debug("processing line %s", line) if not line: break m = block_delimiter.match(line) if in_code_block: if m: processing and of code block else: # in a code block c = start_chunk_definition.match(line) if c: # starting a new chunk processing start of new chunk else: cur_chunk_value.append(line.rstrip()) elif m and ((preceding_line_length == 0) or (abs(len(m.group(1)) - preceding_line_length) > 4)): logger.debug("entering code block") in_code_block = True cur_chunk_name = '' cur_chunk_value = [] mod = '' preceding_line_length = len(line.rstrip())
Before processing the new chunk the current chunk needs to be saved.
<<processing start of new chunk>>= logger.debug('starting ney chunk named "%s"', c.group('name')) if cur_chunk_name != "": add_chunk(chunks, cur_chunk_name, cur_chunk_value, mod) cur_chunk_name = '' cur_chunk_value = [] mod = '' cur_chunk_name = c.group('name') cur_chunk_value.append(c.group('value').rstrip()) mod = c.group('mod')
At the end of a code block we store the chunk being processed.
<<processing and of code block>>= logger.debug("Leaving code block") in_code_block = False if cur_chunk_name != "": add_chunk(chunks, cur_chunk_name, cur_chunk_value, mod) cur_chunk_name = '' cur_chunk_value = [] mod = ''
A couple of support functions are needed to assemble the chunks.
<<support for collecting the chunks>>= def index_of_last(seq, f): """Return the index of the last item in seq where f(item) == True.""" return next((i for i in xrange(len(seq)-1,-1,-1) if f(seq[i])), None) def add_chunk(chunks, name, value, mod): """ adds a chunk to the collection of chunks Trims trailing blank lines before storing. """ # Get index of last non empty line i = index_of_last(value, (lambda x: x.strip() != '') ) v = value[:i+1] logger.debug( "Adding chunk named %(name)s " + "with value of '%(value)s' to collection in mod %(mod)s", {'name':name, 'value':v, 'mod':mod}) if mod == '+': chunks[name] = chunks[name] + v else: chunks[name] = v
Assemble The Files
One we have all them chunks we assemble them into the output files.
<<assemble the output files>>= def assemble_chunk(chunks, chunk_name, indent_level): """ returns a string made of all the lines of the named chunk. It starts with the beginning of the first line of the chunk(nothing is prepended) Before each successive line a new line and indent_level spaces are inserted. """ logger.debug("entering assemble_chunk") logger.debug("Keys in chunks = '%s'", "".join(chunks.keys())) logger.debug("name = '%s'", chunk_name) logger.debug("indent = %d", indent_level) logger.debug( "Assembling chunk named %(name)s " + "with value of '%(value)s' to collection", {'name':chunk_name, 'value':chunks[chunk_name]}) s = '' tag_pattern = re.compile(tag_start_pattern + '(.+?)' + tag_end_pattern) for c in chunks[chunk_name]: logger.debug('adding "%s" to "%s"', c, s) ce = c.rstrip().expandtabs(tab_stop) padding = len(ce) - len(ce.lstrip()) logger.debug('new padding = %d', padding) tags = tag_pattern.findall(ce) for t in tags: p = '^(.*)' + tag_start_pattern + t + tag_end_pattern + '(.*)$' expansion = assemble_chunk(chunks, t, (indent_level + padding)) logger.debug("chunk expanded to '%s'", expansion) # re.sub can not be used because every '\n' # would be translated into a new line. # ce = re.sub(p, expansion, ce) m = re.match(p, ce) ce = m.group(1) + expansion + m.group(2) if s == '': s = ce else: s = s + "\n" + ''.ljust(indent_level) + ce logger.debug("assemble_chunk is returning '%s'", s) return s def write_files(chunks): """ Creates the output files by assembling the chunks The contents of the chunked named '*' will be written to stdout. """ logger.debug('writing files') # import re r = re.compile('\*.+\*') # [s[1:-1] for s in l if (s[:1] == s[-1:] == '*')] for n in [x for x in chunks.keys() if x.startswith("*") and x.endswith("*")]: logger.debug('Writing %s', n) # open the file for writing text c = assemble_chunk(chunks, n, 0) logger.debug("writing '%s' to %s", c , n[1:-1]) if n[1 : -1] == '': f = sys.stdout else: f = open(n[1 : -1], 'wt') f.write(c) f.close()
The Main Entry Point
The main function ties it all together. It does all the setup and breakdown
needed. It makes sure the logger is setup, the command line options are
collected, and the input stream is open, and closes it when finished.
<<main function>>= def main(): """ main entry point. Makes sure the input and output streams are open. Calls the procedure to process the file. Closes the files if necessary. """ import sys # needed for stdin setup_logging() options = get_options() if options.rfilename == "": i = sys.stdin logger.info("Reading form console") else: i=open(options.rfilename, "rt") logger.info("reading from %s", options.rfilename) chunks = extract_chunks(i, sys.stdout) logger.debug("***** All Cunks Extracted *****") logger.debug("Extracted chunks = %s", chunks) # close the files, if we opened them if i != sys.stdin: i.close() global tab_stop tab_stop = options.tab_stop write_files(chunks)
The Supporting Cast
The logging and re modules or used through out the program.
<<imports>>= import re import logging
There are support functions to set up the logger and get the command line options.
<<support functions>>= set up logging configure command line options
Command Line Options
It uses the standard OptionParser.
<<configure command line options>>= def get_options(): """ gets options from the command line and environment. """ from optparse import OptionParser logger.debug('getting options') usage = "usage: %prog [options] [input file]" + \ "if no file is given input is read from stdin." version="%prog " + __version__ + " - " + __date__ parser = OptionParser(usage=usage, version=version) add options (options, args) = parser.parse_args() process the logging option process the arguments logger.debug('done reading options %s', options) return options
You can set the logging level from the command line.
<<add options>>= parser.add_option("-l", "--logging", dest="log_level", default= -1, help= "logging level, CRITICAL, ERROR, WARNING, INFO, or DEBUG")
The logging option requires some setup.
<<process the logging option>>= if options.log_level in ("CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG") : ll = {"CRITICAL" : logging.CRITICAL, "ERROR" : logging.ERROR, "WARNING" : logging.WARNING, "INFO" : logging.INFO, "DEBUG" : logging.DEBUG} logger.setLevel(ll[options.log_level]) elif options.log_level != -1: logger.warning("Logging Level set to illegal value '%s'", options.log_level)
The size of tabs used in tab expansion can also be set on the command line.
<<add options>>= parser.add_option("-t", "--tabs", dest="tab_stop", default= 8, help= "set tab stops for tab expansion")
Process the arguments to get the input document name.
<<process the arguments>>= if len(args) == 1: options.rfilename = args[0] elif len(args) == 0: options.rfilename = "" else : options.rfilename = args[0] logger.warning("Wrong number of argumants")
Logging
The logging set up is straight forward.
<<set up logging>>= _logging_level = logging.INFO # should be logging.INFO for released code def setup_logging(): """ Sets up the logger """ import ConfigParser # change this to configparser for Python 3 import logging.config global logger try: logging.config.fileConfig("apllog.conf") except ConfigParser.NoSectionError: # if there is no configuration file setup a default configuration logging.basicConfig(filename='asciilitprog.log',level= _logging_level, format='%(asctime)s %(levelname)s - %(message)s', datefmt='%Y %b %d, %a %H:%M:%S' ) logger = logging.getLogger('%s' % __name__) logger.debug('logger ready')
Download code |