Desk calculator (Python)
From LiteratePrograms
An interpreter for a dc dialect, written in Python. It lies somewhere between a modern dc implementation and the Bell Labs 1971 version. It lacks support for input and output radix conversion and for floating point; however, it can produce arbitrary binary output, and has been confirmed to work with the following LiteratePrograms articles:
Contents |
theory
An interpeter is basically a left fold over the source code, taking an initial state into the final state by applying each command in turn.
As dc
programs are self-tokenizing
— the source is already a byte-code —
the interpeter itself is very simple.
The complications arise in
providing the semantics of
the two major dc
datatypes
- arbitrary-precision integers
- macro text / string data
practice
data structures
There are three data structures in the interpreter:
- a stack — to store already-evaluated values
- a continuation — to store yet-to-be-evaluated code
- a dictionary — to provide the abstract registers
The decisions to implement the stack and dictionary with a list and dict are rather obvious. As dc commands are (mostly) single bytes, the continuation is a list of bytes, for which we use a string.
<<data structures>>= ############################################# s,c,d = [],"",{} stack elements continuation utilities
the inner loop
The interpreter itself is equally simple: while there are source bytecodes left in the continuation, we look up the corresponding commands, and execute them.
############################################# <<interpreter>>= auxiliary commands cmds = { command table } def interp(str): cpush(str) while len(c): cmds[cpop()]()
printing (I)
For example, the codes p and f correspond to commands to print the top of the stack and the entire stack, respectively.
<<command table>>= 'p': lambda: debugpr(s[-1:]), 'f': lambda: debugpr(s), <<auxiliary commands>>= def debugpr(s): for v in s: print v
arithmetic
Arithmetic operations are a little more complicated. To implement each operation in the target language, we perform the corresponding operation in the host language, along with some explicit bookkeeping: stack manipulation and ensuring that the results are injected into our Numeric Value objects.
<<command table>>= '+': lambda: s.append(NVal(s.pop(-2) + s.pop())), '-': lambda: s.append(NVal(s.pop(-2) - s.pop())), '*': lambda: s.append(NVal(s.pop(-2) * s.pop())), '/': lambda: s.append(NVal(s.pop(-2) //s.pop())), '%': lambda: s.append(NVal(s.pop(-2) % s.pop())), '^': lambda: s.append(NVal(s.pop(-2) **s.pop())), '~': lambda: s.extend(map(NVal,divmod(s.pop(-2),s.pop()))),
Question: what should be done to make this code more robust to changes in execution order?
stack operations
The stack operations would be relatively straightforward, if it were not for the need to avoid statements in lambda expressions. clear and reverse/swap would be clearer as statements:
c → s[:]=[] r → s[-3:]=s[-1:-3:-1]
<<command table>>= 'd': lambda: s.append(s[-1]), 'c': lambda: s.__setitem__(slice( 0,None),[]), 'r': lambda: s.__setitem__(slice(-3,None),s[-1:-3:-1]),
printing (II)
The command n to print a value, as well as z, to push the current size of the stack, are straightforward. P, to print, and Z, to determine the length of a value, have different behaviors depending upon whether or not the top of the stack is a number or a string.
<<command table>>= 'n': lambda: write("%s" % s.pop()), 'P': lambda: s.pop().prval(), 'Z': lambda: s.append(NVal(s.pop().sizeof())), 'z': lambda: s.append(len(s)),
We model this polymorphism by providing two distinct classes for stack values.
As long as we have the two classes, we can do some partial evaluation: as the result of adding a number to the continuation will simply be that it is shifted to the stack, we avoid the intermediate steps and place it there to begin with.
<<stack elements>>= class SVal(str): def execute(self): cpush(self + '\0') def sizeof(self): len(self) def prval(self): write(self) class NVal(long): def execute(self): s.append(self) def sizeof(self): if self==0: return 1 else: return int(math.log10(self))+1 def prval(self): q,str = self,"" while q > 0: q,r = divmod(q,0x100) str = chr(r) + str write(str)
Question: why is our task made easier by using long integers in Python, rather than C longs?
Question: if we did not append the extra NULL byte when executing a string, this implemenation would be properly tail-recursive. Why do we not do so?'
execution control
Supra, we saw that executing a macro was as simple as pushing it onto the continuation, "scheduling" it for evaluation by the interpreter.
"Pushing" a function onto the continuation is implemented by prepending it to the string.
Likewise, popping off the code for a command and trimming a continuation (to provide an early exit or to skip over comments) are implemented with easy string manipulations.
<<continuation utilities>>= def cpop(): global c r,c=c[0],c[1:] return r def cpush(b): global c c = b+c def ctrim(upto): global c try: c = c[c.index(upto)+1:] except: sys.exit(0)
<<command table>>= '#': lambda: ctrim('\n'), 'q': lambda: (ctrim('\0'),ctrim('\0')),
Note that we have used null bytes to delimit levels of macro invocation in the continuation.
Question: how is the double exit behavior of q related to the end-entire-method behavior of ^ in Smalltalk?
Unfortunately, to support this behavior, we cannot delete the nesting information to support tail-call optimization.
Still, each tail call only adds 1 byte, so Generating all rationals (dc) runs, if not to , at least for much longer than it does in the Unix implementation.
registers
Continuations were somewhat involved; the dictionary is trivial. load and store manipulate the interpreter dict in the manner than one expects.
<<command table>>= 'l': lambda: s.append(d[cpop()]), 's': lambda: d.update([(cpop(),s.pop())]),
conditional branches
Taken conditional branches require that the macro be looked up in the dictionary, and its value executed.
<<command table>>= '>': lambda: compare('>'), '<': lambda: compare('<'), '=': lambda: compare('='), '!': lambda: compare('!'+cpop()), <<auxiliary commands>>= def compare(how): cmpfunc = { '>': lambda a,b: b > a, '!>': lambda a,b: b <= a, '<': lambda a,b: b < a, '!<': lambda a,b: b >= a, '=': lambda a,b: b == a, '!=': lambda a,b: b != a } if cmpfunc[how](s.pop(-2),s.pop()): d[cpop()].execute() else: cpop()
other control flow
In contrast, executing from the stack, or using ? to execute user input, are both straightforward.
<<command table>>= 'x': lambda: s.pop().execute(), '?': lambda: cpush(sys.stdin.readline()),
Question: what are the advantages of evaluating user input? the disadvantages?
literals
Of course, in order to operate on values, we must have some way to specify them in the first place.
Here we process numeric and string literals. One might wish to take advantage of library routines to do the lexing and value conversion, but perhaps the character-by-character implementation here better demonstrates that the two forms of literals are their own little domain specific languages, and so we interpret them with the same strategy as the outer interpreter: a left fold.
<<command table>>= '[': dostr, '_': lambda: donum(-1,0), '0': lambda: donum( 1,0), '1': lambda: donum( 1,1), '2': lambda: donum( 1,2), '3': lambda: donum( 1,3), '4': lambda: donum( 1,4), '5': lambda: donum( 1,5), '6': lambda: donum( 1,6), '7': lambda: donum( 1,7), '8': lambda: donum( 1,8), '9': lambda: donum( 1,9), <<auxiliary commands>>= def donum(sign,acc): while c[0].isdigit(): acc = 10*acc + int(cpop()) acc *= sign s.append(NVal(acc)) def dostr(): nest,acc = 1,[] while nest: c = cpop() if c == ']': nest-=1 elif c == '[': nest+=1 if not nest: break acc.append(c) s.append(SVal("".join(acc)))
Question: in theory, is it valid to say that the semantics of an algebraic construct (such as a literal value or an entire program) are given by means of a function folding over its syntax?
the remainder
All that remains is to declare whitespace to code for the "identity" operation — ie, it does nothing.
<<command table>>= ' ': nop, '\t': nop, '\r': nop, '\n': nop, '\0': nop <<auxiliary commands>>= def nop(): pass
Strictly speaking, this is not necessary,
but giving the programmer the choice of format
allows dc
code to be (a little)
less opaque.
Compare [sa d la + sr la lr p d ls >m]
with [sadla+srlalrpdls>m]
—
in a way, free whitespace
is the poor man's semantic hilighting.
wrapping up
Finally, we wrap everything up in a module
<<dc.py>>= ############################################# import sys, math write=sys.stdout.write data structures interpreter command line interface
and provide a command line interface allowing for either interactive or file input.
<<command line interface>>= ############################################# if __name__=="__main__": arg = sys.argv[1:] if not len(arg): while True: interp(sys.stdin.readline()) while len(arg): if arg[0] == '-e': str = arg[1] arg = arg[2:] elif arg[0] == '-f': arg = arg[1:] continue else: f = open(arg[0],"rb") str = f.read() f.close() arg = arg[1:] interp(str)
so, for example, one may say
dc.py -e 6581840dnP
or, having visited Merge sort (dc),
dc.py -f mergesort.dc
Download code |