Turing machine simulator (Sed)

From LiteratePrograms

Jump to: navigation, search
Other implementations: C | C++ | C++ | Haskell | Java | LaTeX | OCaml | Ruby | Scheme | Sed | Unlambda

The sed tool allows to do manipulations on text files. The sed language consists mainly of commands to search and replace text, but also branching commands. Together with conditional execution based on "address ranges" which also can be regular expressions, the sed language is turing-complete. The obvious way to see that is to implement a turing machine simulator in sed.

This is an implementation of a turing machine running a three state busy beaver, as described in [1].

This code was successfully tested with GNU sed version 4.1.4 on Linux.


Definition of the turing machine

A turing machine consists of several parts:

  • A tape which is infinite on one or both sides, and at each position contains one of a finite set of symbols.
  • A read/write head which can move along the tape one position at a time, read the symbol at the current position, and replace that symbol with another symbol.
  • A state machine, which controls the read-write head according to a table defining the actual program.

The state machine's table contains for each combination of current state and current symbol at the head's position

  • what symbol to write at that position (replacing the current symbol)
  • in which direction (left or right) to move the head afterwards
  • the next state of the state machine

There's a special "halt" state, which causes the turing machine to stop operation. Usually that halt state is not counted, i.e. an n-state turing machine has actually n+1 states: n "working" states and the halt state.

Definition of the busy beaver

A busy beaver is turing machine with the following properties:

  • It has exactly two symbols. One of them may be called "blank" or "zero", the other symbol may be called "one".
  • It runs on a tape which is infinite on both sides.
  • If started on an empty tape (i.e. one which is filled with the "blank" symbol), it stops after a finite number of steps. After that, the tape contains as many ones as possible.

Implementation in sed

The sed implementation works directly on the visible representation of the tape.

The tape and head position is represented as follows:

  • Of course, only a finite part of the tape can be represented; this part contains the current head position and all non-blank positions. That is, the part of the tape not shown is explicitly empty.
  • At the head position, a zero is represented by "o", and a one is represented by "I".
  • At all other shown positions, a zero is represented by "*", and a one is represented by "|".
  • During the operation, at least one position left and right of the head is shown.

The implementation takes a valid tape representation in the first line of input, and runs the turing machine on it. It also accepts representations where the head is at the end of the visible tape; in that case, before running the turing machine the visible tape is extended to include the next (blank) symbol left and/or right of it. Especially, on input the empty tape can be represented with a single "o" symbolizing the zero below the head.

The sed program outputs for each step the current state of the state machine, followed by the current tape. The halt state is denoted by the letter "=".

All input lines after the first one are ignored.

Extending the visible part of the tape

The visible part of the tape has to be extended if the head happens to be at the left or right end of the currently visible part of the tape.

If the head is at the beginning of the visible part of the tape, we have to add a star left of it. This can be easily done with a regular expression replacement: An "o" (head above zero) or "I" (head above one) at the beginning (i.e. left border) of the tape representation is replaced by "*o" resp. "*I".

<<left extend visible tape>>=

The analog happens for the right hand side, where a star is added after the head's position.

<<right extend visible tape>>=

The head operations

There are four operations, the head can perform (actually five, but reading will be dealt with separately). The first one is writing a zero at the current position. Note that this has only an effect if the current position actually contains an one. Thus writing a zero actually means replacing a one at the head's position by a zero, or in the representation, replacing "I" with "o".

<<write 0>>=
# write 0

Correspondingly, writing an one at the current position just means replacing "o" with "I".

<<write 1>>=
# write 1

Moving left is a bit more complicated, because we have four different possibilities: Both the symbol at the current position and the symbol left of it may be independently zero or one. Also, after moving the head, we might have reached the left side of the visible tape, so we may have to extend it to the left. Note that the "*" has to be escaped as "\*" in the regular expressions.

<<move left>>=
# move left
left extend visible tape
<<move right>>=
# move right
right extend visible tape

The main program

At the beginning of the program the input tape is tested if it is valid. This is done by using a regular expression describing valid input tapes as address for a branch instruction jumping to an error label.

A valid input tape consists of exactly one head symbol ("o" or "I"), preceded and followed by any number (including zero) of non-head symbols ("*" or "|").

# test for valid tape:
/^[*|]*[oI][*|]*$/! b error

In the input tape, the head may be at the border of the visible part. During the algorithm, this is not allowed. Therefore the tape is extended on both sides, if necessary.

# extend visible tape
s/^o/\*o/; s/^I/\*I/; s/o$/o\*/; s/I$/I\*/

The next part is the state machine. At the beginning of each state, the current state and tape is output. This is done by copying the pattern space (holding the current tape) to the hold space, prepending the current state's name to the pattern space, output of the pattern space, and finally restoring the pattern space from the hold space.

: A
# print current state and tape
s/^/A /

Next, the actual state machine part is implemented. The condition on the symbol at the head is implemented by using a regular expression testing for the symbol under the head as address of a compound statement.

/o/ {
  write 1
  move right
  # next state: B
  b B
/I/ {
  write 1
  move left
  # next state: C
  b C

Since the validity of the tape was checked in advance, execution should never pass to this point (one of the regexp addresses must have triggered, and both blocks end with a branch instruction). However, if a program bug causes us to get here anyway, we better immediatly exit with an error.

b error

States B and C are analog.

: B
# print current state and tape
s/^/B /
/o/ {
  write 1
  move left
  # next state: A
  b A
/I/ {
  write 1
  move right
  # next state: B
  b B
b error
: C
# print current state and tape
s/^/C /
/o/ {
  write 1
  move left
  # next state: B
  b B
/I/ {
  write 1
  move right
  # next state: halt
  b halt
b error

The halt state is different in that we only print the current state and tape, and then immediatly exit. Thus we don't have to save the current tape in the hold space, and since on exit the pattern space is output anyway, we also don't need to explicitly output that, too. So all we have to do is to prepend the "name" of the halt state ("="), and exit with an exit code of 0 (no error occured).

: halt
s/^/= /
q 0

If an error occured, we prepend an error message to the current "tape", and exit with a non-zero return code. Again, the exit causes the error message to be output.

: error
# display error
s/^/error: invalid tape: /
q 1

Running the code

To run the code, one can use the command

sed -f turing.sed

and then type in the initial tape, followed by Enter. Alternatively one can pipe the output of echo into the sed command. For example, the following command runs the turing machine on an empty tape, thus properly demonstrating the busy beaver:

echo o | sed -f turing.sed

This gives the following output:

A *o*
B *|o*
A *I|*
C *o||*
B *o|||*
A *o||||*
B *|I|||*
B *||I||*
B *|||I|*
B *||||I*
B *|||||o*
A *||||I|*
C *|||I||*
= *||||I|*

Since it is hard to follow the head movements with the auto-extended tape, it's a good idea to add enough blank symbols right from the beginning (note the quoting to prevent the shell from interpreting the stars as filename globs):

echo '*****o*****' | sed -f turing.sed


A *****o*****
B *****|o****
A *****I|****
C ****o||****
B ***o|||****
A **o||||****
B **|I|||****
B **||I||****
B **|||I|****
B **||||I****
B **|||||o***
A **||||I|***
C **|||I||***
= **||||I|***

To see what happens if the turing machine is started on a non-blank tape, e.g. the following command can be used:

echo '*|I' | sed -f turing.sed

This gives the output

A *|I*
C *I|*
= *|I*
Download code