Shunting yard algorithm (Perl)

From LiteratePrograms

Jump to: navigation, search
Other implementations: C | Perl | Python

In this article, we describe an implementation of the Shunting yard algorithm in Perl. The algorithm is a simple way of parsing expressions in infix notation.

In this implementation we generate a form of RPN which is readable by the dc UNIX tool.

Contents

Tokens

The first step is to split the input string containing the expression into tokens. The only token types we support are numbers and operators (+, -, *, /) and parentheses. This is easily accomplished with the split function.

<<tokens>>=
my @tokens=split(/\ *([\+\-\*\/\(\)]|\d+\.\d+|\d+) */, $expr);

Operators

We use a hash to store the operator precedences. The unary - operator is called -u to distinguish it from the binary operator with the same name.

<<operators>>=
my %prec=('-u'=>5, '*'=>4, '/'=>3, '+'=>2, '-'=>1, '('=>0, ''=>9);

Right-associative operators must be treated specially. In this example, we only have one (the unary -), and store it in another hash.

<<operators>>=
my %right=('-u'=>1);

We provide a function to easily find the operator precedence.

<<operators>>=
sub getprec {
	my ($op)=@_;
	return exists $prec{$op}?$prec{$op}:-1;
}

The way we used the split function, will leave some empty tokens. We just ignore them.

A unary operator is the first token or any operator that is preceded by another operator (not parentheses).

<<tokens-preprocessing>>=
	!$token and next;
	if($token eq '-' and getprec($last)>=0) {$token='-u';}

The parser

The shunting yard algorithm is quite simple. All numbers are added to the output stream (here represented by @rpn). Operators are pushed on a stack. Each time we reach a new operator, we pop operators from the stack until we reach one that has lower precedence. In the case of a right associative operator, we also stop if we reach an operator of the same precedence.

All popped operators are appended to the output stream.

When we reach a right parenthesis, we pop all operators until the matching left parenthesis. The parentheses are thrown away.

<<parsing>>=
# Parsing
my @op_stack;
my @rpn;
my $last="";
foreach my $token (@tokens) {
tokens-preprocessing
	if($token=~/^\d+$|^\d+\.\d+$/) {
		if($last=~/^\d+$|^\d+\.\d+$/ || $last eq ")") {
			die "Value tokens must be separated by an operator";
		}
		push(@rpn, $token);
	} elsif($token eq '(') {
		push(@op_stack, $token);
	} elsif($token eq ')') {
		while($op_stack[-1] ne '(') {
			my $t=pop(@op_stack);
			push(@rpn, $t);
		}
		pop(@op_stack) eq '(' or die "No matching (";
	} elsif((my $pr=getprec($token))>0) {
		if(exists $right{$token}) {
			while(scalar @op_stack>0 && $pr<getprec($op_stack[-1])) {
				push(@rpn, pop(@op_stack));
			}
		} else {
			while(scalar @op_stack>0 && $pr<=getprec($op_stack[-1])) {
				push(@rpn, pop(@op_stack));
			}
		}
		push(@op_stack, $token);
	} else {
		die "Unknown token: \"$token\"";
	}
	$last=$token;
}

When we have reached the end of the input stream, all remaining operators are popped and appended to the output stream.

<<parsing>>=
while(scalar @op_stack>0) {
	push(@rpn, pop(@op_stack));
}

Output

As we want to generate code for dc, we have to do some small adjustments in the output stream.

The unary - operator doesn't exist in dc, so we have to fake it by multiplying with -1, which is encoded as _1 in dc.

We also want the result to be printed, so we append the p command.

<<output>>=
foreach my $token (@rpn) {
	if($token eq '-u') {print '_1* ';}
	else {print "$token ";}
}
print "p\n";

The program

This program expects an expression on the command line. The output can be piped to dc like this: ./shunting-yard.perl '1+2*3' | dc.

<<shunting-yard.perl>>=
#!/usr/bin/env perl
use strict;
use warnings;
my ($expr)=@ARGV;
if(!$expr) {
	print "Usage $0 expression\n";
	exit 1;
}
tokens
operators
parsing
output
Download code
Views