Shunting yard algorithm (Perl)
From LiteratePrograms
In this article, we describe an implementation of the Shunting yard algorithm in Perl. The algorithm is a simple way of parsing expressions in infix notation.
In this implementation we generate a form of RPN which is readable by the dc UNIX tool.
Contents |
Tokens
The first step is to split the input string containing the expression into tokens. The only token types we support are numbers and operators (+, -, *, /) and parentheses. This is easily accomplished with the split function.
<<tokens>>= my @tokens=split(/\ *([\+\-\*\/\(\)]|\d+\.\d+|\d+) */, $expr);
Operators
We use a hash to store the operator precedences. The unary - operator is called -u to distinguish it from the binary operator with the same name.
<<operators>>= my %prec=('-u'=>5, '*'=>4, '/'=>3, '+'=>2, '-'=>1, '('=>0, ''=>9);
Right-associative operators must be treated specially. In this example, we only have one (the unary -), and store it in another hash.
<<operators>>= my %right=('-u'=>1);
We provide a function to easily find the operator precedence.
<<operators>>= sub getprec { my ($op)=@_; return exists $prec{$op}?$prec{$op}:-1; }
The way we used the split function, will leave some empty tokens. We just ignore them.
A unary operator is the first token or any operator that is preceded by another operator (not parentheses).
<<tokens-preprocessing>>= !$token and next; if($token eq '-' and getprec($last)>=0) {$token='-u';}
The parser
The shunting yard algorithm is quite simple. All numbers are added to the output stream (here represented by @rpn). Operators are pushed on a stack. Each time we reach a new operator, we pop operators from the stack until we reach one that has lower precedence. In the case of a right associative operator, we also stop if we reach an operator of the same precedence.
All popped operators are appended to the output stream.
When we reach a right parenthesis, we pop all operators until the matching left parenthesis. The parentheses are thrown away.
<<parsing>>= # Parsing my @op_stack; my @rpn; my $last=""; foreach my $token (@tokens) { tokens-preprocessing if($token=~/^\d+$|^\d+\.\d+$/) { if($last=~/^\d+$|^\d+\.\d+$/ || $last eq ")") { die "Value tokens must be separated by an operator"; } push(@rpn, $token); } elsif($token eq '(') { push(@op_stack, $token); } elsif($token eq ')') { while($op_stack[-1] ne '(') { my $t=pop(@op_stack); push(@rpn, $t); } pop(@op_stack) eq '(' or die "No matching ("; } elsif((my $pr=getprec($token))>0) { if(exists $right{$token}) { while(scalar @op_stack>0 && $pr<getprec($op_stack[-1])) { push(@rpn, pop(@op_stack)); } } else { while(scalar @op_stack>0 && $pr<=getprec($op_stack[-1])) { push(@rpn, pop(@op_stack)); } } push(@op_stack, $token); } else { die "Unknown token: \"$token\""; } $last=$token; }
When we have reached the end of the input stream, all remaining operators are popped and appended to the output stream.
<<parsing>>= while(scalar @op_stack>0) { push(@rpn, pop(@op_stack)); }
Output
As we want to generate code for dc, we have to do some small adjustments in the output stream.
The unary - operator doesn't exist in dc, so we have to fake it by multiplying with -1, which is encoded as _1 in dc.
We also want the result to be printed, so we append the p command.
<<output>>= foreach my $token (@rpn) { if($token eq '-u') {print '_1* ';} else {print "$token ";} } print "p\n";
The program
This program expects an expression on the command line. The output can be piped to dc like this: ./shunting-yard.perl '1+2*3' | dc
.
<<shunting-yard.perl>>= #!/usr/bin/env perl use strict; use warnings; my ($expr)=@ARGV; if(!$expr) { print "Usage $0 expression\n"; exit 1; } tokens operators parsing output
Download code |