Priority Queue (Haskell)
From LiteratePrograms
The following code implements a priority search queue in the Haskell programming language. It is chiefly intended to support a better Sieve of Eratosthenes implementation and will be focused on the functionality required for that project. It may later be expanded for more complete functionality.
Please note: this code is not a complete implementation as it is intended to act as a component to another article. As it is worked on, it will be improved, but at this time do not assume this is complete and general-purpose.
A priority search queue is a queue structure that attaches a "priority" to a value. We can treat the "priority" as a key and the whole priority queue as an ordered mapping. It is an extremely useful data type used in a wide variety of circumstances (and, indeed, the motive for this article is to provide infrastructure for an efficient Sieve of Eratosthenes implementation).
Contents |
Data representation
Before we can start to code the functions, we should put some thought into how to represent the data structure. A few moments' reflection gives us this:
<<data type declaration>>= data Ord k => PriorityQueue k a = Nil | Branch k a (PriorityQueue k a) (PriorityQueue k a)
A PriorityQueue
is a recursive data structure built as a tree. The root of the tree is the highest-priority item in the queue (or Nil
) and the branches are a traditional binary tree structure for the rest of the leaves with Nil
marking the ends. This structure gives us a reasonably efficient ordering of our priority search queue with easy semantics for pedagogical purposes. Other, more efficient, representation schemes (like balanced trees) can of course be imagined.
Note that only the key is limited to being a member of the Ord
class. This means that the ordering is strictly by the keys. The data can be of any type and data items of the same priority will be extracted in an arbitrary order.
Operations
The operations on a normal priority queue are insert
, remove
and often inspect
. These, in order, insert a new key/value pair into a queue; remove (and return) the current highest key/value pairing from a queue; return the current highest key/value pairing of a queue without removing it. This particular priority queue implementation is built for a purpose and has a slightly different interface:
<<interface declarations>>= -- Return an empty priority queue. empty -- Return the highest-priority key. minKey -- Return the highest-priority key plus its associated value. minKeyValue -- Insert a key/value pair into a queue. insert -- Delete the highest-priority key/value pair and insert a new key/value pair into the queue. deleteMinAndInsert
While this non-standard interface renders the data type less than universally usable, quick inspection of the interface's implementation shows that adding the missing functionality is a trivial exercise at best.
empty
The empty
function provides an empty tree as a basis, thus removing the need to expose the Nil
constructor (read: implementation detail) to prying eyes:
<<empty>>= empty :: Ord k => PriorityQueue k a empty = Nil
It is, in effect a mere alias for the Nil
type constructor.
minKeyValue
The minKeyValue
function is the equivalent of the standard inspect
interface. Since the priority queue is a tree rooted with the highest priority, it simply returns the key/value pair of the provided tree:
<<minKeyValue>>= minKeyValue :: Ord k => PriorityQueue k a -> (k, a) minKeyValue Nil = error "empty queue" minKeyValue (Branch k a _ _) = (k, a)
Note some of the key elements. The first line is a mere type declaration. It is largely unnecessary (the types can be easily inferred by the Haskell compiler) but generally the top-level functions of an interface are best fully-typed as a courtesy to the reader. The second line uses pattern matching on an empty root node and handles the only error condition: taking the highest-priority key/value pair of an empty queue is an undefined operation. The undefined
keyword could be used here, but a more friendly error message is likely indicated. (Remember that undefined
is a special case of error
.)
The final line is the meat of the function. It takes as its pattern a PriorityQueue
instance that is not Nil
and breaks it apart into its constituents. The left and right branches are not needed and therefore ignored (_
). The key and the value are bundled together into a tuple and returned, the queue left untouched.
minKey
The minKey
function can be implemented in a similar fashion:
minKey :: Ord k => PriorityQueue k a -> k minKey Nil = error "empty queue" minKey (Branch k _ _ _) = k
This would work, but is not a good implementation. It replicates much of the minKeyValue
functionality, which means that should the implementation of the PriorityQueue
type change, this function's modifications would have to be synchronized with the changes to minKeyValue
—a highly error-prone operation. (Consider that a very complete implementation of a priority queue could have a dozen functions similar to this with only minor tweaks here and there.) A better solution is this:
<<minKey>>= minKey :: Ord k => PriorityQueue k a -> k minKey = fst . minKeyValue
Here we leverage Haskell's nigh-unparallelled ability to combine functions. Consider that the return type for minKeyValue
is (k, a)
and that the return type for minKey
is k
alone. If there were only a function we could use to convert a tuple into just its first member!
There is such a function. It is defined in the Haskell Prelude and is thus a standard function available in all Haskell implementations. It is defined thus:
fst :: (a, b) -> a fst (x, y) = x
So instead of cutting and pasting the implementation of another function and making minor tweaks to it we can instead use function application (and the "points-free style") to call minKeyValue
and extract the only part we're interested in. This way even if the implementation of minKeyValue
ever changes dramatically, the minValue
function will never have to be modified. (Of course if the interface to the function changes, all bets are off. This is why we think about our interfaces before we write our implementations.)
Helpers
Before continuing with the interface implementations, it would be best to look at and understand the functionality of three helper functions used extensively by the remaining two interface functions:
<<helper declarations>>= -- Join two queues in sorted order. union -- Join two queues without regard to order. -- (This is a helper to the union helper.) link -- Return a queue with a single item from a key/value pair. singleton
singleton
The simplest of these is singleton
. All this function does is build a PriorityQueue
root node with one key/value pair and two Nil
branches:
<<singleton>>= singleton :: Ord k => k -> a -> PriorityQueue k a singleton k a = Branch k a Nil Nil
It is, again, simply a wrapper around a type constructor (Branch
in this case) that hides nasty implementation details from prying eyes. There is room for argument that this function belongs in the interface (to avoid slightly ugly calls like insert k a empty
), but at this point it will remain down here in the helpers. As this priority queue is expanded to go beyond its original requirements, singleton
could well be elevated into the published interface.
link
The next function we'll look at—link
—is definitely not suited to a published interface. Improper use of it can badly damage a tree. Here's what it looks like:
<<link>>= link (Branch k a Nil m) r = Branch k a r m link (Branch k a ll lr) r = Branch k a lr (union ll r)
This code links two queues together. The first pattern matches a queue entry that has no left branch along with a generic queue root. The empty left branch is filled with the passed-in queue root. The second pattern matches a filled queue entry with both branches full. It, as before, ignores the right branch and unites the second queue root with the first's left branch, switching the branches. This function is dangerous to expose because it imposes no ordering whatsoever on the merging. Since a priority queue is an ordered data structure, misuse of this function can lead to severe operational problems.
union
Notice that, as defined, link
uses another helper function in its implementation: union
. This is what union
looks like:
<<union>>= union :: Ord k => PriorityQueue k a -> PriorityQueue k a -> PriorityQueue k a union l Nil = l union Nil r = r union l@(Branch kl _ _ _) r@(Branch kr _ _ _) | kl <= kr = link l r | otherwise = link r l
It's quite a mouthful, but still easy enough to work out. The first line is its type: it takes two PriorityQueue
instances and returns a PriorityQueue
. This, combined with its name, is ample clue of what it is supposed to do.
The next two lines are pattern matches and can be read as "a union of any queue with Nil
is that queue". It's the remaining three lines that are a problem.
First, let's look at the pattern matching. A representative example is l@(Branch kl _ _ _)
. What this is doing is matching an expanded Branch
node and extracting the key value from it, calling it kl
(key, left). What it is also doing, however, is calling the entire node l
as a convenience. This way we won't have to keep tossing around code like (Branch kl al ll rl)
when we're building things. We can instead just use l
and not lose the conceptual forest for the trees.
Anyway, the net effect of the pattern is that the leftmost node of the two passed in is called l
and its key kl
. The rightmost node is called r
and its key kr
. This prepares us for the next two lines.
The next two lines contain guards -- a special kind of pattern matching expression used in function declarations, case statements and list comprehensions. In the context here you can view them as a variant of conditional expressions for case expressions; for example this function could be declared thus:
union l@(Branch kl _ _ _) r@(Branch kr _ _ _) = if kl <= kr then link l r else link r l
The guards are, in this case, more readable than a conditional expression or a case expression would be, however, so they are the most appropriate form.
The first guarded line is a pattern match on a boolean condition. It simply says "if the left key is less than or equal to the right key, link the right node into the left node". Keep in mind the implementation of link above. If the left branch of the left node is Nil
, the right node is just inserted in place. If, however, the left branch of the left node is not empty, union is recursively called on it until the larger node is inserted in the right place.
The second guarded line uses the otherwise
keyword (like the conditional else
) as a catch-all. If the right key is greater than the left key, the left node is linked into the right node's left branch.
Operations (cont.)
With the helper functions out of the way, we can finally address the remaining two interface functions.
insert
The first of these is insert
:
<<insert>>= insert :: Ord k => k -> a -> PriorityQueue k a -> PriorityQueue k a insert k a q = union (singleton k a) q
It is a simple function that just builds a singleton
from the passed-in key/value pair and immediately calls union
to insert it. There's nothing surprising here.
deleteMinAndInsert
The next function, deleteMinAndInsert
, is only a little bit more involved, this mostly because it's a compound function built to combine operations into a single event. (This is, again, provided for purposes of building a proper Sieve of Eratosthenes implementation. This will later be expanded for more general purposes.)
<<deleteMinAndInsert>>= deleteMinAndInsert :: Ord k => k -> a -> PriorityQueue k a -> PriorityQueue k a deleteMinAndInsert k a Nil = singleton k a deleteMinAndInsert k a (Branch _ _ l r) = union (insert k a l) r
The first pattern matched is performing this operation on an empty queue. It could be reasoned that this is an undefined condition and, therefore, warrants an error
call. We've chosen instead, however, to assume that the deletion is optional and that an empty queue, which by definition can have nothing deleted from it, will just get the new key/value pair inserted.
The second pattern matched is to a node with actual content. We're tossing away that node's key/value pair, so we first insert
our new key/value pair into its left tree and then call union
on the resulting new tree and the right tree.
Boilerplate
The rest of the code in the file is book-keeping only.
Module declaration
First, since we want this code to be usable in other projects, we declare it as a module:
<<module declaration>>= module PriorityQueue ( PriorityQueue, empty, minKey, minKeyValue, insert, deleteMinAndInsert ) where
Note that none of the helper functions are visible in this module's export list. This is a deliberate choice and allows us to change the implementation without the interface being noticeably changed (with the possible exception, of course, of additionally-exposed functions).
Imports
Of course we're using a few functions from the Prelude in here, so we need to import it. Were we to re-implement this structure in terms of other data structures (lists, let's say), we would, of course, have to import them here.
<<import declarations>>= import Prelude
Putting it all together
The overall file is structured like this:
<<PriorityQueue.hs>>= module declaration import declarations -- Declare the data type constructors. data type declaration -- Declare the exported interface functions. interface declarations -- Declare the private helper functions. helper declarations
Background and References
The inspiration for this article came from investigating a high-performance Sieve of Eratosthenes (SoE) for Haskell. While the usual form of this algorithm, as taught in algorithm books and introductions to Haskell, is elegant and simple and highlights some strengths of the language, it fails miserably in performance because it isn't really a proper SoE. Doing some digging, I stumbled across this argument on the Haskell-cafe mailing list. This, in turn, led me to which both highlights why the traditional "SoE" implementation isn't really an SoE and provides a set of ever-more-efficient implementations itself.
Sadly, Ms. O'Neill's paper assumes the presence of a priority queue and, further, a priority queue with some unusual operations. No Haskell environment comes out of the box with a priority queue, so one had to be provided to use her solution. A quick web search revealed this implementation, ready to be adapted to the needs of the SoE. It was quickly stripped down, reorganized, altered to suit the required interface and documented here. A key bug of the implementation (one that would lead to serious performance problems down the road) was repaired in the process.
This document has no formal proof of the qualities of a priority search queue. There is to this topic, however, available online.
Download code |