Quicksort (Haskell)
From LiteratePrograms
- Other implementations: AWK | C | C++ | Eiffel | Erlang | Forth | Haskell | Java | JavaScript | Mathematica | Mercury | Oz | Python | Python, arrays | Scala | Sed | Standard ML | Visual Basic .NET | XProc
We describe several different implementations of the quicksort algorithm. All of these implementations operate on lists. Therefore, they avoid use of the randomized pivot techniques found in, for example, the C implementation of quicksort, since access to a random element of a list takes O(n) time.
Contents |
Implementation
A one liner using list comprehensions
The implementation of quick sort in one line. We exploit the fact that a pattern mismatch in list comprehensions results in the omission of that result.
qsortOneLine xs = concat [qsortOneLine [y | y <- tail xs, y < x] ++ x : qsortOneLine [y | y <- tail xs, y >= x] | x <- take 1 xs]
Using list comprehensions
The classic presentation of quicksort in Haskell is a direct implementation using list comprehensions. It generates two lists, one of elements greater than or equal to the "pivot" element (in this case the first element of the list), and one of elements less than the pivot. These two lists are then recursively sorted, before being concatenated around the pivot to form the final sorted list.
<<qsort1>>= qsort1 :: Ord a => [a] -> [a] qsort1 [] = [] qsort1 (p:xs) = qsort1 lesser ++ [p] ++ qsort1 greater where lesser = filter (< p) xs greater = filter (>= p) xs
This implementation has the advantage of being very clear and easy to understand, yet quite compact. However, it sacrifices some performance in order to achieve its remarkable clarity.
Using a partitioning function
One problem with the list comprehension implementation is that it makes two passes through the list to generate the lesser
and greater
lists. An alternative approach is to partition the list in a single pass. One way to accomplish this is through the standard List
module's partition
function, which splits a list based upon some predicate. We have instead elected to implement our own partitioning function. This choice was made partly for pedagogical reasons, but also because our custom partitioning function can accumulate list items that are equal to the pivot, as well as those that are lesser and greater than the pivot. This eliminates the need to process more than once any items that are equal to the pivot.
The new implementation looks essentially the same as the earlier list comprehension implementation, except that the lesser
and greater
lists (along with equal
) are now generated by the part
function.
<<qsort2>>= qsort2 :: Ord a => [a] -> [a] qsort2 [] = [] qsort2 (x:xs) = qsort2 lesser ++ equal ++ qsort2 greater where (lesser,equal,greater) = part x xs ([],[x],[]) partition function
The partitioning function is a straightforward recursion. The base case has an empty list in its second argument, which corresponds to the list to be partitioned. When the list being partitioned is non-empty, we compare the head of the list to the pivot, and add the head to the appropriate list (lesser, equal, or greater) based on the outcome of these comparisons.
<<partition function>>= part :: Ord a => a -> [a] -> ([a],[a],[a]) -> ([a],[a],[a]) part _ [] (l,e,g) = (l,e,g) part p (x:xs) (l,e,g) | x > p = part p xs (l,e,x:g) | x < p = part p xs (x:l,e,g) | otherwise = part p xs (l,x:e,g)
Since the part
function performs only a single pass through the list to create the the partitions, this implementation should be faster than the version using list comprehensions.
Using an accumulator
The qsort2
implementation of quicksort is more efficient than the naive implementation, but still involves many costly list traversals to concatenate the sorted list segments together. It also stores lots of sorted list segments as separate lists, which increases the amount of storage required by the implementation. A more efficient alternative is to add an accumulator to the quicksort implementation. The accumulator is a list which is used to accumulate (hence the name) sorted list elements. The qsort3
implementation initializes the accumulator to the empty list, and calls the auxiliary qsort3'
function to perform the actual sorting.
<<qsort3>>= qsort3 :: Ord a => [a] -> [a] qsort3 x = qsort3' x [] qsort3'
The qsort3'
function is the workhorse of this implementation. This function has three possible cases. If it has run out of unsorted list elements, it returns the accumulated list of sorted elements.
<<qsort3'>>= qsort3' [] y = y
If there is only one element left in the unsorted list it is simply added to the front of the accumulator list.
<<qsort3'>>= qsort3' [x] y = x:y
If there is more than one element in the unsorted list, the first element is pulled off to act as a pivot for partitioning the unsorted list. The partitioning function resembles the approach to partitioning used for the qsort2
implementation. However, unlike the previous partitioning function, when the qsort3'
partitioning function finishes partitioning the list it recursively calls qsort3'
on the "greater" portion of the list and the current value of the accumulator. The result is a sorted list of elements greater than the pivot, to which the list of elements equal to the pivot is concatenated. This new list becomes the accumulator for a recursive call of qsort3'
on the "lesser" portion of the partitioned list.
<<qsort3'>>= qsort3' (x:xs) y = part xs [] [x] [] where part [] l e g = qsort3' l (e ++ (qsort3' g y)) part (z:zs) l e g | z > x = part zs l e (z:g) | z < x = part zs (z:l) e g | otherwise = part zs l (z:e) g
The result of using the accumulator is that the list traversals required for segment concatenation in qsort2
are eliminated. Instead, the sorted list is assembled as the sorting process takes place. The only concatenation that is required involves the (hopefully short) list of items equal to the pivot.
This approach is roughly similar to that used by the quicksort implementation that was provided with the GHC Haskell compiler up until May 2002. However, the version shown here is not stable. The GHC quicksort was more generic (it could use other ordering relations than <), and included a "reverse quicksort" which took account of the way that items in the partition lists have their order reversed in order to ensure stable sorting.
Testing
The complete qsort.hs
source includes a test driver that allows the quicksort implementations to be tested on randomly generated lists.
<<qsort.hs>>= module Main where imports qsort1 qsort2 qsort3 test driver
The test driver parses the command line arguments, using the first argument to determine the length of the randomly generated list to be generated, and the second argument to determine which sorting implementation to test.
<<main>>= main :: IO () main = do (x:xs) <- getArgs let listlen = read x :: Int q = selectSort xs testQsort q listlen
The testable sorting implementations are identified by number, and consist of the three implementations described in this article as well as the default List.sort
<<select sorting implementation>>= selectSort ["1"] = qsort1 selectSort ["2"] = qsort2 selectSort ["3"] = qsort3 selectSort _ = List.sort
Testing itself involves generating a list of random integers using the randomRs
library function, sorting that list, and then printing the last element of the sorted list. The last step in this sequence is intended to ensure complete evaluation of the sorted list.
<<test driver>>= -- Testing randSeq n gen = take n (randomRs (0,n) gen) testQsort q n = do gen <- getStdGen print (last (q (randSeq n gen))) select sorting implementation main
Several library imports are required to enable testing. The System
library provides access to the command line arguments. The List
library supplies the standard sorting implementation. The Random
library provides for random number generation.
<<imports>>= import System import List( sort ) import Random
As an example, the invocation of qsort.hs
to test the naive quicksort implementation on a 1000 element list would be achieved by typing
qsort 1 1000
at the command line.
Functionality
The functionality of the quicksort implementations is most easily tested by loading qsort.hs
into an interactive Haskell environment, such as GHCi or Hugs. Here's an example of a few tests, and their results:
*Main> let testlist = [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9,3] *Main> qsort1 testlist [1,1,2,3,3,3,4,5,5,5,6,7,8,9,9,9] *Main> qsort2 testlist [1,1,2,3,3,3,4,5,5,5,6,7,8,9,9,9] *Main> qsort3 testlist [1,1,2,3,3,3,4,5,5,5,6,7,8,9,9,9]
*Main> let testlist = ["bob","alice","barry","zoe","charlotte","fred"] *Main> qsort1 testlist ["alice","barry","bob","charlotte","fred","zoe"] *Main> qsort2 testlist ["alice","barry","bob","charlotte","fred","zoe"] *Main> qsort3 testlist ["alice","barry","bob","charlotte","fred","zoe"]
So, our sorting functions actually sort. But how quickly? Let's take a look at the performance of the different quicksort implementations.
Performance
The performance statistics shown in the table below were collected using GHC's profiling capabilities. Each implementation was run several times on a randomly-generated 100,000-element list of integers, and the resulting profile data was aggregated. The tests were carried out using GHC 6.4.1 on a 1.5GHz Apple Powerbook with 1GB of RAM.
Implementation | Average Time (s) | Average Allocation (MB) |
---|---|---|
qsort1 | 2.53 | 139 |
qsort2 | 2.03 | 139 |
qsort3 | 1.64 | 83 |
GHC List.sort | 2.14 | 107 |
There are several interesting observations that we can make about this table:
- The move from list comprehension to a single pass partitioning function does provide a slight improvement in performance.
- The inclusion of an accumulator significantly decreases the amount of storage required, while also yielding an improvement in execution time (presumably as a result of eliminating expensive list concatenations).
- The standard GHC
sort
(which is a mergesort) does not perform quite as well in terms of time and space as theqsort3
implementation. It seems likely that the slightly lower performance of GHC'ssort
in this case can be attributed to the extra work it does to provide a stable sort (something thatqsort3
does not do).
In addition to comparing the performance of different implementations, it is interesting to consider the performance of a single implementation applied to lists of different lengths. The plot below shows how the execution time of the qsort3
implementation increases with increasing list length. As predicted by theory, the execution time is approximately O(nlogn), with a constant factor of approximately in this case.
The biggest performance issue with these functional versions of QuickSort is that their worst case performance comes when the lists are already sorted, because they select the pivot from the beginning of the list. (This is not demonstrated in the tests.) A more robust pivot selection algorithm could make the code more complicated, and could make the algorithm take longer in the average case, increasing the constant factor becuase of the linear time required to traverse to other locations in the list.
Parting thoughts
Quicksort was designed to operate on arrays, and works very well on them. However, quicksort is somewhat less suitable for sorting lists, since many of the techniques available for improving the performance of the algorithm rely on random access. Indeed, the sample implementation of List.sort
in the Haskell specification is a version of the insertion sort, rather than quicksort. The Haskell specification does permit implementations to use any sort algorithm that has the same semantics as the sample List.sort
, and at one point the GHC compiler and the Hugs interpreter implemented sort
as a stable version of quicksort. But both have since replaced it with a merge sort.
Download code |