Quicksort (Python)
From LiteratePrograms
- Other implementations: AWK | C | C++ | Eiffel | Erlang | Forth | Haskell | Java | JavaScript | Mathematica | Mercury | Oz | Python | Python, arrays | Scala | Sed | Standard ML | Visual Basic .NET | XProc
We describe several different Python implementations of the quicksort algorithm. All of these implementations operate on lists. For alternative implementations based on Python's array
standard library module see Quicksort (Python, arrays).
Contents |
Implementation
Using list comprehensions
The most straightforward way to implement quicksort in Python is to use list comprehensions. This approach generates two lists, one of elements greater than or equal to the "pivot" element (in this case the first element of the list), and one of elements less than the pivot. These two lists are then recursively sorted, before being concatenated around the pivot to form the final sorted list.
<<qsort1>>= def qsort1(list): """ Quicksort using list comprehensions >>> qsort1<<docstring test numeric input>> <<docstring test numeric output>> >>> qsort1<<docstring test string input>> <<docstring test string output>> """ if list == []: return [] else: pivot = list[0] lesser = qsort1([x for x in list[1:] if x < pivot]) greater = qsort1([x for x in list[1:] if x >= pivot]) return lesser + [pivot] + greater
This implementation has the advantage of being clear and easy to understand, yet quite compact. As it turns out, it is also faster than other list-based quicksorts in Python.
In one line of code
The implementation above can be made even more compact by combining Python's predicate if (return x if y else z
) with its sequence slicing capability (e.g. list[0], list[1:]
) to produce a version of quicksort in just one-line:
<<qsortr>>= def qsortr(list): return [] if list==[] else qsortr([x for x in list[1:] if x < list[0]]) + [list[0]] + qsortr([x for x in list[1:] if x >= list[0]])
The initial element of the list, list[0]
becomes the pivot variable around which the remainder of the list list[1:]
is filtered. When the list selected reaches size 0, the value []
(empty list) is returned and the predicate else is not invoked.
While the "one line of code" version may minimize the text of the algorithm it isn't exactly efficient: the list is fully iterated and filtered twice. It would be nice if the iteration and filtering could be done just once. This can be achieved using a partitioning function.
Using a partitioning function
An alternative to using list comprehensions is to partition the list in a single pass using a partitioning function that accumulates list items that are equal to the pivot, as well as those that are lesser and greater than the pivot. This also eliminates the need to process more than once any items that are equal to the pivot. In theory, this should result in a faster sort than the list comprehension version. However in practice it appears that the list comprehension implementation is not only faster than a single-pass partitioning approach, but approaches the performance of an in-place sort (see Quicksort (Python, arrays)).
The new implementation looks essentially the same as the earlier list comprehension implementation, except that the lesser
and greater
lists (along with equal
) are now generated by the partition
function.
<<qsort2>>= partition function def qsort2(list): """ Quicksort using a partitioning function >>> qsort2<<docstring test numeric input>> <<docstring test numeric output>> >>> qsort2<<docstring test string input>> <<docstring test string output>> """ if list == []: return [] else: pivot = list[0] lesser, equal, greater = partition(list[1:], [], [pivot], []) return qsort2(lesser) + equal + qsort2(greater)
The partitioning function can be thought of as a straightforward recursion. The base case has an empty list in its first argument, which corresponds to the list to be partitioned. When the list being partitioned is non-empty, we compare the head of the list to the pivot, and add the head to the appropriate list (lesser, equal, or greater) based on the outcome of these comparisons.
<<partition function (recursive)>>= def partition(list, l, e, g): if list == []: return (l, e, g) else: head = list[0] if head < e[0]: return partition(list[1:], l + [head], e, g) elif head > e[0]: return partition(list[1:], l, e, g + [head]) else: return partition(list[1:], l, e + [head], g)
Although this is an intuitively appealing way to perform list partitioning, Python's lack of support for recursion elimination causes the recursive implementation to suffer from linear stack usage, which is not acceptable for large lists. An iterative implementation (i.e. a manual tail call optimization) solves this problem:
<<partition function>>= def partition(list, l, e, g): while list != []: head = list.pop(0) if head < e[0]: l = [head] + l elif head > e[0]: g = [head] + g else: e = [head] + e return (l, e, g)
Adding a randomized pivot
One drawback of the implementations we've presented so far is that, while they perform well on random data, they require excessive time and spaces resources on data that is already sorted or nearly-sorted. One way to improve the robustness of the quicksort in the face of this kind of data is to randomly select a pivot, instead of just making the pivot the head of the list. Fortunately, Python supports relatively fast random access to individual list elements, which makes such a pivoting scheme feasible.
The qsort1a
function is essentially the same as qsort1
, except that it implements randomized pivots. We make use of the pop
operation to remove our chosen pivot. This has the unfortunate side effect of mutating the original list that was passed to the sort function. To prevent this unwanted mutation, we wrap the actual sorting operations in an internally defined function, qsort
, to which we pass a copy of the original list. Based on timing measurements, this layer of indirection does not appear to impose any significant performance penalty.
<<qsort1a>>= from random import randrange def qsort1a(list): """ Quicksort using list comprehensions and randomized pivot >>> qsort1a<<docstring test numeric input>> <<docstring test numeric output>> >>> qsort1a<<docstring test string input>> <<docstring test string output>> """ def qsort(list): if list == []: return [] else: pivot = list.pop(randrange(len(list))) lesser = qsort([l for l in list if l < pivot]) greater = qsort([l for l in list if l >= pivot]) return lesser + [pivot] + greater return qsort(list[:])
Testing
We provide a simple self-test for the quicksort implementations using the doctest module from the python standard library. This module analyses docstrings from functions defined in the module and carries out tests specified therein. We supply the tests in the form of an example interactive session with anticipated output. Here we define two tests in terms of function arguments and expected outputs — the function names and interactive session prompts will be added when these chunks of code are combined with the docstrings in which their chunk references are embedded.
<<docstring test numeric input>>= ([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]) <<docstring test numeric output>>= [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9]
<<docstring test string input>>= (["bob","alice","barry","zoe","charlotte","fred","marvin"]) <<docstring test string output>>= ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe']
The complete qsort.py
consists of the quicksort implementations and a call to doctest to test the module:
<<qsort.py>>= qsort1 qsort1a qsort2 if __name__ == "__main__": import doctest doctest.testmod()
Building and running
If you have installed Python, then the quicksort implementations can be tested at the command line by issuing the following command (the -v notifies doctest to output results even if all tests pass):
python qsort.py -v
Running the tests should produce the following output:
Trying: qsort1([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]) Expecting: [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9] ok Trying: qsort1(["bob","alice","barry","zoe","charlotte","fred","marvin"]) Expecting: ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe'] ok 0 of 2 examples failed in __main__.qsort1.__doc__ Running __main__.qsort1a.__doc__ Trying: qsort1a([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]) Expecting: [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9] ok Trying: qsort1a(["bob","alice","barry","zoe","charlotte","fred","marvin"]) Expecting: ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe'] ok 0 of 2 examples failed in __main__.qsort1a.__doc__ Running __main__.qsort2.__doc__ Trying: qsort2([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]) Expecting: [1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9] ok Trying: qsort2(["bob","alice","barry","zoe","charlotte","fred","marvin"]) Expecting: ['alice', 'barry', 'bob', 'charlotte', 'fred', 'marvin', 'zoe'] ok 0 of 2 examples failed in __main__.qsort2.__doc__ 2 items had no tests: __main__ __main__.partition 3 items passed all tests: 2 tests in __main__.qsort1 2 tests in __main__.qsort1a 2 tests in __main__.qsort2 6 tests in 5 items. 6 passed and 0 failed. Test passed.
Performance
Although all of the quicksort implementations apparently work correctly, the question of performance remains. To determine the performance of the quicksorts we use a simple test driver which generates randomly populated integer lists that have a length defined via command-line arguments. The quicksort implementations are then applied to the random lists, using the timing
module to obtain execution-time information.
<<time-qsort.py>>= import sys import random import timing import qsort def checkSorted(a): for i in xrange(1, len(a) - 1): if a[i] < a[i-1]: return False return True numElements = int(sys.argv[1]) testlist = random.sample(xrange(999999999), numElements) print "# elements: %d"%numElements for q in qsort.qsort1, qsort.qsort1a, qsort.qsort2: print "%s"%q.__name__, list = testlist[:] timing.start() result = q(list) timing.finish() print "- %.3f secs"%(float(timing.micro()) / 1000000), if checkSorted(result): print "- passed" else: print "- failed"
The table below compares the performance of the quicksort implementations, for various sizes of list. Times were taken on a 1.5 GhZ PowerPC G4machine under Mac OS X using the standard Python package; we measured both with and without the -O flag, but the difference was not appreciable.
Randomized lists
List size | 100 | 500 | 1000 | 5000 | 10000 | 50000 |
---|---|---|---|---|---|---|
qsort1 (s) | 0.002 | 0.009 | 0.019 | 0.142 | 0.270 | 1.853 |
qsort1a (s) | 0.003 | 0.015 | 0.031 | 0.189 | 0.380 | 2.188 |
qsort2 (s) | 0.004 | 0.029 | 0.097 | 0.846 | 4.161 | 278.242 |
The qsort1
list comprehension implementation significantly outperforms the qsort2
single-pass partitioning implementation for all list sizes. This is probably the result of Python's lack of tail-call optimization, and possibly also a result of optimizations for list comprehensions within the Python runtime.
The qsort1a
implementation performs roughly as well as qsort1
, but not surprisingly pays a slight performance penalty for generating a random number during each pivot selection.
As a further point of comparison, we compare the behavior of qsort1
and qsort1a
on sorted lists of sequential integers, which is a worst-case for a naive list algorithm.
Sorted lists
List size | 100 | 500 | 1000 | 5000 | 10000 | 50000 |
---|---|---|---|---|---|---|
qsort1 (s) | 0.006 | 0.145 | — | — | — | — |
qsort1a (s) | 0.002 | 0.014 | 0.027 | 0.167 | 0.333 | 1.825 |
The simple comprehension-based implementation stack overflowed at 1000 elements. However, by adding a randomized pivot to the comprehension-based implementation it is possible to avoid the stack overflow, and still achieve good performance.
Download code |