Quicksort (Python, arrays)

From LiteratePrograms

Jump to: navigation, search
Other implementations: AWK | C | C++ | Eiffel | Erlang | Forth | Haskell | Java | JavaScript | Mathematica | Mercury | Oz | Python | Python, arrays | Scala | Sed | Standard ML | Visual Basic .NET | XProc

Quicksort can be implemented for Python lists (see Quicksort (Python)) but due to the inability to efficiently choose good pivots in a list, it suffers from poor performance on sorted or nearly sorted inputs. An alternative is to use one of the many array implementations available for Python, such as the standard library module array, the Numeric library (now called NumPy), or the numarray library. We will also require a source of random numbers as supplied by either the standard library's random module (based on the excellent Mersenne twister generator) or Numeric's RandomArray extension library.


Contents

In-place partition function

The most efficient partition functions are in-place — they just shuffle around the elements of an array. The most common approach is to take two array indexes, one starting at the beginning and moving up and one starting at the end and moving down, and the values at these indexes are sometimes exchanged. The following function partitions the subarray a[start..end] inclusive about the pivot value at a[pivotIndex]:

<<partition function>>=
def partition(a, start, end, pivotIndex):
    low = start
    high = end - 1  # After we remove pivot it will be one smaller
    pivotValue = a[pivotIndex]
    remove pivot from array
    while True:
        while low <= high and a[low] < pivotValue:
            low = low + 1
        while low <= high and a[high] >= pivotValue:
            high = high - 1
        if low > high:
            break
        a[low], a[high] = a[high], a[low]
    insert pivot into final position and return final position

In works by moving the two indexes until they point to two elements in the wrong order. It then exchanges them using tuple syntax. We move the pivot "out of the way" to ensure that it isn't moved around during the process - this would complicate things. At the end we store it where low points, which will be one past the end of the lesser elements, its final position:

<<remove pivot from array>>=
a[pivotIndex] = a[end]
<<insert pivot into final position and return final position>>=
a[end] = a[low]
a[low] = pivotValue
return low

Because the element the pivot supplants is sent to the end of the array, it's necessary to exchange it with the first element in the sublist of greater-or-equal elements rather than the last element of the lesser elements.

Main algorithm

Now that we have partition, the main algorithm is simple to write. We are given an array and inclusive start and end indexes. We select a pivot at random using random.randint(), partition about it, then recursively invoke quicksort on the two partitions.

<<quicksort function>>=
def qsortRange(a, start, end):
    if end - start + 1 < 32:
        insertionSort(a, start, end)
    else:
        pivotIndex = partition(a, start, end, randint(start, end))
        qsortRange(a, start, pivotIndex - 1)
        qsortRange(a, pivotIndex + 1, end)
    return a

The actual qsort function just sorts the whole range:

<<quicksort function>>=
def qsort(a):
    return qsortRange(a, 0, len(a) - 1)

If the list size becomes small, we fall back to insertion sort (see Insertion sort (Python) for more):

<<insertion sort function>>=
def insertionSort(a, start, end):
    for i in xrange(start, end + 1):
        # Insert a[i] into the sorted sublist
        v = a[i]
        for j in reversed(xrange(0, i)):
            if a[j] <= v:
                a[j + 1] = v
                break
            a[j + 1] = a[j]
        else:
            a[0] = v
    return a

We add 1 to end because the given range is inclusive, while xrange includes its upper limit. We use the reversed filter to iterate j from i−1 down to zero. The else after the for executes only if it did not break and handles the case where the value is moved all the way to the beginning.

Test code

We create this helper function to generate a random array for testing:

<<generate random array function>>=
def randarray(typecode, numElements, minValue, maxValue):
    a = array(typecode)
    for i in xrange(0, numElements):
        a.append(randint(minValue, maxValue))
    return a

To verify the result, we use this helper:

<<check if array is sorted function>>=
def checkArraySorted(a):
    for i in xrange(1, len(a) - 1):
        if a[i] < a[i-1]:
            return False
    return True

Now we can try it out with something like:

<<test code>>=
a = randarray('i', 50, 1, 200)
print a
print qsort(a)
if checkArraySorted(qsort(randarray('i', 10000, 0, 999999999))):
    print "Test passed"
else:
    print "Test failed"

We see that the result is correctly sorted.

Files

Finally, we wrap all this up in a Python file for distribution. We need to import the random and array modules.

<<qsort.py>>=
from array import *
from random import *
partition function
insertion sort function
quicksort function
generate random array function
check if array is sorted function
test code

Performance

Since the simplest implementations in Python are list-based, the question remains whether this implementation outperforms them well enough to justify its additional complexity. The below table compares runtime (including setup) for data sets of various sizes using this algorithm and the simple list-comprehension-based quicksort algorithm from Quicksort (Python). Times were taken on a 2.8 GhZ Pentium 4 machine under Gentoo Linux using the standard Python package; we measured both with and without the -O flag, but the difference was not appreciable.

Random lists

List size 100 1000 10000 50000 100000 250000 500000 1000000
Array quicksort (s) 0.022 0.038 0.23 1.15 2.415 6.52 13.0 27.0
List quicksort (s) 0.021 0.034 0.19 0.98 2.157 6.37 13.7 28.3

Surprisingly, the very simple list implementation remains competitive with the array implementation for all list sizes. Next, we compare their behavior on sorted lists of sequential integers, which is a worst-case for the naive list algorithm. To account for the difference in setup time we first generate and throw away a random list the same size as before.

Sorted lists

List size 100 1000 10000 50000 100000 250000 500000 1000000
Array quicksort (s) 0.028 0.033 0.17 0.867 1.72 4.46 9.58 19.1
List quicksort (s) 0.023

The simple comprehension-based implementation stack overflowed at 1000 elements. The array-based implementation performed better than either of the algorithms did on random lists. In this commonly occurring case, it's clear that the list-based quicksort is not even an option. It's also interesting to compare this with the results in Quicksort (C), which are several orders of magnitude faster than either of these implementations.

Download code
Views