Merge sort (Oz)

From LiteratePrograms

Jump to: navigation, search
Other implementations: ACL2 | C | C++ | dc | Eiffel | Erlang | Haskell | Java | JavaScript | Lisp | OCaml | Oz | Perl | Prolog | Python | Ruby | Scheme

This article showcases two implementations of the merge sort algorithm in the programming language Oz. Both implementations are generic: the sorting predicate is user-specified; use Value.'<=' to provide the usual stable sorting of numbers.

Contents

Implementation

Mergesort applies a predicate to a list of items that can be compared using that predicate. The predicate defines the sorted order produced by the mergesort algorithm. For a simple list (one element or empty), we can just return a duplicate of the current list. For longer lists, we split the list into two halves, recurse on each half, then merge the two halves in accordance with the predicate.

Mergesort using built-in functions

Oz provides a number of list-processing procedures that make the mergesort algorithm very easy to implement: List.length determines the length of a list; List.takeDrop splits a list into two smaller lists at a defined point; List.merge merges two sorted lists together in accordance with a sorting predicate. Using these functions, mergesort can be expressed as:

<<mergesort>>=
fun {Mergesort Xs P}
   case Xs
   of nil then nil
   [] [X] then [X]
   else
      Len Ys Zs
   in
      {List.length Xs Len}
      {List.takeDrop Xs (Len div 2) Ys Zs}  % split the list in two
      {List.merge {Mergesort Ys P} {Mergesort Zs P} P}
   end
end

Mergesort from first principles

The implementation using built-in functions is quick and easy, but doesn't provide much insight into how the mergesort actually works. Let's now create a mergesort implementation built from custom merge and split functions. The signature of this function is identical to our other mergesort function:

<<custom mergesort>>=
fun {MergesortCustom Xs P}
custom mergesort body
end

However the body of the custom mergesort is a little different. To begin with, it includes locally defined split and merge functions. It also contains a locally defined mergesort function, which operates in a slightly different way than the mergesort constructed from built-in functions:

<<custom mergesort body>>=
   split
   merge
   fun {Mergesort Xs}
      case Xs
      of nil then nil
      [] [X] then [X]
      else
         Ys # Zs = {Split Xs}
      in
         {Merge {Mergesort Ys} {Mergesort Zs}}
      end
   end
in
   {Mergesort Xs}

The differences in the locally defined mergesort are a result of two important facts:

  1. The local Split function is specifically designed to split a list in half, regardless of length, so there is no need to find the length of the list first.
  2. The local Merge function already has the sorting predicate P in its scope, so there is no need to pass the predicate as an argument.

Split

The workhorse for splitting a list is the SplitRec recursive function. This method takes two lists together with a work-in-progress result. In each recursion, two elements are eliminated off the first list, and one element transferred from the second list to the result list. Once the first list is exhausted, we return from the recursion. At this point, we must be n / 2 recursions deep -- where n is the length of the original list. As such, the second list will contain the last nn / 2 elements from the original list and the working list will contain the first n / 2 elements that were eliminated from the second list. This pair of lists is the ultimate return value. Due to the way the working list is built, the elements it contains are in reverse order. To ensure a stable sort, the order of the working list is reversed (to restore the original list order) before it is returned. The Split function simply kicks off this recursive chain.

<<split>>=
fun {Split Xs}
   fun {SplitRec Xs Ys Ws}
      case Xs
      of nil then {List.reverse Ws} # Ys
      [] [_] then {List.reverse Ws} # Ys
      [] _ | _ | Xss then
         Y1 | Yss = Ys
      in
         {SplitRec Xss Yss (Y1 | Ws)}
      end
   end
in {SplitRec Xs Xs nil} end

A more intuitive way of splitting the list might be to put all odd-positioned elements in one sublist, and the rest in another. For example, we might have written Split as

fun {Split Xs}
    case Xs 
    of nil then nil # nil
    [] [X] then [X] # nil
    [] X | Y | Zs then
       Xs # Ys = {Split Zs}
    in
       ( X | Xs ) # ( Y | Ys )
    end 
end

This, however, is not stable: a list that is already partially-ordered according to the predicate might be rearranged. For example, given a binary predicate string-length<= that returns true iff the first argument is not longer than the second, and the list ["aaa" "bbb" "ccc"], you get ["aaa" "ccc" "bbb"] back. The SplitRec function avoids this problem by using a copy of the list, which is consumed two elements at a time, as a counter to find the midpoint of the list.

Merge

Merging is the heart of the algorithm and operates by interleaving the elements of two ordered lists in such a way that the combined list is ordered. This process takes only linear (O(n)) time. The Merge function takes two lists. If either list is empty, we return a duplicate of the other as the merged result. Otherwise, the lead elements of each list are compared using the sorting predicate, P. A true result from the predicate indicates that the first element should precede the second in sorting order. The "winning" element is removed from its list and we recurse on the remainder of the lists. The result, prepended by the removed element, is returned:

<<merge>>=
fun {Merge Xs Ys}
   case Xs # Ys
   of nil # Ys then Ys
   [] Xs # nil then Xs
   [] ( X | Xss ) # ( Y | Yss) then
      if {P X Y} then
         X | {Merge Xss Ys}
      else
         Y | {Merge Xs Yss}
      end
   end
end

Packaging

Both of the mergesort functions are packaged inside a functor, which is an Oz module. In this case, the functor exports both merge functions, making them accessible to other Oz modules.

<<mergesort.oz>>=
functor
export
   Mergesort
   MergesortCustom
define
   mergesort
   custom mergesort
end

Testing

We provide a simple test harness for the mergesort implementations. The heart of this test harness is the TestSort procedure. This procedure accepts a sorting function, an ordering function, a testname, a stimulus, and an expected response as parameters. It applies the sorting function to the stimulus using the provided ordering function, and checks to see whether the actual result matches the expected one. The results of the test are then printed to the console.

<<TestSort>>=
proc {TestSort F Order Testname S R}
   Result
in
   if
      R == {F S Order}
   then
      Result = "passed"
   else 
      Result =  "failed"
   end
   {System.showInfo
    {System.printName F} # " - " # Result # " - " # Testname # "."}
end

The test cases provided as part of the test harness are fairly straightforward. One of the tests verifies that the sorting function actually returns a sorted output.

<<TestNumericSort>>=
proc {TestNumericSort F}
   Stimulus = [3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3]
   Response = [1 1 2 3 3 3 4 5 5 5 6 7 8 9 9 9]
in
   {TestSort F Value.'=<' "numeric sort" Stimulus Response}
end

The second test verifies that the sorting function is stable, i.e. that it does not alter the order of a list that should not be permuted under the provided ordering function. In this case, the test is performed using a list of equal-length strings as the stimulus, and an anonymous function that defines an ordering in terms of string lengths. The expected result is that the stimulus list is not permuted.

<<TestStableSort>>=
   proc {TestStableSort F}
      Stimulus = ["bbb" "aaa" "ccc" "zzz" "fff"]
      Response = ["bbb" "aaa" "ccc" "zzz" "fff"]
   in
      {TestSort F fun {$ X Y} {List.length X}=<{List.length Y} end
       "stable sort" Stimulus Response}
   end

The complete test harness loads the necessary support modules, and then applies the two test procedures to both the Mergesort and MergesortCustom sorting functions. Note that imported functions are invoked using names that have a lower case first letter (and a prefixed module name), despite the fact that they are defined with an upper case letter at the start of their names.

<<mergesort-tests.oz>>=
functor
import
   Application
   System
   Mergesort
define
    TestSort
    TestNumericSort
    TestStableSort
in   
   {TestNumericSort Mergesort.mergesort}
   {TestStableSort Mergesort.mergesort}   
   {TestNumericSort Mergesort.mergesortCustom}
   {TestStableSort Mergesort.mergesortCustom}   
   {Application.exit 0}
end

Building and running

If you have installed the Mozart Programming System, then the mergesort implementations and test harness can be compiled at the command line by issuing the following commands:

ozc -c mergesort.oz
ozc -x mergesort-tests.oz

Running mergesort-tests should produce the following output:

Mergesort - passed - numeric sort.
Mergesort - passed - stable sort.
MergesortCustom - passed - numeric sort.
MergesortCustom - passed - stable sort.
Download code
Views