Merge sort (OCaml)

From LiteratePrograms

Jump to: navigation, search
Other implementations: ACL2 | C | C++ | dc | Eiffel | Erlang | Haskell | Java | JavaScript | Lisp | OCaml | Oz | Perl | Prolog | Python | Ruby | Scheme

This is an implementation of merge sort in OCaml that takes a user specified ordering function compare and a list as arguments, and returns a copy of the list, sorted according to the ordering function.

Merge sort is a recursive algorithm that makes use of two helper functions split and merge. As one might expect, split splits a list into two halves of roughly equal length (the lengths can differ by no more than 1), while merge takes two lists, pre-sorted according to ordering function compare, and combines the two lists into one sorted list.

Given these two helpers, mergesort proceeds as follows:
If the list contains 0 or 1 elements, then t is already sorted, and we simply return the list
If the list contains two or more elements, then we split the list into two halves, recursively call mergesort to sort the two half lists, then we combine them with merge to get a sorted list. This works to sort the list, because recursively calling mergesort on the two halves will eventually reach a point where we reach the base case of lists with zero or one elements, which (being pre-sorted), we can build back up into a sorted list with merge.

<<mergesort>>=
let rec mergesort compare = 
	split
  in
	merge
	in function
  | ([] | [_]) as l -> l
  | l ->  let left,right = split l in 
          merge compare (mergesort compare left) (mergesort compare right)
;;

So now, let us look at the helper functions, starting first with split, which works in a somewhat non-obvious way. We define a recursive helper function, split_aux to do the bulk of the splitting work. split_aux takes three lists as arguments: the first is used to represent the difference in the lengths of the two half-lists we are creating, the second is the left half of the list that we will build up by puling elements from the head of the third list, which will be paired down to become the right half list.

The splitting proceeds as follows: If the first list has zero or one elements in it, then we know that the left and right halves are almost equal in length, and we return them. Otherwise, the right half list has at least 2 more elements than the left half, and so we pul the head off of the right half list, add it onto the left half, and recursively call split_aux decreasing the length of the first list by two (representing that the difference in length of the two halves is two elements closer to zero than it used to be), and passing in the longer left half and the shorter right half.

Finally, we add a third case to split_aux to catch the case where the first list has at least two elements in it, but the right half list is empty. If split_aux is called properly, as we do by supplying the original list as the first and third arguments and an empty list for the second, then this case will never be reached. However, OCaml will warn us if we don't provide patterns for all possible inputs, so we add this third, universal pattern to catch this case and use assert false as the standard way of flagging these cases.

<<split>>=
let split l =
  let rec split_aux l left right = 
    match l,left,right with
    | ([] | [_]),_,_ -> (List.rev left),right
    | (_::_::t),_,h::right_t -> split_aux t (h::left) right_t
    | _ -> assert false
  in
  split_aux l [] l

Note that we need to reverse the elements of the left half before returning it, so that they appear in the same order as they did in the original list. We do this to preserve the stability property of the merge sort algorithm, which states that any elements of equal rank will appear in the same order in the sorted list as they did in the original list.

The merge function, which is used to assemble two sorted lists into one sorted list works as follows: If either of the two lists is empty, then we can simply return the other list as we have nothing to add to it. If both of the two lists have at least one element, we compare the heads of the two lists. We select the smaller of the two to be the head of the merged list, and cons it on the the rest of the merged list which we construct with a recursive call to merge.

Note that the ordering function must follow the same rules as that of the standard OCaml ordering function compare from the Pervaives module. That is, given two values, if the first is less than the second, then the function returns a negative int (usually -1), if they are equal, then the function returns a 0, otherwise it returns a positive int (usually 1).

<<merge>>=
let rec merge compare l1 l2 =
  match l1,l2 with
  | [],l | l,[] -> l
  | h1::t1,h2::t2 ->
    if compare h1 h2 <= 0 then h1::(merge compare t1 l2)
    else h2::(merge compare l1 t2)

Testing

One can test the mergesort implementation using any list and appropriate ordering function. For example:

<<mergesort.ml>>=
mergesort
let l1 = [1; 2; 5; 9; 4; 3; 6; 7; 8;];;
let l2 = ["aaa"; "bbb"; "xxxx"; "ccc"; "ffff"; "w"];;
mergesort compare l1;;
mergesort compare l2;;
mergesort (fun x y -> compare y x) l1;;
mergesort (fun x y -> compare (String.length x) (String.length y)) l2;;
Download code
Views