Merge sort (Ruby)

From LiteratePrograms

Jump to: navigation, search
Other implementations: ACL2 | C | C++ | dc | Eiffel | Erlang | Haskell | Java | JavaScript | Lisp | OCaml | Oz | Perl | Prolog | Python | Ruby | Scheme

This is an implementation of the merge sort algorithm in Ruby. The sorting predicate is user-specified as a block to the method; use of <= provides stable sorting of numbers.

Contents

Extending Array

We implement merge sort as an extension of the built-in Array class. We reopen the class and add the method merge_sort and some helper methods. We mark the helper methods as protected so that code outside of Array cannot call them directly.

<<extend array>>=
class Array
  mergesort method
  protected
  split helper methods
  merge helper method
end

merge_sort

The merge_sort method operates on an array and takes a predicate in block form. For a simple array (one element or empty), we just return a duplicate of the current array. For longer arrays, we split the array into two halves, recurse on each half, then merge the two halves according to the predicate:

<<mergesort method>>=
def merge_sort(&predicate)
  if size <= 1
    self.dup
  else
    halves = split.map{ |half|
      half.merge_sort(&predicate)
    }
    merge(*halves, &predicate)
  end
end

split and split_rec

The workhorse for splitting is the split_rec recursive method. This method takes two lists together with the work-in-progress result. In each recursion, two elements are eliminated off the first list and one element transferred from the second list to the result list. Once the first list is exhausted, we return from the recursion. At this point, we must be n / 2 recursions deep -- where n is the length of the original list. As such, the second list will contain the last nn / 2 elements from the original list and result contains the first n / 2 elements that were eliminated from the second list. This pair of lists is the ultimate return value.

The split method simply kicks off the recursive chain:

<<split helper methods>>=
def split_rec(ls1, ls2, result)
  if ls1.size <= 1
    [result, ls2]
  else
    split_rec(ls1[2..-1], ls2[1..-1], result + [ls2.first])
  end
end
def split
  split_rec(self, self, [])
end

merge

The merge method takes two lists. If either list is empty, we return a duplicate of the other as the merged result. Otherwise, the lead elements of each list are compared. A true result from the predicate indicates that the first element should precede the second in sorting order. The "winning" element is removed from its list and we recurse on the remainder of the lists. The result, prepended by the removed element, is returned:

<<merge helper method>>=
def merge(first, second, &predicate)
  if first.empty?
    second.dup
  elsif second.empty?
    first.dup
  elsif predicate.call(first.first, second.first)
    [first.first] + merge(first[1..-1], second, &predicate)
  else
    [second.first] + merge(first, second[1..-1], &predicate)
  end
end

Driver

To demonstrate the merge sort, we will simply include our extensions to the Array class, then sort an array of Integers:

<<mergesort.rb>>=
extend array
p [1, 5, 6, 4, 3].merge_sort{ |a,b| a <= b }

Considerations

This implementation is intended to be a pure functional implementation of merge sort, similar to implementations you would find in Lisp or Scheme. This is not necessarily the most efficient implementation in Ruby, however. The functional implementation implies that lists are immutable, and as such, there is a lot of juggling by creating many new intermediate lists where a single mutable intermediate value would suffice. Ruby also does not (yet) benefit from the tail-recursion optimizations available in many other programming languages, so rewriting the recursive calls as iterative structures could provide many benefits.

Finally, in this article we extended the Array class directly. While this can be a useful technique that makes for very readable code, care must be taken to avoid namespace pollution and collisions. This is especially true of our helper methods. It is hoped that in the near future, Ruby will include a mechanism for namespace selection to prevent these problems.

Download code
Views