Cuckoo Hashing

From LiteratePrograms

Jump to: navigation, search

Contents

Cuckoo Hashing

Cuckoo hashing is a great way of making a hash table without resorting to chaining to resolve collisions. The basic idea is to use two or more hash functions for every key. This means that for every thing in the table there are two or more places in the table it could be.

Given here is a simple implementation of Cuckoo hashing using the Haskell programming language.

<<CuckooHashing>>=
module Main where
import Data.Array
import Data.List
import System.Random

Types, Eggs and Tables

Every thing in our hash table will store a value and a list of hash-functions used to place it in the table. For simplicity we're only going to insert whole numbers into the hash table. We'll call these things eggs.

<<CuckooHashing>>=
type Value        = Int
type Index        = Int
type HashFunction = (Value -> Index)
type Egg          = (Value, [HashFunction])

The Cuckoo hashtable will store a lot of these eggs. We'll use an array to make the table and we'll say that every location may (or may not) contain an egg.

<<CuckooHashing>>=
type CuckooTable = Array Index (Maybe Egg)

To make our empty table we make a big array which contains Nothing.

<<CuckooHashing>>=
createCuckooTable :: Int -> CuckooTable
createCuckooTable i = listArray (1, i) $ repeat Nothing

To make an egg we take a value and a list of hash functions. We repeat the hash functions in an endless cycle so we know the order to use them in.

<<CuckooHashing>>=
createEgg :: Value -> [HashFunction] -> Egg 
createEgg v hs = (v, cycle hs)

Adding Eggs

When we insert an egg into our table we'll take the first hash function from the list inside the egg, and use it to find the place the egg belongs.

<<CuckooHashing>>=
value :: Egg -> Value
value = fst
hashEgg :: Egg -> (Index, Egg)
hashEgg egg@(v, h:hs) = (h v, (v,hs))

If the place the egg should go is empty then we'll place the egg there. If it isn't empty then we'll kick-out the egg currently in its place and place the new-egg there instead. We'll try and place the kicked-out egg using its next hash function and repeat this until either we find an empty slot, or we reach a limit on the number of kick-outs we allow ourselves to do.

If we go over the number of kick-outs we are allowed we will have an error.

<<CuckooHashing>>=
type Error = String
addEgg :: CuckooTable -> Egg -> Either Error CuckooTable
addEgg = addEgg' kickOutsAllowed

We will make the kick-outs allowed a fixed number to make it simpler.

<<CuckooHashing>>=
  where
    kickOutsAllowed = 20
<<CuckooHashing>>=
addEgg' :: Int -> CuckooTable -> Egg -> Either Error CuckooTable
addEgg' 0 _ _              = Left "We have kicked-out too many eggs"
addEgg' kickouts table egg = let (index, egg') = hashEgg egg
                                 place         = table ! index
                              in case place of

In the case the place is empty we'll just put the egg there.

<<CuckooHashing>>=
                                Nothing       -> Right $ table // [(index, Just egg')]

If there is another egg there and it has the same value as the egg we are trying to insert we can ignore it.

<<CuckooHashing>>=
                                Just otherEgg -> if value egg == value otherEgg 
                                                 then Right table

If it isn't the same egg then we kick-out the other egg, and put our new egg in its place. We can try and add the other egg again, if our kickout limit isn't too low.

<<CuckooHashing>>=
                                                 else 
                                                   let table' = table // [(index, Just egg')]
                                                   in addEgg' (kickouts-1) table' otherEgg

Testing the Table

To show how well the Cuckoo hashtable works we will try add random numbers to the table and how many eggs we can add before we get hit our kick-out limit.

<<CuckooHashing>>=
testTilFailure :: [Egg] -> CuckooTable -> Integer -> Integer
testTilFailure (e:es) t n = 
  case (addEgg t e) of
    Left  err -> n
    Right t'  -> testTilFailure es t' (1+n)
testCuckooTable :: (RandomGen g) => [HashFunction] -> g -> Integer
testCuckooTable hashes gen =
  let eggs = map (flip createEgg $ hashes) $ randoms gen
  in testTilFailure eggs (createCuckooTable tableSize) 0

To test the table we'll use a size of 8243: a prime number. We'll also make three hash functions for the testing.

<<CuckooHashing>>=
tableSize = 8243
hash1 :: HashFunction
hash1 x = 1+ ((58193 + 1258207*y + 2808193*y^2)`mod` 308383) `mod` tableSize
  where y = abs x
hash2 :: HashFunction
hash2 x = 1+ ((711929 + 1810057*y + 9767*y^2)  `mod` 608459) `mod` tableSize
  where y = abs x
hash3 :: HashFunction
hash3 x = 1+ ((459257 + 808481*y + 2958259*y^2)`mod` 1358333) `mod` tableSize
  where y = abs x

We can run our tests like this. We'll repeat them a million times and take the average so we get good data.

<<CuckooHashing>>=
testNumber = 1000000
average :: [Integer] -> Integer
average ns = sum `div` length
  where
    sum    = foldl1' (+) ns
    length = foldl' (\n _ -> 1+n) 0 ns 
main :: IO()
main = do
  gen <- newStdGen
  let test = replicate testNumber $ testCuckooTable [hash1,hash2,hash3] gen
  print $ average test

Results

  • Using three hash functions: 6076 insertions.
  • Using two hash functions: 3993 insertions.
  • Using one hash function: 95 insertions.
Download code
Views