Cuckoo Hashing
From LiteratePrograms
Contents |
Cuckoo Hashing
Cuckoo hashing is a great way of making a hash table without resorting to chaining to resolve collisions. The basic idea is to use two or more hash functions for every key. This means that for every thing in the table there are two or more places in the table it could be.
Given here is a simple implementation of Cuckoo hashing using the Haskell programming language.
<<CuckooHashing>>= module Main where import Data.Array import Data.List import System.Random
Types, Eggs and Tables
Every thing in our hash table will store a value and a list of hash-functions used to place it in the table. For simplicity we're only going to insert whole numbers into the hash table. We'll call these things eggs.
<<CuckooHashing>>= type Value = Int type Index = Int type HashFunction = (Value -> Index) type Egg = (Value, [HashFunction])
The Cuckoo hashtable will store a lot of these eggs. We'll use an array to make the table and we'll say that every location may (or may not) contain an egg.
<<CuckooHashing>>= type CuckooTable = Array Index (Maybe Egg)
To make our empty table we make a big array which contains Nothing.
<<CuckooHashing>>= createCuckooTable :: Int -> CuckooTable createCuckooTable i = listArray (1, i) $ repeat Nothing
To make an egg we take a value and a list of hash functions. We repeat the hash functions in an endless cycle so we know the order to use them in.
<<CuckooHashing>>= createEgg :: Value -> [HashFunction] -> Egg createEgg v hs = (v, cycle hs)
Adding Eggs
When we insert an egg into our table we'll take the first hash function from the list inside the egg, and use it to find the place the egg belongs.
<<CuckooHashing>>= value :: Egg -> Value value = fst hashEgg :: Egg -> (Index, Egg) hashEgg egg@(v, h:hs) = (h v, (v,hs))
If the place the egg should go is empty then we'll place the egg there. If it isn't empty then we'll kick-out the egg currently in its place and place the new-egg there instead. We'll try and place the kicked-out egg using its next hash function and repeat this until either we find an empty slot, or we reach a limit on the number of kick-outs we allow ourselves to do.
If we go over the number of kick-outs we are allowed we will have an error.
<<CuckooHashing>>= type Error = String addEgg :: CuckooTable -> Egg -> Either Error CuckooTable addEgg = addEgg' kickOutsAllowed
We will make the kick-outs allowed a fixed number to make it simpler.
<<CuckooHashing>>= where kickOutsAllowed = 20
<<CuckooHashing>>= addEgg' :: Int -> CuckooTable -> Egg -> Either Error CuckooTable addEgg' 0 _ _ = Left "We have kicked-out too many eggs" addEgg' kickouts table egg = let (index, egg') = hashEgg egg place = table ! index in case place of
In the case the place is empty we'll just put the egg there.
<<CuckooHashing>>= Nothing -> Right $ table // [(index, Just egg')]
If there is another egg there and it has the same value as the egg we are trying to insert we can ignore it.
<<CuckooHashing>>= Just otherEgg -> if value egg == value otherEgg then Right table
If it isn't the same egg then we kick-out the other egg, and put our new egg in its place. We can try and add the other egg again, if our kickout limit isn't too low.
<<CuckooHashing>>= else let table' = table // [(index, Just egg')] in addEgg' (kickouts-1) table' otherEgg
Testing the Table
To show how well the Cuckoo hashtable works we will try add random numbers to the table and how many eggs we can add before we get hit our kick-out limit.
<<CuckooHashing>>= testTilFailure :: [Egg] -> CuckooTable -> Integer -> Integer testTilFailure (e:es) t n = case (addEgg t e) of Left err -> n Right t' -> testTilFailure es t' (1+n) testCuckooTable :: (RandomGen g) => [HashFunction] -> g -> Integer testCuckooTable hashes gen = let eggs = map (flip createEgg $ hashes) $ randoms gen in testTilFailure eggs (createCuckooTable tableSize) 0
To test the table we'll use a size of 8243: a prime number. We'll also make three hash functions for the testing.
<<CuckooHashing>>= tableSize = 8243 hash1 :: HashFunction hash1 x = 1+ ((58193 + 1258207*y + 2808193*y^2)`mod` 308383) `mod` tableSize where y = abs x hash2 :: HashFunction hash2 x = 1+ ((711929 + 1810057*y + 9767*y^2) `mod` 608459) `mod` tableSize where y = abs x hash3 :: HashFunction hash3 x = 1+ ((459257 + 808481*y + 2958259*y^2)`mod` 1358333) `mod` tableSize where y = abs x
We can run our tests like this. We'll repeat them a million times and take the average so we get good data.
<<CuckooHashing>>= testNumber = 1000000 average :: [Integer] -> Integer average ns = sum `div` length where sum = foldl1' (+) ns length = foldl' (\n _ -> 1+n) 0 ns main :: IO() main = do gen <- newStdGen let test = replicate testNumber $ testCuckooTable [hash1,hash2,hash3] gen print $ average test
Results
- Using three hash functions: 6076 insertions.
- Using two hash functions: 3993 insertions.
- Using one hash function: 95 insertions.
Download code |