Hash table (C)

From LiteratePrograms

Jump to: navigation, search
Other implementations: C | Java

This is an implementation of a hash table.

Contents

Description

This implementation uses strings as keys and maps them to a user specified void pointer. The size and the hash function are specified when creating the hash table. To handle hash collisions, each entry in the table is a pointer to a linked list, so that multiple values with the same hash can be stored.

Automatic resizing is not included in this implementation. A function for manual resizing is provided.

Structures and functions

HASHTBL is the primary reference to the hash table. It contains all information necessary to manipulate and look up the elements. All hash table functions expect a pointer to HASHTBL.

<<HASHTBL>>=
typedef struct hashtbl {
	hash_size size;
	struct hashnode_s **nodes;
	hash_size (*hashfunc)(const char *);
} HASHTBL;

The nodes member of HASHTBL is an array of pointers to the first element of a linked list. This element is represented by struct hashnode_s:

<<node>>=
struct hashnode_s {
	char *key;
	void *data;
	struct hashnode_s *next;
};

These are the functions used to manipulate and look up the hash table:

<<functions>>=
HASHTBL *hashtbl_create(hash_size size, hash_size (*hashfunc)(const char *));
void hashtbl_destroy(HASHTBL *hashtbl);
int hashtbl_insert(HASHTBL *hashtbl, const char *key, void *data);
int hashtbl_remove(HASHTBL *hashtbl, const char *key);
void *hashtbl_get(HASHTBL *hashtbl, const char *key);
int hashtbl_resize(HASHTBL *hashtbl, hash_size size);

The HASHTBL pointer returned from hashtbl_create() should be used as the first argument to all other functions. To release all allocated memory, hashtbl_destroy() should be called.

The memory pointed to by the data argument to hashtbl_insert() is managed by the user code, and will not be released by hashtbl_destroy().


<<hashtbl.h>>=
#ifndef HASHTBL_H_INCLUDE_GUARD
#define HASHTBL_H_INCLUDE_GUARD
#include<stdlib.h>
typedef size_t hash_size;
node
HASHTBL
functions
#endif

Implementation

<<hashtbl.c>>=
#include "hashtbl.h"
#include <string.h>
#include <stdio.h>
mystrdup
def_hash_func

Initialisation

hashtbl_create() sets up the initial structure of the hash table. The user specified size will be allocated and initialized to NULL. The user can also specify a hash function. If the hashfunc argument is NULL, a default hash function is used.

If an error occurred, NULL is returned. All other values in the returned HASHTBL pointer should be released with hashtbl_destroy().

<<hashtbl.c>>=
HASHTBL *hashtbl_create(hash_size size, hash_size (*hashfunc)(const char *))
{
	HASHTBL *hashtbl;
	if(!(hashtbl=malloc(sizeof(HASHTBL)))) return NULL;
	if(!(hashtbl->nodes=calloc(size, sizeof(struct hashnode_s*)))) {
		free(hashtbl);
		return NULL;
	}
	hashtbl->size=size;
	if(hashfunc) hashtbl->hashfunc=hashfunc;
	else hashtbl->hashfunc=def_hashfunc;
	return hashtbl;
}

Cleanup

The hashtbl_destroy() walks through the linked lists for each possible hash value, and releases the elements. It also releases the nodes array and the HASHTBL.

<<hashtbl.c>>=
void hashtbl_destroy(HASHTBL *hashtbl)
{
	hash_size n;
	struct hashnode_s *node, *oldnode;
	for(n=0; n<hashtbl->size; ++n) {
		node=hashtbl->nodes[n];
		while(node) {
			free(node->key);
			oldnode=node;
			node=node->next;
			free(oldnode);
		}
	}
	free(hashtbl->nodes);
	free(hashtbl);
}

Adding a new element

To make sure the hash value is not bigger than size, the result of the user provided hash function is used modulo size.

<<hashtbl.c>>=
int hashtbl_insert(HASHTBL *hashtbl, const char *key, void *data)
{
	struct hashnode_s *node;
	hash_size hash=hashtbl->hashfunc(key)%hashtbl->size;

To get the correct position in the nodes table, hash is used as an index.

We have to search the linked list to see if data with the same key has beeen inserted before, in which case we just replace the data member of the hashnode_s struct.

<<hashtbl.c>>=
/*	fprintf(stderr, "hashtbl_insert() key=%s, hash=%d, data=%s\n", key, hash, (char*)data);*/
	node=hashtbl->nodes[hash];
	while(node) {
		if(!strcmp(node->key, key)) {
			node->data=data;
			return 0;
		}
		node=node->next;
	}

If we didn't find the key, we insert a new element at the start of the linked list.-

<<hashtbl.c>>=
	if(!(node=malloc(sizeof(struct hashnode_s)))) return -1;
	if(!(node->key=mystrdup(key))) {
		free(node);
		return -1;
	}
	node->data=data;
	node->next=hashtbl->nodes[hash];
	hashtbl->nodes[hash]=node;

We use 0 to indicate success and -1 for error.

<<hashtbl.c>>=
	return 0;
}

Removing an element

To remove an element from the hash table, we just search for it in the linked list for that hash value, and remove it if it is found. If it was not found, it is an error and -1 is returned.

<<hashtbl.c>>=
int hashtbl_remove(HASHTBL *hashtbl, const char *key)
{
	struct hashnode_s *node, *prevnode=NULL;
	hash_size hash=hashtbl->hashfunc(key)%hashtbl->size;
	node=hashtbl->nodes[hash];
	while(node) {
		if(!strcmp(node->key, key)) {
			free(node->key);
			if(prevnode) prevnode->next=node->next;
			else hashtbl->nodes[hash]=node->next;
			free(node);
			return 0;
		}
		prevnode=node;
		node=node->next;
	}
	return -1;
}

Searching

Searching for an element is easy. We just search through the linked list for the corresponding hash value. NULL is returned if we didn't find it.

<<hashtbl.c>>=
void *hashtbl_get(HASHTBL *hashtbl, const char *key)
{
	struct hashnode_s *node;
	hash_size hash=hashtbl->hashfunc(key)%hashtbl->size;
/*	fprintf(stderr, "hashtbl_get() key=%s, hash=%d\n", key, hash);*/
	node=hashtbl->nodes[hash];
	while(node) {
		if(!strcmp(node->key, key)) return node->data;
		node=node->next;
	}
	return NULL;
}

Resizing

The number of elements in a hash table is not always known when creating the table. If the number of elements grows too large, it will seriously reduce the performance of most hash table operations. If the number of elements are reduced, the hash table will waste memory. That is why we provide a function for resizing the table.

Resizing a hash table is not as easy as a realloc(). All hash values must be recalculated and each element must be inserted into its new position.

We create a temporary HASHTBL object (newtbl) to be used while building the new hashes. This allows us to reuse hashtbl_inset() and hashtbl_remove(), when moving the elements to the new table. After that, we can just free the old table and copy the elements from newtbl to hashtbl.

<<hashtbl.c>>=
int hashtbl_resize(HASHTBL *hashtbl, hash_size size)
{
	HASHTBL newtbl;
	hash_size n;
	struct hashnode_s *node,*next;
	newtbl.size=size;
	newtbl.hashfunc=hashtbl->hashfunc;
	if(!(newtbl.nodes=calloc(size, sizeof(struct hashnode_s*)))) return -1;
	for(n=0; n<hashtbl->size; ++n) {
		for(node=hashtbl->nodes[n]; node; node=next) {
			next = node->next;
			hashtbl_insert(&newtbl, node->key, node->data);
			hashtbl_remove(hashtbl, node->key);
		}
	}
	free(hashtbl->nodes);
	hashtbl->size=newtbl.size;
	hashtbl->nodes=newtbl.nodes;
	return 0;
}

Support functions

In the hashtbl_insert() function, we used a function called mystrdup(). This is just an implementation of strdup() which is provided by many systems, but is not part of the standard C library.

<<mystrdup>>=
static char *mystrdup(const char *s)
{
	char *b;
	if(!(b=malloc(strlen(s)+1))) return NULL;
	strcpy(b, s);
	return b;
}

Default hash function

def_hashfunc() is the default used by hashtbl_create() when the user didn't specify one. This is a simple/naive hash function which adds the key's ASCII char values. It will probably generate lots of collisions on large hash tables.

<<def_hash_func>>=
static hash_size def_hashfunc(const char *key)
{
	hash_size hash=0;
	while(*key) hash+=(unsigned char)*key++;
	return hash;
}

Test

To test that it works, we create a hash table containing some European countries mapped to their respective capitals.

<<main.c>>=
#include"hashtbl.h"
#include<stdio.h>
int main()
{
	HASHTBL *hashtbl;
	char *spain, *italy;
	if(!(hashtbl=hashtbl_create(16, NULL))) {
		fprintf(stderr, "ERROR: hashtbl_create() failed\n");
		exit(EXIT_FAILURE);
	}
	hashtbl_insert(hashtbl, "France", "Paris");
	hashtbl_insert(hashtbl, "England", "London");
	hashtbl_insert(hashtbl, "Sweden", "Stockholm");
	hashtbl_insert(hashtbl, "Germany", "Berlin");
	hashtbl_insert(hashtbl, "Norway", "Oslo");
	hashtbl_insert(hashtbl, "Italy", "Rome");
	hashtbl_insert(hashtbl, "Spain", "Madrid");
	hashtbl_insert(hashtbl, "Estonia", "Tallinn");
	hashtbl_insert(hashtbl, "Netherlands", "Amsterdam");
	hashtbl_insert(hashtbl, "Ireland", "Dublin");
	printf("After insert:\n");
	italy=hashtbl_get(hashtbl, "Italy");
	printf("Italy: %s\n", italy?italy:"-");
	spain=hashtbl_get(hashtbl, "Spain");
	printf("Spain: %s\n", spain?spain:"-");

We use hashtbl_remove() on the "Spain" element, and use hashtbl_get() to ensure it has actually been removed.

<<main.c>>=
	hashtbl_remove(hashtbl, "Spain");
	printf("After remove:\n");
	spain=hashtbl_get(hashtbl, "Spain");
	printf("Spain: %s\n", spain?spain:"-");

To test resizing, we call hashtbl_resize(), and then hashtbl_get() to see if the hash table still works.

<<main.c>>=
	hashtbl_resize(hashtbl, 8);
	printf("After resize:\n");
	italy=hashtbl_get(hashtbl, "Italy");
	printf("Italy: %s\n", italy?italy:"-");
	hashtbl_destroy(hashtbl);
	return 0;
}

The output is, as expected:

After insert:
Italy: Rome
Spain: Madrid
After remove:
Spain: -
After resize:
Italy: Rome

Building

<<Makefile>>=
all: hashtbl

hashtbl: hashtbl.o main.o
	cc -o hashtbl -Wall -pedantic -g hashtbl.o main.o

hashtbl.o: hashtbl.c hashtbl.h
	cc -o hashtbl.o -Wall -pedantic -g -c hashtbl.c

main.o: main.c hashtbl.h
	cc -o main.o -Wall -pedantic -g -c main.c
Download code
Views
Personal tools