Binary search tree (C)

From LiteratePrograms

Jump to: navigation, search
Other implementations: C | C++ | Haskell | Java | Scheme | Standard ML

This article implements a simple binary search tree data structure in C with search, insert, and delete operations. A binary search tree is a tree where each node contains a value, and for each node, the left subtree only contains values less than the value of the node, and the right subtree only contains greater values. This allows to find the node for any value in O(logN) average time.

The binary search tree shall work with any data type, therefore the code takes pointers to void for the data, and relies on an user-supplied comparison function for sorting.

Contents

Node structure

We begin by defining a node structure, which contains left and right child pointers and a pointer to void referring to the node's data:

<<node structure>>=
struct bst_node {
    void* data;
    struct bst_node* left;
    struct bst_node* right;
};

We represent an entire tree using a pointer to its root node; thus this structure doubles as a tree representation. Constructing and destroying nodes is straightforward. Newly constructed nodes initially are leaf nodes, i.e. they have no left or right child:

<<declarations>>=
struct bst_node* new_node(void* data);
<<node constructor>>=
struct bst_node* new_node(void* data) {
    struct bst_node* result = malloc(sizeof(struct bst_node));
    result->data = data;
    result->left = result->right = NULL;
    return result;
}

NULL requires stdlib.h:

<<header files>>=
#include <stdlib.h>

On destroying nodes, it is assumed that the node has already been removed from the tree. Especially there are no subtrees to destroy. Moreover, this search tree implementation doesn't take over ownership of the data values, therefore the data pointer doesn't have to be destroyed either.

<<declarations>>=
void free_node(struct bst_node* node);
<<node destructor>>=
void free_node(struct bst_node* node) {
    free(node);
}

Since our data is generic, the comparison function is supplied by the user via a function pointer and must be the same for all operations. We choose to use a three-way comparator:

<<type definitions>>=
/* Returns negative (left<right), zero (left==right), or positive (left>right). */
typedef int comparator(void* left, void* right);

Search

The most important operation on the tree is searching, which uses the properties of the binary search tree to traverse from the root down towards a given node. In the end it will either find the requested node or prove that it is not in the tree, in which case it returns a pointer to NULL. We return the address of the node pointer instead of just the node pointer to facilitate in-place updates.

<<declarations>>=
struct bst_node** search(struct bst_node** root, comparator compare, void* data);
<<search operation>>=
struct bst_node** search(struct bst_node** root, comparator compare, void* data) {
    struct bst_node** node = root;
    while (*node != NULL) {
        int compare_result = compare(data, (*node)->data);
        if (compare_result < 0)
            node = &(*node)->left;
        else if (compare_result > 0)
            node = &(*node)->right;
        else
            break;
    }
    return node;
}

Insert

Inserting is now simple. We search for the node, and if it is not found, we create a new node and put it in the place the search function expected to find it, which is where it belongs in the tree. This particular insertion function does not allow duplicate values in the tree (set semantics), but one that does would be similar.

<<declarations>>=
void insert(struct bst_node** root, comparator compare, void* data);
<<insert operation>>=
void insert(struct bst_node** root, comparator compare, void* data) {
    struct bst_node** node = search(root, compare, data);
    if (*node == NULL) {
        *node = new_node(data);
    }
}

Deletion

Deletion is the most complicated operation. We focus on deleting nodes rather than values, since search() can be used to find a node based on a value if necessary. Deleting a node with one child or less is simple; we just replace it with its child:

<<declarations>>=
void delete(struct bst_node** node);
<<delete operation>>=
void delete(struct bst_node** node) {
    struct bst_node* old_node = *node;
    if ((*node)->left == NULL) {
        *node = (*node)->right;
        free_node(old_node);
    } else if ((*node)->right == NULL) {
        *node = (*node)->left;
        free_node(old_node);
    } else {
        delete node with two children
    }
}

A node's inorder predecessor is the node whose value immediately precedes the node's value in sorted order. We can find this node by following right child pointers repeatedly starting at the node's left child. Similarly, the inorder successor immediately follows it in sorted order, and can be found by following left child pointers starting at the node's right child.

<<find inorder predecessor>>=
struct bst_node** pred = &(*node)->left;
while ((*pred)->right != NULL) {
    pred = &(*pred)->right;
}

To delete a node with two children, as shown below, we simply exchange the value of the node with either its inorder predecessor or successor, then delete the node that we exchanged with. This preserves the order properties of the tree, and the predecessor/successor nodes always have at most one child, so we already know how to delete them. In this implementation, we choose to consistently use the predecessor.

<<delete node with two children>>=
find inorder predecessor
/* Swap values */
void* temp = (*pred)->data;
(*pred)->data = (*node)->data;
(*node)->data = temp;
delete(pred);

Files

We can place this implementation in a source and header file for reuse:

<<bst.h>>=
#ifndef _BST_H_
type definitions
node structure
declarations
#endif
<<bst.c>>=
header files
#include "bst.h"
search operation
insert operation
delete operation

This completes the implementation.

Views