# Binary search tree (Standard ML)

Other implementations: C | C++ | Haskell | Java | Scheme | Standard ML

This article implements a simple binary search tree data structure in Standard ML with search, insert, and delete operations.

## Node structure

We begin by defining a polymorphic recursive datatype to represent a binary tree of any desired type:

```<<tree datatype>>=
datatype 'a tree = Node of {data:'a, left:'a tree, right:'a tree}
| Leaf
```

Here, "Leaf" nodes do not contain any data or references and merely serve to indicate where the tree ends. They function much like NULL values in a C implementation. An example instance is this int tree containing 1, 2, and 3 with 2 at the root:

```Node{data=2, left=Node{data=1,left=Leaf,right=Leaf},
right=Node{data=3,left=Leaf,right=Leaf}};
```

For each operation, the user will have to supply a comparison function of type `'a * 'a -> order`, which takes two data elements and determines whether the first is less, greater, or equal to the second by returning `LESS`, `GREATER`, or `EQUAL`. The built-in types already have functions like this, such as `Int.compare` and `String.compare`.

## Search

The most important operation on the tree is searching, which uses the properties of the binary search tree to traverse from the root down towards a given node. In the end it will either find the requested node and return `SOME value`, where value is the stored value, or prove that it is not in the tree, in which case it returns `NONE`. These values have type α option. We will have a case for leaves and a case for internal nodes, and use a nested function to avoid unnecessary argument passing.

```<<search operation>>=
fun search(tree : 'a tree, compare, data : 'a) =
let fun s(Leaf) = NONE
| s(Node{data=nodedata,left=left,right=right}) =
(case compare(data, nodedata) of
LESS    => s(left)
| GREATER => s(right)
| EQUAL   => SOME nodedata)
in
s(tree)
end
```

This function is primarily useful in two cases: if you want to know if a value is in the tree, as when using it as a set data structure, or when using a comparator that only uses part of the value, such as the first index of a tuple. We specify some types above because otherwise the type assigned for this function by type inference would be slightly more general than we intended:

```val search = fn : 'a tree * ('b * 'a -> order) * 'b -> 'a option
```

## Insert

Inserting is very similar to searching. Unlike in a procedural language, we can't literally update a child pointer to insert the node, because these references are immutable. Instead, we will create a new version of its parent node which points to the new child. But then since we've modified the parent, we have to create a new version of its parent, and so on up to the root. The effect is that the insertion operation is persistent, meaning that both the old and new state are available through two different root references (but these two versions share most of their state).

```<<insert operation>>=
fun insert(tree : 'a tree, compare, data : 'a) =
let fun i(Leaf) = Node{data=data, left=Leaf, right=Leaf}
| i(Node{data=nodedata,left=left,right=right}) =
(case compare(data, nodedata) of
LESS    => Node{data=nodedata, left=i(left), right=right}
| GREATER => Node{data=nodedata, left=left, right=i(right)}
| EQUAL   => Node{data=nodedata, left=left, right=right})
in
i(tree)
end
```

This is easiest to understand in an inductive way: if we take a tree and add a new node to the correct child subtree, then build a new root node from the updated subtree and the old version of the other child subtree, the result will be a new tree with the node correctly inserted.

This particular implementation does not allow duplicate values in the tree, but simply modifying the "EQUAL" branch above to look like the "LESS" branch would allow this.

## Deletion

Deletion is the most complicated operation, and is also a persistent operation. We begin by locating the subtree rooted at the node we wish to delete, in a manner very similar to the preceding functions, then invoke a `deleteNode()` helper function on it:

```<<delete operation>>=
fun delete(tree : 'a tree, compare, data : 'a) =
let maximum helpers
deleteNode helper function
| d(node as Node{data=nodedata,left=left,right=right}) =
(case compare(data, nodedata) of
LESS    => Node{data=nodedata,left=d(left),right=right}
| GREATER => Node{data=nodedata,left=left,right=d(right)}
| EQUAL   => deleteNode(node))
in
d(tree)
end
```

Deleting a node with no children is easy: we just replace it with a leaf. If it has one child, we just replace it with its child:

```<<deleteNode helper function>>=
and deleteNode(Node{data=nodedata,left=Leaf,right=Leaf})  = Leaf
| deleteNode(Node{data=nodedata,left=Leaf,right=right}) = right
| deleteNode(Node{data=nodedata,left=left,right=Leaf})  = left
| two children case
```

We have no branch for the Leaf case, which should not occur. Some Standard ML compilers will warn about this. We also use "and" instead of "fun" because it's mutually recursive with deleteMax, described later, which precedes it lexically.

A node's inorder predecessor is the node whose value immediately precedes the node's value in sorted order. We can find this node by simply finding the maximum node in the node's left subtree. Similarly, the inorder successor immediately follows it in sorted order, and can be found by finding the minimum node in the node's left subtree. We add two helper functions, one to get the data from the maximum node of a subtree, and one to remove the maximum node, which can safely invoke `deleteNode()` because the maximum node always has at most one child:

```<<maximum helpers>>=
fun valueMax(Node{data=nodedata,right=Leaf,...}) = nodedata
| valueMax(Node{right=right,...})              = valueMax(right)
fun deleteMax(node as Node{data=nodedata,left=left,right=Leaf}) =
deleteNode(node)
| deleteMax(Node{data=nodedata,left=left,right=right}) =
Node{data=nodedata,left=left,right=deleteMax(right)}
```

To delete a node with two children, as shown below, we simply replace the value of the node with that of its inorder predecessor, simultaneously deleting the predecessor. This preserves the order properties of the tree. One can similarly use the successor.

```<<two children case>>=
deleteNode(Node{data=nodedata,left=left,right=right}) =
Node{data=valueMax(left), left=deleteMax(left), right=right}
```

## Files

We can simply place all this code into an .sml file, which when loaded will add the above declarations to the top-level (for now we won't bother with a module):

```<<bst.sml>>=
tree datatype
search operation
insert operation
delete operation
```

This completes the implementation.