# Kruskal's algorithm (Pascal)

This article describes the implementation of a Minimum Spanning Tree graph algorithm implemented in Pascal.

Kruskal's algorithm is a well-known technique for finding the minimum-cost spanning tree of a graph. This report develops a simple implementation using Pascal, and exhibits the techniques of literate programming, modularization, and robustness.

## Algorithm Description

Kruskal's algorithm works as follows. Initially, the N vertices of a graph G are in sets by themselves (I.E. singleton sets). The edges of the graph are sorted by their weights, in non-decreasing order. The algorithm then examines one edge at a time, and makes a greedy choice as to whether or not the edge should be included in the spanning tree. The idea is that, initially, all N vertices are in trees by themselves, and so adding an edge between two such trees could not introduce a cycle. But if the edge connected two vertices belonging to the same tree, then a cycle would result. Assuming that the edge would not introduce a cycle, it is then added to the spanning tree, and the sets of the two vertices (U,V) constituting the edge are merged.

## Outline of Implementation

The literate approach will be used to develop the implementation, but before that, an idea of the general concepts is in order. The implementation will not sacrifice efficiency, yet at the same time be easy to understand and robust for the user. We'll rely on arrays instead of lists, to make some operations on sets faster, and to simplify the implementation. This will necessitate that the vertices be represented by numbers, and must not exceed a compile-time constant. Since the algorithm simply requires outputting the spanning tree, and the algorithm always makes correct choices (I.E. it never backtracks), we will simply output the constituents of the tree instead of frivolously storing them in an elaborate data structure. The code was compiled and tested with Free Pascal 2.00. Aho et al. provide a faster implementation, but it requires the use of trees, and path-compression: while these are simple ideas, the implementation of the tree algorithms would dominate the development, and so the simplified linked-list representation was chosen.

## The Merge-Find Set

Obviously, an important data structure used in the algorithm is the disjoint-set data structure, which allows us to attain the set to which a vertex is a member, and to join two of these sets into one. There should also be a way to initialize the data structure, so that all vertices are in a set of their own. Since our implementation will be based on arrays, it seems reasonable to represent the disjoint sets as integers, and also to represent the vertices as integers. This leads us to the following outline of our merge-find set:

```<<ModSet.pas>>=
unit ModSet;{\$MODE DELPHI}{\$r+}
interface
const
nVertices = 10;
procedure InitFirst (n : integer);
procedure Merge (a, b : integer);
function Find (x : integer) : integer;
implementation
ModSetImplementation
```

There are several ways to implement this data structure. The most basic would be to simply use an array, where element i contains the number of the set to which i belongs. This would not easily facilitate the merge operation, as the entire array must be scanned to find the elements to move. Our approach is almost as simple (and closely follows that proposed by Aho et al., 1984); it elegantly solves this problem, and is asymptotically faster. The idea is to link the elements belonging to a set, and to have header elements pointing to the first member of a set. In order that the merge operation moves as few elements as possible, a count of the number of elements in the sets is kept as well, and the smaller set is always merged into the larger.

```<<ModSetImplementation>>=
type
MFSetRec = record
count : integer;
firstElement : integer;
end;
names : array[1..nVertices] of record
setName : integer;
nextElement : integer;
end;
end;
var
MFSet : MFSetRec;
ModSetProcedures
begin
end.
```

We'll now develop the implementation of the operations on the MFSet. One of the simplest operations is to initialize a new set to contain just one element. It will be called by the interface procedure InitFirst. The idea is simple: set the name of the set to which the supplied element, will belong, and make its following element null (represented by 0). The header of the new set must then be modified to resemble the situation: its element count should be 1, and its first element should (of course) be the element just inserted.

```<<ModSetProcedures>>=
procedure Initial (a, x : integer);
begin
MFSet.names[x].setName := a;
MFSet.names[x].nextElement := 0;
end;
```

Another trivial operation is Find: since the elements all have fields representing the set to which they belong, this number can be returned immediately:

```<<ModSetProcedures>>=
function Find (x : integer) : integer;
begin
find := MFSet.names[x].setName
end;
```

Let us now deal with Merge. The point of keeping an accurate count of set cardinality was so the smaller set could always be merged into the larger. The first thing our Merge operation will do, then, is to make sure that the bigger set is represented by a and the smaller by b. The elements of b are then successively moved into set a, the two sets are linked, and the cardinality of the set that remains is updated to include the elements that were just merged.

```<<ModSetProcedures>>=
procedure Merge (a, b : integer);
var
i, temp : integer;
begin
begin temp := a; a := b; b := temp end;
while MFSet.names[i].nextElement <> 0 do
begin
MFSet.names[i].setName := a;
i := MFSet.names[i].nextElement
end;
MFSet.names[i].setName := a;
end;
```

The idea is that, once the main program knows how many vertices belong in the graph, it will first initialize the MFSet, and then call various Merge operations. The initialization of the merge-find set simply relies on the prior Initial procedure:

```<<ModSetProcedures>>=
procedure InitFirst (n : integer);
var i : integer;
begin
for i := 1 to n do
Initial (i, i)
end;
```

## Edge Storage

Instead of using a traditional graph data structure (such as an adjacency list or adjacency matrix), my idea is simply to store the edges, entered by the user, into an array, and couple the two vertices and the associated weight into the same record. This would be effectively useless for most graph algorithms, but is practically perfect for our purposes. The reason is that the only operations that must be performed on edges is sorting them according to weight, and examining the edges in this order. The module should also provide mechanisms for returning the number of edges, and also the highest vertex number that was used in an edge (for further communication with ModSet. I provide a SortEdges procedure, which should be called after all edges have been added. This decision is arbitrary--alternatively the new edges could have been inserted in sorted order in the first place. Regardless, the structure of the module is clear:

```<<ModEdges.pas>>=
unit ModEdges;{\$MODE DELPHI}{\$r+}
interface
const nEdges = 1024;
procedure AddEdge (a, b, w : integer);
function GetEdgeU (i : integer) : integer;
function GetEdgeV (i : integer) : integer;
function EdgeCount : integer;
function HighestVertex : integer;
procedure SortEdges;
implementation
ModEdgesImplementation
```

The implementation will mainly consist of the edge type declaration, and the afforementioned functions of the interface:

```<<ModEdgesImplementation>>=
EdgeType
ModEdgesProcedures
begin
numEdges := 0
end.
```

As described previously, the edge data structure will simply be an array, consisting of records with three fields: the two vertices of the edge, and the associated weight.

```<<EdgeType>>=
type
edge = record
u, v, weight : integer;
end;
edgesRec = array[1..nEdges] of edge;
var
edges : edgesRec;
numEdges, vHighest : integer;
```

There is nothing surprising about the procedures below, and so little comment is necessary. Insertion sort was chosen for the sorting algorithm, and operations on edges have been split into two complimentary parts so that integers can be passed instead of edges, removing the dependency on the chosen edge representation from other modules.

```<<ModEdgesProcedures>>=
procedure AddEdge (a, b, w : integer);
begin
numEdges := numEdges + 1;
if a > vHighest then vHighest := a;
if b > vHighest then vHighest := b;
edges[numEdges].u := a;
edges[numEdges].v := b;
edges[numEdges].weight := w
end;
function GetEdgeU (i : integer) : integer;
begin
GetEdgeU := edges[i].u
end;
function GetEdgeV (i : integer) : integer;
begin
GetEdgeV := edges[i].v
end;
function EdgeCount : integer;
begin
EdgeCount := NumEdges
end;
procedure SortEdges;
var i, j : integer;
key : edge;
begin
for j := 2 to EdgeCount do
begin
key := edges[j];
i := j - 1;
while (i > 0) and (edges[i].weight > key.weight) do
begin
edges[i+1] := edges[i];
i := i - 1
end;
edges[i+1] := key
end
end;
function HighestVertex : integer;
begin
HighestVertex := vHighest
end;
```

## Obtaining Input

We'll develop a simple module for handling the input to the program. The input will consist of triples of integers: the first integer represents the first vertex of an edge, the second integer represents the second vertex, and the third integer represents the weight of the edge. The input module should read such triples until EOF, and pass the information to ModEdges for storage. Note that if erroneous input is entered, then Freepascal will raise an exception, which we'll catch in the mainline.

```<<ModInput.pas>>=
unit ModInput;{\$MODE DELPHI}{\$r+}
interface
implementation
uses ModEdges;
var u, v, w : integer;
begin
while not eof do
begin
end
end;
begin
end.
```

## The Mainline

All of the necessary mechanisms are now in place, and so the main routine of the algorithm is now quite simple. It does house the essential step of Kruskal's algorithm though: if two vertices are in disjoint sets, then the edge should be added to the spanning tree (or, in our case, be output on the screen), and the two sets should be merged. Note how try...except blocks are present to catch any exception errors that could be thrown by supporting modules. The input module could throw input errors, the edge-storage module could throw array access errors, and the MFSet module could throw errors related to index violations during set operations.

```<<Mainline.pas>>=
program kruskal (input, output);{\$MODE DELPHI}{\$r+}
uses sysutils, ModSet, ModEdges, ModInput;
var i, u, v : integer;
begin
try
except
exit
end;
try
InitFirst (HighestVertex);
except
writeln ('Error initializing vertices');
exit
end;
SortEdges;
for i := 1 to EdgeCount do
begin
u := GetEdgeU (i);
v := GetEdgeV (i);
if Find(u) <> Find (V) then
begin
writeln (u, '->', v);
Merge (find (u), Find (v))
end
end
end.
```