Depth-first search (Python)
From LiteratePrograms
Depth-first search is simple to implement in a procedural language like Python because it always expands the node most recently seen, making it ideal to implement with a stack (such as the procedural call stack).
Contents |
Graph representation
Since we'll need to traverse through all the children of each node we visit, we choose an adjacency-list representation for the graph, creating a list of successor vertices. Each vertex is initialized with its vertex data or label.
<<vertex class>>= class Vertex: def __init__(self, data): self.data = data self.successors = []
Depth first search
The search will take three parameters:
- A start node, indicating the first node to expand
- A goal predicate, which determines whether a given vertex is a goal vertex (a vertex on which the search stops successfully)
- A partial result stack, describing the path (sequence of vertices) that the search has followed so far
In outline, the search looks like this:
<<depth first search>>= def depthFirstSearch(start, isGoal, result): ensure we're not stuck in a cycle result.append(start) check if we've found the goal expand each child node in order, returning if we find the goal # No path was found result.pop() return False
To expand each child, we simply call depth first search recursively on each child in order. If any of them return true, the final path will be in result
and we also return True:
<<expand each child node in order, returning if we find the goal>>= for v in start.successors: if depthFirstSearch(v, isGoal, result): return True
As our terminating condition, we check if the start node is a goal vertex by applying the goal predicate to it. If so, we return True immediately and the result is propagated back up to the initial caller:
<<check if we've found the goal>>= if isGoal(start): return True
Finally, if the graph we're processing might contain directed cycles (it's not a directed acyclic graph), we need to ensure we don't get stuck in an infinite recursion following a cycle. To do this, we simply search the list of visited nodes in result
for the start node and return False immediately if we've followed a cycle:
<<ensure we're not stuck in a cycle>>= if start in result: return False
It's important of course that this be done before appending start
to result
. This completes the implementation.
Sample code
This small test driver demonstrates use of the search function on the Petersen graph, a small graph with 10 vertices and diameter 2. We'll see that the path found by depth-first search can be much longer than the minimum 2 vertices.
First we have the code that generates the graph as a simple vector of vertices, labelling the vertices with sequential integers:
<<generate Petersen graph>>= def petersenGraph(): v = [Vertex(i) for i in xrange(10)] add Petersen graph edges return v
Now we add the edges. We simply create a table of the edges and iterate over it. Note that since our data structure describes directed graphs, we need an edge in each direction for each edge of the graph:
<<add Petersen graph edges>>= edges = \ [(0,1), (1,0), (1,2), (2,1), (2,3), (3,2), (3,4), (4,3), (4,0), (0,4), (5,6), (6,5), (6,7), (7,6), (7,8), (8,7), (8,9), (9,8), (9,5), (5,9), (5,0), (0,5), (6,2), (2,6), (7,4), (4,7), (8,1), (1,8), (9,3), (3,9)] for a, b in edges: v[a].successors.append(v[b])
Now we can complete our sample:
<<main method>>= if __name__ == "__main__": v = petersenGraph() path = [] if depthFirstSearch(v[0], (lambda v: v.data == 7), path): print "Found path:", [u.data for u in path] else: print "No path found" <<vertex.py>>= vertex class depth first search generate Petersen graph main method
When run, this produces the following path:
Found path: [0, 1, 2, 3, 4, 7]
This is far from the optimal path, which is "0 4 7".
Improvements and tradeoffs
On large graphs, the cycle-checking step can become expensive, because the result list may become very long. One solution is to also maintain a set data structure holding the nodes visited so far. Another more common solution is to give each node a visited flag, which has the advantage that no node is ever expanded twice, but has the disadvantage that we need to make an initial pass to clear the flag on every node of the graph. Visited flags also make it difficult to make the algorithm threadsafe.
Another issue on large graphs is that depth-first search can easily form a very long call chain following a useless path. It's useful to include a threshold on how deep we want to search. By searching with incrementally larger thresholds, we get iterative deepening depth-first search, which finds much shorter paths in general than ordinary depth-first search.
Download code |