Betti numbers (Python)
From LiteratePrograms
This article is an exercise inspired by SIGFPE's exploration of Algebraic Topology in Haskell. Please feel free to expand upon it.
Contents |
theory
The Betti numbers are a topological invariant that allow us to measure either the number of holes of various dimensions present in a structure, or equivalently, the number of times the structure loops back upon itself.
practice
boundaries
The boundary of a n-simplex consists of a chain of oriented (n-1)-simplices. We represent these chains as a dictionary of orientations keyed by the simplices.
- matrix constructs an outer product, connecting each basis element in b0 to a basis element in b1
- boundary is a straightforward translation of
- dmatrix builds the matrix connecting the boundary of the n-simplices in a complex to the (n-1)-simplices.
<<boundary>>= matrix = lambda b1,b0,op: [[op(x,y) for x in b1] for y in b0] boundary = lambda s: dict([(tuple(s[:i]+s[i+1:]), (-1)**i) for i in range(len(s))]) dmatrix = lambda c,n: matrix(grade(c,n), (grade(c,n-1) or [[None]]), lambda x,y: boundary(x).get(tuple(y),0))
The boundary of any n-simplex is an n-1 dimensional cycle.
Betti numbers
However, there may also be cycles which do not arise as a boundary of higher-dimensional elements of the complex. (a topological space contains all unions of its subsets. Complexes may be missing some unions (alternatively, containing albeit with multiplicity 0), and it is the presence of these holes (alternatively, the absence of connectivity), which the Betti numbers measure)
This is also a straightforward calculation comparing the kernel of each with the image of — any rank deficiency in the -matrices contributes to the final result.
<<betti numbers>>= def betti(c): d = ctdim(c) ranks = array([rank(dmatrix(c,n)) for n in range(d)]+[0]) dims = array([ len(grade(c,n)) for n in range(d)]+[0]) return dims[:-1] - ranks[:-1] - ranks[1:]
At 0-dimensions, the Betti number counts connected components, and higher values count the n-holes. These are topologically invariant which means that if one changes the mesh without altering its connectivity, the Betti numbers remain unchanged.
To test this, we include some barycentric subdivision code. The new members of a simplex in the subdivision are formed by combining the existing vertices. In a geometric context, this would be done using affine coordinates, here we simply pack old vertices into tuples to form new vertices. Each new element of the simplex forms a chain, consisting of one old vertex and ascending to the largest combination of vertices in that dimension: eg. [0, (0,1), (0,1,2)] might be a typical 2-simplex in a subdivision.
<<barycentric subdivision>>= def bary(c): def chains(xs): if len(xs) < 2: return [xs] else: return [cs+[tuple(xs)] for ps in subsets(xs)[:-1] for cs in chains(ps)] return reduce(concat,map(chains,c))
The Haskell implementation is notable here for the use of the List monad to succinctly generate multiple values at each dimension. Not having monadic dos in Python, we must make the reduce(concat,map(...,...))
explicit.
wrapping up
We will require a few supporting definitions.
miscellanea
Following Iverson, we use the binary expansion to generate masks for all possible subsets of d items, at which point subset is simply a selection with each of those masks. Being in counting order, the result always starts with the empty subset and ends with the improper subset.
<<ancillary>>= binary = lambda d: (lambda a: (a[0]/(2**a[1])) % 2)(indices((2**d,d))) subsets = lambda l: [list(repeat(l,r)) for r in binary(len(l))]
We are lazy, and outsource rank (the mathematical rank, ignoring dependent vectors, not the physical size of the matrix) to the LinearAlgebra module. grade picks only the simplices of a given dimension from the complex, and ctdim computes the dimension of the simplex as a whole. (we follow the category theorists here, and hence are off-by-one from the topologists' view of dimension)
<<ancillary>>= rank = lambda m: sum(singular_value_decomposition(array(m))[1] > 1e-12) grade = lambda c, n: [s for s in c if len(s)==n+1] ctdim = lambda c: maximum.reduce(map(len,c))
A few extra definitions:
- the n-ball — completely connected, but ignoring the single empty element, at (topologists') dimension -1.
- the n-sphere — as above, but with a hollow interior (missing the maximal simplex). Again, we take advantage of the ordering of the binary subsets to trim instead of search.
- the euler number, calculated the hard way as the alternating sum of betti coefficients.
<<supplements>>= ball = lambda n: subsets(range(n+1))[1:] sphere = lambda n: subsets(range(n+2))[1:-1] euler = lambda c: (lambda b: sum(b[0::2])-sum(b[1::2]))(betti(c))
a module
Finally we put together a module, using examples and problems from the handout mentioned by SIGFPE as test vectors.
<<betti.py>>= from Numeric import * from LinearAlgebra import singular_value_decomposition from operator import add as concat # after http://sigfpe.blogspot.com/2006/08/algebraic-topology-in-haskell.html ancillary boundary betti numbers barycentric subdivision supplements if __name__ == "__main__": # examples from http://www.reed.edu/~davidp/411/handouts/simplicial.pdf example = [ [1,2,3], [1,2],[1,3],[2,3],[2,4],[3,4], [1],[2],[3],[4],[5]] print "example.", betti(example) print "exercise 1.", betti(example[1:]) point = [[0]] line = [[0],[1],[0,1]] tri = ball(2) torus = [[1],[2],[3],[4],[5],[6],[7],[8],[9], [1,2],[2,3],[1,3], [5,9],[8,9],[5,8], [4,6],[6,7],[4,7], [1,5],[4,5],[1,4], [2,9],[6,9],[2,6], [3,8],[7,8],[3,7], [1,9],[3,9],[1,8], [4,9],[6,8],[5,7], [1,6],[2,7],[1,7], [1,5,9],[1,2,9],[2,3,9],[3,8,9],[1,3,8],[1,5,8], [4,5,9],[4,6,9],[6,8,9],[6,7,8],[5,7,8],[4,5,7], [1,4,6],[1,2,6],[2,6,7],[2,3,7],[1,3,7],[1,4,7]] example2 = [[1],[2],[3],[4],[5],[6], [1,4],[1,6],[1,5],[1,3], [3,4],[2,3],[3,5], [2,4],[4,6], [2,5],[2,6],[5,6], [1,4,6],[2,4,6],[2,3,4],[2,3,5],[1,3,5],[1,5,6],[2,5,6],[1,3,4]] print "exercise 5.", betti(example2), betti(sphere(2)) print "point", betti(point) print "line", betti(line) print "3-sphere", betti(sphere(3)) print "0-sphere", betti(sphere(0)) print "exercise 6.", betti(torus) print "bary(point), point", betti(bary(point)), betti(point) print "bary(line), line", betti(bary(line)), betti(line) print "bary(tri), tri", betti(bary(tri)), betti(tri) btorus=bary(torus) print "size of bary(torus)", len(btorus) print "bary(torus), torus", betti(btorus), betti(torus) print "euler(2-sphere)", euler(sphere(2)) print "euler(torus)", euler(torus)
and, after waiting a second or so for the subdivided torus, it yields the following.
example. [2 1 0] exercise 1. [2 2] exercise 5. [1 0 1] [1 0 1] point [1] line [1 0] 3-sphere [1 0 0 1] 0-sphere [2] exercise 6. [1 2 1] bary(point), point [1] [1] bary(line), line [1 0] [1 0] bary(tri), tri [1 0 0] [1 0 0] size of bary(torus) 324 bary(torus), torus [1 2 1] [1 2 1] euler(2-sphere) 2 euler(torus) 0
Not exactly fast enough for production datasets, but not bad for roughly 30 lines of definitions.