Parallel prefix (Python)
From LiteratePrograms
A quick sketch of monoidal and segmented monoidal parallel prefix operations. cf.
- Fast incremental regular expression matching with monoids,
Contents |
theory
Given an associative operator, , and an array of data a0,a1,...,an, we can calculate the prefix sum logarithmically, as , instead of linearly, as .
cf
practice
Lacking a vectorized python implementation, we provide a virtual vectorization:
<<vector op>>= from itertools import imap vect = imap
and then code the parallel prefix scan. Note the interplay between the pairwise application of the operators and the recursion. Instead of tackling the problem linearly, element by element, we are tackling it logarithmically, tackling pairs of subproblems with each recursive step.
<<scan>>= def scan(op,a): print a if len(a) > 1: a[1::2] = vect(op,a[0::2],a[1::2]) a[1::2] = scan(op,a[1::2]) a[2::2] = vect(op,a[1::2],a[2::2]) print a return a
wrapping up
Note that this technique works for arbitrary associative operators: addition, multiplication, string concatenation, etc.
<<parprefix.py>>= vector op scan segmented if __name__ == "__main__": import operator scan(operator.add,range(0,16)) scan(operator.mul,range(1,17)) scan(operator.add,list("hi world"))
and the pairwise operation of the scan is clear in the output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [1, 5, 9, 13, 17, 21, 25, 29] [6, 22, 38, 54] [28, 92] [120] [120] [28, 120] [6, 28, 66, 120] [1, 6, 15, 28, 45, 66, 91, 120] [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] [2, 12, 30, 56, 90, 132, 182, 240] [24, 1680, 11880, 43680] [40320, 518918400] [20922789888000L] [20922789888000L] [40320, 20922789888000L] [24, 40320, 479001600, 20922789888000L] [2, 24, 720, 40320, 3628800, 479001600, 87178291200L, 20922789888000L] [1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600, 6227020800L, 87178291200L, 1307674368000L, 20922789888000L] ['h', 'i', ' ', 'w', 'o', 'r', 'l', 'd'] ['hi', ' w', 'or', 'ld'] ['hi w', 'orld'] ['hi world'] ['hi world'] ['hi w', 'hi world'] ['hi', 'hi w', 'hi wor', 'hi world'] ['h', 'hi', 'hi ', 'hi w', 'hi wo', 'hi wor', 'hi worl', 'hi world']
but wait, there's more
Prefix scans also support nested parallelism; adding the capability to "reset" the prefix doesn't affect the underlying associativity of the operator. By adding a boolean flag indicating the start of a subsequence:
<<segmented>>= def segmented_sum(xps,yps): x,xf = xps y,yf = yps if yf: return ( y,yf) else: return (x+y,xf)
we can sum in parallel over partial prefixes, thus performing a parallel parallel prefix:
<<parprefix.py>>= print [x for (x,f) in scan(segmented_sum,zip([7,6,5,4,3,2,1,0], [1,0,1,0,0,1,0,0]))]
Essentially, the flag provides a barrier preventing the scan from propagating:
[(7, 1), (6, 0), (5, 1), (4, 0), (3, 0), (2, 1), (1, 0), (0, 0)] [(13, 1), (9, 1), (2, 1), (1, 0)] [(9, 1), (3, 1)] [(3, 1)] [(3, 1)] [(9, 1), (3, 1)] [(13, 1), (9, 1), (2, 1), (3, 1)] [(7, 1), (13, 1), (5, 1), (9, 1), (12, 1), (2, 1), (3, 1), (3, 1)] [7, 13, 5, 9, 12, 2, 3, 3]
Question: How does the nested parallelism of a segmented scan compare to the vectorized Sieve of Eratosthenes (Bash)?
Download code |