Parallel prefix (Python)

From LiteratePrograms

Jump to: navigation, search

A quick sketch of monoidal and segmented monoidal parallel prefix operations. cf.

  • Fast incremental regular expression matching with monoids,

Contents

theory

Given an associative operator, , and an array of data a0,a1,...,an, we can calculate the prefix sum logarithmically, as , instead of linearly, as .

cf

practice

Lacking a vectorized python implementation, we provide a virtual vectorization:

<<vector op>>=
from itertools import imap
vect = imap

and then code the parallel prefix scan. Note the interplay between the pairwise application of the operators and the recursion. Instead of tackling the problem linearly, element by element, we are tackling it logarithmically, tackling pairs of subproblems with each recursive step.

<<scan>>=
def scan(op,a):
        print a
        if len(a) > 1:
                a[1::2] = vect(op,a[0::2],a[1::2])
                a[1::2] = scan(op,a[1::2])
                a[2::2] = vect(op,a[1::2],a[2::2])
        print a
        return a

wrapping up

Note that this technique works for arbitrary associative operators: addition, multiplication, string concatenation, etc.

<<parprefix.py>>=
vector op
scan
segmented
if __name__ == "__main__":
        import operator
        scan(operator.add,range(0,16))
        scan(operator.mul,range(1,17))
        scan(operator.add,list("hi world"))

and the pairwise operation of the scan is clear in the output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
[1, 5, 9, 13, 17, 21, 25, 29]
[6, 22, 38, 54]
[28, 92]
[120]
[120]
[28, 120]
[6, 28, 66, 120]
[1, 6, 15, 28, 45, 66, 91, 120]
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
[2, 12, 30, 56, 90, 132, 182, 240]
[24, 1680, 11880, 43680]
[40320, 518918400]
[20922789888000L]
[20922789888000L]
[40320, 20922789888000L]
[24, 40320, 479001600, 20922789888000L]
[2, 24, 720, 40320, 3628800, 479001600, 87178291200L, 20922789888000L]
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600, 6227020800L, 87178291200L, 1307674368000L, 20922789888000L]
['h', 'i', ' ', 'w', 'o', 'r', 'l', 'd']
['hi', ' w', 'or', 'ld']
['hi w', 'orld']
['hi world']
['hi world']
['hi w', 'hi world']
['hi', 'hi w', 'hi wor', 'hi world']
['h', 'hi', 'hi ', 'hi w', 'hi wo', 'hi wor', 'hi worl', 'hi world']

but wait, there's more

Prefix scans also support nested parallelism; adding the capability to "reset" the prefix doesn't affect the underlying associativity of the operator. By adding a boolean flag indicating the start of a subsequence:

<<segmented>>=
def segmented_sum(xps,yps):
        x,xf = xps
        y,yf = yps
        if yf:  return (  y,yf)
        else:   return (x+y,xf)

we can sum in parallel over partial prefixes, thus performing a parallel parallel prefix:

<<parprefix.py>>=
        print [x for (x,f) in
                scan(segmented_sum,zip([7,6,5,4,3,2,1,0],
                                       [1,0,1,0,0,1,0,0]))]

Essentially, the flag provides a barrier preventing the scan from propagating:

[(7, 1), (6, 0), (5, 1), (4, 0), (3, 0), (2, 1), (1, 0), (0, 0)]
[(13, 1), (9, 1), (2, 1), (1, 0)]
[(9, 1), (3, 1)]
[(3, 1)]
[(3, 1)]
[(9, 1), (3, 1)]
[(13, 1), (9, 1), (2, 1), (3, 1)]
[(7, 1), (13, 1), (5, 1), (9, 1), (12, 1), (2, 1), (3, 1), (3, 1)]
[7, 13, 5, 9, 12, 2, 3, 3]

Question: How does the nested parallelism of a segmented scan compare to the vectorized Sieve of Eratosthenes (Bash)?

Download code
Views