Median cut algorithm (J)
From LiteratePrograms
- Other implementations: C++ | J
- If you're looking to find the median of a list of elements, see the nth_element standard library function or Category:Selection_algorithm.
The median cut algorithm is a popular algorithm for color quantization, but can be applied to virtually any point clustering problem. In outline, it works as follows:
- Let B be a set of boxes containing points, initially containing only a single box containing all points.
- While the number of boxes in B is less than desired number of clusters...
- Find the largest side length of any side of any box.
- Cut that box into two boxes along its largest side in such a way that half the contained points fall into each new box.
- Shrink the two new boxes so that they are just large enough to contain their points.
Theory
We follow the above algorithm nearly exactly — although instead of shrinking bounding boxes, we just recalculate them on each pass.
Exercise: Our code is specialized for 3-dimensional color data; which definition can be easily generalized for n-dimensional clustering?
Practice
Because each step of the algorithm produces exactly one more box, and we start with one, we just need to step one fewer times than the number (x) of boxes desired. In J, this is easily accomplished via the iteration operator (^:). We prepare the initial box from an unboxed array (<), and, having taken the mean of each box, > removes the boxes afterwards to produce an unboxed array of the final palette entries.
<<mediancut.ijs>>= mediancut =: dyad : '> mean each step^:(x-1) <y' mean =: +/%#
At each step, we must find the longest side among all our boxes (fmax) and use that information to split the largest box (smax).
<<mediancut.ijs>>= step =: monad : '(fmax y) smax y'
To find the longest side, we first compute the bounding box (bbox) by subtracting the minimum value (<./) from the maximum value (>./) along each dimension and for each box. We then collapse these lengths to a 1D list using the boxed equivalent of ravel (;), sort them (/:) and take the largest ({:).
<<mediancut.ijs>>= fmax =: monad : '{:/: ; bbox each y' bbox =: monad : '(>./y)-(<./y)'
By rotating (|.) the box list (y) according to the box with the maximum dimension (<.x%3), we reduce the problem of splitting the largest box (smax) to that of splitting the first box (sfst) along its largest dimension (3|x).
In turn, the problem of splitting the first box can be reduced to splitting a single box (sbox), operating on the head ({.y) of the box list, and appending the tail (}.y) of the box list to the result.
Finally, we split a single box by ranking its entries according to their values along the desired dimension (x), chopping the entries into those below and those above the median, and using the arrax indexing operator ({) to select the appropriate values from the original list (y).
<<mediancut.ijs>>= sbox =: dyad : '({&y) each chop x rank y' sfst =: dyad : '(x sbox >{.y),(}.y)' smax =: dyad : '(3|x) sfst (<.x%3) |. y'
ranking the entries is easy: we generate the necessary permutation (/:) from the values along the desired dimension (x {"1 y).
chopping is also straightforward: we divide (%) the number of entries (#) in half, and use the resulting index to produce two boxes: one with entries before ({.) the other with entries after ({.) the chopping point. These boxes are joined (;) to produce a single return value.
<<mediancut.ijs>>= rank =: dyad : '/: x {"1 y' chop =: monad : '(<.2%~#y) ({.;}.) y'
Wrapping up
We still need to describe file I/O, but assuming one has the raw rgb values in the array pixeldata, the following command generates a 16-color palette. (<. is the floor function, converting floating-point values to integers for the palette)
<. 16 mediancut pixeldata
The output will be:
127 126 112 135 139 139 167 119 112 172 184 101 188 212 112 180 153 158 211 207 190 243 246 220 15 15 13 45 25 32 44 59 26 88 63 46 98 108 54 76 64 89 126 79 102 125 133 84
I reduced color depth in Gimp using this custom palette. With Gimp's dither, the resulting image is a little nicer than the Photoshop/C++ color reduction:
Download code |