Base64 (C)

From LiteratePrograms

Jump to: navigation, search

some working (albeit rather minimal) code for Base 64 decoding, RFC3548, originally hacked as a quick fix on a box without a decent mail reader.

theory

This program is almost not worth writing: the core of the Base 64 conversion consists of the isomorphism between 24-bit bitstrings (presented on input as 4 groups of 6 bits and output as 3 groups of 8 bits).

The slight complications are that:

  • strings to be coded are not necessarily multiples of 24 bits; we must be prepared to handle a runt conversion for the last few characters.
  • the 6-bit groups are injected into regular 8-bit bytes, so it may take an arbitrary number of source characters in order to accumulate 4 coding characters.

Note that this is easier than the general metamorphism; we may not have 1:1 correspondence between source characters and groups, but each group is translated independently — we never have to "carry" bits from the computation of one group to that of the next.

practice

Translating a 24 bit chunk is a simple matter of shifting all the bits into the appropriate places. (we use nbytes[phase] to handle runt conversions)

<<translate chunk>>=
xlate(in,phase);
<<define translation>>=
void xlate(unsigned char *in, int phase)
{
                    unsigned char out[3];
                    out[0] = in[0] << 2 | in[1] >> 4;
                    out[1] = in[1] << 4 | in[2] >> 2;
                    out[2] = in[2] << 6 | in[3] >> 0;
                    fwrite(out, nbytes[phase], 1, stdout);
}

We must process the input a character, c, at a time even though most of the time we will be translating it in four-character chunks.

This is because, as mentioned above,

  • the last chunk may be less than four characters (signalled by an equals sign, '=')
  • arbitrary non-coding characters are ignored, so accumulating a chunk may require more than four characters of source text.
<<process input>>=
if(c == '=')    {
                    translate chunk 
                    break;
}
p = strchr(b64, c);
if(p)    {
                    in[phase] = p - b64;
                    phase = (phase + 1) % 4;
                    if(phase == 0)    {
                    translate chunk 
                    in[0]=in[1]=in[2]=in[3]=0;
                    }
}

wrapping up

Finally, we put it all together in a small filter program

<<base64.c>>=
#include <stdio.h>
#include <string.h>
char b64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int nbytes[] = { 3, 1, 1, 2 };
define translation
int main()
{
                    int c, phase;
                    unsigned char in[4];
                    char *p;
                    phase = 0;
                    while((c = getchar()) != EOF)    {
                    process input
                    }
                    return 0;
}

and verify that

TGl0ZXJhdGVQcm9ncmFtcw==

decodes to

LiteratePrograms
Views