MD5 sum (C, OpenSSL)

From LiteratePrograms

Jump to: navigation, search

Hash-digests are a very important tool for security. Here an example one how to use openssl on Unix to calculate the md5 hash code of some file

Requirements

You need some recent Unix and all the needed libraries for OpenSSL. You can get them all either from your Unix distribution or from http://www.openssl.org/ directly

The only part the Unix stuff comes into play is while reading in the whole content of a file. If you replace that with the Windows counterpart then you can use the code for calculating the md5-hash on Windows also

Implementation

<< includes >>= 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <openssl/evp.h>

You can see the stuff is really unix-centric. However one include stick a bit out that the openssl/evp part. This is an abstraction for handling diverse hashes (and beyond). So we are above the using the core facilities and that's usualy a good thing (tm) ;-)

Now the following part can be used for tranferring the digest into a hexadecimal string. So in fact it would deserver a bit more attention.

<< print_as_hex >>=
static void print_as_hex (const unsigned char *digest, int len) {
  int i;
  for(i = 0; i < len; i++){
    printf ("%02x", digest[i]);
  }
}

Now we need the block which really calculates some md5 hash digest.

<< calculate_md5_hash >>=
static void calculate_md5_of(const void *content, ssize_t len){
  EVP_MD_CTX mdctx;
  unsigned char md_value[EVP_MAX_MD_SIZE];
  unsigned int md_len;
  EVP_DigestInit(&mdctx, EVP_md5());
  EVP_DigestUpdate(&mdctx, content, (size_t) len);
  EVP_DigestFinal_ex(&mdctx, md_value, &md_len);
  EVP_MD_CTX_cleanup(&mdctx);
  printf("md5 sum of t1.txt = ");
   print_as_hex 
}


If you use the EVP abstraction all the functions do need tn EVP_MD_CTX context "object". As you see the names for the hashes consists of the EVP_ prefix and a name (in lower letters) of the hash. So in fact I could have written EVP_sha1 as easily.

Maybe you are puzzling why there is the EVP_DigestUpdate call and the EVP_DigestFinal_ex call. Now if you have to build the has of a very large file you probably can not allocate all the neede memory for that at once. So you use some limit, and read the content in blockwise. For calculating the digest you need all the stuff, and so you can call the first function more then once to build the full hash.

After the latter call the digest can not be updated any longer.

Now is just one thing left the main function:

<<md5sum.c>>=
 includes 
 print_as_hex 
 calculate_md5_hash 
int main(void) {
  const char *file_name = "t1.txt";
  struct stat stat_buf;
  int i_rval = 0;
  int in_fd = -1;
  off_t size_of_file;
  ssize_t read_bytes;
  i_rval = stat(file_name, &stat_buf);
  size_of_file = stat_buf.st_size;
  file_content = malloc(size_of_file);
  if (NULL == file_content){
    goto clean;
  }
  in_fd = open(file_name, 0, O_RDONLY);
  if (in_fd < 0 ){
    goto clean;
  }
  /* slurp in all from the file at once */
  read_bytes = read(in_fd, file_content, size_of_file);
  if ( read_bytes < 0 ) {
    fprintf(stderr, "something has gone wrong while reading from the file\n");
    goto clean;
  }
  close(in_fd);
  calculate_md5_of(file_content, size_of_file);
  free(file_content);
  return 0;
 clean:
  if (file_content) free(file_content);
  if (in_fd > 0) close(in_fd);
  exit(EXIT_FAILURE);
}

That is simple POSIX-C programming, as written before the unix stuff is just used to get the size of the file, because I know it's a small file I can read it in all at once. And calculate the hash on it.

With the following content of t1.txt:

bla      this or that my or may
End or start free or not
Out or in 
away for now
sense this doesn't make
what's fud?
Should I go?

Do I get the following output: ./a.out md5 sum of t1.txt = f8f0cd185c115b9dbbc5c72ada9f8e4b

Now we can cross check with another utility:

md5sum t1.txt

f8f0cd185c115b9dbbc5c72ada9f8e4b t1.txt

Download code
Views