What All This MD5 Hash Stuff Actually Means [Technology Explained]

In a recent article about checking whether you were affected by Gawker's hacking incident, one of the steps involved converting your email address into an MD5 hash.

We had a few questions from readers asking exactly what was going on, and why this process was necessary. It's not our style to leave you guys asking questions, so here's a full run-down of MD5, hashing and a small overview of computers and cryptography.

Cryptographic Hashing

MD5 stands for Message Digest algorithm 5, and was invented by celebrated US cryptographer Professor Ronald Rivest in 1991 to replace the old MD4 standard. MD5 is simply the name for a type of cryptographic hashing function Ron came up with, way back in '91.

The idea behind cryptographic hashing is to take an arbitrary block of data and return a fixed-size "hash" value. It can be any data, of any size but the hash value will always be fixed. Try it for yourself here.

Cryptographic hashing has a number of uses, and there are a vast number of algorithms (other than MD5) designed to do a similar job. One of the main uses for cryptographic hashing is for verifying the contents of a message or file after transfer.

If you've ever downloaded a particularly large file (Linux distributions, that sort of thing) you'll probably have noticed the hash value that accompanies it. Once this file has been downloaded, you can use the hash to verify that the file you downloaded is in no way different to the file advertised.

The same method works for messages, with the hash verifying that the message received matches the message sent. On a very basic level, if you and a friend have a large file each and wish to verify they're exactly the same without the hefty transfer, the hash code will do it for you.

Hashing algorithms also play a part in data or file identification. A good example for this is peer to peer file sharing networks, such as eDonkey2000. The system used a variant of the MD4 algorithm (below) which also combined file's size into a hash to quickly point to files on the network.

A signature example of this is in the ability to quickly find data in hash tables, a method commonly used by search engines.

Another use for hashes is in the storage of passwords. Storing passwords as clear text is a bad idea, for obvious reasons so instead they are converted to hash values. When a user inputs a password it is converted to a hash value, and checked against the known stored hash. As hashing is a one-way process, provided the algorithm is sound then there is theoretically little chance of the original password being deciphered from the hash.

Cryptographic hashing is also often used in the generation of passwords, and derivative passwords from a single phrase.

Message Digest algorithm 5

The MD5 function provides a 32 digit hexadecimal number. If we were to turn 'makeuseof.com' into into an MD5 hash value then it would look like: 64399513b7d734ca90181b27a62134dc. It was built upon a method called the Merkle"“DamgÃ¥rd structure (below), which is used to build what are known as "collision-proof" hash functions.

No security is everything-proof, however and in 1996 potential flaws were found within the MD5 hashing algorithm. At the time these were not seen as fatal, and MD5 continued to be used. In 2004 a far more serious problem was discovered after a group of researchers described how to make two separate files share the same MD5 hash value. This was the first instance of a collision attack being used against the MD5 hashing algorithm. A collision attack attempts to find two arbritary outputs which produce the same hash value - hence, a collision (two files existing with the same value).

Over the next few years attempts to find further security problems within MD5 took place, and in 2008 another research group managed to use the collision attack method to fake SSL certificate validity. This could dupe users into thinking they are browsing securely, when they are not. The US Department of Homeland Security announced that: "users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use".

Despite the government warning, many services still use MD5 and as such are technically at risk. It is however possible to "salt" passwords, to prevent potential attackers using dictionary attacks (testing known words) against the system. If a hacker has a list of random often-used passwords and your user account database, they can check the hashes in the database against those on the list. Salt is a random string, which is linked to existing password hashes and then hashed again. The salt value and resulting hash is then stored in the database.

If a hacker wanted to find out your users' passwords then he would need to decipher the salt hashes first, and this renders a dictionary attack pretty useless. Salt does not affect the password itself, so you must always choose a hard-to-guess password.

Conclusion

MD5 is one of many different methods of identifying, securing and verifying data. Cryptographic hashing is a vital chapter in the history of security, and keeping things hidden. As with many things designed with security in mind, someone's gone and broken it.

You probably won't have to worry too much about hashing and MD5 checksums in your daily surfing habits, but at least now you know what they do and how they do it.

Ever needed to hash anything? Do you verify the files you download? Do you know of any good MD5 web apps? Let us know in the comments!

Intro image: Shutterstock