Before you start
Objectives: Learn what is hashing, why do we use it, and common hashing algorithms.
Prerequisites: you have to understand what is asymmetric cryptography.
Key terms: hash, value, function, hashing, message, key, receiver, algorithm
What is Hashing
In hashing cryptography we use one-way hash function to protect the integrity of data. A one-way hash function is a mathematical function that takes a variable length string (or message or file) and transforms it into a fixed length value. The fixed length output is typically referred to as a hash value. Hash function can be performed in one direction only. The algorithm that performs the hashing is not secret. The secret lies in a secret value which protects the method, and in the fact that the the hashing function can be performed in only one way. It is impossible to recreate the original data from the hash value.
The hash value can sometimes be referred to as fingerprint, message digest, and checksum, depending in which context the hash function is used.
Hashing Algorithms
The most popular hashing algorithms are:
- Message Digest (MD-5, MD-4, MD-2) – produces 128 bit hash value.
- SHA-1 – stands for Secure Hash Algorithm, produces 160 bit hash value.
- Haval – improvement of MD-5, produces 128 bit hash value.
Usage
Hash values are often used to detect if the data integrity has been compromised. For example, we know that the same file (or message) will always produce the same hash value if we use the same hash function. At the same time, two different files (or messages) should never produce the same hash value (in theory this is actually possible, but the probability is low). To check the integrity of files we often use the Message Digest function.
For example, to check if the file was changed during transmission we can use the asymmetrical cryptography system and a hashing function. So, we have a sender with his own private key and a public key. The receiver also has a private and a public key. The sender takes his message and performs a hash function on it to receive a hash value. Then it takes that hash value and encrypts it with his private key to receive the encrypted hash value. The encrypted hash value is then attached to the end of the original message, and then sent to the receiver.
Sender
The receiver takes the message, strips off the encrypted hash value, and uses the public key of the sender in order to receive the actual hash value. The receiver then performs the same hashing function on the original received data. If the result of the hashing function which is performed on the receiver is the same as the received, decrypted hash value, the receiver can be sure that the message was not modified. The whole process also shows that the sender is the one who sent the message.
Hashing is also used to securely store user passwords in databases. However, in this case MD and SHA hash algorithms are not really suited since they are susceptible to brute force attacks. MD hash algorithms should be used for the purposes of checking if two messages (files), are the same, and not for storing user passwords. For password storage we should only use salted hashes. A salt is a bit of additional data which makes hashes more difficult to crack. Beware that there are a number of online services which provide extensive lists of pre-computed hashes, as well as the original input for those hashes. The use of a salt makes it impossible to find the resulting hash in one of these lists.