Hashes and Computer Security
What is a Cryptographic Hash?
A hash function is a one-way function which takes an input of any length and returns an output of a given length (e.g. 16 bytes) regardless of the size of the input. As a one way function, we can make it very hard to work out the input if you are only given the output. Cryptographic hash functions aim to make it impossible to identify the input from the output. In reality, they may not be impossible to break; they just make it computationally infeasible. Because of this property, hashes are often used to store sensitive information (such as passwords). If someone stole a database of password hashes, they still shouldn't be able to identify the passwords.
The fact that hashes map data of different sizes to a fixed length is also useful and means that we can use them to confirm the integrity of data. We can do this by sending a hash of the data with the data itself so that the recipient can perform the same hash function over the received data and confirm that the hash they get matches the hash calculated by the sender.
Hash Function Examples
- SHA-256, SHA-512
Not all hashes are secure. For example, MD5 is now considered broken – it is relatively easy for a modern computer to work out the input to an MD5 hash based on a known output.
With less secure hashes, there are also situations where different inputs cause the same output are called collisions.
One weakness of hashes can be the fact that a given input will always generate a given output. For example, if multiple uses have the same password (e.g. passw0rd), then this will always hash to the same value – which may help an attacker with access to a hash database work out passwords. It also means that an attacker may create a reference table (a rainbow) by hashing all common passwords.
To overcome this, we can combine a random ‘salt’ with each individual password. So that we can validate the password, we simply store the salt as well as the hash value.
However, for production systems, this still may not be enough. If your database is compromised then it is expected that an attacker will also have access to your salt. Although they won't have an existing rainbow table for this scenario, the power of modern computers (and particularly graphics cards) mean that it can be relatively quick to calculate a lot of hashes - for example, starting with a list of common passwords.
Therefore it is best to use a 'Key Derivation Function' which cannot be quickly computed because of the amount of memory or CPU cycles required by the algorithm. The algorithm should be tuned so that it isn't too slow for users, but it is slow enough that an attacker couldn't retrieve passwords at scale.
Different hashes have different properties – for example MD5 hashes are 128 bits long whilst SHA-256 hashes are 256 bits. Even different systems using the same hashing algorithm (e.g. SHA-256) may have a different way of representing the hash due to the way salts are used etc.
You can find information about hash formats on the internet (e.g. Wordpress documentation will detail the format used by Wordpress). There are also tools which will analyse a given hash and suggest what may have generated it. These include hash-identifier and hashID.