The Basics of Crpyto 6: Hashing Functions
- andy1265
- Jun 20, 2022
- 3 min read
Hashing functions are everywhere, if you have ever seen the MD5 checksum for a download or SHA-1/256 written next to a random string these are hashes produced by hashing functions. Hashing functions are ubiquitous in modern technology, they are used in SSH, TLS, digital signatures, crypto mining (hashing functions are the core of the proof of work systems) and numerous other places.
The main difference between hash functions and the other cryptographic systems discussed so far is that hash functions are one way. As such if you take a hash of something the hash alone can not be converted back into it's original format. The only way to recover the value is to try every combination of inputs until you get the correct output. However some hashing functions do have weaknesses which will be discussed later.
There are both cryptographic and non cryptographic hash functions. We will only be covering cryptographic hash functions in the instalment as non cryptographic hash functions are not relevant to the series. However it should be noted that non cryptographic hash functions provide no security and should never be used in place of a cryptographic hash function.
Security In Hash Functions
Hash functions should (ideally) work like a black box, every input returns a unique and unpredictable output. I.e. hashing the value "1" should always return the same result however nothing else should produce that result and knowing the hashed values of "2", "3", "4", "5" etc etc should give no indication of what the hash of "1" will be and any alteration to the value of "1" no matter how minor will alter the hashing functions output.
Preimage Resistance
Preimage resistance essentially means that given a hashed value you can not find the plaintext value. It is important to note that a hash function can not be reversed as any hash can potentially have multiple inputs that would generate that hash, the challenge is in creating a function with so many potential outputs it is impractical to find a collision for any hash. Preimage resistance has another category tied to the initial statement "second preimage resistance". Second preimage resistance ensures that if given a plaintext that results in a specific hash it is not possible to find a different plaintext that produces the same hash.
Collision Resistance
No hashing function is truly collision proof simply due to the fact that hashing functions take inputs longer than the hashes they output. As such if you fed the function every possible hash it can produce and then something 1 byte longer then definitionally there would be a collision. A hash function can be considered collision resistant if a collision is as hard to find as the original message itself.
Building a Hash Function
Hashing is carried out by taking an input and breaking it into smaller chunks and then individually hashing each chunk. Most hashing functions (MD4, MD5, SHA-1, SHA-2, Whirlpool etc) are compression based hashing functions. A compression based hashing function works by splitting the input down into blocks of a set size and then hashing them by combining the previous hash with the next block and hashing the pair, the output hash is then provided as input along with the next block. Most algorithms have a set size for which they hash blocks (e.g. sha-254 has a 512 bit block size) so if an input is less that 512 bits or is not a multiple of 512 bits one of the blocks will need to be padded to fit. Padding is done by appending 1, then 0's then the total length of the input in binary.
Hint: If the binary input 01010 is provided to a hashing function with a block size of 16 the input will be padded to: 0101010000000011 1 Denotes the start of the padding. 0 0's are used to fill up the majority of space. 11 is the total length of the input in binary.
All modern hashing functions in use today are based on block ciphers. In this context the block cipher is taken and used with the plaintext block as the key and the previous hash as the plaintext. As long as the block cipher is secure the resulting compression function will also be secure.
Hash Length Extension Attack
In cryptography and computer security, a length extension attack is a type of attack where an attacker can use the hash of a message block and the length of a message block to calculate the hash of the second message block for an attacker-controlled message block. Algorithms like MD5, SHA-1, and SHA-2 that are based on the compressions systems described above are susceptible to this kind of attack. The SHA-3 algorithm is not susceptible. More information can be found here,
In short, SHA-256 and SHA-512 should be used if you need a hashing function.
Comentarios