Hashing: Proving Security Strength

G5 Cyber Security

4 months ago

TL;DR

Hashing is a one-way function used to create a unique ‘fingerprint’ of data. Showing it’s secure means proving it’s hard to reverse (find the original data from the fingerprint) and hard to find two different pieces of data that produce the same fingerprint (collisions). We do this through analysing its properties, looking at collision resistance, pre-image resistance, and second pre-image resistance. Strong hashing algorithms like SHA-256 are widely used because they’ve been extensively tested.

How to Prove a Hashing Scheme is Sufficient

Understand the Core Properties: A good hashing scheme needs three main strengths:
- Pre-image resistance: Given a hash value, it should be practically impossible to find *any* input that produces that hash. Think of it like smashing something – easy to break, hard to rebuild exactly from the pieces.
- Second pre-image resistance: Given an input, it should be practically impossible to find a different input that produces the same hash value as the original.
- Collision resistance: It should be practically impossible to find *any* two different inputs that produce the same hash value. This is the strongest requirement.
Algorithm Choice Matters: Not all hashing algorithms are created equal.
- MD5 & SHA-1: These are now considered broken due to discovered vulnerabilities and should *not* be used for cyber security purposes. Collisions can be found relatively easily.
- SHA-2 Family (SHA-256, SHA-384, SHA-512): These are currently considered secure. SHA-256 is a common choice.
- SHA-3: A different design than SHA-2, offering an alternative if concerns arise about SHA-2 in the future.
Hash Length is Critical: The longer the hash output (e.g., 256 bits for SHA-256), the harder it is to find collisions.
- A shorter hash length means a higher probability of accidental collisions, even with a good algorithm.
- For example, if you have a hash function that produces a 32-bit hash, there are only 2³² possible hash values. With enough inputs, you’re guaranteed to find duplicates (collisions).
Birthday Paradox: This shows how collisions become more likely faster than you might think.
- It states that in a set of randomly chosen people, there’s a surprisingly high probability of two sharing the same birthday. The same applies to hash values.
- For an n-bit hash function, you only need approximately 2^(n/2) inputs before collisions become likely. For SHA-256 (256 bits), this is around 2¹²⁸ inputs – still very large, but achievable with enough computing power.
Testing and Analysis: Algorithms are tested by the cyber security community.
- Differential Cryptanalysis: Attempts to find patterns in how small changes to the input affect the output hash.
- Collision Attacks: Researchers try to deliberately create two different inputs that produce the same hash value. Success indicates a weakness in the algorithm.
- Pre-image attacks: Attempts to reverse engineer the hashing process, finding an input for a given hash.

Salt and Pepper (for Password Hashing): Never store passwords directly! Use salting.

Salting: Add a random string (the ‘salt’) to each password *before* hashing it. This makes pre-computed rainbow table attacks much harder, as attackers need to calculate hashes for every possible salt value.

Example using Python:

import hashlib
import os

def hash_password(password):
    salt = os.urandom(16)
    salted_password = salt + password.encode('utf-8')
    hashed_password = hashlib.sha256(salted_password).hexdigest()
    return salt.hex() + ':' + hashed_password

# Example usage:
pwd = 'mysecretpassword'
hashed = hash_password(pwd)
print(hashed)

Key Derivation Functions (KDFs): For more robust password storage, use KDFs like Argon2, bcrypt or scrypt.
- These are designed specifically for password hashing and include features to slow down attackers (making brute-force attacks more expensive).
- They also incorporate salting automatically.