Blog | G5 Cyber Security

MD5 Collision Attacks

TL;DR

You can create two different inputs that produce the same MD5 hash (a collision). This is possible because MD5 isn’t a strong hashing algorithm anymore. If you have some control over one of the inputs, it makes finding a collision much easier than if you had to find it completely randomly.

Understanding the Problem

MD5 creates a 128-bit hash value from any input data. A ‘collision’ happens when two different pieces of data produce the same MD5 hash. Because MD5 is known to have weaknesses, collisions can be found relatively easily using techniques like length extension attacks and precomputed rainbow tables (though these are less common now). The fact that you have partial control over one input significantly simplifies things.

Solution: Length Extension Attack

The most practical approach when you have some control over the initial message is a length extension attack. This allows you to append data to an existing hash and create a new, colliding hash without knowing the original message.

  1. Understand the Setup: You know part of the message (message1) and its MD5 hash (hash1). You also control some additional data you want to append (append_data). The goal is to find a message2 that has the same MD5 hash as message1.
  2. Padding: MD5 requires specific padding. This involves appending ‘1’ followed by enough ‘0’s so the message length becomes a multiple of 512 bits, plus a 64-bit representation of the original message length in little-endian format. You don’t need to calculate this manually; libraries handle it for you.
  3. Length Extension Calculation: The core idea is to use the initial hash (hash1), the original message length, and your append_data to compute a new hash that will collide with the original. This requires understanding how MD5 internally processes data in blocks. Libraries abstract this complexity.
  4. Implementation Example (Python): The following example uses the hashlib library and demonstrates the concept. Note: This is a simplified illustration; real-world implementations might require more robust handling of message lengths and padding.
    import hashlib
    
    def length_extension_attack(message1, hash1, append_data):
      original_length = len(message1) * 8 # Length in bits
      new_message = message1 + append_data
      h = hashlib.md5()
      h.update(message1.encode('utf-8'))
      h.update(bytes([0x80])) # Append '1' bit
      # Pad with zeros to reach 512 bits (adjust as needed)
      padding_length = (448 - original_length) % 512
      h.update(b' ' * padding_length)
      h.update(original_length.to_bytes(8, 'little')) # Append original length in bits
      intermediate_hash = h.digest()
    
      # Now append the new data and calculate the final hash.
      h2 = hashlib.md5(intermediate_hash + append_data.encode('utf-8'))
      final_hash = h2.hexdigest()
      return final_hash
    
    message1 = "hello"
    hash1 = hashlib.md5(message1.encode('utf-8')).hexdigest()
    append_data = "world"
    new_hash = length_extension_attack(message1, hash1, append_data)
    print(f"Original Message: {message1}")
    print(f"Original Hash: {hash1}")
    print(f"Appended Data: {append_data}")
    print(f"New Hash (should collide): {new_hash}")
    
  5. Verification: After calculating the new hash, verify that it matches the original hash. If they match, you’ve successfully created a collision. In practice, finding *useful* collisions might require more sophisticated techniques and tools. The example above is for demonstration only; the ‘world’ append data will not necessarily create a useful collision.

Important Considerations

Exit mobile version