TL;DR
Markov chains and Hidden Markov Models (HMMs) can be surprisingly effective for password auditing and cracking, especially against weaker passwords or those following common patterns. They work by learning the probability of character sequences from a sample dataset (e.g., previously cracked passwords). This guide explains how to use them practically.
1. Understanding the Basics
Before we start, let’s quickly cover what these models are:
- Markov Chains: Predict the next character based *only* on the current character (or a small sequence of previous characters). Think of it like this: if you see ‘q’, there’s a high chance the next letter will be ‘u’.
- Hidden Markov Models: More complex. They assume an underlying, hidden state that influences the observed characters. This is useful for passwords with different “modes” – e.g., a name followed by numbers.
2. Gathering Password Data
You’ll need a dataset of cracked or known passwords to train your model. Larger, more diverse datasets are better. Sources include:
- Publicly available password dumps (use responsibly and legally!).
- Hashes from penetration testing engagements (with permission).
- Generated password lists based on common patterns.
3. Building a Markov Chain
Here’s how you can build a simple Markov chain in Python:
import collections
def build_markov_chain(passwords):
chain = collections.defaultdict(list)
for password in passwords:
for i in range(len(password) - 1):
chain[password[i]].append(password[i+1])
return chain
# Example Usage
passwords = ['password', 'passw0rd', 'p@ssword']
markov_chain = build_markov_chain(passwords)
print(markov_chain)
This code creates a dictionary where keys are characters and values are lists of characters that follow them in the training data. The more passwords you feed it, the better the probabilities will be.
4. Generating Passwords from the Markov Chain
Now let’s generate potential passwords:
import random
def generate_password(chain, length):
start_char = random.choice(list(chain.keys())) # Start with a random character
password = [start_char]
for _ in range(length - 1):
next_chars = chain[password[-1]]
if next_chars:
next_char = random.choice(next_chars)
password.append(next_char)
else:
break # Stop if no possible next characters
return ''.join(password)
# Example Usage
length = 8
generated_password = generate_password(markov_chain, length)
print(generated_password)
This function starts with a random character and iteratively adds characters based on the probabilities learned from the chain. It stops when it can’t find a valid next character.
5. Using Hidden Markov Models (HMMs)
HMMs are more complex to implement but offer better results for structured passwords. You’ll need a library like hmmlearn:
- Install:
pip install hmmlearn - Data Preparation: Convert your passwords into sequences of observations (characters). You’ll also need to define the hidden states (e.g., ‘name’, ‘number’, ‘symbol’).
- Training: Use
hmmlearn.HMMto train a model on your prepared data. - Generation: Use the trained model’s
sample()method to generate passwords.
A full HMM example is beyond the scope of this guide due to its complexity, but the hmmlearn documentation provides excellent tutorials.
6. Password Auditing Tools
Several tools integrate Markov chain-based password cracking:
- John the Ripper: Supports custom wordlists generated from Markov chains.
- Hashcat: Can use rule-based attacks that incorporate character frequency and sequence probabilities (similar to Markov chains).
7. Important Considerations
- Dataset Quality: The quality of your training data is crucial.
- Password Length: Longer passwords are harder to crack with these methods.
- Complexity: Passwords with diverse character sets (upper/lowercase, numbers, symbols) are more resistant.
- Legal and Ethical Use: Only use these techniques on systems you have explicit permission to test.