TL;DR
Yes! Letting users provide both the regular expression and the input to be matched against it is extremely dangerous. It can lead to Denial of Service (DoS) attacks, and potentially allow attackers to execute arbitrary code on your server. This guide explains why and how to fix it.
Why it’s risky
Regular expressions (regex) are powerful tools for pattern matching. However, badly written regex can take a very long time to process – especially against certain inputs. This is called ‘ReDoS’ (Regular Expression Denial of Service). When an attacker controls both the regex and the input, they can craft a combination that causes your server to hang or crash.
How attackers exploit this
Imagine you have a function like this (example in Python):
def match_string(regex, input_string):
import re
try:
re.search(regex, input_string)
return True
except Exception as e:
return False
An attacker could supply a regex like ^(a+)+$ and an input string of just "b". This seems harmless, but the regex engine will try many different combinations to match, leading to exponential processing time.
How to protect yourself
- Avoid letting users supply regex directly whenever possible. The best solution is to avoid this entirely. Use pre-defined patterns or a limited set of options that you control.
- If you *must* allow user input, sanitise and validate it rigorously:
- Limit complexity: Restrict the length of the regex string. A long regex is more likely to be problematic.
- Disallow backtracking features: Features like backreferences (
1), possessive quantifiers (++,*+) and nested quantifiers are common causes of ReDoS. Blacklist these from user-supplied regex. - Character class restrictions: Limit the characters allowed in the regex to a safe set. Avoid allowing metacharacters like
[,],^,$without careful escaping and validation. - Timeouts: Set a maximum execution time for the regex matching operation. This will prevent long-running attacks from taking down your server.
- Use a safe regex engine or library: Some regex engines are more resistant to ReDoS than others. Consider using libraries specifically designed with security in mind.
Example Timeout Implementation (Python)
Here’s how you can add a timeout to the Python example:
import re
import signal
def match_string(regex, input_string, timeout=1):
def handler(signum, frame):
raise TimeoutError("Regex execution timed out")
signal.signal(signal.SIGALRM, handler)
signal.alarm(timeout) # Set the alarm for 'timeout' seconds
try:
re.search(regex, input_string)
return True
except TimeoutError as e:
print("Regex timed out!")
return False
except Exception as e:
return False
finally:
signal.alarm(0) # Disable the alarm
This code sets a 1-second timeout for the regex execution. If it takes longer than that, a TimeoutError is raised.
Testing
- ReDoS testing tools: Use online ReDoS testers (search for ‘redos tester’) to check if your allowed regex patterns are vulnerable with various inputs.
- Fuzzing: Generate random regex and input combinations to try and find problematic cases.

