Get a Pentest and security assessment of your IT network.

Cyber Security

Regex & Input Risks: A Security Guide

TL;DR

Yes! Letting users provide both the regular expression and the input to be matched against it is extremely dangerous. It can lead to Denial of Service (DoS) attacks, and potentially allow attackers to execute arbitrary code on your server. This guide explains why and how to fix it.

Why it’s risky

Regular expressions (regex) are powerful tools for pattern matching. However, badly written regex can take a very long time to process – especially against certain inputs. This is called ‘ReDoS’ (Regular Expression Denial of Service). When an attacker controls both the regex and the input, they can craft a combination that causes your server to hang or crash.

How attackers exploit this

Imagine you have a function like this (example in Python):

def match_string(regex, input_string):
  import re
  try:
    re.search(regex, input_string)
    return True
  except Exception as e:
    return False

An attacker could supply a regex like ^(a+)+$ and an input string of just "b". This seems harmless, but the regex engine will try many different combinations to match, leading to exponential processing time.

How to protect yourself

  1. Avoid letting users supply regex directly whenever possible. The best solution is to avoid this entirely. Use pre-defined patterns or a limited set of options that you control.
  2. If you *must* allow user input, sanitise and validate it rigorously:
    • Limit complexity: Restrict the length of the regex string. A long regex is more likely to be problematic.
    • Disallow backtracking features: Features like backreferences (1), possessive quantifiers (++, *+) and nested quantifiers are common causes of ReDoS. Blacklist these from user-supplied regex.
    • Character class restrictions: Limit the characters allowed in the regex to a safe set. Avoid allowing metacharacters like [, ], ^, $ without careful escaping and validation.
    • Timeouts: Set a maximum execution time for the regex matching operation. This will prevent long-running attacks from taking down your server.
  3. Use a safe regex engine or library: Some regex engines are more resistant to ReDoS than others. Consider using libraries specifically designed with security in mind.

Example Timeout Implementation (Python)

Here’s how you can add a timeout to the Python example:

import re
import signal

def match_string(regex, input_string, timeout=1):
  def handler(signum, frame):
    raise TimeoutError("Regex execution timed out")

  signal.signal(signal.SIGALRM, handler)
  signal.alarm(timeout) # Set the alarm for 'timeout' seconds

  try:
    re.search(regex, input_string)
    return True
  except TimeoutError as e:
    print("Regex timed out!")
    return False
  except Exception as e:
    return False
  finally:
    signal.alarm(0) # Disable the alarm

This code sets a 1-second timeout for the regex execution. If it takes longer than that, a TimeoutError is raised.

Testing

  • ReDoS testing tools: Use online ReDoS testers (search for ‘redos tester’) to check if your allowed regex patterns are vulnerable with various inputs.
  • Fuzzing: Generate random regex and input combinations to try and find problematic cases.
Related posts
Cyber Security

Zip Codes & PII: Are They Personal Data?

Cyber Security

Zero-Day Vulnerabilities: User Defence Guide

Cyber Security

Zero Knowledge Voting with Trusted Server

Cyber Security

ZeroNet: 51% Attack Risks & Mitigation