Blog | G5 Cyber Security

Regex & Input Risks: A Security Guide

TL;DR

Yes! Letting users provide both the regular expression and the input to be matched against it is extremely dangerous. It can lead to Denial of Service (DoS) attacks, and potentially allow attackers to execute arbitrary code on your server. This guide explains why and how to fix it.

Why it’s risky

Regular expressions (regex) are powerful tools for pattern matching. However, badly written regex can take a very long time to process – especially against certain inputs. This is called ‘ReDoS’ (Regular Expression Denial of Service). When an attacker controls both the regex and the input, they can craft a combination that causes your server to hang or crash.

How attackers exploit this

Imagine you have a function like this (example in Python):

def match_string(regex, input_string):
  import re
  try:
    re.search(regex, input_string)
    return True
  except Exception as e:
    return False

An attacker could supply a regex like ^(a+)+$ and an input string of just "b". This seems harmless, but the regex engine will try many different combinations to match, leading to exponential processing time.

How to protect yourself

  1. Avoid letting users supply regex directly whenever possible. The best solution is to avoid this entirely. Use pre-defined patterns or a limited set of options that you control.
  2. If you *must* allow user input, sanitise and validate it rigorously:
  • Use a safe regex engine or library: Some regex engines are more resistant to ReDoS than others. Consider using libraries specifically designed with security in mind.
  • Example Timeout Implementation (Python)

    Here’s how you can add a timeout to the Python example:

    import re
    import signal
    
    def match_string(regex, input_string, timeout=1):
      def handler(signum, frame):
        raise TimeoutError("Regex execution timed out")
    
      signal.signal(signal.SIGALRM, handler)
      signal.alarm(timeout) # Set the alarm for 'timeout' seconds
    
      try:
        re.search(regex, input_string)
        return True
      except TimeoutError as e:
        print("Regex timed out!")
        return False
      except Exception as e:
        return False
      finally:
        signal.alarm(0) # Disable the alarm

    This code sets a 1-second timeout for the regex execution. If it takes longer than that, a TimeoutError is raised.

    Testing

    Exit mobile version