TL;DR
This guide shows how attackers bypass input validation and what you can do to prevent it. We’ll cover common techniques and practical fixes.
What is Input Validation?
Input validation checks user-supplied data before your application uses it. It makes sure the data is in the expected format, length, and type. Without it, attackers can send malicious input that could cause problems like:
- SQL Injection: Running unwanted database commands.
- Cross-Site Scripting (XSS): Injecting harmful scripts into websites.
- Command Injection: Executing system commands on your server.
How Attackers Bypass Validation
Attackers use various methods to trick applications into accepting bad data. Here are some common techniques:
1. Encoding and Obfuscation
- URL Encoding: Replacing characters with their percent-encoded equivalents (e.g., space becomes %20).
- HTML Encoding: Using HTML entities (e.g., < for <, > for >).
- Unicode Encoding: Using different Unicode representations of the same character.
Example: An application might block the ‘<' character but allow its Unicode equivalent <.
2. Case Manipulation
Some validation rules are case-sensitive. Attackers can try different casing to bypass checks.
Example: If a rule blocks ‘SELECT’, an attacker might use ‘Select’ or ‘sElEcT’.
3. Whitespace and Comments
- Extra Spaces: Adding extra spaces before, after, or within the input.
- SQL Comments: Using SQL comments (e.g.,
--,/* ... */) to hide malicious code.
Example: An attacker might use ‘SELECT * FROM users –‘ to comment out the rest of a query.
4. Using Alternative Syntax
Some languages or databases allow multiple ways to achieve the same result. Attackers can exploit these alternatives.
Example: In SQL, using hexadecimal representation for strings (e.g., 0x73656c656374 instead of ‘select’).
5. Input Length Manipulation
- Truncation: If validation checks length but doesn’t handle partial input correctly, attackers can truncate the input to bypass filters.
- Padding: Adding characters at the end of a string to exceed length limits while still including malicious code.
Example: An application might limit input to 50 characters but not properly sanitize the first 49.
6. Double Encoding
Encoding the same character multiple times can sometimes bypass filters that only decode once.
Example: URL encoding a string twice (e.g., space -> %2520).
How to Prevent Input Validation Bypass
- Use Whitelisting, Not Blacklisting: Define what is allowed instead of trying to block everything bad. This is much more secure.
- Proper Encoding/Escaping: Encode data based on the context where it’s used (HTML, URL, SQL, etc.). Use built-in functions for this; don’t write your own!
- PHP Example:
htmlspecialchars($userInput, ENT_QUOTES, 'UTF-8')for HTML. - Python Example: Use libraries like
bleachfor safe HTML sanitization. - Parameterized Queries (Prepared Statements): For SQL, always use parameterized queries to prevent SQL injection. This separates data from the query structure.
- Input Length Limits: Enforce reasonable length limits on all inputs.
- Data Type Validation: Ensure the input is of the expected data type (e.g., integer, string, email).
- Regular Expressions: Use regular expressions carefully to validate complex patterns. Be sure your regexes are accurate and don’t have vulnerabilities themselves.
- Canonicalization: Convert inputs to a standard form before validation. This helps prevent bypasses using different representations of the same data.
- Contextual Validation: Validate input differently depending on where it will be used.
- Regular Security Audits and Penetration Testing: Regularly test your application for vulnerabilities, including input validation issues.
# Python example using sqlite3
c = conn.cursor()
c.execute("SELECT * FROM users WHERE username = ?", (username,))
Resources
- OWASP Input Validation Cheat Sheet: https://owasp.org/www-project-input-validation-cheat-sheet

