TL;DR
Input filters block characters that could cause problems (like website hacking). This guide shows common ways attackers get around these filters, and how to protect against them. It’s aimed at anyone managing web applications or security.
Understanding Metacharacter Filtering
Metacharacters are special symbols with extra meaning (e.g., <, >, "). Filters try to remove or escape these characters to prevent attacks like cross-site scripting (XSS) and SQL injection.
Bypass Techniques
- Character Encoding:
- HTML Entities: Replace characters with their HTML entity codes. For example,
<for<,>for>,&for&. - URL Encoding: Use percent encoding (e.g.,
%3Cfor<). Useful when submitting data in URLs. - Unicode Encoding: Use Unicode characters that are visually similar to the blocked character. For example, using a full-width angle bracket instead of a standard one.
<).< as %26lt;.- JavaScript Encoding: If the input is used in JavaScript, try using JavaScript-specific escape sequences (e.g.,
x3Cfor<). - SQL Injection Alternatives: For SQL injection, explore different ways to achieve the same result without using common keywords like
SELECTorUNION.
- Adding Whitespace: Insert spaces around metacharacters (e.g.,
< tag >). Some filters don’t handle whitespace correctly. - Using Comments: Inject comments to break up the filter’s pattern matching (e.g.,
<!-- comment -->tag<!-- comment -->).
- Attribute Context: If the input goes into an HTML attribute, you might need to use different techniques than if it’s placed directly in the body of the page. For example, using single quotes instead of double quotes within an attribute.
- JavaScript Context: Different encoding methods are needed for JavaScript code.
Example: Bypassing a Simple HTML Tag Filter
Let’s say the filter blocks <script>.
Input: <script>alert('XSS')</script> (Blocked)
Bypass 1 (HTML Entities): <script>alert('XSS')</script> (May work)
Bypass 2 (Case Variation): <Script>alert('XSS')</Script> (May work if case-insensitive filtering isn't used)
Protecting Against Bypasses
- Input Validation:
- Whitelist Approach: Only allow specific, known-good characters or patterns. This is the most secure method.
- Blacklist Avoidance: Blacklists are easily bypassed. Use them only as a secondary defense.