Get a Pentest and security assessment of your IT network.

Cyber Security

Translation Files: A Security Risk?

TL;DR

User-provided translation strings can be a serious cyber security risk. Malicious code can be hidden within these strings, leading to cross-site scripting (XSS), remote code execution (RCE) or other attacks. This guide explains how to identify and mitigate this threat.

Understanding the Problem

Many applications allow users to contribute translations for different languages. While helpful for global reach, these submissions aren’t always vetted properly. Attackers can exploit this by injecting harmful code into translation strings that will be executed when displayed to other users.

Solution: Protecting Against Malicious Translations

  1. Input Validation & Sanitisation: This is your first line of defence.
    • Character Restrictions: Limit the allowed character set. Don’t allow HTML tags or special characters unless absolutely necessary and properly escaped (see step 3).
    • Length Limits: Impose reasonable length limits on translation strings to prevent overly long, potentially malicious inputs.
    • Regular Expression Filtering: Use regular expressions to block common attack patterns. For example, you might block <script> tags or JavaScript functions.
      # Example Python regex (basic) - blocks  tag"
      if re.search(r'<script>', translation, re.IGNORECASE):
        print("Malicious script detected!")
      
  2. Content Security Policy (CSP): Implement a strong CSP to control the resources your application is allowed to load.
    • Restrict Script Sources: Specifically disallow inline scripts and only allow scripts from trusted domains. This significantly reduces the impact of XSS attacks, even if malicious code makes it through input validation.
      # Example CSP header:
      Content-Security-Policy: default-src 'self'; script-src 'self' https://trusted-cdn.example.com;
      
  3. Output Encoding/Escaping: Always encode or escape user-provided data before displaying it on your website.
    • HTML Escaping: Convert characters like <, >, &, " and ' into their HTML entities. This prevents the browser from interpreting them as code.
      # Example PHP HTML escaping:
      <?php
      echo htmlspecialchars($translation, ENT_QUOTES, 'UTF-8');
      ?>
      
  4. Translation File Format Restrictions: Limit the accepted file formats.
    • Avoid Executable Formats: Don't allow formats that can contain executable code (e.g., .po files with arbitrary binary data). Stick to simple text-based formats like JSON or YAML.
    • Schema Validation: If using JSON or YAML, define a strict schema and validate all submitted translation files against it.
      # Example JSON Schema snippet:
      {
        "type": "object",
        "properties": {
          "greeting": {"type": "string"}
        },
        "required": ["greeting"]
      }
      
  5. Regular Security Audits: Regularly review your translation files for suspicious content.
    • Automated Scanning: Use automated tools to scan for known attack patterns and vulnerabilities.
    • Manual Review: Periodically have a human reviewer examine the translations, especially those from untrusted sources.
  6. Principle of Least Privilege: Ensure that the user account submitting translations has only the necessary permissions.
    • Limited Access: Don't grant translation contributors administrative access to your application or server.
Related posts
Cyber Security

Zip Codes & PII: Are They Personal Data?

Cyber Security

Zero-Day Vulnerabilities: User Defence Guide

Cyber Security

Zero Knowledge Voting with Trusted Server

Cyber Security

ZeroNet: 51% Attack Risks & Mitigation