Get a Pentest and security assessment of your IT network.

Cyber Security

Clean PDF: Remove Embedded Malware

TL;DR

This guide shows you how to remove potential malware from a PDF file by ‘bursting’ it into its individual components and then rebuilding it. This process can discard malicious code that might be hidden within the file structure.

Steps

  1. Install Required Tools
    • PDFtk Server: A command-line tool for manipulating PDFs. Download from PDF Labs (choose the correct version for your operating system). You may need to install it using a package manager like apt on Linux or by running the installer on Windows.
    • Ghostscript: A PostScript and PDF interpreter. Download from Ghostscript (again, choose the version for your OS). Ensure it’s added to your system’s PATH environment variable so you can run gs commands from any directory.
  2. Burst the PDF
  3. The goal here is to split the PDF into its individual components (images, fonts, etc.). Use PDFtk Server for this.

    pdftk input.pdf burst output output_folder

    Replace input.pdf with the name of your potentially infected file and output_folder with a new folder where you want to save the extracted components. This will create files like img001.jpg, font001.ttf, etc.

  4. Inspect Extracted Components (Optional but Recommended)
  5. Before rebuilding, it’s a good idea to check the extracted components for anything suspicious. This is especially important if you have reason to believe specific types of malware might be present.

    • Images: Open images in an image editor and look for hidden data or unusual patterns.
    • Fonts: Be wary of fonts from unknown sources. You can use font inspection tools online to check their properties.
    • JavaScript Files (if any): Examine JavaScript files carefully for malicious code. Use a text editor or an online JavaScript analyser.
  6. Rebuild the PDF
  7. Now, rebuild the PDF using Ghostscript. This will create a new PDF file without the potentially harmful embedded content.

    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -dQUIET -c "(r) file output_folder/*.jpg output_folder/*.png output_folder/*.ttf" -f new.pdf

    Replace new.pdf with the desired name for your cleaned PDF file. Adjust the *.jpg, *.png and *.ttf parts of the command to match the actual files in your output_folder. You may need to add other file types if your original PDF contained them.

    Important: The order of files passed to Ghostscript matters. Make sure they are in a logical sequence (images, fonts, etc.).

  8. Verify the Rebuilt PDF
    • Open and Test: Open the rebuilt PDF in a PDF viewer and test all its features (forms, links, buttons) to ensure everything works as expected.
    • Scan with Anti-Virus: Scan the rebuilt PDF file with your anti-virus software for any remaining threats.

Disclaimer: While this method can help remove embedded malware, it is not foolproof. Sophisticated malware may still be present. Always exercise caution when opening PDFs from untrusted sources.

Related posts
Cyber Security

Zip Codes & PII: Are They Personal Data?

Cyber Security

Zero-Day Vulnerabilities: User Defence Guide

Cyber Security

Zero Knowledge Voting with Trusted Server

Cyber Security

ZeroNet: 51% Attack Risks & Mitigation