Blog | G5 Cyber Security

Clean PDF: Remove Embedded Malware

TL;DR

This guide shows you how to remove potential malware from a PDF file by ‘bursting’ it into its individual components and then rebuilding it. This process can discard malicious code that might be hidden within the file structure.

Steps

  1. Install Required Tools
  • Burst the PDF
  • The goal here is to split the PDF into its individual components (images, fonts, etc.). Use PDFtk Server for this.

    pdftk input.pdf burst output output_folder

    Replace input.pdf with the name of your potentially infected file and output_folder with a new folder where you want to save the extracted components. This will create files like img001.jpg, font001.ttf, etc.

  • Inspect Extracted Components (Optional but Recommended)
  • Before rebuilding, it’s a good idea to check the extracted components for anything suspicious. This is especially important if you have reason to believe specific types of malware might be present.

  • Rebuild the PDF
  • Now, rebuild the PDF using Ghostscript. This will create a new PDF file without the potentially harmful embedded content.

    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -dQUIET -c "(r) file output_folder/*.jpg output_folder/*.png output_folder/*.ttf" -f new.pdf

    Replace new.pdf with the desired name for your cleaned PDF file. Adjust the *.jpg, *.png and *.ttf parts of the command to match the actual files in your output_folder. You may need to add other file types if your original PDF contained them.

    Important: The order of files passed to Ghostscript matters. Make sure they are in a logical sequence (images, fonts, etc.).

  • Verify the Rebuilt PDF
  • Disclaimer: While this method can help remove embedded malware, it is not foolproof. Sophisticated malware may still be present. Always exercise caution when opening PDFs from untrusted sources.

    Exit mobile version