Get a Pentest and security assessment of your IT network.

Cyber Security

Compressed File Size Spoofing: Why It Happens

TL;DR

Yes, compressed files can *appear* to have a different size than their actual uncompressed content. This isn’t usually malicious, but it’s important to understand why and how to check the true size. Common causes include incorrect reporting by tools, metadata manipulation, and archive structure.

Understanding Compressed File Size Discrepancies

Compressed files (like ZIP, GZIP, TAR.GZ) reduce file sizes for storage and transfer. However, several reasons can lead to a mismatch between the reported size of the compressed file and the actual size of the data when uncompressed.

How It Happens: Step-by-Step Guide

  1. Incorrect Tool Reporting: Different operating systems and archiving tools may calculate file sizes differently.
    • Windows often reports the allocated space, which can be larger than the actual data size due to block sizes.
    • Linux/macOS typically report the precise file size.
  2. Archive Structure Overhead: Compressed archives aren’t just the compressed data; they include metadata (file names, timestamps, permissions) and directory structures. This adds overhead.
    • ZIP files, for example, have central directories and local file headers that contribute to the overall size.
  3. Compression Level: Higher compression levels generally result in smaller file sizes but take longer to compress/decompress. The reported size will vary based on this setting.
  4. Metadata Manipulation (Less Common, Potential Security Risk): While rare, it’s possible to alter the metadata within an archive to *falsely* report a larger or smaller file size than reality. This is often associated with malicious intent.
    • This doesn’t change the actual compressed data but can mislead users.
  5. Sparse Files: Some files are ‘sparse’, meaning they contain large blocks of zeroed data that aren’t physically stored on disk, only indicated as such in metadata. Compression can affect how these are reported.

Checking the True File Size

Here’s how to verify the actual uncompressed size:

  1. Uncompress the file: The most reliable method is to fully extract/decompress the archive. Then, check the total size of the extracted files and folders.
    • Linux/macOS (using tar):
    • tar -xvzf filename.tar.gz && du -sh .
    • This extracts the archive and then uses du -sh . to show the total size of the extracted contents in a human-readable format.
    • Windows (using 7-Zip): Right-click the file, select ‘7-Zip’ -> ‘Extract Here’. Then check the folder properties for the total size.
  2. Use File Manager Properties: After extraction, use your operating system’s file manager to view the combined size of all files within the extracted directory.
  3. Command Line (Linux/macOS – using stat): For individual files *within* an archive, you can sometimes get information without full extraction. However, this is less reliable for overall size.
    zipinfo filename.zip | grep 'file name'

    This shows the uncompressed size of each file in the ZIP archive.

cyber security Implications

While usually benign, size spoofing can be a tactic used in cyber security attacks:

  • Malware Disguise: A small compressed file that expands to a large malicious payload.
  • Denial of Service (DoS): Sending a seemingly harmless archive that uncompresses into an enormous file, consuming disk space and resources.

Prevention

  • Always scan compressed files with updated antivirus software before extracting them.
  • Be cautious about opening archives from untrusted sources.
  • Verify the uncompressed size as described above, especially for unexpected file sizes.
Related posts
Cyber Security

Zip Codes & PII: Are They Personal Data?

Cyber Security

Zero-Day Vulnerabilities: User Defence Guide

Cyber Security

Zero Knowledge Voting with Trusted Server

Cyber Security

ZeroNet: 51% Attack Risks & Mitigation