Get a Pentest and security assessment of your IT network.

Cyber Security

Code Verification: Proof of Source

TL;DR

It’s very difficult to cryptographically prove a running interpreted program is exactly the same as published source code. While perfect proof isn’t usually possible, you can get strong evidence using techniques like hashing and digital signatures combined with careful build processes and dependency management.

How it Works: The Challenges

Interpreted languages (Python, JavaScript, Ruby etc.) are different from compiled ones. Compiled code creates a fixed executable file directly from the source. Interpreted code needs an interpreter to run it, and there’s often more flexibility in how that code is executed.

  • Dynamic Nature: Interpreted languages can modify themselves at runtime.
  • Environment Differences: The same code can behave differently based on the environment (libraries, versions).
  • Obfuscation: Source code can be intentionally made harder to read without changing its functionality.

These factors make a simple hash comparison unreliable.

Steps to Verify Code Integrity

  1. Hashing the Source Code: Create a cryptographic hash of your original source code.
    sha256sum my_script.py

    This gives you a unique ‘fingerprint’ of the file.

  2. Dependency Management: Use a dependency manager (e.g., pip for Python, npm for JavaScript) to lock down specific versions of all libraries your code uses.
    • Python: Create a requirements.txt file with pinned versions:
      requests==2.28.1
    • JavaScript: Use package-lock.json or yarn.lock to record exact dependencies.
  3. Build Process (if applicable): If you use a build step (e.g., bundling JavaScript), hash the *output* of that process, not just the source.
    sha256sum dist/bundle.js
  4. Digital Signatures: Sign your source code and dependency lock file with a digital signature using a private key.
    • This proves who created the files, and that they haven’t been tampered with. Tools like GPG can be used for this.
    • Example (GPG signing):
      gpg --sign my_script.py
  5. Runtime Hashing: At runtime, recalculate the hash of the source code and compare it to the original.

    This is tricky in interpreted languages because you need a way to access the source code at runtime. It’s often not practical.

    import hashlib
    with open('my_script.py', 'rb') as f:
      source_code = f.read()
    hash_value = hashlib.sha256(source_code).hexdigest()
    print(hash_value)

    Compare this hash to the original hash you calculated in step 1.

  6. Code Integrity Checks: Implement checks within your application to verify dependencies and code hashes.
    • If a check fails, refuse to run or display a clear warning.

Important Considerations

  • Secure Storage of Keys: Protect your private key used for digital signatures!
  • Supply Chain Security: Be aware of risks in the dependencies you use. Compromised dependencies can undermine your verification efforts.
  • Tamper Detection, Not Prevention: These methods primarily detect tampering; they don’t prevent it.
  • Complexity: Full end-to-end code verification is complex and requires significant effort.
Related posts
Cyber Security

Zip Codes & PII: Are They Personal Data?

Cyber Security

Zero-Day Vulnerabilities: User Defence Guide

Cyber Security

Zero Knowledge Voting with Trusted Server

Cyber Security

ZeroNet: 51% Attack Risks & Mitigation