TL;DR
It’s very difficult to cryptographically prove a running interpreted program is exactly the same as published source code. While perfect proof isn’t usually possible, you can get strong evidence using techniques like hashing and digital signatures combined with careful build processes and dependency management.
How it Works: The Challenges
Interpreted languages (Python, JavaScript, Ruby etc.) are different from compiled ones. Compiled code creates a fixed executable file directly from the source. Interpreted code needs an interpreter to run it, and there’s often more flexibility in how that code is executed.
- Dynamic Nature: Interpreted languages can modify themselves at runtime.
- Environment Differences: The same code can behave differently based on the environment (libraries, versions).
- Obfuscation: Source code can be intentionally made harder to read without changing its functionality.
These factors make a simple hash comparison unreliable.
Steps to Verify Code Integrity
- Hashing the Source Code: Create a cryptographic hash of your original source code.
sha256sum my_script.pyThis gives you a unique ‘fingerprint’ of the file.
- Dependency Management: Use a dependency manager (e.g.,
pipfor Python,npmfor JavaScript) to lock down specific versions of all libraries your code uses.- Python: Create a
requirements.txtfile with pinned versions:requests==2.28.1 - JavaScript: Use
package-lock.jsonoryarn.lockto record exact dependencies.
- Python: Create a
- Build Process (if applicable): If you use a build step (e.g., bundling JavaScript), hash the *output* of that process, not just the source.
sha256sum dist/bundle.js - Digital Signatures: Sign your source code and dependency lock file with a digital signature using a private key.
- This proves who created the files, and that they haven’t been tampered with. Tools like GPG can be used for this.
- Example (GPG signing):
gpg --sign my_script.py
- Runtime Hashing: At runtime, recalculate the hash of the source code and compare it to the original.
This is tricky in interpreted languages because you need a way to access the source code at runtime. It’s often not practical.
import hashlib with open('my_script.py', 'rb') as f: source_code = f.read() hash_value = hashlib.sha256(source_code).hexdigest() print(hash_value)Compare this hash to the original hash you calculated in step 1.
- Code Integrity Checks: Implement checks within your application to verify dependencies and code hashes.
- If a check fails, refuse to run or display a clear warning.
Important Considerations
- Secure Storage of Keys: Protect your private key used for digital signatures!
- Supply Chain Security: Be aware of risks in the dependencies you use. Compromised dependencies can undermine your verification efforts.
- Tamper Detection, Not Prevention: These methods primarily detect tampering; they don’t prevent it.
- Complexity: Full end-to-end code verification is complex and requires significant effort.

