TL;DR
Yes, a webpage can track if you download its source code or save it as a web archive (like using your browser’s ‘Save Page As…’). They do this by checking for specific events and behaviours. However, there are ways to reduce tracking.
How Webpages Track Downloads
- JavaScript Events: Many websites use JavaScript to detect when you try to save the page or its source code.
beforeunloadevent: This event fires when the browser is about to leave a page, including when saving. Websites can attach code to this event to log the action. It’s not reliable for downloads but can be used as an indicator.- File Download API: If the website uses JavaScript to initiate file downloads (e.g., dynamically generated files), they have more control and can track those directly.
- Network Requests: When you save a page, your browser makes requests for all its resources (HTML, CSS, images, etc.). The server logs these requests, which can be correlated to identify download attempts.
- Web Archive Services: Saving as a web archive often involves sending the page content to a third-party service (like Archive.org). This service will have a record of the saved page and its origin.
- Content Integrity Checks: Some websites embed hidden code or unique identifiers within their HTML that are checked when the page is loaded. If this code is missing in a downloaded version, it can indicate a download attempt.
How to Reduce Tracking
- Disable JavaScript (Use with Caution): Disabling JavaScript will prevent many tracking methods but may break website functionality.
- In your browser settings, find the JavaScript options and disable it for all sites or specific sites you suspect are tracking downloads.
- Browser Extensions: Use privacy-focused browser extensions like uBlock Origin or Privacy Badger to block trackers and scripts. These can often prevent download tracking code from running.
- Save as Text Only: Instead of ‘Save Page As…’, try viewing the page source (usually right-click -> ‘View Page Source’) and then copying and pasting it into a plain text editor. This removes all JavaScript and other potentially tracking elements.
- Right-click on the webpage, select ‘View Page Source’.
- Select all the content in the source code window (Ctrl+A or Cmd+A).
- Copy the content (Ctrl+C or Cmd+C).
- Paste it into a plain text editor like Notepad (Windows) or TextEdit (Mac).
- Use Command-Line Tools: Use tools like
wgetorcurlto download the page content. These tools give you more control over what is downloaded and can be configured to avoid running JavaScript.wget -q --no-check-certificate-O filename.html - Incognito/Private Browsing: While not a perfect solution, using incognito mode or a private browsing window can limit the amount of tracking data associated with your session.
- VPN and Tor: Using a VPN (Virtual Private Network) or the Tor network can mask your IP address and make it harder to track your downloads.
Important Considerations
- Saving as PDF: Saving a page as a PDF is less likely to be tracked than saving the HTML source, but some websites may still attempt to embed tracking code within the PDF itself.
- Dynamic Content: If the webpage relies heavily on dynamic content loaded after the initial page load, simply downloading the HTML source might not capture everything you see on the screen.

