TL;DR
Yes, a webpage can track if you download its source code or save it as a web archive (like using your browser’s ‘Save Page As…’). They do this by checking for specific events and behaviours. However, there are ways to reduce tracking.
How Webpages Track Downloads
- JavaScript Events: Many websites use JavaScript to detect when you try to save the page or its source code.
beforeunloadevent: This event fires when the browser is about to leave a page, including when saving. Websites can attach code to this event to log the action. It’s not reliable for downloads but can be used as an indicator.- File Download API: If the website uses JavaScript to initiate file downloads (e.g., dynamically generated files), they have more control and can track those directly.
How to Reduce Tracking
- Disable JavaScript (Use with Caution): Disabling JavaScript will prevent many tracking methods but may break website functionality.
- In your browser settings, find the JavaScript options and disable it for all sites or specific sites you suspect are tracking downloads.
- Right-click on the webpage, select ‘View Page Source’.
- Select all the content in the source code window (Ctrl+A or Cmd+A).
- Copy the content (Ctrl+C or Cmd+C).
- Paste it into a plain text editor like Notepad (Windows) or TextEdit (Mac).
wget or curl to download the page content. These tools give you more control over what is downloaded and can be configured to avoid running JavaScript.
wget -q --no-check-certificate -O filename.html
Important Considerations
- Saving as PDF: Saving a page as a PDF is less likely to be tracked than saving the HTML source, but some websites may still attempt to embed tracking code within the PDF itself.
- Dynamic Content: If the webpage relies heavily on dynamic content loaded after the initial page load, simply downloading the HTML source might not capture everything you see on the screen.