TL;DR
Yes, machine learning systems are vulnerable to attack. These aren’t just theoretical problems – there have been real-world incidents. This guide explains common threats and how to protect your models.
1. Data Poisoning Attacks
Data poisoning happens when attackers inject bad data into the training set of a machine learning model. This causes the model to learn incorrect patterns, leading to misclassifications or other unwanted behaviour. It’s like teaching someone wrong facts.
- How it works: Attackers might submit malicious examples through a web form (if your model retrains on user data) or compromise the data source itself.
- Real-world example: In 2017, researchers showed how to poison traffic sign recognition systems by subtly altering training images. This could cause self-driving cars to misinterpret signs.
- Mitigation:
- Data validation: Carefully check all incoming data for anomalies and inconsistencies.
- Anomaly detection: Use separate models or statistical methods to identify potentially poisoned samples before they’re used for training.
- Robust training techniques: Some algorithms are less sensitive to noisy data than others. Consider using these where possible.
2. Adversarial Examples
Adversarial examples are inputs specifically crafted to fool a machine learning model, even though they look almost identical to legitimate inputs to a human.
- How it works: Attackers add small, carefully calculated perturbations to an input image (or other data type) that cause the model to misclassify it.
- Real-world example: Researchers have created stickers that can be placed on stop signs that cause self-driving cars to interpret them as speed limit signs.
- Mitigation:
- Adversarial training: Train the model with adversarial examples included in the dataset. This helps it learn to be more robust.
- Input sanitisation: Limit the range of possible input values or apply smoothing filters to reduce the impact of small perturbations.
- Defensive distillation: Train a second model on the outputs (probabilities) of the first model, making it harder for attackers to craft effective adversarial examples.
3. Model Extraction Attacks
Model extraction attacks involve stealing the knowledge embedded within a machine learning model without having access to its training data.
- How it works: Attackers repeatedly query the model with different inputs and observe the outputs. They then use this information to train their own, similar model.
- Real-world example: Researchers have successfully extracted models from cloud-based image recognition services. This allows them to replicate the service’s functionality or find vulnerabilities in the original model.
- Mitigation:
- Rate limiting: Limit the number of queries a single user can make within a given timeframe.
- Output perturbation: Add noise to the model’s outputs to make it harder for attackers to reconstruct the original model. Be careful not to degrade performance too much!
- Watermarking: Embed a unique identifier into the model’s predictions, allowing you to detect if someone is using a stolen copy.
4. Evasion Attacks
Evasion attacks occur at inference time – after the model has been trained and deployed. Attackers modify inputs to bypass security checks or misclassify data.
- How it works: Similar to adversarial examples, but focused on bypassing a specific system rather than generally fooling the model.
- Real-world example: Malware authors have used evasion techniques to create malicious files that are misclassified as benign by machine learning-based antivirus systems.
- Mitigation:
- Regular retraining: Update your model frequently with new data, including examples of recent attacks.
- Ensemble methods: Use multiple models in combination to make it harder for attackers to bypass all of them.
- Feature squeezing: Reduce the dimensionality of the input data to remove features that attackers might be exploiting.
5. Supply Chain Attacks
Compromising third-party libraries or datasets used in your machine learning pipeline.
- How it works: Attackers inject malicious code into popular packages (like TensorFlow or PyTorch) or tamper with publicly available datasets.
- Real-world example: Several incidents have involved compromised Python packages containing malware, affecting users who downloaded and installed them.
- Mitigation:
- Dependency management: Use tools like pipenv or conda to manage your project’s dependencies and ensure you’re using trusted sources.
- Package verification: Check the integrity of downloaded packages before installing them.
pip install --check-hashes package_name - Regular audits: Regularly review your supply chain for potential vulnerabilities.

