TL;DR
This guide shows you how to perform a simple Caffe-Latte attack in Python, demonstrating a vulnerability where malicious data injected into image files can alter model predictions. We’ll cover creating the poisoned images, loading them with a standard image library (PIL), and observing the impact on a pre-trained model.
Prerequisites
- Python 3 installed
- Libraries: Pillow (PIL fork), NumPy, TensorFlow/Keras (or PyTorch – adapt code accordingly)
Install with pip:
pip install pillow numpy tensorflow
Step 1: Understanding the Caffe-Latte Attack
The Caffe-Latte attack exploits how image libraries handle metadata. Specifically, it involves embedding malicious data (often EXIF tags) into an image file that can be interpreted as instructions by a model during loading or processing. This can cause misclassifications.
Step 2: Creating the Poisoned Images
We’ll use Pillow to modify image metadata. For simplicity, we’ll add a fake comment tag containing our malicious payload. In a real attack, this could be more sophisticated and targeted.
from PIL import Image
import numpy as np
# Load an example image
img = Image.open('cat.jpg')
# Create the malicious comment (payload)
malicious_comment = "This is a test payload!"
# Add the comment to the EXIF data
img.info['comment'] = malicious_comment
# Save the poisoned image
saved_image = 'cat_poisoned.jpg'
img.save(saved_image)
print(f'Poisoned image saved as {saved_image}')
Replace 'cat.jpg' with a suitable image file.
Step 3: Loading the Model
Load your pre-trained model using TensorFlow/Keras (or PyTorch). This example uses Keras:
from tensorflow.keras.models import load_model
# Load the pre-trained model
model = load_model('my_image_classifier.h5') # Replace with your model file
Step 4: Loading and Processing Images
Load both the original and poisoned images, preprocess them as required by your model (resizing, normalization etc.), and make predictions.
from tensorflow.keras.preprocessing import image
import numpy as np
# Load the original image
original_img = Image.open('cat.jpg')
original_img = image.load_img('cat.jpg', target_size=(224, 224))
original_img = image.img_to_array(original_img)
original_img = np.expand_dims(original_img, axis=0)
# Load the poisoned image
poisoned_img = Image.open('cat_poisoned.jpg')
poisoned_img = image.load_img('cat_poisoned.jpg', target_size=(224, 224))
poisoned_img = image.img_to_array(poisoned_img)
poisoned_img = np.expand_dims(poisoned_img, axis=0)
# Make predictions
original_prediction = model.predict(original_img)
poisoned_prediction = model.predict(poisoned_img)
Adjust target_size=(224, 224) to match your model’s input requirements.
Step 5: Analyzing the Results
Compare the predictions for the original and poisoned images. You should observe a difference in the predicted class probabilities due to the injected metadata influencing the model’s output. Print the prediction results:
import numpy as np
# Assuming your model outputs probabilities for each class
print('Original Image Prediction:', original_prediction)
print('Poisoned Image Prediction:', poisoned_prediction)
Look for significant changes in the predicted probabilities. A successful attack will cause a misclassification or reduced confidence in the correct prediction.
Important Considerations
- Model Specificity: The effectiveness of this attack depends heavily on how your model handles image loading and metadata.
- Preprocessing: Image preprocessing steps can mitigate the attack if they strip or sanitize metadata.
- Security Measures: Implement robust input validation and sanitization techniques to prevent malicious data from affecting your cyber security systems.

