Testing with Good Data

G5 Cyber Security

2 months ago

TL;DR

If you have a lot of known good inputs for your system, use them to create a baseline test suite. This helps catch unexpected changes and regressions when you make updates.

1. Why Use Good Data for Testing?

Testing with only bad data (like trying to break things) is important, but it doesn’t tell you if your system still works normally. Good data confirms that the core functionality hasn’t been accidentally broken by changes.

Regression Testing: Ensures new code doesn’t ruin existing features.
Baseline Performance: Measures how quickly things should run when everything is working as expected.
Confidence in Updates: Gives you more trust that your changes are safe to deploy.

2. Gathering Your Benign Inputs

You need a collection of inputs that you know will work correctly with your system. Where do these come from?

Existing Data: Use real-world data that has been successfully processed before (ensure it doesn’t contain sensitive information!).
Sample Files: Create a set of representative sample files covering different valid scenarios.
Automated Generation: If possible, write scripts to automatically generate good inputs based on your system’s specifications.

3. Creating Your Baseline Test Suite

Now you’ll turn those inputs into a repeatable test suite.

Choose a Testing Framework: Select a framework appropriate for your language and system (e.g., pytest for Python, JUnit for Java).
Write Test Cases: For each good input, write a test case that:
- Loads the input data.
- Runs it through your system.
- Verifies the expected output.
Automate Execution: Configure your framework to run all tests automatically (e.g., as part of a build process).

4. Example Test Case (Python with pytest)

Let’s say you have a function process_data(input_file) that reads data from a file and returns a result.

import pytest

def process_data(input_file):
  # Your actual processing logic here
  with open(input_file, 'r') as f:
    data = f.read()
  return data.upper() # Example: convert to uppercase

def test_good_input():
  # Create a sample input file
  with open('test_input.txt', 'w') as f:
    f.write('hello world')

  expected_output = 'HELLO WORLD'
  actual_output = process_data('test_input.txt')
  assert actual_output == expected_output

5. Running and Interpreting Results

Run your test suite regularly.

All Tests Pass: Great! Your system is likely still working as expected.
Tests Fail: Investigate immediately to find the cause of the failure. This could be a bug in new code, or an unexpected change in your environment.

6. Maintaining Your Test Suite

Good data tests aren’t ‘set it and forget it’.

Add New Tests: As you add features, create new test cases to cover them with good inputs.
Update Existing Tests: If the expected output changes (due to a valid feature update), modify your tests accordingly.
Regular Review: Periodically review your tests to ensure they are still relevant and effective.