System Tests
Entry point¶
From the repo root, the canonical way to run system tests is:
Justfile defines:
So you can also invoke the runner directly:
The runner script is responsible for:
- Starting the system under test (e.g. API server) before any tests run.
- Discovering test functions and executing them sequentially.
- Setting an appropriate process exit code based on pass/fail.
- Stopping the system under test again at the end.
Test harness behaviour¶
The harness (run_system_tests.sh) behaves roughly like this:
- Start the system under test
It uses just commands (for example just api_start / just api_stop) to bring up and tear down the environment the tests rely on.
- Load helpers and test scripts
It sources:
test_helpers.sh– common helpers and configuration.-
All
*_test.shfiles in the same directory – where each file can define one or more test functions. -
Discover test functions
It collects all shell functions with a name ending in _test, sorts them, and treats each as an individual test case.
That means:
- You don’t need to register a new test anywhere.
-
As long as the function name ends in
_testand the file is*_test.sh, it will be picked up. -
Execute tests
Each test function is executed in a subshell. The harness:
- Prints which test is running.
- Consider the test passed if the function returns exit code
0. - Consider the test failed if the function returns a non-zero code.
- Continues to the next test even if one fails.
At the end it exits with:
0if all tests passed.-
1if any test failed. -
Stop the system under test
After all tests, the harness calls the appropriate just command to stop the system (e.g. shutting down the API process).
Shared helpers¶
test_helpers.sh provides shared functionality for all system tests, for example:
- Simple formatting helpers (bold / coloured output).
- Common configuration variables (e.g. log directory, timeouts).
- Routines to wait for the system or API to be “ready” (polling a status endpoint until it responds, with a timeout).
When writing new tests:
- Always source
test_helpers.shvia the runner (you don’t need to do this manually; the runner already does). - Prefer using the existing helpers rather than re-implementing things like “wait for API to be ready”.
- If you need reusable logic for multiple system tests, extend
test_helpers.shrather than copying functions between*_test.shfiles.
Naming and structure conventions¶
File naming¶
System test scripts follow this pattern:
Guidelines:
NN_is a two-digit prefix (e.g.01_,02_, …) that controls ordering. Use it if you care about approximate execution order (e.g. smoke tests first).descriptive_nameshould describe the theme of the script (“api_health”, “scenario_simulation”, “x11_integration”, etc.)._test.shsuffix marks the file as a test script.
You can have more than one test function in a single file if it makes sense to group related checks.
Function naming¶
A function is treated as a test if its name ends in _test, e.g.:
Guidelines:
- Use a unique, descriptive name — it will appear in the harness output.
- Prefer one “main” test function per file unless you have a clear grouping.
- Helper functions should not end in
_test(so they aren’t auto-executed).
Test function contract¶
Each test function:
- Takes no arguments.
- Performs whatever actions it needs (HTTP calls, CLI invocations, etc.).
- Prints human-readable output describing what it’s doing and what it found.
-
Returns:
-
0on success. - A non-zero exit code on failure.
The harness uses this exit code to decide whether the test passed or failed.
What a new system test typically looks like¶
Here is a recommended skeleton for a new test function:
Feel free to adapt the formatting, as long as:
- The function’s exit code correctly represents pass/fail.
- Output is understandable by a human reading CI logs.
Logging and diagnostics¶
System tests should be debuggable when they fail. Some general recommendations:
- Write richer machine-readable logs (JSON, structured text) into a shared log directory; keep console output concise.
- Include enough context in the log file names (e.g. the domain of the check).
- When a test fails, print the tail or a short summary of the relevant log(s) to stdout so CI logs contain a hint without having to dig into artefacts.
The helpers script defines where logs are stored by default and may allow overriding the log directory via an environment variable (e.g. for CI vs local runs). Check test_helpers.sh for the exact variable names and behaviour.
What to test at the system level¶
Some ideas for system-level checks you might want to add:
-
API behaviour
-
Smoke tests for key endpoints (status, scenario control, configuration).
- Schema / field sanity checks (e.g. unique IDs, expected ranges).
-
Error handling for invalid input.
-
Scenario execution
-
Start a scenario, let it run for some time, check that:
- Expected outputs (reports, artefacts, logs) are generated.
- No obvious error states are present in responses.
- Round-trip across multiple API calls (start, query progress, stop, retrieve results).
-
Integration with external tools
-
Running internal scanning / analysis tools and checking they succeed.
-
Verifying required OS-level functionality (e.g. presence of system utilities, access to required resources).
-
GUI / X11 integration
-
Launch a minimal GUI app inside the dev/CI environment and verify it can talk to the host X server.
-
Confirm a window appears in the X11 tree.
-
Non-functional behaviour
-
Very coarse performance sanity checks (e.g. “scenario finishes within N seconds on CI hardware”).
- Resource usage thresholds (if those can be tested reliably).
When deciding whether something belongs in a system test versus a unit / integration test, ask:
“Does this require the whole stack to be running and realistic user workflows to be exercised?”
If the answer is yes, it probably belongs here.
Design guidelines for new system tests¶
-
Idempotent and repeatable Tests should be safe to run multiple times in a row, and in any order. Avoid leaving behind state that breaks subsequent runs.
-
Isolated Avoid relying on side effects from previous tests; set up the state you need in each test.
-
Deterministic Avoid flakiness (e.g. random sleep durations with no checks, reliance on external services that may be down, unbounded timeouts).
-
Configurable Use / extend environment variables for things like:
-
API base URL
- timeouts
-
log locations so that CI and local runs can tweak behaviour without changing code.
-
Fast enough for CI Individual system tests can be slower than unit tests, but should still complete in a reasonable time. If something needs several minutes, consider splitting it or enabling it only in certain pipelines.
-
Use shared helpers If you find yourself copying the same snippets (curl patterns, JSON extraction, polling loops), move them into
test_helpers.sh.
Adding a new system test: checklist¶
- Create a new file under
tools/system_tests/with the pattern:
-
Add one or more functions in that file whose names end in
_test. -
Use or extend shared helpers in
test_helpers.shinstead of duplicating logic. -
Make the function self-contained:
-
Arrange, act, assert.
- Print a clear message on success and on failure.
-
Return
0for success, non-zero for failure. -
Run locally: