Testing

Diffulex uses pytest for Python tests. The test suite includes lightweight unit tests, engine behavior tests, server tests, model tests, and GPU-heavy kernel or smoke tests.

pytest

Run the full Python test suite from the repository root:

pytest

Run a focused test file while developing:

pytest test/python/engine/test_generation_outputs.py

Prefer focused commands while iterating. Full-suite runs are useful before integration, but they can hide the signal from a single failing area.

Test Layout

Common test areas include:

  • test/python/engine/ for scheduler, request, config, cache, and generation behavior;

  • test/python/server/ for HTTP server and frontend/backend coordination;

  • test/python/model/ for model-family behavior;

  • test/python/layer/ for reusable layers;

  • test/python/kernel/ for kernel correctness and performance checks;

  • test/python/moe/ for MoE dispatcher and routing behavior.

Use the nearest existing test as the template for new coverage.

GPU Tests

Many engine and kernel tests require CUDA devices. Keep GPU tests small and make their resource assumptions explicit through markers, filenames, or test docs.

When diagnosing a GPU-only failure:

  1. reproduce with the smallest prompt or tensor shape;

  2. reduce tensor and data parallelism to 1;

  3. disable eager/CUDA graph toggles only after correctness is established;

  4. compare against a reference implementation when available.

Adding Regression Coverage

For bug fixes, add a test that fails on the original behavior when practical. If a full reproduction requires large models or private checkpoints, add the smallest unit-level check that guards the broken contract.

For docs-only changes, a Sphinx build is the relevant verification command:

./.venv/bin/python -m sphinx -E -b html docs docs/_build/html