Developer Troubleshooting¶

Use focused checks before running the full suite. Most development failures are caused by registry imports, invalid config combinations, CUDA availability, or shape/layout mismatches between model runner, attention metadata, and kernels.

Imports¶

Confirm lightweight imports first:

python -c "import diffulex, diffulex_kernel; print('ok')"

Then import the package that should register your component:

python -c "from diffulex import strategy; print(strategy.__all__)"

If a strategy is missing, check that its directory has __init__.py and that the package import does not raise.

Strategy Registration¶

When a strategy is not found, confirm that each registry decorator uses the same strategy key. Request, scheduler, KV cache manager, and model runner must all be registered under the decoding strategy name selected by Config.

Also check that the strategy package imports registered classes from its __init__.py. Registration happens as an import side effect.

Model and Sampler Registration¶

When a model or sampler is not found, import the relevant module directly and inspect available keys:

from diffulex.model.auto_model import AutoModelForDiffusionLM
from diffulex.sampler.auto_sampler import AutoSampler

print(AutoModelForDiffusionLM.available_models())
print(AutoSampler.available_samplers())

The model_name in config must match both registrations when a custom sampler is required.

GPU-Only Failures¶

Reduce the model size, request count, token budget, and batch token budget to separate logic errors from memory pressure.

Useful temporary changes:

Setting	Temporary value
`tensor_parallel_size`	Set to `1`.
`data_parallel_size`	Set to `1`.
`max_num_reqs`	Lower it to reduce active request pressure.
`max_num_batched_tokens`	Lower it to reduce scheduler and memory pressure.
`enforce_eager`	Set to `True` while isolating correctness issues.

Re-enable optimized paths only after the small case is correct.

Cache and Attention Issues¶

Cache bugs often appear as incorrect tokens, CUDA memory errors, or shape mismatches. Check these fields together:

Field or component	Why it matters
`block_size`	Must match the decoding block layout expected by the strategy.
`page_size`	Must stay compatible with `block_size` and the KV cache manager.
`kv_cache_layout`	Must match attention metadata and kernel layout assumptions.
Strategy attention metadata	Defines the tensors passed into attention kernels.
Model runner tensor preparation	Produces the shapes and layouts consumed by kernels.

Do not change kernel code until the metadata and layout assumptions are clear.