Developer Troubleshooting¶
Use focused checks before running the full suite. Most development failures are caused by registry imports, invalid config combinations, CUDA availability, or shape/layout mismatches between model runner, attention metadata, and kernels.
Imports¶
Confirm lightweight imports first:
python -c "import diffulex, diffulex_kernel; print('ok')"
Then import the package that should register your component:
python -c "from diffulex import strategy; print(strategy.__all__)"
If a strategy is missing, check that its directory has __init__.py and that
the package import does not raise.
Strategy Registration¶
When a strategy is not found, confirm that each registry decorator uses the same
strategy key. Request, scheduler, KV cache manager, and model runner must all be
registered under the decoding strategy name selected by Config.
Also check that the strategy package imports registered classes from its
__init__.py. Registration happens as an import side effect.
Model and Sampler Registration¶
When a model or sampler is not found, import the relevant module directly and inspect available keys:
from diffulex.model.auto_model import AutoModelForDiffusionLM
from diffulex.sampler.auto_sampler import AutoSampler
print(AutoModelForDiffusionLM.available_models())
print(AutoSampler.available_samplers())
The model_name in config must match both registrations when a custom sampler
is required.
GPU-Only Failures¶
Reduce the model size, request count, token budget, and batch token budget to separate logic errors from memory pressure.
Useful temporary changes:
Setting |
Temporary value |
|---|---|
|
Set to |
|
Set to |
|
Lower it to reduce active request pressure. |
|
Lower it to reduce scheduler and memory pressure. |
|
Set to |
Re-enable optimized paths only after the small case is correct.
Cache and Attention Issues¶
Cache bugs often appear as incorrect tokens, CUDA memory errors, or shape mismatches. Check these fields together:
Field or component |
Why it matters |
|---|---|
|
Must match the decoding block layout expected by the strategy. |
|
Must stay compatible with |
|
Must match attention metadata and kernel layout assumptions. |
Strategy attention metadata |
Defines the tensors passed into attention kernels. |
Model runner tensor preparation |
Produces the shapes and layouts consumed by kernels. |
Do not change kernel code until the metadata and layout assumptions are clear.