Developer Troubleshooting

Use focused checks before running the full suite. Most development failures are caused by registry imports, invalid config combinations, CUDA availability, or shape/layout mismatches between model runner, attention metadata, and kernels.

Imports

Confirm lightweight imports first:

python -c "import diffulex, diffulex_kernel; print('ok')"

Then import the package that should register your component:

python -c "from diffulex import strategy; print(strategy.__all__)"

If a strategy is missing, check that its directory has __init__.py and that the package import does not raise.

Strategy Registration

When a strategy is not found, confirm that each registry decorator uses the same strategy key. Request, scheduler, KV cache manager, and model runner must all be registered under the decoding strategy name selected by Config.

Also check that the strategy package imports registered classes from its __init__.py. Registration happens as an import side effect.

Model and Sampler Registration

When a model or sampler is not found, import the relevant module directly and inspect available keys:

from diffulex.model.auto_model import AutoModelForDiffusionLM
from diffulex.sampler.auto_sampler import AutoSampler

print(AutoModelForDiffusionLM.available_models())
print(AutoSampler.available_samplers())

The model_name in config must match both registrations when a custom sampler is required.

GPU-Only Failures

Reduce the model size, request count, token budget, and batch token budget to separate logic errors from memory pressure.

Useful temporary changes:

Setting

Temporary value

tensor_parallel_size

Set to 1.

data_parallel_size

Set to 1.

max_num_reqs

Lower it to reduce active request pressure.

max_num_batched_tokens

Lower it to reduce scheduler and memory pressure.

enforce_eager

Set to True while isolating correctness issues.

Re-enable optimized paths only after the small case is correct.

Cache and Attention Issues

Cache bugs often appear as incorrect tokens, CUDA memory errors, or shape mismatches. Check these fields together:

Field or component

Why it matters

block_size

Must match the decoding block layout expected by the strategy.

page_size

Must stay compatible with block_size and the KV cache manager.

kv_cache_layout

Must match attention metadata and kernel layout assumptions.

Strategy attention metadata

Defines the tensors passed into attention kernels.

Model runner tensor preparation

Produces the shapes and layouts consumed by kernels.

Do not change kernel code until the metadata and layout assumptions are clear.