# Developer Troubleshooting

Use focused checks before running the full suite. Most development failures are
caused by registry imports, invalid config combinations, CUDA availability, or
shape/layout mismatches between model runner, attention metadata, and kernels.

## Imports

Confirm lightweight imports first:

```bash
python -c "import diffulex, diffulex_kernel; print('ok')"
```

Then import the package that should register your component:

```bash
python -c "from diffulex import strategy; print(strategy.__all__)"
```

If a strategy is missing, check that its directory has `__init__.py` and that
the package import does not raise.

## Strategy Registration

When a strategy is not found, confirm that each registry decorator uses the same
strategy key. Request, scheduler, KV cache manager, and model runner must all be
registered under the decoding strategy name selected by `Config`.

Also check that the strategy package imports registered classes from its
`__init__.py`. Registration happens as an import side effect.

## Model and Sampler Registration

When a model or sampler is not found, import the relevant module directly and
inspect available keys:

```python
from diffulex.model.auto_model import AutoModelForDiffusionLM
from diffulex.sampler.auto_sampler import AutoSampler

print(AutoModelForDiffusionLM.available_models())
print(AutoSampler.available_samplers())
```

The `model_name` in config must match both registrations when a custom sampler
is required.

## GPU-Only Failures

Reduce the model size, request count, token budget, and batch token budget to
separate logic errors from memory pressure.

Useful temporary changes:

| Setting | Temporary value |
| --- | --- |
| `tensor_parallel_size` | Set to `1`. |
| `data_parallel_size` | Set to `1`. |
| `max_num_reqs` | Lower it to reduce active request pressure. |
| `max_num_batched_tokens` | Lower it to reduce scheduler and memory pressure. |
| `enforce_eager` | Set to `True` while isolating correctness issues. |

Re-enable optimized paths only after the small case is correct.

## Cache and Attention Issues

Cache bugs often appear as incorrect tokens, CUDA memory errors, or shape
mismatches. Check these fields together:

| Field or component | Why it matters |
| --- | --- |
| `block_size` | Must match the decoding block layout expected by the strategy. |
| `page_size` | Must stay compatible with `block_size` and the KV cache manager. |
| `kv_cache_layout` | Must match attention metadata and kernel layout assumptions. |
| Strategy attention metadata | Defines the tensors passed into attention kernels. |
| Model runner tensor preparation | Produces the shapes and layouts consumed by kernels. |

Do not change kernel code until the metadata and layout assumptions are clear.