# diffulex.strategy.templates

`diffulex.strategy.templates` contains reusable strategy building blocks. A
concrete strategy package usually subclasses or directly reuses one of these
templates, then registers request, scheduler, KV cache manager, model runner,
and attention metadata classes under a strategy name.

The templates are organized by decoding layout:

| Template package | Use it when | Core classes |
| --- | --- | --- |
| `diffulex.strategy.templates.dual_cache` | A strategy needs the multi-block lifecycle but wants a separate cache-layout variant. | Dual-cache extension points. |
| `diffulex.mixin.multi_block` | A strategy decodes with a rolling buffer of fixed-size diffusion blocks. | Multi-block request, scheduler, KV cache, runner, attention metadata, and full-static runner templates. |
| `diffulex.strategy.templates.token_merge` | A strategy is multi-block and also passes token-merge distributions into attention. | Token-merge request, scheduler, KV cache, runner, and attention metadata templates. |

For extension work, start from the smallest template that already matches the
decoding lifecycle. Most new block-diffusion strategies should begin with the
multi-block template; only use token-merge templates when attention truly needs
per-token merge descriptors.

## dual_cache

The dual-cache template package is currently a set of named extension points for
future strategies that need separate cache views or cache lifecycles. The
planned Dual Cache mechanism is tracked separately from the standard multi-block
runtime.

## multi_block

The multi-block template represents each request as a chain of `DllmBlock`
objects, keeps a rolling block buffer, schedules prefill/decode work against the
KV cache, and prepares attention metadata for paged attention.

The request template owns most lifecycle semantics: waiting, prefilling,
decoding, completed, preempted, EOS, `max_tokens`, `max_model_len`, `max_nfe`,
and `max_repetition_run`.

The scheduler template remains narrow: it chooses which requests run this step,
asks the cache manager whether pages are available, and applies sampler writes.

## token_merge

Token-merge templates extend multi-block decoding with per-position
token-merge metadata. The DMax strategy uses this family to pass top-k token
IDs, top-k probabilities, residual probabilities, merge mode, weight, and
renormalization settings into attention metadata.