diffulex.strategy.templates¶
diffulex.strategy.templates contains reusable strategy building blocks. A
concrete strategy package usually subclasses or directly reuses one of these
templates, then registers request, scheduler, KV cache manager, model runner,
and attention metadata classes under a strategy name.
The templates are organized by decoding layout:
Template package |
Use it when |
Core classes |
|---|---|---|
|
A strategy needs the multi-block lifecycle but wants a separate cache-layout variant. |
Dual-cache extension points. |
|
A strategy decodes with a rolling buffer of fixed-size diffusion blocks. |
Multi-block request, scheduler, KV cache, runner, attention metadata, and full-static runner templates. |
|
A strategy is multi-block and also passes token-merge distributions into attention. |
Token-merge request, scheduler, KV cache, runner, and attention metadata templates. |
For extension work, start from the smallest template that already matches the decoding lifecycle. Most new block-diffusion strategies should begin with the multi-block template; only use token-merge templates when attention truly needs per-token merge descriptors.
dual_cache¶
The dual-cache template package is currently a set of named extension points for future strategies that need separate cache views or cache lifecycles. The planned Dual Cache mechanism is tracked separately from the standard multi-block runtime.
multi_block¶
The multi-block template represents each request as a chain of DllmBlock
objects, keeps a rolling block buffer, schedules prefill/decode work against the
KV cache, and prepares attention metadata for paged attention.
The request template owns most lifecycle semantics: waiting, prefilling,
decoding, completed, preempted, EOS, max_tokens, max_model_len, max_nfe,
and max_repetition_run.
The scheduler template remains narrow: it chooses which requests run this step, asks the cache manager whether pages are available, and applies sampler writes.
token_merge¶
Token-merge templates extend multi-block decoding with per-position token-merge metadata. The DMax strategy uses this family to pass top-k token IDs, top-k probabilities, residual probabilities, merge mode, weight, and renormalization settings into attention metadata.