diffulex.strategy.templates

diffulex.strategy.templates contains reusable strategy building blocks. A concrete strategy package usually subclasses or directly reuses one of these templates, then registers request, scheduler, KV cache manager, model runner, and attention metadata classes under a strategy name.

The templates are organized by decoding layout:

Template package

Use it when

Core classes

diffulex.strategy.templates.dual_cache

A strategy needs the multi-block lifecycle but wants a separate cache-layout variant.

Dual-cache extension points.

diffulex.mixin.multi_block

A strategy decodes with a rolling buffer of fixed-size diffusion blocks.

Multi-block request, scheduler, KV cache, runner, attention metadata, and full-static runner templates.

diffulex.strategy.templates.token_merge

A strategy is multi-block and also passes token-merge distributions into attention.

Token-merge request, scheduler, KV cache, runner, and attention metadata templates.

For extension work, start from the smallest template that already matches the decoding lifecycle. Most new block-diffusion strategies should begin with the multi-block template; only use token-merge templates when attention truly needs per-token merge descriptors.

dual_cache

The dual-cache template package is currently a set of named extension points for future strategies that need separate cache views or cache lifecycles. The planned Dual Cache mechanism is tracked separately from the standard multi-block runtime.

multi_block

The multi-block template represents each request as a chain of DllmBlock objects, keeps a rolling block buffer, schedules prefill/decode work against the KV cache, and prepares attention metadata for paged attention.

The request template owns most lifecycle semantics: waiting, prefilling, decoding, completed, preempted, EOS, max_tokens, max_model_len, max_nfe, and max_repetition_run.

The scheduler template remains narrow: it chooses which requests run this step, asks the cache manager whether pages are available, and applies sampler writes.

token_merge

Token-merge templates extend multi-block decoding with per-position token-merge metadata. The DMax strategy uses this family to pass top-k token IDs, top-k probabilities, residual probabilities, merge mode, weight, and renormalization settings into attention metadata.