Decoding Strategies¶
Diffulex selects strategy-specific request, scheduler, KV cache manager, model
runner, and attention metadata components through registries. The strategy is
chosen by decoding_strategy.
Decoding Strategy¶
Set decoding_strategy to one of d2f, multi_bd, fast_dllm_v2, dmax, or
diffusion_gemma.
Benchmark config input also normalizes older aliases multi_block_diffusion,
block_diffusion, and fast_dllm to multi_bd.
The choice changes more than the sampler name:
Strategy |
Behavior |
|---|---|
|
Forces full-prefix multi-block behavior and disables prefix caching. |
|
Implements Multi-Block Diffusion (MultiBD): a bounded active block set with block-causal visibility and prefix caching enabled when compatible. |
|
Implements Fast-dLLM-v2 dual-cache decoding: 3-mode FSM (full-buffer init, sub-block refine, final commit) with per-mode CUDA graphs. |
|
Enables DMax-style token merging on supported edit-sampling models. |
|
Uses the native DiffusionGemma canvas/block runtime. |
Decoding Thresholds¶
Thresholds tune when a strategy adds, releases, accepts, edits, or remasks tokens and blocks.
Key |
How to set it |
What it does |
|---|---|---|
|
Start from the default |
Controls when another decoding block can be added. |
|
Start from the default |
Controls when semi-complete block state can advance. |
|
Use a confidence value from |
Accepts mask-to-token updates once confidence is high enough. |
|
Use a confidence value from |
Accepts token-to-token edits in edit-style decoding. |
|
Use a confidence value from |
Remasks filled tokens that fall below the confidence threshold. |
|
Use a stability ratio from |
Controls DMax-style edit-block progress. |
Keep thresholds in YAML when comparing experiments. Use CLI overrides for short ad hoc runs.
Sampling Mode¶
Set sampling_mode to naive for the standard sampler path or edit for
edit-style decoding.
sampling_mode="edit" is restricted to edit-sampling model names:
Compatible |
|---|
|
|
|
|
|
decoding_strategy="dmax" requires sampling_mode="edit" and one of the
compatible model names.