Decoding Strategies¶

Diffulex selects strategy-specific request, scheduler, KV cache manager, model runner, and attention metadata components through registries. The strategy is chosen by decoding_strategy.

Decoding Strategy¶

Set decoding_strategy to one of d2f, multi_bd, fast_dllm_v2, dmax, or diffusion_gemma.

Benchmark config input also normalizes older aliases multi_block_diffusion, block_diffusion, and fast_dllm to multi_bd.

The choice changes more than the sampler name:

Strategy	Behavior
`d2f`	Forces full-prefix multi-block behavior and disables prefix caching.
`multi_bd`	Implements Multi-Block Diffusion (MultiBD): a bounded active block set with block-causal visibility and prefix caching enabled when compatible.
`fast_dllm_v2`	Implements Fast-dLLM-v2 dual-cache decoding: 3-mode FSM (full-buffer init, sub-block refine, final commit) with per-mode CUDA graphs.
`dmax`	Enables DMax-style token merging on supported edit-sampling models.
`diffusion_gemma`	Uses the native DiffusionGemma canvas/block runtime.

Decoding Thresholds¶

Thresholds tune when a strategy adds, releases, accepts, edits, or remasks tokens and blocks.

Key	How to set it	What it does
`add_block_threshold`	Start from the default `0.1`; tune as a float for block-add behavior.	Controls when another decoding block can be added.
`semi_complete_threshold`	Start from the default `0.9`; tune as a float for block advancement.	Controls when semi-complete block state can advance.
`accept_threshold`	Use a confidence value from `0` to `1`. The default is `0.9`.	Accepts mask-to-token updates once confidence is high enough.
`edit_threshold`	Use a confidence value from `0` to `1`. The default is `0.0`.	Accepts token-to-token edits in edit-style decoding.
`remask_threshold`	Use a confidence value from `0` to `1`. The default is `0.4`.	Remasks filled tokens that fall below the confidence threshold.
`token_stability_threshold`	Use a stability ratio from `0` to `1`. The default is `0.0`.	Controls DMax-style edit-block progress.

Keep thresholds in YAML when comparing experiments. Use CLI overrides for short ad hoc runs.

Sampling Mode¶

Set sampling_mode to naive for the standard sampler path or edit for edit-style decoding.

sampling_mode="edit" is restricted to edit-sampling model names:

Compatible `model_name`
`llada2`
`llada2_moe`
`llada2_mini`
`llada2dot1_mini`
`llada2_mini_dmax`

decoding_strategy="dmax" requires sampling_mode="edit" and one of the compatible model names.

Related Arguments¶

Surface	Names	Notes
Python/config	`decoding_strategy`, `sampling_mode`, `decoding_thresholds`	Use these in `Config`, YAML, or Python construction.
CLI	`--decoding-strategy`, `--sampling-mode`, `--add-block-threshold`, `--semi-complete-threshold`, `--accept-threshold`, `--edit-threshold`, `--remask-threshold`, `--token-stability-threshold`	Use CLI overrides for short experiment runs.