Decoding Strategies

Diffulex selects strategy-specific request, scheduler, KV cache manager, model runner, and attention metadata components through registries. The strategy is chosen by decoding_strategy.

Decoding Strategy

Set decoding_strategy to one of d2f, multi_bd, fast_dllm_v2, dmax, or diffusion_gemma.

Benchmark config input also normalizes older aliases multi_block_diffusion, block_diffusion, and fast_dllm to multi_bd.

The choice changes more than the sampler name:

Strategy

Behavior

d2f

Forces full-prefix multi-block behavior and disables prefix caching.

multi_bd

Implements Multi-Block Diffusion (MultiBD): a bounded active block set with block-causal visibility and prefix caching enabled when compatible.

fast_dllm_v2

Implements Fast-dLLM-v2 dual-cache decoding: 3-mode FSM (full-buffer init, sub-block refine, final commit) with per-mode CUDA graphs.

dmax

Enables DMax-style token merging on supported edit-sampling models.

diffusion_gemma

Uses the native DiffusionGemma canvas/block runtime.

Decoding Thresholds

Thresholds tune when a strategy adds, releases, accepts, edits, or remasks tokens and blocks.

Key

How to set it

What it does

add_block_threshold

Start from the default 0.1; tune as a float for block-add behavior.

Controls when another decoding block can be added.

semi_complete_threshold

Start from the default 0.9; tune as a float for block advancement.

Controls when semi-complete block state can advance.

accept_threshold

Use a confidence value from 0 to 1. The default is 0.9.

Accepts mask-to-token updates once confidence is high enough.

edit_threshold

Use a confidence value from 0 to 1. The default is 0.0.

Accepts token-to-token edits in edit-style decoding.

remask_threshold

Use a confidence value from 0 to 1. The default is 0.4.

Remasks filled tokens that fall below the confidence threshold.

token_stability_threshold

Use a stability ratio from 0 to 1. The default is 0.0.

Controls DMax-style edit-block progress.

Keep thresholds in YAML when comparing experiments. Use CLI overrides for short ad hoc runs.

Sampling Mode

Set sampling_mode to naive for the standard sampler path or edit for edit-style decoding.

sampling_mode="edit" is restricted to edit-sampling model names:

Compatible model_name

llada2

llada2_moe

llada2_mini

llada2dot1_mini

llada2_mini_dmax

decoding_strategy="dmax" requires sampling_mode="edit" and one of the compatible model names.