Models¶

Diffulex model support is defined by three choices:

model_name: selects the registered model implementation and sampler factory;
decoding_strategy: selects request state, scheduler, KV cache manager, and model runner;
sampling_mode: selects standard or edit-style sampler behavior.

The config validator normalizes some model/strategy combinations and rejects known invalid combinations.

Supported Model Families¶

Model family	`model_name` values	Typical strategy	Notes
Dream / D2F-Dream	`dream`	`d2f`	D2F-style full-prefix block decoding.
DiffuCoder / D2F-DiffuCoder	`diffucoder`	`d2f`	Uses shifted sampler behavior.
Dream reasoner	`dream_reasoner`	`multi_bd`	Block-causal MultiBD path.
Stable-DiffCoder	`stable_diffcoder`	`multi_bd`	Block-causal MultiBD path.
LLaDA / D2F-LLaDA	`llada`	`d2f`	Use D2F LoRA-style configs when applicable.
Fast-dLLM-v2	`fast_dllm_v2`	`multi_bd` or `fast_dllm_v2`	Dual-Cache sub-block refinement; `fast_dllm_v2` strategy enables the native FDv2 decoding path.
SDAR	`sdar`	`multi_bd`	Dense SDAR path.
SDAR-MoE	`sdar_moe`	`multi_bd`	MoE path; keep expert parallel at `1` unless extending the runtime.
LLaDA2 family	`llada2`, `llada2_mini`, `llada2_moe`, `llada2dot1_mini`, `llada2_mini_dmax`	`multi_bd`, `dmax`, or `fast_dllm_v2`	`llada2_mini_dmax` enables DMax sampler defaults.
DiffusionGemma	`diffusion_gemma`	`diffusion_gemma`	Native 256-token canvas/block decoder.

The original full-attention inference implementations from some upstream dLLM projects are not the target runtime. Diffulex adds support model by model through block-wise adapters, samplers, and strategy registrations.

Strategy Compatibility¶

Strategy	Use it for	Important behavior
`d2f`	D2F-style LLaDA, Dream, and DiffuCoder paths	Forces full-prefix multi-block behavior and disables prefix caching.
`multi_bd`	LLaDA2, SDAR, stable DiffuCoder/Dream reasoner paths	Implements Multi-Block Diffusion with block-causal visibility and prefix caching.
`fast_dllm_v2`	Fast-dLLM-v2 native dual-cache decoding	3-mode sub-block refinement FSM with dedicated CUDA graphs per mode.
`dmax`	Supported LLaDA2 edit-sampling experiments	Token merge + edit sampling; requires `sampling_mode="edit"`.
`diffusion_gemma`	DiffusionGemma	Uses DiffusionGemma request, sampler, block/page size, and canvas defaults.

Sampling Modes¶

Sampling mode	Use it for
`naive`	Standard confidence-based diffusion sampling. This is the default for most supported models.
`edit`	LLaDA2-family edit/remask sampling. Required by DMax-style decoding.

Model Path Requirements¶

Model and tokenizer paths should point to local directories. During startup, Diffulex loads tokenizer metadata, Hugging Face config, and then the registered Diffulex model implementation.

Before a full benchmark, verify:

Requirement	Check
Checkpoint path	The model directory exists and contains the expected config and weights.
Tokenizer	The tokenizer loads from `tokenizer_path` or the model path.
`model_name`	The model name is listed in the table above and registered under `diffulex/model/`.
Strategy	The strategy is compatible with the model family.
Mask token	`mask_token_id` matches the tokenizer for the selected model.
Page/block size	`block_size <= page_size`; DiffusionGemma uses `256/256`.

Maintained Benchmark Configs¶

Common starting points live under diffulex_bench/configs/:

Config	Purpose
`llada2_mini_gsm8k.yml`	LLaDA2-mini, `multi_bd`, GSM8K.
`llada2_mini_dmax_gsm8k.yml`	LLaDA2-mini DMax/edit sampling.
`diffusion_gemma_gsm8k.yml`	DiffusionGemma native Diffulex benchmark.
`diffucoder_instruct_gsm8k.yml`	DiffuCoder D2F-style benchmark.
`dream_base_gsm8k.yml`	Dream D2F-style benchmark.
`fast_dllm_v2_gsm8k.yml`	Fast-dLLM-v2 multi-block benchmark.
`sdar_chat_gsm8k.yml`	SDAR dense benchmark.
`sdar_moe_chat_gsm8k.yml`	SDAR-MoE benchmark.
`fast_dllm_v2_multibd_gsm8k.yml`	Fast-dLLM-v2 training-free MultiBD variant (blksz=4, bufsz=6).
`llada_instruct_gsm8k.yml`	LLaDA D2F-style benchmark.
`llada2_mini_gsm8k_tp1_limit200_maxnfe2048.yml`	Parity/evaluation variant with TP=1 and 200-sample limit.

Start with --dataset-limit before running a full dataset.