Models¶
Diffulex model support is defined by three choices:
model_name: selects the registered model implementation and sampler factory;decoding_strategy: selects request state, scheduler, KV cache manager, and model runner;sampling_mode: selects standard or edit-style sampler behavior.
The config validator normalizes some model/strategy combinations and rejects known invalid combinations.
Supported Model Families¶
Model family |
|
Typical strategy |
Notes |
|---|---|---|---|
Dream / D2F-Dream |
|
|
D2F-style full-prefix block decoding. |
DiffuCoder / D2F-DiffuCoder |
|
|
Uses shifted sampler behavior. |
Dream reasoner |
|
|
Block-causal MultiBD path. |
Stable-DiffCoder |
|
|
Block-causal MultiBD path. |
LLaDA / D2F-LLaDA |
|
|
Use D2F LoRA-style configs when applicable. |
Fast-dLLM-v2 |
|
|
Dual-Cache sub-block refinement; |
SDAR |
|
|
Dense SDAR path. |
SDAR-MoE |
|
|
MoE path; keep expert parallel at |
LLaDA2 family |
|
|
|
DiffusionGemma |
|
|
Native 256-token canvas/block decoder. |
The original full-attention inference implementations from some upstream dLLM projects are not the target runtime. Diffulex adds support model by model through block-wise adapters, samplers, and strategy registrations.
Strategy Compatibility¶
Strategy |
Use it for |
Important behavior |
|---|---|---|
|
D2F-style LLaDA, Dream, and DiffuCoder paths |
Forces full-prefix multi-block behavior and disables prefix caching. |
|
LLaDA2, SDAR, stable DiffuCoder/Dream reasoner paths |
Implements Multi-Block Diffusion with block-causal visibility and prefix caching. |
|
Fast-dLLM-v2 native dual-cache decoding |
3-mode sub-block refinement FSM with dedicated CUDA graphs per mode. |
|
Supported LLaDA2 edit-sampling experiments |
Token merge + edit sampling; requires |
|
DiffusionGemma |
Uses DiffusionGemma request, sampler, block/page size, and canvas defaults. |
Sampling Modes¶
Sampling mode |
Use it for |
|---|---|
|
Standard confidence-based diffusion sampling. This is the default for most supported models. |
|
LLaDA2-family edit/remask sampling. Required by DMax-style decoding. |
Model Path Requirements¶
Model and tokenizer paths should point to local directories. During startup, Diffulex loads tokenizer metadata, Hugging Face config, and then the registered Diffulex model implementation.
Before a full benchmark, verify:
Requirement |
Check |
|---|---|
Checkpoint path |
The model directory exists and contains the expected config and weights. |
Tokenizer |
The tokenizer loads from |
|
The model name is listed in the table above and registered under |
Strategy |
The strategy is compatible with the model family. |
Mask token |
|
Page/block size |
|
Maintained Benchmark Configs¶
Common starting points live under diffulex_bench/configs/:
Config |
Purpose |
|---|---|
|
LLaDA2-mini, |
|
LLaDA2-mini DMax/edit sampling. |
|
DiffusionGemma native Diffulex benchmark. |
|
DiffuCoder D2F-style benchmark. |
|
Dream D2F-style benchmark. |
|
Fast-dLLM-v2 multi-block benchmark. |
|
SDAR dense benchmark. |
|
SDAR-MoE benchmark. |
|
Fast-dLLM-v2 training-free MultiBD variant (blksz=4, bufsz=6). |
|
LLaDA D2F-style benchmark. |
|
Parity/evaluation variant with TP=1 and 200-sample limit. |
Start with --dataset-limit before running a full dataset.