# Models Diffulex model support is defined by three choices: - `model_name`: selects the registered model implementation and sampler factory; - `decoding_strategy`: selects request state, scheduler, KV cache manager, and model runner; - `sampling_mode`: selects standard or edit-style sampler behavior. The config validator normalizes some model/strategy combinations and rejects known invalid combinations. ## Supported Model Families | Model family | `model_name` values | Typical strategy | Notes | | --- | --- | --- | --- | | Dream / D2F-Dream | `dream` | `d2f` | D2F-style full-prefix block decoding. | | DiffuCoder / D2F-DiffuCoder | `diffucoder` | `d2f` | Uses shifted sampler behavior. | | Dream reasoner | `dream_reasoner` | `multi_bd` | Block-causal MultiBD path. | | Stable-DiffCoder | `stable_diffcoder` | `multi_bd` | Block-causal MultiBD path. | | LLaDA / D2F-LLaDA | `llada` | `d2f` | Use D2F LoRA-style configs when applicable. | | Fast-dLLM-v2 | `fast_dllm_v2` | `multi_bd` or `fast_dllm_v2` | Dual-Cache sub-block refinement; `fast_dllm_v2` strategy enables the native FDv2 decoding path. | | SDAR | `sdar` | `multi_bd` | Dense SDAR path. | | SDAR-MoE | `sdar_moe` | `multi_bd` | MoE path; keep expert parallel at `1` unless extending the runtime. | | LLaDA2 family | `llada2`, `llada2_mini`, `llada2_moe`, `llada2dot1_mini`, `llada2_mini_dmax` | `multi_bd`, `dmax`, or `fast_dllm_v2` | `llada2_mini_dmax` enables DMax sampler defaults. | | DiffusionGemma | `diffusion_gemma` | `diffusion_gemma` | Native 256-token canvas/block decoder. | The original full-attention inference implementations from some upstream dLLM projects are not the target runtime. Diffulex adds support model by model through block-wise adapters, samplers, and strategy registrations. ## Strategy Compatibility | Strategy | Use it for | Important behavior | | --- | --- | --- | | `d2f` | D2F-style LLaDA, Dream, and DiffuCoder paths | Forces full-prefix multi-block behavior and disables prefix caching. | | `multi_bd` | LLaDA2, SDAR, stable DiffuCoder/Dream reasoner paths | Implements Multi-Block Diffusion with block-causal visibility and prefix caching. | | `fast_dllm_v2` | Fast-dLLM-v2 native dual-cache decoding | 3-mode sub-block refinement FSM with dedicated CUDA graphs per mode. | | `dmax` | Supported LLaDA2 edit-sampling experiments | Token merge + edit sampling; requires `sampling_mode="edit"`. | | `diffusion_gemma` | DiffusionGemma | Uses DiffusionGemma request, sampler, block/page size, and canvas defaults. | ## Sampling Modes | Sampling mode | Use it for | | --- | --- | | `naive` | Standard confidence-based diffusion sampling. This is the default for most supported models. | | `edit` | LLaDA2-family edit/remask sampling. Required by DMax-style decoding. | ## Model Path Requirements Model and tokenizer paths should point to local directories. During startup, Diffulex loads tokenizer metadata, Hugging Face config, and then the registered Diffulex model implementation. Before a full benchmark, verify: | Requirement | Check | | --- | --- | | Checkpoint path | The model directory exists and contains the expected config and weights. | | Tokenizer | The tokenizer loads from `tokenizer_path` or the model path. | | `model_name` | The model name is listed in the table above and registered under `diffulex/model/`. | | Strategy | The strategy is compatible with the model family. | | Mask token | `mask_token_id` matches the tokenizer for the selected model. | | Page/block size | `block_size <= page_size`; DiffusionGemma uses `256/256`. | ## Maintained Benchmark Configs Common starting points live under `diffulex_bench/configs/`: | Config | Purpose | | --- | --- | | `llada2_mini_gsm8k.yml` | LLaDA2-mini, `multi_bd`, GSM8K. | | `llada2_mini_dmax_gsm8k.yml` | LLaDA2-mini DMax/edit sampling. | | `diffusion_gemma_gsm8k.yml` | DiffusionGemma native Diffulex benchmark. | | `diffucoder_instruct_gsm8k.yml` | DiffuCoder D2F-style benchmark. | | `dream_base_gsm8k.yml` | Dream D2F-style benchmark. | | `fast_dllm_v2_gsm8k.yml` | Fast-dLLM-v2 multi-block benchmark. | | `sdar_chat_gsm8k.yml` | SDAR dense benchmark. | | `sdar_moe_chat_gsm8k.yml` | SDAR-MoE benchmark. | | `fast_dllm_v2_multibd_gsm8k.yml` | Fast-dLLM-v2 training-free MultiBD variant (blksz=4, bufsz=6). | | `llada_instruct_gsm8k.yml` | LLaDA D2F-style benchmark. | | `llada2_mini_gsm8k_tp1_limit200_maxnfe2048.yml` | Parity/evaluation variant with TP=1 and 200-sample limit. | Start with `--dataset-limit` before running a full dataset.