# Models

Diffulex model support is defined by three choices:

- `model_name`: selects the registered model implementation and sampler factory;
- `decoding_strategy`: selects request state, scheduler, KV cache manager, and
  model runner;
- `sampling_mode`: selects standard or edit-style sampler behavior.

The config validator normalizes some model/strategy combinations and rejects
known invalid combinations.

## Supported Model Families

| Model family | `model_name` values | Typical strategy | Notes |
| --- | --- | --- | --- |
| Dream / D2F-Dream | `dream` | `d2f` | D2F-style full-prefix block decoding. |
| DiffuCoder / D2F-DiffuCoder | `diffucoder` | `d2f` | Uses shifted sampler behavior. |
| Dream reasoner | `dream_reasoner` | `multi_bd` | Block-causal MultiBD path. |
| Stable-DiffCoder | `stable_diffcoder` | `multi_bd` | Block-causal MultiBD path. |
| LLaDA / D2F-LLaDA | `llada` | `d2f` | Use D2F LoRA-style configs when applicable. |
| Fast-dLLM-v2 | `fast_dllm_v2` | `multi_bd` or `fast_dllm_v2` | Dual-Cache sub-block refinement; `fast_dllm_v2` strategy enables the native FDv2 decoding path. |
| SDAR | `sdar` | `multi_bd` | Dense SDAR path. |
| SDAR-MoE | `sdar_moe` | `multi_bd` | MoE path; keep expert parallel at `1` unless extending the runtime. |
| LLaDA2 family | `llada2`, `llada2_mini`, `llada2_moe`, `llada2dot1_mini`, `llada2_mini_dmax` | `multi_bd`, `dmax`, or `fast_dllm_v2` | `llada2_mini_dmax` enables DMax sampler defaults. |
| DiffusionGemma | `diffusion_gemma` | `diffusion_gemma` | Native 256-token canvas/block decoder. |

The original full-attention inference implementations from some upstream dLLM
projects are not the target runtime. Diffulex adds support model by model
through block-wise adapters, samplers, and strategy registrations.

## Strategy Compatibility

| Strategy | Use it for | Important behavior |
| --- | --- | --- |
| `d2f` | D2F-style LLaDA, Dream, and DiffuCoder paths | Forces full-prefix multi-block behavior and disables prefix caching. |
| `multi_bd` | LLaDA2, SDAR, stable DiffuCoder/Dream reasoner paths | Implements Multi-Block Diffusion with block-causal visibility and prefix caching. |
| `fast_dllm_v2` | Fast-dLLM-v2 native dual-cache decoding | 3-mode sub-block refinement FSM with dedicated CUDA graphs per mode. |
| `dmax` | Supported LLaDA2 edit-sampling experiments | Token merge + edit sampling; requires `sampling_mode="edit"`. |
| `diffusion_gemma` | DiffusionGemma | Uses DiffusionGemma request, sampler, block/page size, and canvas defaults. |

## Sampling Modes

| Sampling mode | Use it for |
| --- | --- |
| `naive` | Standard confidence-based diffusion sampling. This is the default for most supported models. |
| `edit` | LLaDA2-family edit/remask sampling. Required by DMax-style decoding. |

## Model Path Requirements

Model and tokenizer paths should point to local directories. During startup,
Diffulex loads tokenizer metadata, Hugging Face config, and then the registered
Diffulex model implementation.

Before a full benchmark, verify:

| Requirement | Check |
| --- | --- |
| Checkpoint path | The model directory exists and contains the expected config and weights. |
| Tokenizer | The tokenizer loads from `tokenizer_path` or the model path. |
| `model_name` | The model name is listed in the table above and registered under `diffulex/model/`. |
| Strategy | The strategy is compatible with the model family. |
| Mask token | `mask_token_id` matches the tokenizer for the selected model. |
| Page/block size | `block_size <= page_size`; DiffusionGemma uses `256/256`. |

## Maintained Benchmark Configs

Common starting points live under `diffulex_bench/configs/`:

| Config | Purpose |
| --- | --- |
| `llada2_mini_gsm8k.yml` | LLaDA2-mini, `multi_bd`, GSM8K. |
| `llada2_mini_dmax_gsm8k.yml` | LLaDA2-mini DMax/edit sampling. |
| `diffusion_gemma_gsm8k.yml` | DiffusionGemma native Diffulex benchmark. |
| `diffucoder_instruct_gsm8k.yml` | DiffuCoder D2F-style benchmark. |
| `dream_base_gsm8k.yml` | Dream D2F-style benchmark. |
| `fast_dllm_v2_gsm8k.yml` | Fast-dLLM-v2 multi-block benchmark. |
| `sdar_chat_gsm8k.yml` | SDAR dense benchmark. |
| `sdar_moe_chat_gsm8k.yml` | SDAR-MoE benchmark. |
| `fast_dllm_v2_multibd_gsm8k.yml` | Fast-dLLM-v2 training-free MultiBD variant (blksz=4, bufsz=6). |
| `llada_instruct_gsm8k.yml` | LLaDA D2F-style benchmark. |
| `llada2_mini_gsm8k_tp1_limit200_maxnfe2048.yml` | Parity/evaluation variant with TP=1 and 200-sample limit. |

Start with `--dataset-limit` before running a full dataset.