Adding a New Model Family¶
This tutorial walks through adding a model family to Diffulex. The goal is to make the model load through the standard engine path, decode with a compatible strategy, and pass a focused smoke test before broad benchmarking.
Choose a Model Name¶
Pick a stable model_name string. This key connects configuration, model
registration, sampler registration, CLI choices, and benchmark configs. Keep the
name lowercase and consistent with existing names such as llada, sdar, and
fast_dllm_v2.
If the model should be benchmarked from the CLI, add the name to
MODEL_NAME_CHOICES in diffulex_bench/arg_parser.py.
Model Implementation¶
Add model code under diffulex/model/. Register the model with
AutoModelForDiffusionLM.register. Most factories receive config.hf_config;
use use_full_config=True only when model construction needs full Diffulex
runtime settings.
The model should match the interface expected by the selected model runner and sampler. Start from the closest existing model family and keep the first version minimal.
Sampler Implementation¶
Add a matching sampler under diffulex/sampler/ when the model needs
family-specific token update logic. Register it with AutoSampler.register.
Use sampling_mode="naive" unless the model needs edit-style updates. Edit
sampling is currently restricted to specific LLaDA2-family model names in
Config._validate_sampling_mode.
Configuration Defaults¶
Only add model-specific defaults when the generic engine arguments are not enough. Examples already in the config:
Condition |
Existing config behavior |
|---|---|
DiffusionGemma |
Uses the native |
DMax |
Requires edit sampling and a DMax-compatible model name. |
D2F |
Disables prefix caching and uses full-prefix multi-block behavior. |
Avoid broad validation until a real invalid state has been observed.
Benchmark and Serving Configs¶
After the model loads, add a small benchmark config under
diffulex_bench/configs/ if the model is meant to be evaluated regularly. Use
paths as placeholders and keep model-specific settings explicit.
For serving, document a minimal command with low request and token limits first. Users can scale limits after the command succeeds.
Verification¶
A staged verification path keeps wiring issues easy to isolate:
Import the model and sampler modules.
Construct a
Configwith the newmodel_name.Run one tiny offline generation.
Run a benchmark with
--dataset-limit.Add focused tests for model loading, sampler behavior, or config validation.
Do not start with a full benchmark. Full evaluations hide basic wiring problems behind long runtime and larger GPU memory pressure.