Adding a New Model Family

This tutorial walks through adding a model family to Diffulex. The goal is to make the model load through the standard engine path, decode with a compatible strategy, and pass a focused smoke test before broad benchmarking.

Choose a Model Name

Pick a stable model_name string. This key connects configuration, model registration, sampler registration, CLI choices, and benchmark configs. Keep the name lowercase and consistent with existing names such as llada, sdar, and fast_dllm_v2.

If the model should be benchmarked from the CLI, add the name to MODEL_NAME_CHOICES in diffulex_bench/arg_parser.py.

Model Implementation

Add model code under diffulex/model/. Register the model with AutoModelForDiffusionLM.register. Most factories receive config.hf_config; use use_full_config=True only when model construction needs full Diffulex runtime settings.

The model should match the interface expected by the selected model runner and sampler. Start from the closest existing model family and keep the first version minimal.

Sampler Implementation

Add a matching sampler under diffulex/sampler/ when the model needs family-specific token update logic. Register it with AutoSampler.register.

Use sampling_mode="naive" unless the model needs edit-style updates. Edit sampling is currently restricted to specific LLaDA2-family model names in Config._validate_sampling_mode.

Configuration Defaults

Only add model-specific defaults when the generic engine arguments are not enough. Examples already in the config:

Condition

Existing config behavior

DiffusionGemma

Uses the native diffusion_gemma strategy defaults, block_size=256, page_size=256, and buffer_size=1.

DMax

Requires edit sampling and a DMax-compatible model name.

D2F

Disables prefix caching and uses full-prefix multi-block behavior.

Avoid broad validation until a real invalid state has been observed.

Benchmark and Serving Configs

After the model loads, add a small benchmark config under diffulex_bench/configs/ if the model is meant to be evaluated regularly. Use paths as placeholders and keep model-specific settings explicit.

For serving, document a minimal command with low request and token limits first. Users can scale limits after the command succeeds.

Verification

A staged verification path keeps wiring issues easy to isolate:

  1. Import the model and sampler modules.

  2. Construct a Config with the new model_name.

  3. Run one tiny offline generation.

  4. Run a benchmark with --dataset-limit.

  5. Add focused tests for model loading, sampler behavior, or config validation.

Do not start with a full benchmark. Full evaluations hide basic wiring problems behind long runtime and larger GPU memory pressure.