# Adding a New Model Family

This tutorial walks through adding a model family to Diffulex. The goal is to
make the model load through the standard engine path, decode with a compatible
strategy, and pass a focused smoke test before broad benchmarking.

## Choose a Model Name

Pick a stable `model_name` string. This key connects configuration, model
registration, sampler registration, CLI choices, and benchmark configs. Keep the
name lowercase and consistent with existing names such as `llada`, `sdar`, and
`fast_dllm_v2`.

If the model should be benchmarked from the CLI, add the name to
`MODEL_NAME_CHOICES` in `diffulex_bench/arg_parser.py`.

## Model Implementation

Add model code under `diffulex/model/`. Register the model with
`AutoModelForDiffusionLM.register`. Most factories receive `config.hf_config`;
use `use_full_config=True` only when model construction needs full Diffulex
runtime settings.

The model should match the interface expected by the selected model runner and
sampler. Start from the closest existing model family and keep the first version
minimal.

## Sampler Implementation

Add a matching sampler under `diffulex/sampler/` when the model needs
family-specific token update logic. Register it with `AutoSampler.register`.

Use `sampling_mode="naive"` unless the model needs edit-style updates. Edit
sampling is currently restricted to specific LLaDA2-family model names in
`Config._validate_sampling_mode`.

## Configuration Defaults

Only add model-specific defaults when the generic engine arguments are not
enough. Examples already in the config:

| Condition | Existing config behavior |
| --- | --- |
| DiffusionGemma | Uses the native `diffusion_gemma` strategy defaults, `block_size=256`, `page_size=256`, and `buffer_size=1`. |
| DMax | Requires edit sampling and a DMax-compatible model name. |
| D2F | Disables prefix caching and uses full-prefix multi-block behavior. |

Avoid broad validation until a real invalid state has been observed.

## Benchmark and Serving Configs

After the model loads, add a small benchmark config under
`diffulex_bench/configs/` if the model is meant to be evaluated regularly. Use
paths as placeholders and keep model-specific settings explicit.

For serving, document a minimal command with low request and token limits first.
Users can scale limits after the command succeeds.

## Verification

A staged verification path keeps wiring issues easy to isolate:

1. Import the model and sampler modules.
2. Construct a `Config` with the new `model_name`.
3. Run one tiny offline generation.
4. Run a benchmark with `--dataset-limit`.
5. Add focused tests for model loading, sampler behavior, or config validation.

Do not start with a full benchmark. Full evaluations hide basic wiring problems
behind long runtime and larger GPU memory pressure.