# diffulex.utils

`diffulex.utils` contains shared helpers that do not belong to one model
family, strategy, or serving path. The modules here are still part of the
runtime path: checkpoint loading, tokenizer construction, and output accounting
all flow through this package.

| Module | Main responsibility |
| --- | --- |
| `diffulex.utils.checkpoint` | Small dataclasses used by checkpoint weight resolution. |
| `diffulex.utils.loader` | Base-model and LoRA weight loading. |
| `diffulex.utils.output` | Generation trajectories, text conversion, and benchmark metrics. |
| `diffulex.utils.registry` | Display helpers for registry factories. |
| `diffulex.utils.tokenizer` | Robust Hugging Face tokenizer construction. |

## diffulex.utils.checkpoint

`checkpoint` defines the value objects used by model and layer code when a
checkpoint tensor needs custom loading behavior.

| Symbol | How to use it | What it does |
| --- | --- | --- |
| `LoadContext` | Receive it inside a module's `resolve_checkpoint_weight` hook. | Carries the engine `Config` and the full checkpoint tensor name being resolved. |
| `ResolvedWeight` | Return it from `resolve_checkpoint_weight`. | Describes where a tensor should go: a parameter, a buffer, a custom loader, a transform, a shard id, or a skip marker. |

Use `ResolvedWeight` when checkpoint names do not map cleanly to PyTorch
parameter names. It keeps family-specific mapping logic near the model or layer
that owns the weight, while `loader` handles the actual copy.

## diffulex.utils.loader

`loader` is responsible for reading `.safetensors` checkpoints, applying custom
weight resolvers, handling packed module mappings, and optionally loading LoRA
adapters.

| Symbol | How to use it | What it does |
| --- | --- | --- |
| `load_lora_config` | Pass a LoRA adapter directory. | Reads `adapter_config.json` when present; otherwise returns an empty dict. |
| `enable_lora_for_model` | Pass a model and LoRA config before loading weights. | Calls `__init_lora__` on matching modules so LoRA tensors exist. |
| `default_weight_loader` | Use as the fallback parameter loader. | Copies the loaded tensor into `param.data`. |
| `resolve_weight_spec` | Pass model, checkpoint tensor name, and config. | Walks modules from most specific prefix to root and asks `resolve_checkpoint_weight` hooks for a `ResolvedWeight`. |
| `apply_resolved_weight` | Pass a `ResolvedWeight` and loaded tensor. | Applies transforms, custom loaders, parameter shards, buffers, or skip behavior. |
| `try_load_direct` | Pass model, tensor name, and loaded tensor. | Attempts direct parameter or buffer loading by exact name. |
| `try_load_via_packed_mapping` | Pass model, packed mapping, tensor name, loaded tensor, and config. | Handles packed projections such as merged QKV or model-family-specific aliases. |
| `load_model` | Pass an initialized model and `Config`. | Loads base `.safetensors` files, enables LoRA when requested, then loads LoRA weights. |
| `load_lora_weights` | Pass a LoRA-enabled model and adapter path. | Finds LoRA A/B tensors, handles TP sharding for supported layers, and optionally pre-merges adapters. |

The load order is deliberate. Custom resolvers get the first chance to map a
checkpoint tensor, packed-module mappings run second, and exact parameter/buffer
names are tried last. This makes unusual model-family layouts explicit without
breaking the simple case.

## diffulex.utils.output

`output` stores generation results and computes the metrics shown by offline
inference and benchmarks. It keeps both token-level trajectories and aggregate
throughput counters.

| Symbol | How to use it | What it does |
| --- | --- | --- |
| `decode_token_ids_robust` | Pass a tokenizer and token IDs. | Decodes normally first, then falls back to token conversion for tokenizers with stricter decode signatures. |
| `ReqStep` | Created for each scheduled engine step. | Records step time, prefill/decode mode, generated token count, running tokens, buffer block IDs, and optional block trace. |
| `ReqTrajectory` | One item per prompt. | Stores final token IDs, full response token IDs, truncation flags, completion reason, text, and per-step trajectory. |
| `GenerationOutputs` | Created by the engine for a batch. | Accumulates trajectories and exposes metrics such as TPF, TTFT, TPOT, throughput, prefill throughput, and decode throughput. |
| `GenerationOutputs.record_step` | Call after each engine step. | Updates batch counters and appends `ReqStep` records for each active request. |
| `GenerationOutputs.convert_to_text` | Pass the tokenizer after generation finishes. | Decodes truncated and full token responses into text. |
| `GenerationOutputs.to_benchmark_format` | Call before returning benchmark-compatible data. | Produces `{text, full_text, token_ids, nfe}` dictionaries. |

Set `DIFFULEX_SAVE_TRACE=0` when block-level traces are not needed. Leaving it
enabled records per-block status and mask ratios, which is useful for debugging
decoding behavior but adds more data to each trajectory.

## diffulex.utils.registry

`registry` contains small helpers used by the registry classes for readable
errors and diagnostics.

| Symbol | How to use it | What it does |
| --- | --- | --- |
| `fetch_factory_name` | Pass a class, function, `functools.partial`, or callable object. | Returns a stable module-qualified display name after unwrapping decorators and partials. |

Use this helper when a registry needs to describe which factory is currently
bound without assuming the factory is a plain class.

## diffulex.utils.tokenizer

`tokenizer` wraps Hugging Face `AutoTokenizer.from_pretrained` with a fallback
for tokenizer configs that store `extra_special_tokens` as a list. Some
tokenizer versions expect a dict, so Diffulex coerces the list into stable
generated token names before retrying.

| Symbol | How to use it | What it does |
| --- | --- | --- |
| `auto_tokenizer_from_pretrained` | Use instead of calling `AutoTokenizer.from_pretrained` directly in Diffulex code. | Loads the tokenizer, and if necessary retries with coerced `extra_special_tokens`. |

The fallback only runs for the known `extra_special_tokens` shape error. Other
tokenizer loading failures are re-raised so configuration problems remain
visible.