diffulex.utils¶

diffulex.utils contains shared helpers that do not belong to one model family, strategy, or serving path. The modules here are still part of the runtime path: checkpoint loading, tokenizer construction, and output accounting all flow through this package.

Module	Main responsibility
`diffulex.utils.checkpoint`	Small dataclasses used by checkpoint weight resolution.
`diffulex.utils.loader`	Base-model and LoRA weight loading.
`diffulex.utils.output`	Generation trajectories, text conversion, and benchmark metrics.
`diffulex.utils.registry`	Display helpers for registry factories.
`diffulex.utils.tokenizer`	Robust Hugging Face tokenizer construction.

diffulex.utils.checkpoint¶

checkpoint defines the value objects used by model and layer code when a checkpoint tensor needs custom loading behavior.

Symbol	How to use it	What it does
`LoadContext`	Receive it inside a module’s `resolve_checkpoint_weight` hook.	Carries the engine `Config` and the full checkpoint tensor name being resolved.
`ResolvedWeight`	Return it from `resolve_checkpoint_weight`.	Describes where a tensor should go: a parameter, a buffer, a custom loader, a transform, a shard id, or a skip marker.

Use ResolvedWeight when checkpoint names do not map cleanly to PyTorch parameter names. It keeps family-specific mapping logic near the model or layer that owns the weight, while loader handles the actual copy.

diffulex.utils.loader¶

loader is responsible for reading .safetensors checkpoints, applying custom weight resolvers, handling packed module mappings, and optionally loading LoRA adapters.

Symbol	How to use it	What it does
`load_lora_config`	Pass a LoRA adapter directory.	Reads `adapter_config.json` when present; otherwise returns an empty dict.
`enable_lora_for_model`	Pass a model and LoRA config before loading weights.	Calls `__init_lora__` on matching modules so LoRA tensors exist.
`default_weight_loader`	Use as the fallback parameter loader.	Copies the loaded tensor into `param.data`.
`resolve_weight_spec`	Pass model, checkpoint tensor name, and config.	Walks modules from most specific prefix to root and asks `resolve_checkpoint_weight` hooks for a `ResolvedWeight`.
`apply_resolved_weight`	Pass a `ResolvedWeight` and loaded tensor.	Applies transforms, custom loaders, parameter shards, buffers, or skip behavior.
`try_load_direct`	Pass model, tensor name, and loaded tensor.	Attempts direct parameter or buffer loading by exact name.
`try_load_via_packed_mapping`	Pass model, packed mapping, tensor name, loaded tensor, and config.	Handles packed projections such as merged QKV or model-family-specific aliases.
`load_model`	Pass an initialized model and `Config`.	Loads base `.safetensors` files, enables LoRA when requested, then loads LoRA weights.
`load_lora_weights`	Pass a LoRA-enabled model and adapter path.	Finds LoRA A/B tensors, handles TP sharding for supported layers, and optionally pre-merges adapters.

The load order is deliberate. Custom resolvers get the first chance to map a checkpoint tensor, packed-module mappings run second, and exact parameter/buffer names are tried last. This makes unusual model-family layouts explicit without breaking the simple case.

diffulex.utils.output¶

output stores generation results and computes the metrics shown by offline inference and benchmarks. It keeps both token-level trajectories and aggregate throughput counters.

Symbol	How to use it	What it does
`decode_token_ids_robust`	Pass a tokenizer and token IDs.	Decodes normally first, then falls back to token conversion for tokenizers with stricter decode signatures.
`ReqStep`	Created for each scheduled engine step.	Records step time, prefill/decode mode, generated token count, running tokens, buffer block IDs, and optional block trace.
`ReqTrajectory`	One item per prompt.	Stores final token IDs, full response token IDs, truncation flags, completion reason, text, and per-step trajectory.
`GenerationOutputs`	Created by the engine for a batch.	Accumulates trajectories and exposes metrics such as TPF, TTFT, TPOT, throughput, prefill throughput, and decode throughput.
`GenerationOutputs.record_step`	Call after each engine step.	Updates batch counters and appends `ReqStep` records for each active request.
`GenerationOutputs.convert_to_text`	Pass the tokenizer after generation finishes.	Decodes truncated and full token responses into text.
`GenerationOutputs.to_benchmark_format`	Call before returning benchmark-compatible data.	Produces `{text, full_text, token_ids, nfe}` dictionaries.

Set DIFFULEX_SAVE_TRACE=0 when block-level traces are not needed. Leaving it enabled records per-block status and mask ratios, which is useful for debugging decoding behavior but adds more data to each trajectory.

diffulex.utils.registry¶

registry contains small helpers used by the registry classes for readable errors and diagnostics.

Symbol	How to use it	What it does
`fetch_factory_name`	Pass a class, function, `functools.partial`, or callable object.	Returns a stable module-qualified display name after unwrapping decorators and partials.

Use this helper when a registry needs to describe which factory is currently bound without assuming the factory is a plain class.

diffulex.utils.tokenizer¶

tokenizer wraps Hugging Face AutoTokenizer.from_pretrained with a fallback for tokenizer configs that store extra_special_tokens as a list. Some tokenizer versions expect a dict, so Diffulex coerces the list into stable generated token names before retrying.

Symbol	How to use it	What it does
`auto_tokenizer_from_pretrained`	Use instead of calling `AutoTokenizer.from_pretrained` directly in Diffulex code.	Loads the tokenizer, and if necessary retries with coerced `extra_special_tokens`.

The fallback only runs for the known extra_special_tokens shape error. Other tokenizer loading failures are re-raised so configuration problems remain visible.