diffulex¶

The diffulex package is the public Python entry point for in-process inference. Importing the package is intentionally lightweight: the root module uses lazy imports so that from diffulex import Diffulex, SamplingParams does not eagerly import the full engine, CUDA kernels, or model stack until the engine is constructed.

Public Symbols¶

Diffulex
SamplingParams
get_logger
setup_logger
LoggerMixin

Diffulex¶

Diffulex is a thin public alias for diffulex.engine.engine.DiffulexEngine. Constructing it validates the engine configuration, loads tokenizer metadata, initializes the model runner, and creates the strategy-specific scheduler.

from diffulex import Diffulex

llm = Diffulex(
    model="/path/to/model",
    model_name="llada",
    decoding_strategy="d2f",
    tensor_parallel_size=1,
    data_parallel_size=1,
)

The model argument must point to a local model directory. Most keyword arguments are passed through to diffulex.config.Config; unsupported keyword arguments are ignored by the engine constructor.

Generation¶

Use generate for offline batched inference:

from diffulex import SamplingParams

outputs = llm.generate(
    ["Solve 2 + 2."],
    SamplingParams(temperature=0.0, max_tokens=32),
)

for item in outputs.trajectories:
    print(item.text)

generate accepts a list of strings or a list of token ID lists. When prompts are strings, the engine tokenizes them with the model tokenizer. The return value records generated text, token IDs, request trajectories, and timing data.

Request Lifecycle¶

Lower-level callers can use the step API:

add_request(prompt, sampling_params) adds a request and returns its request ID.
step() advances the scheduler and model runner by one engine step.
is_finished() reports whether all queued requests are complete.
abort_request(req_id) asks the scheduler to stop a request.
exit() tears down model workers and profiling sessions.

Call exit() when embedding Diffulex in a long-running process. The engine also registers an atexit hook and signal handlers, but explicit shutdown makes resource ownership clearer in tests and services.

SamplingParams¶

SamplingParams controls generation behavior for each request:

Parameter	How to set it	What it does
`temperature`	Use `0.0` for deterministic runs, or a higher float for sampling.	Controls generation randomness.
`max_tokens`	Use a positive output-token limit.	Caps the number of generated tokens.
`max_nfe`	Use a positive integer, or leave it unset.	Caps the number of forward evaluations when the strategy supports that limit.
`max_repetition_run`	Use a positive integer, or leave it unset.	Stops generation after a long repeated-token run.
`ignore_eos`	Leave `False` for normal generation; set `True` only when a task requires it.	Lets generation continue after EOS.

When set, max_nfe and max_repetition_run must be positive.

Logging Helpers¶

get_logger, setup_logger, and LoggerMixin are re-exported for callers that want logging behavior consistent with Diffulex internals.

Root Modules¶

Module	Source
`diffulex.config`	`diffulex/config.py`
`diffulex.diffulex`	`diffulex/diffulex.py`
`diffulex.logger`	`diffulex/logger.py`
`diffulex.profiling`	`diffulex/profiling.py`
`diffulex.sampling_params`	`diffulex/sampling_params.py`
`diffulex.vllm_compat`	`diffulex/vllm_compat.py`

Subpackages¶

The package map below is intentionally limited to two levels: each page covers one direct diffulex.* package and lists only its direct children.