diffulex¶
The diffulex package is the public Python entry point for in-process
inference. Importing the package is intentionally lightweight: the root module
uses lazy imports so that from diffulex import Diffulex, SamplingParams does
not eagerly import the full engine, CUDA kernels, or model stack until the
engine is constructed.
Public Symbols¶
DiffulexSamplingParamsget_loggersetup_loggerLoggerMixin
Diffulex¶
Diffulex is a thin public alias for diffulex.engine.engine.DiffulexEngine.
Constructing it validates the engine configuration, loads tokenizer metadata,
initializes the model runner, and creates the strategy-specific scheduler.
from diffulex import Diffulex
llm = Diffulex(
model="/path/to/model",
model_name="llada",
decoding_strategy="d2f",
tensor_parallel_size=1,
data_parallel_size=1,
)
The model argument must point to a local model directory. Most keyword
arguments are passed through to diffulex.config.Config; unsupported keyword
arguments are ignored by the engine constructor.
Generation¶
Use generate for offline batched inference:
from diffulex import SamplingParams
outputs = llm.generate(
["Solve 2 + 2."],
SamplingParams(temperature=0.0, max_tokens=32),
)
for item in outputs.trajectories:
print(item.text)
generate accepts a list of strings or a list of token ID lists. When prompts
are strings, the engine tokenizes them with the model tokenizer. The return
value records generated text, token IDs, request trajectories, and timing data.
Request Lifecycle¶
Lower-level callers can use the step API:
add_request(prompt, sampling_params)adds a request and returns its request ID.step()advances the scheduler and model runner by one engine step.is_finished()reports whether all queued requests are complete.abort_request(req_id)asks the scheduler to stop a request.exit()tears down model workers and profiling sessions.
Call exit() when embedding Diffulex in a long-running process. The engine also
registers an atexit hook and signal handlers, but explicit shutdown makes
resource ownership clearer in tests and services.
SamplingParams¶
SamplingParams controls generation behavior for each request:
Parameter |
How to set it |
What it does |
|---|---|---|
|
Use |
Controls generation randomness. |
|
Use a positive output-token limit. |
Caps the number of generated tokens. |
|
Use a positive integer, or leave it unset. |
Caps the number of forward evaluations when the strategy supports that limit. |
|
Use a positive integer, or leave it unset. |
Stops generation after a long repeated-token run. |
|
Leave |
Lets generation continue after EOS. |
When set, max_nfe and max_repetition_run must be positive.
Logging Helpers¶
get_logger, setup_logger, and LoggerMixin are re-exported for callers that
want logging behavior consistent with Diffulex internals.
Root Modules¶
Module |
Source |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Subpackages¶
The package map below is intentionally limited to two levels: each page covers
one direct diffulex.* package and lists only its direct children.