diffulex

The diffulex package is the public Python entry point for in-process inference. Importing the package is intentionally lightweight: the root module uses lazy imports so that from diffulex import Diffulex, SamplingParams does not eagerly import the full engine, CUDA kernels, or model stack until the engine is constructed.

Public Symbols

  • Diffulex

  • SamplingParams

  • get_logger

  • setup_logger

  • LoggerMixin

Diffulex

Diffulex is a thin public alias for diffulex.engine.engine.DiffulexEngine. Constructing it validates the engine configuration, loads tokenizer metadata, initializes the model runner, and creates the strategy-specific scheduler.

from diffulex import Diffulex

llm = Diffulex(
    model="/path/to/model",
    model_name="llada",
    decoding_strategy="d2f",
    tensor_parallel_size=1,
    data_parallel_size=1,
)

The model argument must point to a local model directory. Most keyword arguments are passed through to diffulex.config.Config; unsupported keyword arguments are ignored by the engine constructor.

Generation

Use generate for offline batched inference:

from diffulex import SamplingParams

outputs = llm.generate(
    ["Solve 2 + 2."],
    SamplingParams(temperature=0.0, max_tokens=32),
)

for item in outputs.trajectories:
    print(item.text)

generate accepts a list of strings or a list of token ID lists. When prompts are strings, the engine tokenizes them with the model tokenizer. The return value records generated text, token IDs, request trajectories, and timing data.

Request Lifecycle

Lower-level callers can use the step API:

  • add_request(prompt, sampling_params) adds a request and returns its request ID.

  • step() advances the scheduler and model runner by one engine step.

  • is_finished() reports whether all queued requests are complete.

  • abort_request(req_id) asks the scheduler to stop a request.

  • exit() tears down model workers and profiling sessions.

Call exit() when embedding Diffulex in a long-running process. The engine also registers an atexit hook and signal handlers, but explicit shutdown makes resource ownership clearer in tests and services.

SamplingParams

SamplingParams controls generation behavior for each request:

Parameter

How to set it

What it does

temperature

Use 0.0 for deterministic runs, or a higher float for sampling.

Controls generation randomness.

max_tokens

Use a positive output-token limit.

Caps the number of generated tokens.

max_nfe

Use a positive integer, or leave it unset.

Caps the number of forward evaluations when the strategy supports that limit.

max_repetition_run

Use a positive integer, or leave it unset.

Stops generation after a long repeated-token run.

ignore_eos

Leave False for normal generation; set True only when a task requires it.

Lets generation continue after EOS.

When set, max_nfe and max_repetition_run must be positive.

Logging Helpers

get_logger, setup_logger, and LoggerMixin are re-exported for callers that want logging behavior consistent with Diffulex internals.

Root Modules

Module

Source

diffulex.config

diffulex/config.py

diffulex.diffulex

diffulex/diffulex.py

diffulex.logger

diffulex/logger.py

diffulex.profiling

diffulex/profiling.py

diffulex.sampling_params

diffulex/sampling_params.py

diffulex.vllm_compat

diffulex/vllm_compat.py

Subpackages

The package map below is intentionally limited to two levels: each page covers one direct diffulex.* package and lists only its direct children.