# diffulex The `diffulex` package is the public Python entry point for in-process inference. Importing the package is intentionally lightweight: the root module uses lazy imports so that `from diffulex import Diffulex, SamplingParams` does not eagerly import the full engine, CUDA kernels, or model stack until the engine is constructed. ## Public Symbols - `Diffulex` - `SamplingParams` - `get_logger` - `setup_logger` - `LoggerMixin` ## Diffulex `Diffulex` is a thin public alias for `diffulex.engine.engine.DiffulexEngine`. Constructing it validates the engine configuration, loads tokenizer metadata, initializes the model runner, and creates the strategy-specific scheduler. ```python from diffulex import Diffulex llm = Diffulex( model="/path/to/model", model_name="llada", decoding_strategy="d2f", tensor_parallel_size=1, data_parallel_size=1, ) ``` The `model` argument must point to a local model directory. Most keyword arguments are passed through to `diffulex.config.Config`; unsupported keyword arguments are ignored by the engine constructor. ## Generation Use `generate` for offline batched inference: ```python from diffulex import SamplingParams outputs = llm.generate( ["Solve 2 + 2."], SamplingParams(temperature=0.0, max_tokens=32), ) for item in outputs.trajectories: print(item.text) ``` `generate` accepts a list of strings or a list of token ID lists. When prompts are strings, the engine tokenizes them with the model tokenizer. The return value records generated text, token IDs, request trajectories, and timing data. ## Request Lifecycle Lower-level callers can use the step API: - `add_request(prompt, sampling_params)` adds a request and returns its request ID. - `step()` advances the scheduler and model runner by one engine step. - `is_finished()` reports whether all queued requests are complete. - `abort_request(req_id)` asks the scheduler to stop a request. - `exit()` tears down model workers and profiling sessions. Call `exit()` when embedding Diffulex in a long-running process. The engine also registers an `atexit` hook and signal handlers, but explicit shutdown makes resource ownership clearer in tests and services. ## SamplingParams `SamplingParams` controls generation behavior for each request: | Parameter | How to set it | What it does | | --- | --- | --- | | `temperature` | Use `0.0` for deterministic runs, or a higher float for sampling. | Controls generation randomness. | | `max_tokens` | Use a positive output-token limit. | Caps the number of generated tokens. | | `max_nfe` | Use a positive integer, or leave it unset. | Caps the number of forward evaluations when the strategy supports that limit. | | `max_repetition_run` | Use a positive integer, or leave it unset. | Stops generation after a long repeated-token run. | | `ignore_eos` | Leave `False` for normal generation; set `True` only when a task requires it. | Lets generation continue after EOS. | When set, `max_nfe` and `max_repetition_run` must be positive. ## Logging Helpers `get_logger`, `setup_logger`, and `LoggerMixin` are re-exported for callers that want logging behavior consistent with Diffulex internals. ## Root Modules | Module | Source | | --- | --- | | `diffulex.config` | `diffulex/config.py` | | `diffulex.diffulex` | `diffulex/diffulex.py` | | `diffulex.logger` | `diffulex/logger.py` | | `diffulex.profiling` | `diffulex/profiling.py` | | `diffulex.sampling_params` | `diffulex/sampling_params.py` | | `diffulex.vllm_compat` | `diffulex/vllm_compat.py` | ## Subpackages The package map below is intentionally limited to two levels: each page covers one direct `diffulex.*` package and lists only its direct children. :::{toctree} :maxdepth: 1 diffulex.attention diffulex.distributed diffulex.engine diffulex.layer diffulex.mixin diffulex.model diffulex.moe diffulex.sampler diffulex.server diffulex.strategy diffulex.strategy.templates diffulex.utils :::