diffulex.strategy¶
diffulex.strategy contains the built-in decoding strategies. Importing the
package imports each strategy subpackage so its registry decorators run. The
engine then resolves the selected strategy name through the request, scheduler,
KV cache manager, model runner, sampler, and attention metadata registries.
Strategy |
Template family |
Registered components |
|---|---|---|
|
Full-prefix block diffusion |
|
|
Token-merging multi-block diffusion |
|
|
Multi-Block Diffusion |
|
|
DiffusionGemma canvas/block decoding |
|
The package-level helpers keep the currently selected strategy name:
Symbol |
How to use it |
What it does |
|---|---|---|
|
Call from code that needs to inspect the active strategy. |
Returns the current strategy name or |
|
Pass a strategy name before strategy-dependent setup. |
Stores the active decoding strategy globally. |
|
Call during teardown or tests. |
Clears the global strategy name. |
These helpers do not register components by themselves. Registration happens
inside the strategy subpackages through AutoReq, AutoScheduler,
AutoKVCacheManager, and AutoModelRunner decorators.
diffulex.strategy.d2f¶
d2f is the default block-diffusion strategy. It uses the multi-block template
stack and keeps prefix attention full by setting is_prefix_full=True in the
model runner.
Symbol |
How to use it |
What it does |
|---|---|---|
|
Created through |
Inherits |
|
Created through |
Adds requests with |
|
Created through |
Uses multi-block page allocation and append behavior. |
|
Created through |
Prepares chunked multi-block prefill/decode batches, runs the model, samples outputs, and captures multi-block CUDA graphs. |
|
Fetched by the attention backend. |
Extends |
|
Called through the global attention metadata fetch hook. |
Returns the current D2F metadata instance. |
|
Called by the model runner before attention. |
Replaces the current D2F metadata with request/page/sequence tensors for the next forward pass. |
|
Called after a forward pass or capture. |
Restores an empty D2F metadata object. |
Use d2f as the reference when adding a strategy that follows the standard
multi-block request lifecycle without token-merging metadata.
diffulex.strategy.dmax¶
dmax is the built-in token-merging strategy. It starts from the token-merging
multi-block template and adds DMax-specific request activation behavior,
attention metadata, and graph capture buffers for token merge descriptors.
Symbol |
How to use it |
What it does |
|---|---|---|
|
Created through |
Inherits |
|
Created through |
Uses token-merge request state, schedules multi-block work, and postprocesses token-merge sampler output. |
|
Created through |
Uses token-merging multi-block cache behavior on top of the multi-block page allocator. |
|
Created through |
Binds DMax attention metadata, prepares token-merging metadata tensors, and captures token-merging CUDA graphs. |
|
Fetched by the attention backend. |
Extends |
|
Called through the global attention metadata fetch hook. |
Returns the current DMax metadata instance. |
|
Called by the model runner before attention. |
Replaces the current DMax metadata with request/page/sequence tensors for the next forward pass. |
|
Called after a forward pass or capture. |
Restores an empty DMax metadata object. |
DMaxReq also honors DIFFULEX_DMAX_FORCE_PREFILL_ACTIVE=1. That environment
switch keeps active-block iterations on the prefill-style path, which is useful
when comparing against reference DMax behavior.
diffulex.strategy.multi_bd¶
multi_bd is the built-in Multi-Block Diffusion strategy. It shares most of
the same template surface as d2f, but uses block-causal prefix behavior so
prefix caching can remain enabled.
Symbol |
How to use it |
What it does |
|---|---|---|
|
Created through |
Inherits |
|
Created through |
Adds, schedules, preempts, and postprocesses multi-block requests. |
|
Created through |
Uses multi-block cache allocation and append behavior. |
|
Created through |
Prepares multi-block batches and sets attention flags from |
|
Fetched by the attention backend. |
Extends |
|
Called through the global attention metadata fetch hook. |
Returns the current MultiBD metadata instance. |
|
Called by the model runner before attention. |
Replaces the current MultiBD metadata with request/page/sequence tensors for the next forward pass. |
|
Called after a forward pass or capture. |
Restores an empty MultiBD metadata object. |
Use multi_bd as the closer reference when a new strategy needs block-causal
prefix behavior and prefix caching.
diffulex.strategy.diffusion_gemma¶
diffusion_gemma is the native DiffusionGemma strategy. It uses a
DiffusionGemma-specific request, sampler, model runner, and attention metadata
path with 256-token block/page defaults.
Symbol |
How to use it |
What it does |
|---|---|---|
|
Created through |
Tracks DiffusionGemma canvas state and commit timing. |
|
Created through |
Uses the DiffusionGemma request lifecycle with standard scheduling hooks. |
|
Created through |
Uses the DiffusionGemma page/block layout. |
|
Created through |
Prepares DiffusionGemma prefill/decode tensors, self-conditioning context, and model forward calls. |
|
Fetched by the attention backend. |
Stores DiffusionGemma attention metadata for the current forward pass. |