# diffulex.strategy `diffulex.strategy` contains the built-in decoding strategies. Importing the package imports each strategy subpackage so its registry decorators run. The engine then resolves the selected strategy name through the request, scheduler, KV cache manager, model runner, sampler, and attention metadata registries. | Strategy | Template family | Registered components | | --- | --- | --- | | `d2f` | Full-prefix block diffusion | `D2fReq`, `D2fScheduler`, `D2fKVCacheManager`, `D2fModelRunner`, `D2fAttnMetaData` | | `dmax` | Token-merging multi-block diffusion | `DMaxReq`, `DMaxScheduler`, `DMaxKVCacheManager`, `DMaxModelRunner`, `DMaxAttnMetaData` | | `multi_bd` | Multi-Block Diffusion | `MultiBDReq`, `MultiBDScheduler`, `MultiBDKVCacheManager`, `MultiBDModelRunner`, `MultiBDAttnMetaData` | | `diffusion_gemma` | DiffusionGemma canvas/block decoding | `DiffusionGemmaReq`, `DiffusionGemmaScheduler`, `DiffusionGemmaKVCacheManager`, `DiffusionGemmaModelRunner`, `DiffusionGemmaAttnMetaData` | The package-level helpers keep the currently selected strategy name: | Symbol | How to use it | What it does | | --- | --- | --- | | `fetch_decoding_strategy` | Call from code that needs to inspect the active strategy. | Returns the current strategy name or `None`. | | `set_decoding_strategy` | Pass a strategy name before strategy-dependent setup. | Stores the active decoding strategy globally. | | `reset_decoding_strategy` | Call during teardown or tests. | Clears the global strategy name. | These helpers do not register components by themselves. Registration happens inside the strategy subpackages through `AutoReq`, `AutoScheduler`, `AutoKVCacheManager`, and `AutoModelRunner` decorators. ## diffulex.strategy.d2f `d2f` is the default block-diffusion strategy. It uses the multi-block template stack and keeps prefix attention full by setting `is_prefix_full=True` in the model runner. | Symbol | How to use it | What it does | | --- | --- | --- | | `D2fReq` | Created through `AutoReq.create(..., decoding_strategy="d2f")`. | Inherits `MultiBlockReqTemplate`; strategy-specific state is initialized later by the scheduler. | | `D2fScheduler` | Created through `AutoScheduler`. | Adds requests with `init_multi_block`, schedules prefill/decode blocks, handles preemption, and postprocesses accepted token writes. | | `D2fKVCacheManager` | Created through `AutoKVCacheManager`. | Uses multi-block page allocation and append behavior. | | `D2fModelRunner` | Created through `AutoModelRunner`. | Prepares chunked multi-block prefill/decode batches, runs the model, samples outputs, and captures multi-block CUDA graphs. | | `D2fAttnMetaData` | Fetched by the attention backend. | Extends `MultiBlockAttnMetaDataTemplate` and stores per-batch multi-block attention fields. | | `fetch_d2f_attn_metadata` | Called through the global attention metadata fetch hook. | Returns the current D2F metadata instance. | | `set_d2f_attn_metadata` | Called by the model runner before attention. | Replaces the current D2F metadata with request/page/sequence tensors for the next forward pass. | | `reset_d2f_attn_metadata` | Called after a forward pass or capture. | Restores an empty D2F metadata object. | Use `d2f` as the reference when adding a strategy that follows the standard multi-block request lifecycle without token-merging metadata. ## diffulex.strategy.dmax `dmax` is the built-in token-merging strategy. It starts from the token-merging multi-block template and adds DMax-specific request activation behavior, attention metadata, and graph capture buffers for token merge descriptors. | Symbol | How to use it | What it does | | --- | --- | --- | | `DMaxReq` | Created through `AutoReq.create(..., decoding_strategy="dmax")`. | Inherits `TokenMergingMultiBlockReqTemplate` and initializes token-merge state from `Config`. | | `DMaxScheduler` | Created through `AutoScheduler`. | Uses token-merge request state, schedules multi-block work, and postprocesses token-merge sampler output. | | `DMaxKVCacheManager` | Created through `AutoKVCacheManager`. | Uses token-merging multi-block cache behavior on top of the multi-block page allocator. | | `DMaxModelRunner` | Created through `AutoModelRunner`. | Binds DMax attention metadata, prepares token-merging metadata tensors, and captures token-merging CUDA graphs. | | `DMaxAttnMetaData` | Fetched by the attention backend. | Extends `TokenMergingMultiBlockAttnMetaDataTemplate` and resets token-merging fields on construction. | | `fetch_dmax_attn_metadata` | Called through the global attention metadata fetch hook. | Returns the current DMax metadata instance. | | `set_dmax_attn_metadata` | Called by the model runner before attention. | Replaces the current DMax metadata with request/page/sequence tensors for the next forward pass. | | `reset_dmax_attn_metadata` | Called after a forward pass or capture. | Restores an empty DMax metadata object. | `DMaxReq` also honors `DIFFULEX_DMAX_FORCE_PREFILL_ACTIVE=1`. That environment switch keeps active-block iterations on the prefill-style path, which is useful when comparing against reference DMax behavior. ## diffulex.strategy.multi_bd `multi_bd` is the built-in Multi-Block Diffusion strategy. It shares most of the same template surface as `d2f`, but uses block-causal prefix behavior so prefix caching can remain enabled. | Symbol | How to use it | What it does | | --- | --- | --- | | `MultiBDReq` | Created through `AutoReq.create(..., decoding_strategy="multi_bd")`. | Inherits `MultiBlockReqTemplate`; request attributes are filled by `init_multi_block`. | | `MultiBDScheduler` | Created through `AutoScheduler`. | Adds, schedules, preempts, and postprocesses multi-block requests. | | `MultiBDKVCacheManager` | Created through `AutoKVCacheManager`. | Uses multi-block cache allocation and append behavior. | | `MultiBDModelRunner` | Created through `AutoModelRunner`. | Prepares multi-block batches and sets attention flags from `Config`, including DiffusionGemma-specific prefix handling. | | `MultiBDAttnMetaData` | Fetched by the attention backend. | Extends `MultiBlockAttnMetaDataTemplate` for the MultiBD strategy. | | `fetch_multi_bd_attn_metadata` | Called through the global attention metadata fetch hook. | Returns the current MultiBD metadata instance. | | `set_multi_bd_attn_metadata` | Called by the model runner before attention. | Replaces the current MultiBD metadata with request/page/sequence tensors for the next forward pass. | | `reset_multi_bd_attn_metadata` | Called after a forward pass or capture. | Restores an empty MultiBD metadata object. | Use `multi_bd` as the closer reference when a new strategy needs block-causal prefix behavior and prefix caching. ## diffulex.strategy.diffusion_gemma `diffusion_gemma` is the native DiffusionGemma strategy. It uses a DiffusionGemma-specific request, sampler, model runner, and attention metadata path with 256-token block/page defaults. | Symbol | How to use it | What it does | | --- | --- | --- | | `DiffusionGemmaReq` | Created through `AutoReq.create(..., decoding_strategy="diffusion_gemma")`. | Tracks DiffusionGemma canvas state and commit timing. | | `DiffusionGemmaScheduler` | Created through `AutoScheduler`. | Uses the DiffusionGemma request lifecycle with standard scheduling hooks. | | `DiffusionGemmaKVCacheManager` | Created through `AutoKVCacheManager`. | Uses the DiffusionGemma page/block layout. | | `DiffusionGemmaModelRunner` | Created through `AutoModelRunner`. | Prepares DiffusionGemma prefill/decode tensors, self-conditioning context, and model forward calls. | | `DiffusionGemmaAttnMetaData` | Fetched by the attention backend. | Stores DiffusionGemma attention metadata for the current forward pass. |