Add a Decoding Strategy¶
Diffulex selects strategy-specific engine components through registries. A new
strategy usually registers these pieces under one decoding_strategy name:
request state;
scheduler;
KV cache manager;
model runner;
attention metadata.
Use this guide when the built-in d2f, multi_bd, dmax, and
diffusion_gemma strategies do not match the decoding behavior you need.
Start from Current Templates¶
Current reusable pieces live in these places:
Area |
Path |
|---|---|
Core multi-block request/scheduler/cache/runner aliases |
|
Multi-block runner helpers and attention metadata mixin |
|
Token-merge templates |
|
Dual-cache extension points |
|
Use multi_bd or d2f as the smallest references for normal multi-block
strategies. Use dmax as the reference when attention needs token-merge
metadata. Use diffusion_gemma as the reference for a model-specific strategy
with custom request and sampler semantics.
Directory Layout¶
Create a package under diffulex/strategy/<strategy_name>/:
diffulex/strategy/my_strategy/
__init__.py
attention/
metadata.py
engine/
kv_cache_manager.py
model_runner.py
request.py
scheduler.py
Import the package from diffulex/strategy/__init__.py so registry decorators
run during engine startup.
Request¶
Register a request class with AutoReq:
from diffulex.config import Config
from diffulex.engine.request import AutoReq, MultiBlockReqTemplate
from diffulex.sampling_params import SamplingParams
@AutoReq.register("my_strategy")
class MyStrategyReq(MultiBlockReqTemplate):
def __init__(
self,
token_ids: list[int],
sampling_params: SamplingParams = SamplingParams(),
config: Config | None = None,
):
super().__init__(token_ids, sampling_params, config)
For normal multi-block behavior, the base request is enough. Add fields only when the strategy needs additional per-request state.
Scheduler¶
Register a scheduler with AutoScheduler:
from diffulex.config import Config
from diffulex.engine.scheduler import AutoScheduler, MultiBlockSchedulerTemplate
@AutoScheduler.register("my_strategy")
class MyStrategyScheduler(MultiBlockSchedulerTemplate):
def __init__(self, config: Config):
super().__init__(config)
MultiBlockSchedulerTemplate is a compatibility alias for the core block-aware
scheduler. Override add, schedule, preempt, or postprocess only when the
default lifecycle is not correct for the new strategy.
KV Cache Manager¶
Register a KV cache manager with AutoKVCacheManager:
from diffulex.config import Config
from diffulex.engine.kv_cache_manager import AutoKVCacheManager, MultiBlockKVCacheManagerTemplate
@AutoKVCacheManager.register("my_strategy")
class MyStrategyKVCacheManager(MultiBlockKVCacheManagerTemplate):
def __init__(self, config: Config):
super().__init__(config)
Override this class when cache growth, prefix reuse, page hashing, or append rules differ from the standard multi-block manager.
Attention Metadata¶
The model runner sets the global attention metadata fetch function used by the attention layers. A minimal metadata module can extend the multi-block metadata mixin:
from dataclasses import dataclass
from diffulex.attention.metadata import AttnMetaDataBase
from diffulex.mixin.multi_block.attention_metadata import MultiBlockAttnMetaDataMixin
@dataclass
class MyStrategyAttnMetaData(MultiBlockAttnMetaDataMixin, AttnMetaDataBase):
def __post_init__(self):
self.init_multi_block()
MY_STRATEGY_ATTN_METADATA = MyStrategyAttnMetaData()
def fetch_my_strategy_attn_metadata() -> MyStrategyAttnMetaData:
return MY_STRATEGY_ATTN_METADATA
def set_my_strategy_attn_metadata(**kwargs) -> None:
global MY_STRATEGY_ATTN_METADATA
MY_STRATEGY_ATTN_METADATA = MyStrategyAttnMetaData(**kwargs)
def reset_my_strategy_attn_metadata() -> None:
global MY_STRATEGY_ATTN_METADATA
MY_STRATEGY_ATTN_METADATA = MyStrategyAttnMetaData()
Use a separate metadata object when the strategy changes attention layout, prefix handling, or page table interpretation.
Model Runner¶
Register a model runner with AutoModelRunner:
from multiprocessing.synchronize import Event
from diffulex.attention.metadata import set_fetch_fn_for_attn_metadata
from diffulex.config import Config
from diffulex.engine.model_runner import AutoModelRunner, MultiBlockModelRunnerTemplate
from diffulex.strategy.my_strategy.attention.metadata import (
fetch_my_strategy_attn_metadata,
reset_my_strategy_attn_metadata,
set_my_strategy_attn_metadata,
)
@AutoModelRunner.register("my_strategy")
class MyStrategyModelRunner(MultiBlockModelRunnerTemplate):
def __init__(self, config: Config, rank: int, event: Event | list[Event]):
set_fetch_fn_for_attn_metadata(fetch_my_strategy_attn_metadata)
self.init_attn_metadata_fn(
set_my_strategy_attn_metadata,
reset_my_strategy_attn_metadata,
fetch_my_strategy_attn_metadata,
)
super().__init__(config, rank, event)
The base runner already provides normal multi-block prepare, run, and graph capture behavior. Override runner methods only when the strategy needs custom tensor preparation, model execution, sampler interaction, or graph capture.
Export and Import¶
Export the registered classes from the strategy package __init__.py:
from .engine.kv_cache_manager import MyStrategyKVCacheManager
from .engine.model_runner import MyStrategyModelRunner
from .engine.request import MyStrategyReq
from .engine.scheduler import MyStrategyScheduler
__all__ = [
"MyStrategyKVCacheManager",
"MyStrategyModelRunner",
"MyStrategyReq",
"MyStrategyScheduler",
]
Then import the package from diffulex/strategy/__init__.py.
Config Validation¶
Add validation in diffulex.config.Config only when the strategy would
otherwise run in an invalid state.
Examples already in the config:
Strategy |
Config behavior |
|---|---|
|
Forces |
|
Forces |
|
Forces |
|
Uses DiffusionGemma-specific block/page/sampler defaults. |
Verification Checklist¶
Before opening a pull request:
Import
diffulex.strategyand confirm the new package imports.Construct a
Configwith the newdecoding_strategy.Run one tiny in-process generation.
Run a benchmark with
--dataset-limit.Add focused tests for request state, scheduler behavior, sampler output, or attention metadata.