# diffulex.attention `diffulex.attention` is the boundary between strategy-specific attention metadata and the attention kernels used by model layers. Strategy model runners prepare metadata for each engine step, then the attention layer reads that metadata through the package-level fetch hook. This package should stay small. New decoding strategies usually add their own metadata subclasses under `diffulex.strategy.*.attention`; shared backend selection and metadata plumbing belong here. | Module | Role | | --- | --- | | `diffulex.attention.attn_impl` | Implements the `Attention` module and dispatches to reference, Triton, or grouped Triton attention paths. | | `diffulex.attention.metadata` | Defines the shared metadata base class and global fetch/warmup helpers used by attention layers. | ## diffulex.attention.attn_impl This module owns the common attention layer interface used by model implementations. It keeps backend-specific calls behind a single `Attention` module so model code can pass hidden states, QKV projections, and cache tensors without directly choosing a kernel. | Symbol | Purpose | | --- | --- | | `Attention` | PyTorch module that selects the configured attention implementation and consumes the current attention metadata. | | `reference_torch_attention` | Debug/reference implementation for correctness checks. | | `triton_attention` | Optimized attention path for the standard metadata layout. | | `triton_grouped_attention` | Optimized grouped attention path for grouped metadata layouts. | Use `triton_grouped` when measuring throughput or reporting performance. The plain `triton` and reference paths are retained for compatibility and debugging. ## diffulex.attention.metadata This module defines the metadata contract shared by attention layers and strategy model runners. A strategy-specific runner installs a fetch function before execution; the attention layer reads the current metadata through that function during forward passes. | Symbol | Purpose | | --- | --- | | `AttnMetaDataBase` | Base dataclass for prefill/decode lengths, page tables, slot mapping, context lengths, page size, block size, and cache layout. | | `set_fetch_fn_for_attn_metadata` | Installs the strategy-specific metadata fetch function. | | `set_warming_up` / `is_warming_up` / `reset_warming_up` | Track CUDA graph warmup state so attention code can distinguish warmup from normal execution. | When adding a new strategy, define the strategy-specific metadata subclass in the strategy package and keep only shared metadata mechanics in this module.