diffulex.attention¶

diffulex.attention is the boundary between strategy-specific attention metadata and the attention kernels used by model layers. Strategy model runners prepare metadata for each engine step, then the attention layer reads that metadata through the package-level fetch hook.

This package should stay small. New decoding strategies usually add their own metadata subclasses under diffulex.strategy.*.attention; shared backend selection and metadata plumbing belong here.

Module	Role
`diffulex.attention.attn_impl`	Implements the `Attention` module and dispatches to reference, Triton, or grouped Triton attention paths.
`diffulex.attention.metadata`	Defines the shared metadata base class and global fetch/warmup helpers used by attention layers.

diffulex.attention.attn_impl¶

This module owns the common attention layer interface used by model implementations. It keeps backend-specific calls behind a single Attention module so model code can pass hidden states, QKV projections, and cache tensors without directly choosing a kernel.

Symbol	Purpose
`Attention`	PyTorch module that selects the configured attention implementation and consumes the current attention metadata.
`reference_torch_attention`	Debug/reference implementation for correctness checks.
`triton_attention`	Optimized attention path for the standard metadata layout.
`triton_grouped_attention`	Optimized grouped attention path for grouped metadata layouts.

Use triton_grouped when measuring throughput or reporting performance. The plain triton and reference paths are retained for compatibility and debugging.

diffulex.attention.metadata¶

This module defines the metadata contract shared by attention layers and strategy model runners. A strategy-specific runner installs a fetch function before execution; the attention layer reads the current metadata through that function during forward passes.

Symbol	Purpose
`AttnMetaDataBase`	Base dataclass for prefill/decode lengths, page tables, slot mapping, context lengths, page size, block size, and cache layout.
`set_fetch_fn_for_attn_metadata`	Installs the strategy-specific metadata fetch function.
`set_warming_up` / `is_warming_up` / `reset_warming_up`	Track CUDA graph warmup state so attention code can distinguish warmup from normal execution.

When adding a new strategy, define the strategy-specific metadata subclass in the strategy package and keep only shared metadata mechanics in this module.