# diffulex_kernel

The `diffulex_kernel` package exposes optional optimized kernel entry points
used by attention, KV cache, top-k, and MoE execution paths. The package root is
lazy: importing `diffulex_kernel` does not load Triton or other heavy optional
dependencies until a specific kernel symbol is requested.

## Public Symbols

- `dllm_chunked_prefill`
- `chunked_prefill_attn_unified`
- `chunked_prefill_attn_grouped_unified`
- `store_kv_cache_unified_layout`
- `store_kv_cache_distinct_layout`
- `load_kv_cache`
- `fused_moe`
- `vllm_fused_moe`
- `fused_expert_packed`
- `fused_topk`
- `fused_group_limited_topk`
- `fused_grouped_topk`

## Attention Kernels

`dllm_chunked_prefill` and `chunked_prefill_attn_unified` refer to the unified
chunked prefill attention implementation. The grouped variant is available as
`chunked_prefill_attn_grouped_unified`. Engine code chooses these paths through
strategy-specific model runner and attention metadata logic.

## KV Cache Kernels

KV cache helpers support both configured cache layouts:

- `store_kv_cache_unified_layout`
- `store_kv_cache_distinct_layout`
- `load_kv_cache`

Use the layout that matches `Config.kv_cache_layout`. Mixing layouts between
engine configuration and direct kernel calls will produce incorrect cache
interpretation.

## MoE and Top-k Kernels

Fused MoE and top-k helpers accelerate routing and expert execution for
supported models. Available entry points include `fused_moe`,
`vllm_fused_moe`, `fused_expert_packed`, `fused_topk`, and grouped top-k
aliases.

## Direct Use

Most users should not call these kernels directly. They are lower-level
building blocks expected to receive tensors in layouts prepared by Diffulex
model runners and layers. Direct calls are useful for focused kernel tests,
profiling scripts, and numerical comparisons against reference implementations.

When adding a new kernel, include a focused test under `test/python/kernel/` or
the relevant third-party kernel test directory.