# diffulex_kernel The `diffulex_kernel` package exposes optional optimized kernel entry points used by attention, KV cache, top-k, and MoE execution paths. The package root is lazy: importing `diffulex_kernel` does not load Triton or other heavy optional dependencies until a specific kernel symbol is requested. ## Public Symbols - `dllm_chunked_prefill` - `chunked_prefill_attn_unified` - `chunked_prefill_attn_grouped_unified` - `store_kv_cache_unified_layout` - `store_kv_cache_distinct_layout` - `load_kv_cache` - `fused_moe` - `vllm_fused_moe` - `fused_expert_packed` - `fused_topk` - `fused_group_limited_topk` - `fused_grouped_topk` ## Attention Kernels `dllm_chunked_prefill` and `chunked_prefill_attn_unified` refer to the unified chunked prefill attention implementation. The grouped variant is available as `chunked_prefill_attn_grouped_unified`. Engine code chooses these paths through strategy-specific model runner and attention metadata logic. ## KV Cache Kernels KV cache helpers support both configured cache layouts: - `store_kv_cache_unified_layout` - `store_kv_cache_distinct_layout` - `load_kv_cache` Use the layout that matches `Config.kv_cache_layout`. Mixing layouts between engine configuration and direct kernel calls will produce incorrect cache interpretation. ## MoE and Top-k Kernels Fused MoE and top-k helpers accelerate routing and expert execution for supported models. Available entry points include `fused_moe`, `vllm_fused_moe`, `fused_expert_packed`, `fused_topk`, and grouped top-k aliases. ## Direct Use Most users should not call these kernels directly. They are lower-level building blocks expected to receive tensors in layouts prepared by Diffulex model runners and layers. Direct calls are useful for focused kernel tests, profiling scripts, and numerical comparisons against reference implementations. When adding a new kernel, include a focused test under `test/python/kernel/` or the relevant third-party kernel test directory.