diffulex_kernel

The diffulex_kernel package exposes optional optimized kernel entry points used by attention, KV cache, top-k, and MoE execution paths. The package root is lazy: importing diffulex_kernel does not load Triton or other heavy optional dependencies until a specific kernel symbol is requested.

Public Symbols

  • dllm_chunked_prefill

  • chunked_prefill_attn_unified

  • chunked_prefill_attn_grouped_unified

  • store_kv_cache_unified_layout

  • store_kv_cache_distinct_layout

  • load_kv_cache

  • fused_moe

  • vllm_fused_moe

  • fused_expert_packed

  • fused_topk

  • fused_group_limited_topk

  • fused_grouped_topk

Attention Kernels

dllm_chunked_prefill and chunked_prefill_attn_unified refer to the unified chunked prefill attention implementation. The grouped variant is available as chunked_prefill_attn_grouped_unified. Engine code chooses these paths through strategy-specific model runner and attention metadata logic.

KV Cache Kernels

KV cache helpers support both configured cache layouts:

  • store_kv_cache_unified_layout

  • store_kv_cache_distinct_layout

  • load_kv_cache

Use the layout that matches Config.kv_cache_layout. Mixing layouts between engine configuration and direct kernel calls will produce incorrect cache interpretation.

MoE and Top-k Kernels

Fused MoE and top-k helpers accelerate routing and expert execution for supported models. Available entry points include fused_moe, vllm_fused_moe, fused_expert_packed, fused_topk, and grouped top-k aliases.

Direct Use

Most users should not call these kernels directly. They are lower-level building blocks expected to receive tensors in layouts prepared by Diffulex model runners and layers. Direct calls are useful for focused kernel tests, profiling scripts, and numerical comparisons against reference implementations.

When adding a new kernel, include a focused test under test/python/kernel/ or the relevant third-party kernel test directory.