Fused MoE

Fused MoE paths accelerate routing and expert execution for supported Mixture-of-Experts models. The public tuning surface focuses on the GEMM implementation used inside the validated single-node MoE path.

GEMM Implementation

moe_gemm_impl can be triton, vllm, vllm_modular, or naive. The default is triton.

Use naive for debugging and optimized implementations for performance checks.