Diffulex is a Paged Attention-based dLLM accelerated decoding inference framework that is easy to develop and extensible. The design maximizes hiding the complexity of underlying KV Cache management, parallel strategy scheduling, and memory optimization. By providing a clean and unified API interface along with flexible inference strategy configurations (e.g., D2F, Block Diffusion, Fast-dLLM), Diffulex allows developers to focus on model inference logic and business requirements while maintaining production-level inference performance and resource utilization efficiency.
:::{toctree} :maxdepth: 2 :caption: GET STARTED :::
:::{toctree} :maxdepth: 1 :caption: TUTORIALS :::
:::{toctree} :maxdepth: 1 :caption: PROGRAMMING GUIDES :::
:::{toctree} :maxdepth: 1 :caption: DEEP LEARNING OPERATORS :::
:::{toctree} :maxdepth: 1 :caption: COMPILER INTERNALS :::
:::{toctree} :maxdepth: 1 :caption: API Reference :::
:::{toctree} :maxdepth: 1 :caption: Privacy
privacy :::