Watch Diffulex Decode in Motion
From vanilla LLaDA2-Mini to MBD-LLaDA2-Mini to MBD-LLaDA2-Mini-DMax — the same prompts, same model backbone, massive decoding speedup. All on a single A100. See the full progression across four model variants on the videos page.
MBD-LLaDA2-Mini-DMax Demo
This selected trace uses MBD-LLaDA2-Mini-DMax, the fastest model we trained, running on a single NVIDIA A100-SXM4-80GB GPU through the Diffulex engine.
Playback note. The demo videos pass through a Streamlit frontend, which can consume much of the engine-side throughput advantage. Use the aggregate TPS numbers on the Diffulex page to judge the actual engine path.
See All 16 DemosExplore the Project
MBD-LMs spans three parts: demonstrated decoding results, the model-side paradigm, and the inference engine that makes it runnable.
Watch Diffulex in Motion
16 traces across four model variants — LLaDA2-Mini, MBD-LLaDA2, MBD-LLaDA2-DMax, and DiffusionGemma — all generated by the Diffulex inference engine.
MethodMBD-LMs Paradigm
MultiBD formulation, MultiTF post-training, Block Buffer runtime, interactive decode traces, full evaluation results, and throughput analysis.
Inference EngineDiffulex Engine
The runnable inference path for block-style diffusion LMs. GSM8K benchmarks at mainstream-engine throughput, with MultiBD, prefix caching, and CUDA Graph support.