Multi-Block Diffusion Language Models

Featured Demo

Watch Diffulex Decode in Motion

From vanilla LLaDA2-Mini to MBD-LLaDA2-Mini to MBD-LLaDA2-Mini-DMax — the same prompts, same model backbone, massive decoding speedup. All on a single A100. See the full progression across four model variants on the videos page.

Featured Diffulex trace

MBD-LLaDA2-Mini-DMax Demo

This selected trace uses MBD-LLaDA2-Mini-DMax, the fastest model we trained, running on a single NVIDIA A100-SXM4-80GB GPU through the Diffulex engine.

Playback note. The demo videos pass through a Streamlit frontend, which can consume much of the engine-side throughput advantage. Use the aggregate TPS numbers on the Diffulex page to judge the actual engine path.

See All 16 Demos

Explore the Project

MBD-LMs spans three parts: demonstrated decoding results, the model-side paradigm, and the inference engine that makes it runnable.

Demo Videos

Multi-Block Diffusion Language Models

Watch Diffulex Decode in Motion

MBD-LLaDA2-Mini-DMax Demo

Explore the Project

Watch Diffulex in Motion

MBD-LMs Paradigm

Diffulex Engine