Benchmark¶

Use the benchmark entry point when you want to run evaluation workloads such as GSM8K, HumanEval, or MBPP through diffulex_bench.

Generic form:

python -m diffulex_bench.main \
  --config path/to/config.yml \
  --log-file path/to/run.log \
  --log-level DEBUG

The config file provides the engine and evaluation settings. In the repository, the common pattern is:

--config diffulex_bench/configs/<name>.yml
optional overrides like --model-path, --tokenizer-path, --model-name, --decoding-strategy, --tensor-parallel-size, --data-parallel-size, --dataset, --dataset-limit, --temperature, --max-tokens, and --output-dir

Supported models¶

D2F-LLaDA¶

python -m diffulex_bench.main \
  --config diffulex_bench/configs/llada_instruct_gsm8k.yml \
  --model-path /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \
  --tokenizer-path /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \
  --model-name llada \
  --decoding-strategy d2f \
  --use-lora \
  --lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_LLaDA_Instruct_8B_Lora \
  --tensor-parallel-size 2 \
  --data-parallel-size 1 \
  --dataset gsm8k_diffulex \
  --dataset-limit 100 \
  --temperature 0.0 \
  --max-tokens 256

D2F-Dream¶

python -m diffulex_bench.main \
  --config diffulex_bench/configs/dream_base_gsm8k.yml \
  --model-path /YOUR-CKPT-PATH/Dream-org/Dream-v0-Base-7B \
  --tokenizer-path /YOUR-CKPT-PATH/Dream-org/Dream-v0-Base-7B \
  --model-name dream \
  --decoding-strategy d2f \
  --use-lora \
  --lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_Dream_Base_7B_Lora \
  --tensor-parallel-size 2 \
  --data-parallel-size 1 \
  --dataset gsm8k_diffulex \
  --dataset-limit 100 \
  --temperature 0.0 \
  --max-tokens 256

Fast-dLLM-v2¶

python -m diffulex_bench.main \
  --config diffulex_bench/configs/fast_dllm_v2_gsm8k.yml \
  --model-path /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \
  --tokenizer-path /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \
  --model-name fast_dllm_v2 \
  --decoding-strategy multi_bd \
  --tensor-parallel-size 2 \
  --data-parallel-size 1 \
  --dataset gsm8k_diffulex \
  --dataset-limit 100 \
  --temperature 0.0 \
  --max-tokens 256

SDAR¶

python -m diffulex_bench.main \
  --config diffulex_bench/configs/sdar_chat_gsm8k.yml \
  --model-path /YOUR-CKPT-PATH/JetLM/SDAR-1.7B-Chat-b32 \
  --model-name sdar \
  --decoding-strategy multi_bd \
  --tensor-parallel-size 1 \
  --data-parallel-size 1 \
  --dataset gsm8k_diffulex \
  --temperature 0.0 \
  --max-tokens 256

SDAR-MoE¶

Use the same benchmark entry point and a matching sdar_moe config. The repository already treats sdar_moe as a supported model family; keep the same benchmark structure as SDAR and set the model path to your SDAR-MoE checkpoint.