Benchmark¶
Use the benchmark entry point when you want to run evaluation workloads such as GSM8K, HumanEval, or MBPP through diffulex_bench.
Generic form:
python -m diffulex_bench.main \
--config path/to/config.yml \
--log-file path/to/run.log \
--log-level DEBUG
The config file provides the engine and evaluation settings. In the repository, the common pattern is:
--config diffulex_bench/configs/<name>.ymloptional overrides like
--model-path,--tokenizer-path,--model-name,--decoding-strategy,--tensor-parallel-size,--data-parallel-size,--dataset,--dataset-limit,--temperature,--max-tokens, and--output-dir
Supported models¶
D2F-LLaDA¶
python -m diffulex_bench.main \
--config diffulex_bench/configs/llada_instruct_gsm8k.yml \
--model-path /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \
--tokenizer-path /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \
--model-name llada \
--decoding-strategy d2f \
--use-lora \
--lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_LLaDA_Instruct_8B_Lora \
--tensor-parallel-size 2 \
--data-parallel-size 1 \
--dataset gsm8k_diffulex \
--dataset-limit 100 \
--temperature 0.0 \
--max-tokens 256
D2F-Dream¶
python -m diffulex_bench.main \
--config diffulex_bench/configs/dream_base_gsm8k.yml \
--model-path /YOUR-CKPT-PATH/Dream-org/Dream-v0-Base-7B \
--tokenizer-path /YOUR-CKPT-PATH/Dream-org/Dream-v0-Base-7B \
--model-name dream \
--decoding-strategy d2f \
--use-lora \
--lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_Dream_Base_7B_Lora \
--tensor-parallel-size 2 \
--data-parallel-size 1 \
--dataset gsm8k_diffulex \
--dataset-limit 100 \
--temperature 0.0 \
--max-tokens 256
Fast-dLLM-v2¶
python -m diffulex_bench.main \
--config diffulex_bench/configs/fast_dllm_v2_gsm8k.yml \
--model-path /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \
--tokenizer-path /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \
--model-name fast_dllm_v2 \
--decoding-strategy multi_bd \
--tensor-parallel-size 2 \
--data-parallel-size 1 \
--dataset gsm8k_diffulex \
--dataset-limit 100 \
--temperature 0.0 \
--max-tokens 256
SDAR¶
python -m diffulex_bench.main \
--config diffulex_bench/configs/sdar_chat_gsm8k.yml \
--model-path /YOUR-CKPT-PATH/JetLM/SDAR-1.7B-Chat-b32 \
--model-name sdar \
--decoding-strategy multi_bd \
--tensor-parallel-size 1 \
--data-parallel-size 1 \
--dataset gsm8k_diffulex \
--temperature 0.0 \
--max-tokens 256
SDAR-MoE¶
Use the same benchmark entry point and a matching sdar_moe config. The repository already treats sdar_moe as a supported model family; keep the same benchmark structure as SDAR and set the model path to your SDAR-MoE checkpoint.