# Benchmark Use the benchmark entry point when you want to run evaluation workloads such as GSM8K, HumanEval, or MBPP through `diffulex_bench`. Generic form: ```bash python -m diffulex_bench.main \ --config path/to/config.yml \ --log-file path/to/run.log \ --log-level DEBUG ``` The config file provides the engine and evaluation settings. In the repository, the common pattern is: - `--config diffulex_bench/configs/.yml` - optional overrides like `--model-path`, `--tokenizer-path`, `--model-name`, `--decoding-strategy`, `--tensor-parallel-size`, `--data-parallel-size`, `--dataset`, `--dataset-limit`, `--temperature`, `--max-tokens`, and `--output-dir` ## Supported models ### D2F-LLaDA ```bash python -m diffulex_bench.main \ --config diffulex_bench/configs/llada_instruct_gsm8k.yml \ --model-path /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \ --tokenizer-path /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \ --model-name llada \ --decoding-strategy d2f \ --use-lora \ --lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_LLaDA_Instruct_8B_Lora \ --tensor-parallel-size 2 \ --data-parallel-size 1 \ --dataset gsm8k_diffulex \ --dataset-limit 100 \ --temperature 0.0 \ --max-tokens 256 ``` ### D2F-Dream ```bash python -m diffulex_bench.main \ --config diffulex_bench/configs/dream_base_gsm8k.yml \ --model-path /YOUR-CKPT-PATH/Dream-org/Dream-v0-Base-7B \ --tokenizer-path /YOUR-CKPT-PATH/Dream-org/Dream-v0-Base-7B \ --model-name dream \ --decoding-strategy d2f \ --use-lora \ --lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_Dream_Base_7B_Lora \ --tensor-parallel-size 2 \ --data-parallel-size 1 \ --dataset gsm8k_diffulex \ --dataset-limit 100 \ --temperature 0.0 \ --max-tokens 256 ``` ### Fast-dLLM-v2 ```bash python -m diffulex_bench.main \ --config diffulex_bench/configs/fast_dllm_v2_gsm8k.yml \ --model-path /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \ --tokenizer-path /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \ --model-name fast_dllm_v2 \ --decoding-strategy multi_bd \ --tensor-parallel-size 2 \ --data-parallel-size 1 \ --dataset gsm8k_diffulex \ --dataset-limit 100 \ --temperature 0.0 \ --max-tokens 256 ``` ### SDAR ```bash python -m diffulex_bench.main \ --config diffulex_bench/configs/sdar_chat_gsm8k.yml \ --model-path /YOUR-CKPT-PATH/JetLM/SDAR-1.7B-Chat-b32 \ --model-name sdar \ --decoding-strategy multi_bd \ --tensor-parallel-size 1 \ --data-parallel-size 1 \ --dataset gsm8k_diffulex \ --temperature 0.0 \ --max-tokens 256 ``` ### SDAR-MoE Use the same benchmark entry point and a matching `sdar_moe` config. The repository already treats `sdar_moe` as a supported model family; keep the same benchmark structure as SDAR and set the model path to your SDAR-MoE checkpoint.