# Server Use the HTTP server entry point when you want an interactive service instead of a benchmark run. Generic form: ```bash python -m diffulex.server.launch \ --model /path/to/model \ --model-name \ --decoding-strategy \ --tensor-parallel-size 1 \ --data-parallel-size 1 \ --max-model-len 2048 \ --max-num-batched-tokens 4096 \ --max-num-reqs 128 \ --gpu-memory-utilization 0.9 ``` The server process accepts the same core engine arguments as the benchmark path, plus HTTP-specific flags: - `--host` - `--port` - `--log-level` - `--device-ids` - `--zmq-command-addr` - `--zmq-event-addr` ## Supported models ### Fast-dLLM-v2 ```bash python -m diffulex.server.launch \ --model /YOUR-CKPT-PATH/Efficient-Large-Model/Fast_dLLM_v2_7B \ --model-name fast_dllm_v2 \ --decoding-strategy multi_bd \ --sampling-mode naive \ --tensor-parallel-size 2 \ --data-parallel-size 1 \ --max-model-len 1024 \ --max-num-batched-tokens 1024 \ --max-num-reqs 24 \ --gpu-memory-utilization 0.4 \ --block-size 32 \ --buffer-size 1 \ --accept-threshold 0.95 \ --semi-complete-threshold 0.9 \ --add-block-threshold 0.1 \ --enforce-eager ``` ### D2F-LLaDA ```bash python -m diffulex.server.launch \ --model /YOUR-CKPT-PATH/GSAI-ML/LLaDA-8B-Instruct \ --model-name llada \ --decoding-strategy d2f \ --tensor-parallel-size 2 \ --data-parallel-size 1 \ --use-lora \ --lora-path /YOUR-CKPT-PATH/SJTU-DENG-Lab/D2F_LLaDA_Instruct_8B_Lora \ --pre-merge-lora \ --max-model-len 2048 \ --max-num-batched-tokens 2048 \ --max-num-reqs 32 \ --accept-threshold 0.95 \ --semi-complete-threshold 0.9 \ --add-block-threshold 0.1 \ --enforce-eager ``` ### SDAR ```bash python -m diffulex.server.launch \ --model /YOUR-CKPT-PATH/JetLM/SDAR-1.7B-Chat-b32 \ --host 0.0.0.0 \ --port 8000 \ --model-name sdar \ --decoding-strategy multi_bd \ --tensor-parallel-size 1 \ --data-parallel-size 1 \ --device-ids 1 \ --block-size 32 \ --buffer-size 4 \ --page-size 32 \ --max-num-batched-tokens 4096 \ --max-num-reqs 128 \ --max-model-len 2048 \ --gpu-memory-utilization 0.5 \ --kv-cache-layout unified \ --add-block-threshold 0.1 \ --semi-complete-threshold 0.9 \ --accept-threshold 0.95 ```