User GuideΒΆ
Use these pages as reference material while editing configs, command lines, or deployment settings.
Configuration: engine, sampler, benchmark, and runtime parameter reference.
Models: model family, strategy, and sampling compatibility.
Benchmark: config-first evaluation workflow.
Server: HTTP serving commands, request formats, and local demo visualization.
Features: focused pages for prefix caching, CUDA Graph, LoRA, parallelism, and kernels.
Troubleshooting: common environment, CUDA, config, and serving failures.
Start with Configuration when a command line is unclear.
Use Models before combining a new model_name,
decoding_strategy, and sampling_mode.