User GuideΒΆ

Use these pages as reference material while editing configs, command lines, or deployment settings.

  • Configuration: engine, sampler, benchmark, and runtime parameter reference.

  • Models: model family, strategy, and sampling compatibility.

  • Benchmark: config-first evaluation workflow.

  • Server: HTTP serving commands, request formats, and local demo visualization.

  • Features: focused pages for prefix caching, CUDA Graph, LoRA, parallelism, and kernels.

  • Troubleshooting: common environment, CUDA, config, and serving failures.

Start with Configuration when a command line is unclear. Use Models before combining a new model_name, decoding_strategy, and sampling_mode.