# Prefix Caching

Prefix caching reuses compatible prefix KV cache state across requests. It is a
capacity and latency optimization for workloads with shared prompts or repeated
prefixes.

## Configuration

`enable_prefix_caching` controls whether compatible strategies may use prefix
caching.

The value is boolean and defaults to `True`. Strategy normalization still has
the final say: `decoding_strategy="d2f"` forces prefix caching off, while
`multi_bd` and `dmax` leave it enabled when the rest of the cache layout is
compatible.

| Surface | How to set it | Notes |
| --- | --- | --- |
| Server CLI | Prefix caching is enabled by default; add `--disable-prefix-caching` to turn it off. | Useful when debugging cache layout or request state. |
| Benchmark CLI | Use `--enable-prefix-caching` or `--no-enable-prefix-caching`. | Makes prefix-cache behavior explicit in experiment commands. |

## When to Disable It

Disable prefix caching while debugging cache layout, request state, or strategy
changes. Once correctness is stable, re-enable it for throughput and latency
checks.

## Related Arguments

| Surface | Names | Notes |
| --- | --- | --- |
| Python/config | `enable_prefix_caching` | Main config field before strategy normalization. |
| CLI | `--disable-prefix-caching`, `--enable-prefix-caching`, `--no-enable-prefix-caching` | Server and benchmark CLIs expose the toggle with different flag names. |
| Related config | `decoding_strategy`, `multi_block_prefix_full`, `kv_cache_layout` | Strategy and cache layout decide whether prefix reuse is actually compatible. |