E2E encryption for tensor forwards and control messages
enable_autonat
boolean
true
NAT detection. Disable on WSL2 to reduce noise
enable_dcutr
boolean
true
Hole punching. Disable on WSL2 to reduce noise
tensor_compression
boolean
true
Zstd compression for tensor payloads
prefix_kv_compression
boolean
false
Zstd compression for cross-node prefix-KV snapshot wire frames. Default off — meaningful win on WAN where wire size is the bottleneck; roughly neutral on localhost. Receivers always decompress regardless of this flag.
tensor_compress_level
integer
1
Zstd compression level (1-22, 1 = fastest). Shared between tensor and prefix-KV.
tensor_compress_threshold
integer
1024
Min payload bytes before compression. Shared between tensor and prefix-KV.
Max request batch size. 1 = no batching. When > 1, both local and remote forward requests batch together via BatchForwarder, filling pipeline bubbles in distributed inference
batch_timeout_ms
integer
50
Ms to wait for additional requests before dispatching a partial batch. 0 = dispatch immediately (purely opportunistic batching)
speculative_decoding
boolean
false
Enable speculative decoding
speculative_gamma
integer
4
Draft tokens per verification step
draft_model_path
path
none
Path to draft model
max_split_model_memory_mb
integer
none
Max GPU memory for split model cache
tp_max_latency_ms
integer
10
Max peer latency (ms) for tensor parallelism groups
local_embedding_privacy
boolean
false
Embed tokens locally before sending to first segment. Remote nodes never see raw token IDs
encrypted_pipeline
boolean
false
Force first+last segment to local node (boomerang topology). No remote sees plaintext. Adds ~1 RTT/token. Per-model override via API. Requires shard 0 + final shard locally
Requires --features claude-subscription at build time. Managed via the dashboard or PUT /api/admin/providers.
Option
Type
Default
Description
enabled
boolean
false
Route claude-* model requests through the local CLI
claude_binary
string
"claude"
Path to the claude binary
default_model
string
none
Override model for all requests
max_concurrent
integer
3
Maximum concurrent subprocess invocations
timeout_secs
integer
300
Per-request timeout in seconds
working_dir
string
(temp dir)
Working directory for the subprocess. Empty or "none" uses system temp dir (recommended for API proxy use). Set to a project path for context-aware responses.