Prometheus Metrics

SwarmLLM exposes a Prometheus-compatible metrics endpoint at GET /metrics. No authentication required (standard convention for metrics endpoints).

Available Metrics

Core Metrics

MetricTypeDescription
swarmllm_peers_connectedgaugeNumber of connected peers
swarmllm_inference_requests_totalcounterTotal inference requests processed
swarmllm_credits_balancegaugeCurrent credit balance
swarmllm_shards_hostedgaugeNumber of locally hosted shards
swarmllm_inference_latency_secondshistogramInference request latency

Channel Metrics

Internal channel health metrics for monitoring backpressure:

MetricTypeDescription
swarmllm_channel_capacity{channel="..."}gaugeChannel buffer capacity
swarmllm_channel_sent_total{channel="..."}counterMessages sent through channel
swarmllm_channel_dropped_total{channel="..."}counterMessages dropped due to backpressure

Histogram Buckets

The latency histogram uses these bucket boundaries (in seconds): 0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, +Inf

Scraping Configuration

Add to your prometheus.yml:

scrape_configs:
  - job_name: "swarmllm"
    static_configs:
      - targets: ["localhost:8800"]

Example Queries

# Request rate (requests per second over 5 minutes)
rate(swarmllm_inference_requests_total[5m])

# P50 latency
histogram_quantile(0.50, rate(swarmllm_inference_latency_seconds_bucket[5m]))

# P99 latency
histogram_quantile(0.99, rate(swarmllm_inference_latency_seconds_bucket[5m]))

# Average latency
rate(swarmllm_inference_latency_seconds_sum[5m]) / rate(swarmllm_inference_latency_seconds_count[5m])

Health Check

GET /health/ready

Readiness probe returning subsystem status. Returns 200 when ready, 503 otherwise. No auth required.

{
  "ready": true,
  "subsystems": {
    "network": true,
    "inference_router": true,
    "api_server": true,
    ...
  }
}