Prometheus Metrics
SwarmLLM exposes a Prometheus-compatible metrics endpoint at GET /metrics. No authentication required (standard convention for metrics endpoints).
Available Metrics
Core Metrics
| Metric | Type | Description |
|---|---|---|
swarmllm_peers_connected | gauge | Number of connected peers |
swarmllm_inference_requests_total | counter | Total inference requests processed |
swarmllm_credits_balance | gauge | Current credit balance |
swarmllm_shards_hosted | gauge | Number of locally hosted shards |
swarmllm_inference_latency_seconds | histogram | Inference request latency |
Channel Metrics
Internal channel health metrics for monitoring backpressure:
| Metric | Type | Description |
|---|---|---|
swarmllm_channel_capacity{channel="..."} | gauge | Channel buffer capacity |
swarmllm_channel_sent_total{channel="..."} | counter | Messages sent through channel |
swarmllm_channel_dropped_total{channel="..."} | counter | Messages dropped due to backpressure |
Histogram Buckets
The latency histogram uses these bucket boundaries (in seconds):
0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, +Inf
Scraping Configuration
Add to your prometheus.yml:
scrape_configs:
- job_name: "swarmllm"
static_configs:
- targets: ["localhost:8800"]
Example Queries
# Request rate (requests per second over 5 minutes)
rate(swarmllm_inference_requests_total[5m])
# P50 latency
histogram_quantile(0.50, rate(swarmllm_inference_latency_seconds_bucket[5m]))
# P99 latency
histogram_quantile(0.99, rate(swarmllm_inference_latency_seconds_bucket[5m]))
# Average latency
rate(swarmllm_inference_latency_seconds_sum[5m]) / rate(swarmllm_inference_latency_seconds_count[5m])
Health Check
GET /health/ready
Readiness probe returning subsystem status. Returns 200 when ready, 503 otherwise. No auth required.
{
"ready": true,
"subsystems": {
"network": true,
"inference_router": true,
"api_server": true,
...
}
}