Monitoring & Observability
Track performance, costs, and errors Comprehensive monitoring for your Korad.AI deployment.
Metrics​
Key Metrics to Monitor​
| Metric | Description | Alert Threshold |
|---|---|---|
| Request Rate | Requests per second | < 1 req/s for 5 min |
| Error Rate | Failed requests | > 5% |
| Latency (p50) | Median response time | > 1s |
| Latency (p95) | 95th percentile | > 5s |
| Optimization Rate | % of requests optimized | < 50% |
| Cost per Request | Average cost per request | > $0.10 |
| Cache Hit Rate | % requests cached | < 20% |
Built-in Metrics Endpoint​
curl http://localhost:8081/metrics
Response:
{
"optimizer": {
"requests_total": 15420,
"requests_per_second": 2.5,
"error_rate": 0.01,
"latency_p50_ms": 450,
"latency_p95_ms": 1200,
"latency_p99_ms": 3500
},
"optimization": {
"tier_1_cache_hits": 3250,
"tier_2_vanishing_context": 1200,
"tier_3_rlm": 450,
"tier_4_family_locked": 8900,
"tier_5_savings_slider": 1620,
"total_savings_usd": 125.50
},
"billing": {
"total_cost_usd": 45.20,
"theoretical_cost_usd": 170.70,
"savings_percentage": 0.74
}
}
Logging​
Log Levels​
ERROR- Errors requiring attentionWARN- Warnings (rate limits, budget alerts)INFO- Request/response loggingDEBUG- Detailed diagnostics
View Logs​
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f optimizer
# Last 100 lines
docker-compose logs --tail=100 optimizer
Log Format​
{
"timestamp": "2025-02-08T10:30:00Z",
"level": "INFO",
"service": "optimizer",
"request_id": "req-123",
"virtual_key_id": "key-1",
"model": "claude-sonnet-4-5-20250929",
"original_tokens": 50000,
"optimized_tokens": 5000,
"strategy": "Vanishing-Context",
"cost_cents": 15,
"latency_ms": 1200
}
Dashboards​
Grafana Dashboard​
Import pre-built dashboards for:
- Overview - Request rate, error rate, latency
- Optimization - Savings by tier, optimization rate
- Billing - Cost breakdown, savings over time
- Performance - P50/P95/P99 latency
Example Dashboard JSON​
{
"dashboard": {
"title": "Korad.AI Optimizer",
"panels": [
{
"title": "Request Rate",
"targets": [{
"expr": "rate(optimizer_requests_total[1m])"
}]
},
{
"title": "Optimization Savings",
"targets": [{
"expr": "sum(optimizer_savings_usd)"
}]
}
]
}
}
Alerting​
Configure Alerts​
High Error Rate​
alert: HighErrorRate
expr: rate(optimizer_errors_total[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }}% for the last 5 minutes"
Budget Alert​
alert: BudgetAlert
expr: billing_current_month_usd > billing_budget_limit * 0.9
annotations:
summary: "90% of budget used"
description: "Current spend: {{ $value }} of budget"
Low Optimization Rate​
alert: LowOptimizationRate
expr: rate(optimizer_optimized_requests[5m]) / rate(optimizer_requests_total[5m]) < 0.5
annotations:
summary: "Low optimization rate"
description: "Only {{ $value }}% of requests are being optimized"
Notification Channels​
- Slack - Send alerts to Slack channel
- Email - Email alerts on-call engineer
- PagerDuty - Create incidents for critical alerts
- Webhook - Custom webhook integration
Tracing​
Distributed Tracing​
Enable tracing for request flow:
# In optimizer
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("optimize_request") as span:
span.set_attribute("model", model)
span.set_attribute("original_tokens", original_tokens)
# Apply optimization
with tracer.start_as_current_span("apply_tier_2"):
# Vanishing Context logic
pass
View Traces​
# View traces in Jaeger
docker run -d -p 16686:16686 jaegertracing/all-in-one:latest
# Open browser
open http://localhost:16686
Cost Monitoring​
Real-Time Cost Tracking​
def track_costs(response):
"""Track costs for monitoring."""
return {
"request_id": response.id,
"model": response.model,
"original_tokens": response.response_headers.get('X-Korad-Original-Tokens'),
"optimized_tokens": response.response_headers.get('X-Korad-Optimized-Tokens'),
"theoretical_cost": response.response_headers.get('X-Korad-Theoretical-Cost'),
"actual_cost": response.response_headers.get('X-Korad-Actual-Cost'),
"billed_amount": response.response_headers.get('X-Korad-Billed-Amount'),
"savings": response.response_headers.get('X-Korad-Savings-USD'),
"strategy": response.response_headers.get('X-Korad-Strategy')
}
# Send to monitoring
response = client.chat.completions.create(...)
metrics = track_costs(response)
send_to_prometheus(metrics)
Budget Alerts​
def check_budget_spend():
"""Check if budget is exceeded."""
current_spend = get_monthly_spend()
budget_limit = get_budget_limit()
if current_spend > budget_limit * 0.9:
send_alert("90% of budget used")
if current_spend >= budget_limit:
send_alert("Budget exceeded!")
# Optionally, block requests
Performance Monitoring​
Latency Tracking​
import time
from prometheus_client import Histogram
# Create histogram
request_latency = Histogram(
'optimizer_request_latency_seconds',
'Request latency',
['model', 'strategy']
)
# Measure latency
@request_latency.time()
def process_request(request):
# Process request
return response
Optimization Rate​
from prometheus_client import Counter
optimized_requests = Counter(
'optimizer_optimized_requests_total',
'Total optimized requests',
['tier']
)
# Increment counter
optimized_requests.labels(tier='tier_2').inc()
Integration Examples​
Prometheus​
# prometheus.yml
scrape_configs:
- job_name: 'bifrost'
static_configs:
- targets: ['localhost:8081']
metrics_path: '/metrics'
DataDog​
from datadog import statsd
# Send metric
statsd.increment('optimizer.requests', tags=['model:claude-sonnet'])
statsd.gauge('optimizer.savings', 0.78)
New Relic​
from newrelic import agent
# Custom metric
agent.record_custom_metric('Optimizer/Savings', 0.78)
Health Checks​
Liveness Probe​
curl http://localhost:8084/health
Response:
{
"status": "healthy",
"timestamp": "2025-02-08T10:30:00Z",
"services": {
"optimizer": "healthy",
"bifrost": "healthy",
"redis": "healthy",
"mcp_tools": "healthy"
}
}
Readiness Probe​
curl http://localhost:8084/ready
Response:
{
"ready": true,
"dependencies": {
"bifrost": true,
"redis": true,
"providers": {
"anthropic": true,
"openai": true
}
}
}
Monitor every aspect of your Korad.AI deployment.