Skip to main content

Monitoring & Observability

Track performance, costs, and errors Comprehensive monitoring for your Korad.AI deployment.

Metrics​

Key Metrics to Monitor​

MetricDescriptionAlert Threshold
Request RateRequests per second< 1 req/s for 5 min
Error RateFailed requests> 5%
Latency (p50)Median response time> 1s
Latency (p95)95th percentile> 5s
Optimization Rate% of requests optimized< 50%
Cost per RequestAverage cost per request> $0.10
Cache Hit Rate% requests cached< 20%

Built-in Metrics Endpoint​

curl http://localhost:8081/metrics

Response:

{
"optimizer": {
"requests_total": 15420,
"requests_per_second": 2.5,
"error_rate": 0.01,
"latency_p50_ms": 450,
"latency_p95_ms": 1200,
"latency_p99_ms": 3500
},
"optimization": {
"tier_1_cache_hits": 3250,
"tier_2_vanishing_context": 1200,
"tier_3_rlm": 450,
"tier_4_family_locked": 8900,
"tier_5_savings_slider": 1620,
"total_savings_usd": 125.50
},
"billing": {
"total_cost_usd": 45.20,
"theoretical_cost_usd": 170.70,
"savings_percentage": 0.74
}
}

Logging​

Log Levels​

  • ERROR - Errors requiring attention
  • WARN - Warnings (rate limits, budget alerts)
  • INFO - Request/response logging
  • DEBUG - Detailed diagnostics

View Logs​

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f optimizer

# Last 100 lines
docker-compose logs --tail=100 optimizer

Log Format​

{
"timestamp": "2025-02-08T10:30:00Z",
"level": "INFO",
"service": "optimizer",
"request_id": "req-123",
"virtual_key_id": "key-1",
"model": "claude-sonnet-4-5-20250929",
"original_tokens": 50000,
"optimized_tokens": 5000,
"strategy": "Vanishing-Context",
"cost_cents": 15,
"latency_ms": 1200
}

Dashboards​

Grafana Dashboard​

Import pre-built dashboards for:

  1. Overview - Request rate, error rate, latency
  2. Optimization - Savings by tier, optimization rate
  3. Billing - Cost breakdown, savings over time
  4. Performance - P50/P95/P99 latency

Example Dashboard JSON​

{
"dashboard": {
"title": "Korad.AI Optimizer",
"panels": [
{
"title": "Request Rate",
"targets": [{
"expr": "rate(optimizer_requests_total[1m])"
}]
},
{
"title": "Optimization Savings",
"targets": [{
"expr": "sum(optimizer_savings_usd)"
}]
}
]
}
}

Alerting​

Configure Alerts​

High Error Rate​

alert: HighErrorRate
expr: rate(optimizer_errors_total[5m]) > 0.05
for: 5m
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }}% for the last 5 minutes"

Budget Alert​

alert: BudgetAlert
expr: billing_current_month_usd > billing_budget_limit * 0.9
annotations:
summary: "90% of budget used"
description: "Current spend: {{ $value }} of budget"

Low Optimization Rate​

alert: LowOptimizationRate
expr: rate(optimizer_optimized_requests[5m]) / rate(optimizer_requests_total[5m]) < 0.5
annotations:
summary: "Low optimization rate"
description: "Only {{ $value }}% of requests are being optimized"

Notification Channels​

  • Slack - Send alerts to Slack channel
  • Email - Email alerts on-call engineer
  • PagerDuty - Create incidents for critical alerts
  • Webhook - Custom webhook integration

Tracing​

Distributed Tracing​

Enable tracing for request flow:

# In optimizer
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("optimize_request") as span:
span.set_attribute("model", model)
span.set_attribute("original_tokens", original_tokens)

# Apply optimization
with tracer.start_as_current_span("apply_tier_2"):
# Vanishing Context logic
pass

View Traces​

# View traces in Jaeger
docker run -d -p 16686:16686 jaegertracing/all-in-one:latest

# Open browser
open http://localhost:16686

Cost Monitoring​

Real-Time Cost Tracking​

def track_costs(response):
"""Track costs for monitoring."""
return {
"request_id": response.id,
"model": response.model,
"original_tokens": response.response_headers.get('X-Korad-Original-Tokens'),
"optimized_tokens": response.response_headers.get('X-Korad-Optimized-Tokens'),
"theoretical_cost": response.response_headers.get('X-Korad-Theoretical-Cost'),
"actual_cost": response.response_headers.get('X-Korad-Actual-Cost'),
"billed_amount": response.response_headers.get('X-Korad-Billed-Amount'),
"savings": response.response_headers.get('X-Korad-Savings-USD'),
"strategy": response.response_headers.get('X-Korad-Strategy')
}

# Send to monitoring
response = client.chat.completions.create(...)
metrics = track_costs(response)
send_to_prometheus(metrics)

Budget Alerts​

def check_budget_spend():
"""Check if budget is exceeded."""
current_spend = get_monthly_spend()
budget_limit = get_budget_limit()

if current_spend > budget_limit * 0.9:
send_alert("90% of budget used")

if current_spend >= budget_limit:
send_alert("Budget exceeded!")
# Optionally, block requests

Performance Monitoring​

Latency Tracking​

import time
from prometheus_client import Histogram

# Create histogram
request_latency = Histogram(
'optimizer_request_latency_seconds',
'Request latency',
['model', 'strategy']
)

# Measure latency
@request_latency.time()
def process_request(request):
# Process request
return response

Optimization Rate​

from prometheus_client import Counter

optimized_requests = Counter(
'optimizer_optimized_requests_total',
'Total optimized requests',
['tier']
)

# Increment counter
optimized_requests.labels(tier='tier_2').inc()

Integration Examples​

Prometheus​

# prometheus.yml
scrape_configs:
- job_name: 'bifrost'
static_configs:
- targets: ['localhost:8081']
metrics_path: '/metrics'

DataDog​

from datadog import statsd

# Send metric
statsd.increment('optimizer.requests', tags=['model:claude-sonnet'])
statsd.gauge('optimizer.savings', 0.78)

New Relic​

from newrelic import agent

# Custom metric
agent.record_custom_metric('Optimizer/Savings', 0.78)

Health Checks​

Liveness Probe​

curl http://localhost:8084/health

Response:

{
"status": "healthy",
"timestamp": "2025-02-08T10:30:00Z",
"services": {
"optimizer": "healthy",
"bifrost": "healthy",
"redis": "healthy",
"mcp_tools": "healthy"
}
}

Readiness Probe​

curl http://localhost:8084/ready

Response:

{
"ready": true,
"dependencies": {
"bifrost": true,
"redis": true,
"providers": {
"anthropic": true,
"openai": true
}
}
}

Monitor every aspect of your Korad.AI deployment.