Monitoring & Observability

Track performance, costs, and errors Comprehensive monitoring for your Korad.AI deployment.

Metrics

Key Metrics to Monitor

Metric	Description	Alert Threshold
Request Rate	Requests per second	< 1 req/s for 5 min
Error Rate	Failed requests	> 5%
Latency (p50)	Median response time	> 1s
Latency (p95)	95th percentile	> 5s
Optimization Rate	% of requests optimized	< 50%
Cost per Request	Average cost per request	> $0.10
Cache Hit Rate	% requests cached	< 20%

Built-in Metrics Endpoint

curl http://localhost:8081/metrics

Response:

{
  "optimizer": {
    "requests_total": 15420,
    "requests_per_second": 2.5,
    "error_rate": 0.01,
    "latency_p50_ms": 450,
    "latency_p95_ms": 1200,
    "latency_p99_ms": 3500
  },
  "optimization": {
    "tier_1_cache_hits": 3250,
    "tier_2_vanishing_context": 1200,
    "tier_3_rlm": 450,
    "tier_4_family_locked": 8900,
    "tier_5_savings_slider": 1620,
    "total_savings_usd": 125.50
  },
  "billing": {
    "total_cost_usd": 45.20,
    "theoretical_cost_usd": 170.70,
    "savings_percentage": 0.74
  }
}

Logging

Log Levels

ERROR - Errors requiring attention
WARN - Warnings (rate limits, budget alerts)
INFO - Request/response logging
DEBUG - Detailed diagnostics

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f optimizer

# Last 100 lines
docker-compose logs --tail=100 optimizer

Log Format

{
  "timestamp": "2025-02-08T10:30:00Z",
  "level": "INFO",
  "service": "optimizer",
  "request_id": "req-123",
  "virtual_key_id": "key-1",
  "model": "claude-sonnet-4-5-20250929",
  "original_tokens": 50000,
  "optimized_tokens": 5000,
  "strategy": "Vanishing-Context",
  "cost_cents": 15,
  "latency_ms": 1200
}

Dashboards

Grafana Dashboard

Import pre-built dashboards for:

Overview - Request rate, error rate, latency
Optimization - Savings by tier, optimization rate
Billing - Cost breakdown, savings over time
Performance - P50/P95/P99 latency

Example Dashboard JSON

{
  "dashboard": {
    "title": "Korad.AI Optimizer",
    "panels": [
      {
        "title": "Request Rate",
        "targets": [{
          "expr": "rate(optimizer_requests_total[1m])"
        }]
      },
      {
        "title": "Optimization Savings",
        "targets": [{
          "expr": "sum(optimizer_savings_usd)"
        }]
      }
    ]
  }
}

Alerting

Configure Alerts

High Error Rate

alert: HighErrorRate
expr: rate(optimizer_errors_total[5m]) > 0.05
for: 5m
annotations:
  summary: "High error rate detected"
  description: "Error rate is {{ $value }}% for the last 5 minutes"

Budget Alert

alert: BudgetAlert
expr: billing_current_month_usd > billing_budget_limit * 0.9
annotations:
  summary: "90% of budget used"
  description: "Current spend: {{ $value }} of budget"

Low Optimization Rate

alert: LowOptimizationRate
expr: rate(optimizer_optimized_requests[5m]) / rate(optimizer_requests_total[5m]) < 0.5
annotations:
  summary: "Low optimization rate"
  description: "Only {{ $value }}% of requests are being optimized"

Notification Channels

Slack - Send alerts to Slack channel
Email - Email alerts on-call engineer
PagerDuty - Create incidents for critical alerts
Webhook - Custom webhook integration

Tracing

Distributed Tracing

Enable tracing for request flow:

# In optimizer
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("optimize_request") as span:
    span.set_attribute("model", model)
    span.set_attribute("original_tokens", original_tokens)

    # Apply optimization
    with tracer.start_as_current_span("apply_tier_2"):
        # Vanishing Context logic
        pass

View Traces

# View traces in Jaeger
docker run -d -p 16686:16686 jaegertracing/all-in-one:latest

# Open browser
open http://localhost:16686

Cost Monitoring

Real-Time Cost Tracking

def track_costs(response):
    """Track costs for monitoring."""
    return {
        "request_id": response.id,
        "model": response.model,
        "original_tokens": response.response_headers.get('X-Korad-Original-Tokens'),
        "optimized_tokens": response.response_headers.get('X-Korad-Optimized-Tokens'),
        "theoretical_cost": response.response_headers.get('X-Korad-Theoretical-Cost'),
        "actual_cost": response.response_headers.get('X-Korad-Actual-Cost'),
        "billed_amount": response.response_headers.get('X-Korad-Billed-Amount'),
        "savings": response.response_headers.get('X-Korad-Savings-USD'),
        "strategy": response.response_headers.get('X-Korad-Strategy')
    }

# Send to monitoring
response = client.chat.completions.create(...)
metrics = track_costs(response)
send_to_prometheus(metrics)

Budget Alerts

def check_budget_spend():
    """Check if budget is exceeded."""
    current_spend = get_monthly_spend()
    budget_limit = get_budget_limit()

    if current_spend > budget_limit * 0.9:
        send_alert("90% of budget used")

    if current_spend >= budget_limit:
        send_alert("Budget exceeded!")
        # Optionally, block requests

Performance Monitoring

Latency Tracking

import time
from prometheus_client import Histogram

# Create histogram
request_latency = Histogram(
    'optimizer_request_latency_seconds',
    'Request latency',
    ['model', 'strategy']
)

# Measure latency
@request_latency.time()
def process_request(request):
    # Process request
    return response

Optimization Rate

from prometheus_client import Counter

optimized_requests = Counter(
    'optimizer_optimized_requests_total',
    'Total optimized requests',
    ['tier']
)

# Increment counter
optimized_requests.labels(tier='tier_2').inc()

Integration Examples

Prometheus

# prometheus.yml
scrape_configs:
  - job_name: 'bifrost'
    static_configs:
      - targets: ['localhost:8081']
    metrics_path: '/metrics'

DataDog

from datadog import statsd

# Send metric
statsd.increment('optimizer.requests', tags=['model:claude-sonnet'])
statsd.gauge('optimizer.savings', 0.78)

New Relic

from newrelic import agent

# Custom metric
agent.record_custom_metric('Optimizer/Savings', 0.78)

Health Checks

Liveness Probe

curl http://localhost:8084/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-02-08T10:30:00Z",
  "services": {
    "optimizer": "healthy",
    "bifrost": "healthy",
    "redis": "healthy",
    "mcp_tools": "healthy"
  }
}

Readiness Probe

curl http://localhost:8084/ready

Response:

{
  "ready": true,
  "dependencies": {
    "bifrost": true,
    "redis": true,
    "providers": {
      "anthropic": true,
      "openai": true
    }
  }
}

Monitor every aspect of your Korad.AI deployment.

Metrics​

Key Metrics to Monitor​

Built-in Metrics Endpoint​

Logging​

Log Levels​

View Logs​

Log Format​

Dashboards​

Grafana Dashboard​

Example Dashboard JSON​

Alerting​

Configure Alerts​

High Error Rate​

Budget Alert​

Low Optimization Rate​

Notification Channels​

Tracing​

Distributed Tracing​

View Traces​

Cost Monitoring​

Real-Time Cost Tracking​

Budget Alerts​

Performance Monitoring​

Latency Tracking​

Optimization Rate​

Integration Examples​

Prometheus​

DataDog​

New Relic​

Health Checks​

Liveness Probe​

Readiness Probe​

Metrics

Key Metrics to Monitor

Built-in Metrics Endpoint

Logging

Log Levels

View Logs

Log Format

Dashboards

Grafana Dashboard

Example Dashboard JSON

Alerting

Configure Alerts

High Error Rate

Budget Alert

Low Optimization Rate

Notification Channels

Tracing

Distributed Tracing

View Traces

Cost Monitoring

Real-Time Cost Tracking

Budget Alerts

Performance Monitoring

Latency Tracking

Optimization Rate

Integration Examples

Prometheus

DataDog

New Relic

Health Checks

Liveness Probe

Readiness Probe