v0.6.0 — Production Hardening ✅

Make it reliable enough for real workloads.

Status: Core features completed — rate limiting, audit log, task queue, metrics, alerts, and auth middleware all implemented and tested (201 tests passing).

New Modules

Module	Description
`core/rate_limiter.py`	Per-provider sliding-window rate limiter (requests + tokens)
`core/audit.py`	Structured audit log with filtering and JSON export
`core/task_queue.py`	Priority task queue with retry logic (Postgres-ready interface)
`core/metrics.py`	Counter, Gauge, Histogram + MetricsRegistry with Prometheus export
`core/alerts.py`	Spend alert rules with dedup and webhook support
`dashboard/auth.py`	API key authentication middleware for FastAPI

Features

Local (Ollama)

ID	Feature	Status	Detail
PROD-01	Local model registry	Planned	Track pulled models, sizes, last used date
PROD-02	Ollama auto-pull	Planned	Auto-pull missing models before execution
PROD-03	GPU memory management	Planned	Monitor VRAM, prevent OOM by queuing
PROD-04	Local inference metrics	✅	`MetricsRegistry` with `default_metrics()` — tok/s, latency histograms

Cloud (OpenRouter)

ID	Feature	Status	Detail
PROD-05	API key rotation	Planned	Multiple keys with round-robin
PROD-06	Rate limiting	✅	`RateLimiter` with 60s sliding window per provider
PROD-07	Retry with backoff	Partial	Basic 429 fallback exists; exponential backoff planned
PROD-08	Spend alerts	✅	`AlertManager` with configurable rules, dedup, webhook support

Both (Core)

ID	Feature	Status	Detail
PROD-09	Persistent task queue	✅	`TaskQueue` in-memory with Postgres-ready interface
PROD-10	Authentication	✅	`APIKeyMiddleware` — header/query param, static bypass, dev mode
PROD-11	Audit log	✅	`AuditLog` with 11 event types, filtering, task trace, JSON export
PROD-12	Health checks	Planned	`/health` endpoint with per-provider status
PROD-13	Metrics export	✅	`MetricsRegistry.export_prometheus()` — valid Prometheus text format

Key APIs

RateLimiter

from agent_orchestrator.core.rate_limiter import RateLimiter, RateLimitConfig

limiter = RateLimiter([
    RateLimitConfig(requests_per_minute=60, tokens_per_minute=100000, provider_key="openrouter"),
])
if await limiter.acquire("openrouter", estimated_tokens=2000):
    # make the request
    limiter.record_usage("openrouter", tokens=1500)

AuditLog

from agent_orchestrator.core.audit import AuditLog, EVENT_AGENT_START

log = AuditLog()
log.log_action(EVENT_AGENT_START, "backend", "Starting API build", task_id="t1")
trace = log.get_task_trace("t1")  # all events for task t1
exported = log.export_json()       # JSON-serializable list

TaskQueue

from agent_orchestrator.core.task_queue import TaskQueue, QueuedTask

queue = TaskQueue()
queue.enqueue(QueuedTask(task_id="t1", description="Build API", priority=10))
task = queue.dequeue()  # highest priority first
queue.complete("t1", "API built successfully")
stats = queue.get_stats()  # pending, running, completed, failed

Metrics

from agent_orchestrator.core.metrics import MetricsRegistry, default_metrics

registry = default_metrics()
registry.counter("agent_tasks_total", labels={"agent": "backend", "status": "completed"}).inc()
registry.histogram("agent_latency_seconds", labels={"agent": "backend"}).observe(2.5)
print(registry.export_prometheus())

AlertManager

from agent_orchestrator.core.alerts import AlertManager, AlertRule

manager = AlertManager([
    AlertRule(name="daily_limit", threshold_usd=5.0, period="day", action="log"),
    AlertRule(name="task_limit", threshold_usd=0.50, period="task", action="webhook",
             webhook_url="https://hooks.slack.com/..."),
])
alerts = manager.check(current_spend=6.0, period="day")

APIKeyMiddleware

from agent_orchestrator.dashboard.auth import APIKeyMiddleware

# In FastAPI app setup:
app.add_middleware(APIKeyMiddleware, api_keys=["secret-key-1", "secret-key-2"])
# No keys = dev mode (all requests allowed)

New Modules​

Features​

Local (Ollama)​

Cloud (OpenRouter)​

Both (Core)​

Key APIs​

RateLimiter​

AuditLog​

TaskQueue​

Metrics​

AlertManager​

APIKeyMiddleware​