Skip to main content

Infrastructure Tiers

Tier 1: Pure Cloud (Phase 1-2)

[Client / UI]
|
v
[AWS EC2 t3.medium -- Orchestrator]
- StateGraph engine
- API Gateway (FastAPI)
- Dashboard (WebSocket)
|
v
[OpenRouter API]
- Qwen3 30B A3B (complex tasks)
- Qwen3.5-Flash (simple tasks)
|
v
[AWS S3] -- storage, checkpoints

Cost: ~42 EUR/month | Setup: hours | Maintenance: minimal

Tier 2: Hybrid (Phase 4)

[AWS EC2 -- Orchestrator]
|
|--- Complex / fine-tuned -------> [Vast.ai H200 -- vLLM]
|
|--- Standard / burst -----------> [OpenRouter Qwen3]
|
|--- Simple / economical --------> [OpenRouter Flash]

Cost: ~625 EUR/month | Setup: days | Maintenance: moderate

Tier 3: Full On-Prem (Future)

[Kubernetes Orchestrator]
|
|--- vLLM Cluster (GPU nodes)
|--- Cloud API (frontier fallback)
|--- Model Registry

Cost: high upfront, low marginal | Best for: enterprise, >20 devs

Monitoring

Phase 1-2

  • AWS CloudWatch: EC2 metrics
  • OpenRouter dashboard: token usage
  • Custom alerts: Telegram/email on budget threshold

Phase 3-4

  • Grafana + Prometheus: full infra metrics
  • LangFuse: LLM tracing, prompt versioning
  • Vast.ai dashboard: GPU utilization