Skip to main content

Phase 0 — AWS Infrastructure + Auth (ASAP)

Goal: EC2 up, HTTPS working, OAuth2 active, first agent reachable remotely. Budget: ~42 EUR/month Duration: 2 sprints (2 weeks)

IaC: Terraform · CI/CD: GitHub Actions · Cloud: AWS EC2 + Docker Compose Auth: OAuth2 (GitHub) + JWT session cookies · State: S3 + DynamoDB lock

Architecture Target

Sprint 1 — Terraform: Bootstrap AWS Infrastructure

Step 1.1 — Terraform Backend (S3 + DynamoDB)

One-time manual bootstrap for state management.

Step 1.2 — VPC + EC2 + Security Group

Terraform modules: terraform/modules/ec2/, terraform/modules/networking/, terraform/modules/iam/

Step 1.3 — GitHub Actions: Terraform Pipeline

.github/workflows/terraform.yml — plan on PR, apply on merge to main.

Deliverables:

  • S3 bucket + DynamoDB backend (terraform/backend/main.tf)
  • terraform apply creates VPC, EC2 (t3.medium), SG, EIP (terraform/modules/)
  • EC2 user data: Docker + Compose + Node Exporter (modules/ec2/user_data.sh)
  • IAM role with CloudWatch, ECR pull, SSM (modules/iam/)
  • IMDSv2 required, EBS encrypted, SSH restricted
  • GitHub Actions: plan on PR, apply on merge (terraform.yml)
  • 26 infrastructure tests (tests/test_terraform.py)

Sprint 2 — Auth OAuth2 + App Deploy + Monitoring

Step 2.1 — OAuth2 Authentication

OAuth2 flow with JWT session cookies (authlib + PyJWT).

2.1.1 — Create GitHub OAuth App

GitHub OAuth Apps cannot be created via CLI/API — web UI only.

  1. Go to github.com/settings/developers > OAuth Apps > New OAuth App
  2. Fill in:
    • Application name: Agent Orchestrator
    • Homepage URL: https://agents.yourdomain.com (or http://localhost:5005 for local)
    • Authorization callback URL: https://agents.yourdomain.com/auth/github/callback
  3. Click Register application
  4. Copy the Client ID
  5. Click Generate a new client secret, copy it immediately (shown only once)
  6. Store both values in GitHub Secrets:
    OAUTH_CLIENT_ID=Ov23li...
    OAUTH_CLIENT_SECRET=abc123...

2.1.2 — Generate JWT Secret

openssl rand -hex 32

Store as GitHub Secret: JWT_SECRET_KEY

2.1.3 — All Required Secrets

SecretSourceRequired
OAUTH_CLIENT_IDGitHub Developer SettingsYes
OAUTH_CLIENT_SECRETGitHub Developer SettingsYes
JWT_SECRET_KEYopenssl rand -hex 32Yes
BASE_URLYour domainYes
OPENROUTER_API_KEYOpenRouter dashboardYes (already set)
AWS_ACCESS_KEY_IDAWS IAMYes
AWS_SECRET_ACCESS_KEYAWS IAMYes
EC2_SSH_PRIVATE_KEYssh-keygenYes

2.1.4 — Auth Flow

2.1.5 — Local Testing

To test OAuth locally before deploying to AWS:

# .env.local already has JWT_SECRET_KEY and BASE_URL=http://localhost:5005
# After creating the GitHub OAuth App with callback http://localhost:5005/auth/github/callback:

# Edit .env.local, fill in OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET
# Then:
set -a && source .env.local && set +a
docker compose up dashboard
# Visit http://localhost:5005 → redirects to /login → click "Login with GitHub"
Cookie secure flag

In oauth_routes.py, cookies are set with secure=True. This works on localhost in most browsers but not on LAN IPs. For local testing over LAN, temporarily set secure=False.

2.1.6 — User Store in PostgreSQL

The user store (dashboard_users + dashboard_pending tables) persists approved users, roles, and pending access requests in PostgreSQL. This is required for production — JSON file fallback only works for local dev.

Tables:

TableColumnsPurpose
dashboard_usersgithub_login (PK), email, name, role, active, created_atApproved users with roles
dashboard_pendinggithub_login (PK), email, name, requested_atPending access requests waiting for admin approval

Behavior:

  • On startup, setup_db() creates tables if they don't exist
  • If JSON files from local dev exist (dashboard-users.json, dashboard-pending.json), data is auto-migrated to Postgres and files renamed to .json.migrated
  • If Postgres is unavailable, all operations fall back to JSON files transparently
  • Admin panel shows pending requests with approve/reject actions

Admin flow:

  1. Unknown user tries to log in → denied, saved to dashboard_pending
  2. Admin opens Admin panel → sees pending requests with badge count
  3. Admin approves (with role) or rejects each request
  4. Approved users can log in on next attempt

Step 2.2 — Docker Compose Production

docker-compose.prod.yml with nginx, backend, redis, postgres, prometheus, grafana.

Step 2.3 — GitHub Actions: Deploy Pipeline

.github/workflows/deploy.yml — SSH deploy + health check.

Step 2.4 — Monitoring Board

TaskPriorityDetail
Prometheus setupCRITICALScrape orchestrator metrics (/metrics endpoint)
Grafana dashboardsCRITICALAgent activity, latency, token usage, cost per model
Node ExporterHIGHEC2 system metrics (CPU, RAM, disk, network)
Alert rulesHIGHCost threshold, error rate spike, agent stall detection

Deliverables:

  • OAuth2 GitHub working (fail-closed, security-hardened)
  • Dashboard accessible only after login (WebSocket pre-accept auth)
  • User store (users + pending requests) in PostgreSQL
  • Admin panel for managing users and approving access requests
  • bcrypt password hashing, CORS allowlist, SSRF protection
  • Audit logging (login/logout/denied events)
  • docker-compose.prod.yml (nginx, redis, prometheus, grafana)
  • Nginx: TLS 1.2+, HSTS, rate limiting, WebSocket proxy, /metrics blocked
  • Prometheus: dashboard + node exporter scraping, 6 alert rules
  • Grafana: pre-provisioned dashboard (tasks, cost, latency, CPU/RAM/disk)
  • GitHub Actions deploy pipeline (test → rsync → build → health check)
  • 30 deployment tests (tests/test_deploy.py)
  • HTTPS active on custom domain (needs Route53 + ACM setup)
  • Grafana accessible via SSH tunnel (needs EC2 running)

KPIs

KPITarget
Deploy time (push → live)< 5 min
Auth success rate100%
First token latency< 5s
Uptime99%
Monthly infra cost< 60 EUR

Security Checklist

  • SSH open only from your fixed IP (Terraform SG — ssh_allowed_cidrs)
  • .env.prod never in repository (GitHub Secrets only, .gitignore)
  • JWT cookie httponly=True, secure=True, samesite=strict, 4h expiry
  • Fail-closed auth (no default bypass, dev mode blocked in production)
  • WebSocket pre-accept authentication
  • CORS allowlist (no wildcard origins)
  • SSRF protection (Ollama URL restricted to localhost)
  • API keys header-only (no query param leaks)
  • IMDSv2 required on EC2 (prevents SSRF → metadata attacks)
  • Grafana not publicly exposed (no ports in docker-compose.prod.yml)
  • Rate limiting on /api/* (nginx: 10 req/s + burst 20)
  • OpenRouter API key rotated every 90 days