Skip to main content

Post-MVP — Infrastructure Scaling

Trigger: Monthly revenue > 600 EUR for 2 consecutive months. Budget: ~625 EUR/month

Decision Gate

This phase only starts if v1.0 is shipped AND revenue justifies the investment. Until then, the system runs on AWS t3.medium + OpenRouter at ~42-100 EUR/month.

GPU Infrastructure

TaskDetail
Vast.ai H200 setupvLLM inference server for complex/fine-tuned tasks
Hybrid routingRoute to Vast.ai (complex) or OpenRouter (burst)
Model hostingSelf-host Qwen3 30B or fine-tuned variant
Auto-scalingScale between OpenRouter and self-hosted based on load

Fine-Tuning

TaskDetail
Data pipelineCurate training data from production usage
Fine-tune Qwen3 30BDomain-specific on H200
A/B testingFine-tuned vs base model on real traffic
Model registryTrack versions, metrics, rollback

Enterprise Features

TaskDetail
SSO / SAMLEnterprise authentication
Audit loggingFull audit trail of agent actions
Data residencyEU, US, or self-hosted
SLA guaranteesUptime and latency commitments
On-prem optionPackage for customer self-hosting

Cost Breakdown

ItemEUR/month
AWS EC2 + S3 + networking80
Vast.ai H200 interruptible (252h inference)305
Vast.ai H200 on-demand (108h fine-tuning)241
OpenRouter (overflow/fallback)30 est.
Total~656

Why Self-Hosted?

The switch to GPU is driven by capabilities, not cost savings:

  1. Fine-tuning on proprietary data (impossible with OpenRouter)
  2. Total privacy (sensitive data stays in-house)
  3. Guaranteed latency without third-party dependency
  4. Custom domain-specific model

At current OpenRouter pricing ($0.08/$0.28 per 1M tokens), you'd need ~260M tokens/day to match GPU cost — that's enterprise-level usage.