Post-MVP — Infrastructure Scaling
Trigger: Monthly revenue > 600 EUR for 2 consecutive months. Budget: ~625 EUR/month
Decision Gate
This phase only starts if v1.0 is shipped AND revenue justifies the investment. Until then, the system runs on AWS t3.medium + OpenRouter at ~42-100 EUR/month.
GPU Infrastructure
| Task | Detail |
|---|---|
| Vast.ai H200 setup | vLLM inference server for complex/fine-tuned tasks |
| Hybrid routing | Route to Vast.ai (complex) or OpenRouter (burst) |
| Model hosting | Self-host Qwen3 30B or fine-tuned variant |
| Auto-scaling | Scale between OpenRouter and self-hosted based on load |
Fine-Tuning
| Task | Detail |
|---|---|
| Data pipeline | Curate training data from production usage |
| Fine-tune Qwen3 30B | Domain-specific on H200 |
| A/B testing | Fine-tuned vs base model on real traffic |
| Model registry | Track versions, metrics, rollback |
Enterprise Features
| Task | Detail |
|---|---|
| SSO / SAML | Enterprise authentication |
| Audit logging | Full audit trail of agent actions |
| Data residency | EU, US, or self-hosted |
| SLA guarantees | Uptime and latency commitments |
| On-prem option | Package for customer self-hosting |
Cost Breakdown
| Item | EUR/month |
|---|---|
| AWS EC2 + S3 + networking | 80 |
| Vast.ai H200 interruptible (252h inference) | 305 |
| Vast.ai H200 on-demand (108h fine-tuning) | 241 |
| OpenRouter (overflow/fallback) | 30 est. |
| Total | ~656 |
Why Self-Hosted?
The switch to GPU is driven by capabilities, not cost savings:
- Fine-tuning on proprietary data (impossible with OpenRouter)
- Total privacy (sensitive data stays in-house)
- Guaranteed latency without third-party dependency
- Custom domain-specific model
At current OpenRouter pricing ($0.08/$0.28 per 1M tokens), you'd need ~260M tokens/day to match GPU cost — that's enterprise-level usage.