Simple, Transparent Pricing
From free to enterprise. No hidden fees, no GPU markup games.
Developer
Get started instantly. Shared GPU pool with generous free tier and pay-as-you-go after.
- 1,000 free requests/month
- All modalities (LLM, embeddings, STT)
- OpenAI-compatible API
- Shared GPU pool
- Community support
- Rate limited (60 RPM)
- Dedicated GPU allocation
- Custom model deployment
- SLA guarantee
Pro
Reserved GPU capacity with burst scaling. For production workloads that need reliability.
- 10,000 requests/month included
- All modalities (LLM, embeddings, STT)
- OpenAI-compatible API
- Reserved GPU hours
- Priority support (email)
- Higher rate limits (600 RPM)
- Burst to shared pool
- Custom model deployment
- 99.9% SLA
Enterprise
Dedicated infrastructure in your own Azure subscription. Full isolation, custom SLAs.
- Unlimited requests
- All modalities (LLM, embeddings, STT)
- OpenAI-compatible API
- Dedicated GPU node pools
- Dedicated support (Slack + phone)
- No rate limits
- Custom model deployment
- Compound AI pipelines
- 99.99% SLA
Self-Hosted
Run DirectAI on your own infrastructure. Full Helm chart, container images, and support.
- Your infrastructure, your rules
- All modalities (LLM, embeddings, STT)
- OpenAI-compatible API
- Helm chart + Docker images
- Setup support + documentation
- Air-gapped / sovereign deployment
- Custom model deployment
- Bring your own GPUs
- Support tier add-on available
Pay-Per-Use Rates
After your tier's included allowance, usage is billed at these rates.
| Modality | Unit | Developer | Pro |
|---|---|---|---|
| LLM (Chat Completions) | per 1M tokens | $0.80 | $0.60 |
| Embeddings | per 1M tokens | $0.05 | $0.03 |
| Transcription | per hour | $0.30 | $0.20 |
Frequently Asked Questions
How does pay-per-use pricing work?
After your free tier allowance, you pay per 1K tokens for LLMs, per request for embeddings and transcription. Usage is metered in real-time and billed monthly via Stripe. No surprise bills — set spend limits in your dashboard.
Can I switch between tiers?
Yes. Upgrade or downgrade anytime from your dashboard. Changes take effect at the start of your next billing cycle. Your API keys and endpoints stay the same.
What models are available?
We support Llama 3.1 (8B, 70B, 405B), Mistral, Qwen, DeepSeek for chat. BGE and E5 for embeddings. Whisper large-v3 for transcription. Enterprise and Self-Hosted tiers can deploy any custom model.
What's the difference between managed and self-hosted?
Managed tiers run on DirectAI infrastructure — we handle scaling, updates, and monitoring. Self-Hosted gives you our Helm chart and container images to run on your own Kubernetes cluster. Same engine, your servers.
Do you support fine-tuned models?
Enterprise and Self-Hosted tiers support custom model deployment. Upload your weights, and we compile optimized TensorRT-LLM engines for your target GPU. Standard architectures (Llama, Mistral, Qwen) deploy in minutes.
Is there a free trial for paid tiers?
The Developer tier is free with 1,000 requests/month — use it as your trial. If you need to evaluate Pro features, contact us for a 14-day Pro trial.
Not sure which plan is right? Talk to us