Simple, Transparent Pricing

From free to enterprise. No hidden fees, no GPU markup games.

Developer

Freethen pay-per-use

Get started instantly. Shared GPU pool with generous free tier and pay-as-you-go after.

1,000 free requests/month
All modalities (LLM, embeddings, STT)
OpenAI-compatible API
Shared GPU pool
Community support
Rate limited (60 RPM)
Dedicated GPU allocation
Custom model deployment
SLA guarantee

Join Waitlist

Pro

$99/month + usage

Reserved GPU capacity with burst scaling. For production workloads that need reliability.

10,000 requests/month included
All modalities (LLM, embeddings, STT)
OpenAI-compatible API
Reserved GPU hours
Priority support (email)
Higher rate limits (600 RPM)
Burst to shared pool
Custom model deployment
99.9% SLA

Join Waitlist

Enterprise

Customcontact us

Dedicated infrastructure in your own Azure subscription. Full isolation, custom SLAs.

Unlimited requests
All modalities (LLM, embeddings, STT)
OpenAI-compatible API
Dedicated GPU node pools
Dedicated support (Slack + phone)
No rate limits
Custom model deployment
Compound AI pipelines
99.99% SLA

Contact Sales

Self-Hosted

$499/month license

Run DirectAI on your own infrastructure. Full Helm chart, container images, and support.

Your infrastructure, your rules
All modalities (LLM, embeddings, STT)
OpenAI-compatible API
Helm chart + Docker images
Setup support + documentation
Air-gapped / sovereign deployment
Custom model deployment
Bring your own GPUs
Support tier add-on available

Contact Sales

Pay-Per-Use Rates

After your tier's included allowance, usage is billed at these rates.

Modality	Unit	Developer	Pro
LLM (Chat Completions)	per 1M tokens	$0.80	$0.60
Embeddings	per 1M tokens	$0.05	$0.03
Transcription	per hour	$0.30	$0.20

Frequently Asked Questions

How does pay-per-use pricing work?

After your free tier allowance, you pay per 1K tokens for LLMs, per request for embeddings and transcription. Usage is metered in real-time and billed monthly via Stripe. No surprise bills — set spend limits in your dashboard.

Can I switch between tiers?

Yes. Upgrade or downgrade anytime from your dashboard. Changes take effect at the start of your next billing cycle. Your API keys and endpoints stay the same.

What models are available?

We support Llama 3.1 (8B, 70B, 405B), Mistral, Qwen, DeepSeek for chat. BGE and E5 for embeddings. Whisper large-v3 for transcription. Enterprise and Self-Hosted tiers can deploy any custom model.

What's the difference between managed and self-hosted?

Managed tiers run on DirectAI infrastructure — we handle scaling, updates, and monitoring. Self-Hosted gives you our Helm chart and container images to run on your own Kubernetes cluster. Same engine, your servers.

Do you support fine-tuned models?

Enterprise and Self-Hosted tiers support custom model deployment. Upload your weights, and we compile optimized TensorRT-LLM engines for your target GPU. Standard architectures (Llama, Mistral, Qwen) deploy in minutes.

Is there a free trial for paid tiers?

The Developer tier is free with 1,000 requests/month — use it as your trial. If you need to evaluate Pro features, contact us for a 14-day Pro trial.

Not sure which plan is right? Talk to us