Simple, Transparent Pricing

From free to enterprise. No hidden fees, no GPU markup games.

Developer

Freethen pay-per-use

Get started instantly. Shared GPU pool with generous free tier and pay-as-you-go after.

  • 1,000 free requests/month
  • All modalities (LLM, embeddings, STT)
  • OpenAI-compatible API
  • Shared GPU pool
  • Community support
  • Rate limited (60 RPM)
  • Dedicated GPU allocation
  • Custom model deployment
  • SLA guarantee
Most Popular

Pro

$99/month + usage

Reserved GPU capacity with burst scaling. For production workloads that need reliability.

  • 10,000 requests/month included
  • All modalities (LLM, embeddings, STT)
  • OpenAI-compatible API
  • Reserved GPU hours
  • Priority support (email)
  • Higher rate limits (600 RPM)
  • Burst to shared pool
  • Custom model deployment
  • 99.9% SLA

Enterprise

Customcontact us

Dedicated infrastructure in your own Azure subscription. Full isolation, custom SLAs.

  • Unlimited requests
  • All modalities (LLM, embeddings, STT)
  • OpenAI-compatible API
  • Dedicated GPU node pools
  • Dedicated support (Slack + phone)
  • No rate limits
  • Custom model deployment
  • Compound AI pipelines
  • 99.99% SLA

Self-Hosted

$499/month license

Run DirectAI on your own infrastructure. Full Helm chart, container images, and support.

  • Your infrastructure, your rules
  • All modalities (LLM, embeddings, STT)
  • OpenAI-compatible API
  • Helm chart + Docker images
  • Setup support + documentation
  • Air-gapped / sovereign deployment
  • Custom model deployment
  • Bring your own GPUs
  • Support tier add-on available

Pay-Per-Use Rates

After your tier's included allowance, usage is billed at these rates.

ModalityUnitDeveloperPro
LLM (Chat Completions)per 1M tokens$0.80$0.60
Embeddingsper 1M tokens$0.05$0.03
Transcriptionper hour$0.30$0.20

Frequently Asked Questions

How does pay-per-use pricing work?

After your free tier allowance, you pay per 1K tokens for LLMs, per request for embeddings and transcription. Usage is metered in real-time and billed monthly via Stripe. No surprise bills — set spend limits in your dashboard.

Can I switch between tiers?

Yes. Upgrade or downgrade anytime from your dashboard. Changes take effect at the start of your next billing cycle. Your API keys and endpoints stay the same.

What models are available?

We support Llama 3.1 (8B, 70B, 405B), Mistral, Qwen, DeepSeek for chat. BGE and E5 for embeddings. Whisper large-v3 for transcription. Enterprise and Self-Hosted tiers can deploy any custom model.

What's the difference between managed and self-hosted?

Managed tiers run on DirectAI infrastructure — we handle scaling, updates, and monitoring. Self-Hosted gives you our Helm chart and container images to run on your own Kubernetes cluster. Same engine, your servers.

Do you support fine-tuned models?

Enterprise and Self-Hosted tiers support custom model deployment. Upload your weights, and we compile optimized TensorRT-LLM engines for your target GPU. Standard architectures (Llama, Mistral, Qwen) deploy in minutes.

Is there a free trial for paid tiers?

The Developer tier is free with 1,000 requests/month — use it as your trial. If you need to evaluate Pro features, contact us for a 14-day Pro trial.

Not sure which plan is right? Talk to us