Name: AISIX Cloud
Brand: AISIX
Availability: InStock

Question 1

Do you charge for tokens or LLM usage?

Accepted Answer

No. You bring your own provider keys and pay your LLM bill directly to OpenAI, Anthropic, and others. AISIX Cloud only charges the plan subscription — there is no markup on your tokens.

Question 2

How is AISIX different from an LLM API relay or token reseller?

Accepted Answer

A relay resells access to models through its own accounts and servers — your prompts, responses, and API keys pass through a third party you don't control, and you inherit its markup, shared rate limits, and the risk of the upstream account being throttled or banned. AISIX is your own gateway, not a reseller. You connect your own provider keys and run it in your cloud or VPC (or a managed plane with envelope-encrypted keys), so traffic and data stay under your control. You get routing, failover, rate limits, guardrails, budgets, and full observability — and you pay providers directly, with no token markup. It's open-source (Apache-2.0, built in Rust), production-grade, and SOC 2 / ISO 27001 / GDPR / HIPAA-ready when you scale.

Question 3

What counts toward my monthly request quota?

Accepted Answer

Recorded requests — any call routed and logged through the gateway, including error responses. One streaming response counts as a single request.

Question 4

What happens when I hit the limit?

Accepted Answer

On Developer, your traffic keeps flowing past 100K — those extra requests simply aren't recorded, so they don't show up in your logs or analytics, and nothing is ever blocked. On Team, we automatically add $100 per additional 1M requests — traffic is never interrupted. Sustained usage over 5M/mo is a good point to move to Enterprise.

Question 5

Can I self-host AISIX instead of using the cloud?

Accepted Answer

Yes. AISIX is open source under Apache-2.0 — run the full gateway as a single Rust binary, free, with community support. The managed cloud adds the hosted control plane, dashboard, budgets, RBAC, and SLAs on top. Enterprise can also run the managed stack inside your own cloud / VPC.

Question 6

Which LLM providers are supported?

Accepted Answer

More than 100 providers through one OpenAI-compatible API, including 20+ popular integrations (OpenAI, Anthropic, Gemini, DeepSeek, Groq, Mistral, Cohere, Qwen, Together, Fireworks, and more). Cloud-hosted providers — AWS Bedrock, Azure OpenAI, and GCP Vertex AI — are available on Enterprise.

Question 7

How do SSO, audit logs, and compliance work?

Accepted Answer

Organization management, SSO (SAML / OIDC), audit logs, and SOC 2 Type II / ISO 27001 / GDPR / HIPAA are part of Enterprise. Talk to sales to scope your requirements and deployment model.

Question 8

Where is my data stored, and how are provider keys protected?

Accepted Answer

On AISIX Cloud, provider keys are envelope-encrypted at rest and decrypted only at request time; each data plane is scoped to its own environment keyspace. On self-host, all data and keys stay entirely within your own infrastructure.

Question 9

Does the gateway add latency?

Accepted Answer

The data plane is a native Rust proxy with a published performance baseline: ~28,300 req/s at saturation on 4 vCPUs, sub-millisecond p50 gateway overhead at low-to-moderate load, and ~0.65 ms added time-to-first-token for streaming — negligible next to LLM inference time.

Feature	Developer$0	Team$149/mo	EnterpriseCustom
Usage & limits
Recorded requests / month	100K	1M	10M+
Members	1	25	Unlimited
Environments	1	3	Unlimited
Data planes	1	3	Unlimited
Log retention	3 days	30 days	Custom
Metrics retention	30 days	90 days	Custom
Monthly overage	Not recorded	$100 / 1M	Custom
AI gateway core
Universal OpenAI-compatible API
Chat completions + streaming (SSE)
Anthropic-compatible /v1/messages
Embeddings · rerank
Audio (speech-to-text, TTS)
Image generation
Provider passthrough
Playground
Single-model direct
Virtual / routing models	—
Model ensembles (panel + judge)	—
MCP gateway
Model providers
OpenAI · Anthropic · Gemini · DeepSeek
20+ popular providers
100+ via OpenAI-compatibility
AWS Bedrock · Azure OpenAI · GCP Vertex AI	—	—
Routing & reliability
Weighted load balancing	—
Sticky canary / A-B routing	—
Tag-based conditional routing · wildcard aliases	—
Automatic retry	—
Fallback on errors / 429	—
Upstream health checks	—
Semantic routing	—	—
Cost / latency / load-aware routing	—	—
Rate limiting
Request rate limits (RPM / RPD)	—
Token rate limits (TPM / TPD)	—
Concurrency limits	—
Per-team / per-member limits	—
Cost & budgets
Per-key budgets (hard-stop / warn)	—
Org / env / provider budgets	—
Budget alerts (75 / 90 / 100%)	—
Per-model cost tracking	—
Caching
Response cache (exact-match)	—
Memory + Redis backends	—
Cost-saved telemetry	—
Semantic cache	—	—
Security & guardrails
Keyword / regex guardrails	—
Cloud safety guardrails	—	—
PII redaction · content moderation	—	—
Custom guardrail hooks	—	—
Observability
Request access logs
Usage & cost analytics dashboard	Basic
Prometheus metrics	—
OpenTelemetry trace export	—
Alerts	—
Data Lake / bucket export	—	—
Access control & org
API key management + rotation
Virtual keys & model allowlist
Personal access tokens (CLI / CI)
Social login (GitHub / Google)
RBAC	—
Teams	—
Organization management	—	—
SSO (SAML / OIDC)	—	—
Audit logs	—	—
Deployment & compliance
Managed SaaS hosting
Provider-key encryption at rest
Self-host / on-prem option	—	—
Private / VPC deployment	—	—
SOC 2 Type II · ISO 27001 · GDPR · HIPAA	—	—
BAA · data isolation	—	—
Support & SLA
Support	Community	Standard	Dedicated
SLA	—	Standard	Custom

Start free. Scale when AI is moving your numbers.

Pricing that grows with your AI

What's a recorded request?

Free tier never blocks

Overage is predictable

Everything you need to build, ship & scale AI

Enterprise-grade security, ready when you scale

Questions, answered

Ship AI that grows your business