Questions, answered
Do you charge for tokens or LLM usage?+
No. You bring your own provider keys and pay your LLM bill directly to OpenAI, Anthropic, and others. AISIX Cloud only charges the plan subscription — there is no markup on your tokens.
What counts toward my monthly request quota?+
Recorded requests — any call routed and logged through the gateway, including error responses. One streaming response counts as a single request.
What happens when I hit the limit?+
On Developer, your traffic keeps flowing past 100K — those extra requests simply aren't recorded, so they don't show up in your logs or analytics, and nothing is ever blocked. On Team, we automatically add $100 per additional 1M requests — traffic is never interrupted. Sustained usage over 5M/mo is a good point to move to Enterprise.
Can I self-host AISIX instead of using the cloud?+
Yes. AISIX is open source under Apache-2.0 — run the full gateway as a single Rust binary, free, with community support. The managed cloud adds the hosted control plane, dashboard, budgets, RBAC, and SLAs on top. Enterprise can also run the managed stack inside your own cloud / VPC.
Which LLM providers are supported?+
More than 100 providers through one OpenAI-compatible API, including 20+ popular integrations (OpenAI, Anthropic, Gemini, DeepSeek, Groq, Mistral, Cohere, Qwen, Together, Fireworks, and more). Cloud-hosted providers — AWS Bedrock, Azure OpenAI, and GCP Vertex AI — are available on Enterprise.
How do SSO, audit logs, and compliance work?+
Organization management, SSO (SAML / OIDC), audit logs, and SOC 2 Type II / ISO 27001 / GDPR / HIPAA are part of Enterprise. Talk to sales to scope your requirements and deployment model.
Where is my data stored, and how are provider keys protected?+
On AISIX Cloud, provider keys are envelope-encrypted at rest and decrypted only at request time; each data plane is scoped to its own environment keyspace. On self-host, all data and keys stay entirely within your own infrastructure.
Does the gateway add latency?+
The data plane is a native Rust proxy with sub-millisecond overhead, so the gateway is negligible next to LLM inference time.