Compare AISIX, LiteLLM, Portkey, Envoy, TrueFoundry, Vercel, and Cloudflare AI gateways across provider coverage, multi-model routing, guardrails, token budgets, caching, and governance — and find the gateway that fits how your team runs LLM and AI-agent traffic.
Introduction
At a Glance
Gateway Profiles
Our Take
In-Depth Comparisons
FAQ
An AI gateway sits in front of your LLM providers and gives teams one place to route, govern, secure, cache, and observe LLM and AI-agent traffic. It adds AI-native controls — multi-model routing, token-based budgets, prompt and response guardrails, and per-model cost tracking — on top of the routing and reliability of a traditional API gateway.
This page consolidates how the leading AI gateways compare, so you can evaluate them side by side before reading any single head-to-head. The fastest way to narrow the field is to decide how you want to run the gateway.
1. Open-source, self-hosted AI gateways
AISIX, LiteLLM, the Portkey gateway, and Envoy AI Gateway run on infrastructure you control. You get full control over deployment, customization, and data residency — LLM traffic can stay entirely inside your network — in exchange for operating the gateway yourself.
2. Managed / SaaS AI gateways
Vercel AI Gateway and Cloudflare AI Gateway are fully managed: one endpoint reaches many models with edge caching and built-in analytics. Onboarding is fast and there is nothing to operate, but traffic routes through the vendor's network rather than your own.
3. Hybrid platforms you can self-host
AISIX, TrueFoundry, and Portkey pair a governance control plane with a data plane you can run yourself — start with a managed experience, then self-host in your VPC for compliance or data residency. AISIX is the open-source option here, keeping all model traffic in your network while still offering a managed control plane.
Last updated: July 2026. Competitor facts are sourced from official documentation and repositories. AISIX capabilities that are not yet shipped are marked Roadmap rather than listed as available.
✓ supported · ✗ not available · Ent. enterprise / paid tier · — not disclosed
| Capability | AISIX | LiteLLM | Portkey | Envoy | TrueFoundry | Vercel | Cloudflare |
|---|---|---|---|---|---|---|---|
| Foundation | |||||||
| License | Apache-2.0 | MIT | MIT | Apache-2.0 | Proprietary | Proprietary | Proprietary |
| Built with | Rust | Python | TypeScript | Go | TypeScript | — | — |
| Hosting model | Self-host or SaaS | Self-host | Self-host + hosted | Self-host | Self-host or SaaS | SaaS | SaaS |
| Can run in your network | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Routing & traffic | |||||||
| Provider / model coverage | Major providers | 100+ providers | 1,600+ models | 16+ providers | 1,600+ models | Hundreds of models | 20+ providers |
| Multi-model routing & fallback | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Semantic caching | Roadmap | ✓ | Ent. | ✗ | ✓ | ✗ | ✗ |
| MCP gateway | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| Governance & cost | |||||||
| Token budgets & cost tracking | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Prompt & response guardrails | ✓ | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ |
| SSO / SCIM | Roadmap | Ent. | Ent. | ✗ | ✓ | Ent. | Ent. |
AISIX
Foundation
License
Apache-2.0
Built with
Rust
Hosting model
Self-host or SaaS
Can run in your network
✓
Routing & traffic
Provider / model coverage
Major providers
Multi-model routing & fallback
✓
Semantic caching
Roadmap
MCP gateway
✓
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✓
SSO / SCIM
Roadmap
LiteLLM
Foundation
License
MIT
Built with
Python
Hosting model
Self-host
Can run in your network
✓
Routing & traffic
Provider / model coverage
100+ providers
Multi-model routing & fallback
✓
Semantic caching
✓
MCP gateway
✓
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✓
SSO / SCIM
Ent.
Portkey
Foundation
License
MIT
Built with
TypeScript
Hosting model
Self-host + hosted
Can run in your network
✓
Routing & traffic
Provider / model coverage
1,600+ models
Multi-model routing & fallback
✓
Semantic caching
Ent.
MCP gateway
✓
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✓
SSO / SCIM
Ent.
Envoy
Foundation
License
Apache-2.0
Built with
Go
Hosting model
Self-host
Can run in your network
✓
Routing & traffic
Provider / model coverage
16+ providers
Multi-model routing & fallback
✓
Semantic caching
✗
MCP gateway
✓
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✗
SSO / SCIM
✗
TrueFoundry
Foundation
License
Proprietary
Built with
TypeScript
Hosting model
Self-host or SaaS
Can run in your network
✓
Routing & traffic
Provider / model coverage
1,600+ models
Multi-model routing & fallback
✓
Semantic caching
✓
MCP gateway
✓
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✓
SSO / SCIM
✓
Vercel
Foundation
License
Proprietary
Built with
—
Hosting model
SaaS
Can run in your network
✗
Routing & traffic
Provider / model coverage
Hundreds of models
Multi-model routing & fallback
✓
Semantic caching
✗
MCP gateway
✗
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✗
SSO / SCIM
Ent.
Cloudflare
Foundation
License
Proprietary
Built with
—
Hosting model
SaaS
Can run in your network
✗
Routing & traffic
Provider / model coverage
20+ providers
Multi-model routing & fallback
✓
Semantic caching
✗
MCP gateway
✗
Governance & cost
Token budgets & cost tracking
✓
Prompt & response guardrails
✓
SSO / SCIM
Ent.
Open-source AI gateway (Apache-2.0) in Rust from the team behind Apache APISIX — a hybrid platform you can self-host or run with a managed control plane.
AISIX is a hybrid: its open-source data plane runs inside your own VPC over an outbound-only connection — so prompts and responses never leave your network — while the control plane can be self-managed or used as a hosted service, letting you start quickly without giving up data residency. It governs LLM and AI-agent traffic on the same high-performance data plane as its API gateway: multi-model routing and load balancing (including cost-, latency- and load-aware strategies and sticky canary releases), token-based budgets and cost tracking, prompt and response guardrails, and an MCP gateway that runs MCP tools through the same keys, rate limits, and guardrails as LLM traffic. It also offers ensemble models that fan a single request out to a panel of models and synthesize the result — something most AI gateways do not offer — and publishes a performance baseline of ~28,300 req/s at saturation on 4 vCPUs, with sub-millisecond p50 gateway overhead at low-to-moderate load. SSO/SCIM and semantic caching are on the active roadmap.
Best for: Teams that want LLM traffic and data to stay inside their own infrastructure, a unified API + AI gateway on one data plane, and the flexibility to self-host or use a managed control plane.
MIT-licensed, Python-based proxy that calls 100+ LLM providers behind an OpenAI-compatible API.
LiteLLM focuses on breadth and portability. The open-source proxy supports semantic caching, per-key/team spend budgets, and an MCP gateway, and self-hosts via Docker, Helm, or Terraform. SSO is free for up to 5 users; larger SSO deployments and SCIM require an enterprise license.
Best for: Teams that want the widest provider compatibility and a lightweight, fully open-source self-hosted proxy.
Open-source gateway (MIT, TypeScript) paired with a hosted control plane for governance and analytics.
Portkey routes to 1,600+ models and ships 20+ native guardrails (plus partner integrations) and an MCP gateway. Governance, analytics, and configuration run through a Portkey-hosted control plane, with a self-hosted data-plane option for enterprises — so LLM traffic only stays in your network when you run that data plane yourself. Semantic caching, SSO, and SCIM sit on enterprise / control-plane tiers.
Best for: Teams that want a managed governance console and don't mind a hosted control plane (with a self-hosted data plane available for enterprises).
Apache-2.0, Go-based AI gateway built on Envoy Proxy and Envoy Gateway, Kubernetes-native and self-hosted.
Envoy AI Gateway brings LLM traffic onto the Envoy data plane: model-name virtualization, automatic provider fallback, token-based usage rate limiting and quota budgets, a GA MCP gateway (v1.0), and GenAI metrics via OpenTelemetry. As pure infrastructure it has no native guardrail engine or semantic cache, and no built-in identity/SSO plane — those are expected to live in surrounding tooling.
Best for: Platform teams already standardized on Envoy and Kubernetes who want a CNCF-aligned, infrastructure-grade AI data plane.
Proprietary enterprise gateway (TypeScript/Hono) deployable as SaaS or fully self-hosted in your own VPC.
TrueFoundry can run on its managed cloud or be self-hosted in a VPC, on-prem, or air-gapped — keeping LLM traffic in your network. It offers latency-based routing, weighted load balancing and fallback, semantic caching, an MCP gateway with OAuth/RBAC, token and cost quotas, PII/toxicity guardrails, and SSO (OIDC/SAML) with SCIM. The vendor claims sub-3ms internal latency.
Best for: Enterprises that want a full governance plane they can also self-host for compliance or data-residency reasons.
Proprietary managed service that exposes hundreds of models through a single Vercel-hosted endpoint.
Vercel AI Gateway is a hosted SaaS: one key reaches hundreds of models across ~45 providers, with provider ordering, fallback, BYOK (no token markup), and built-in spend/budget observability. Requests route through Vercel rather than your network; it offers prompt caching only (no semantic cache), no native guardrails, and SSO/SCIM via Vercel Enterprise. MCP is a Vercel platform feature, not a gateway capability.
Best for: Teams building on Vercel that want zero-ops model access and spend visibility without running infrastructure.
Proprietary managed service that proxies LLM traffic through Cloudflare's edge network.
Cloudflare AI Gateway fronts 20+ providers through a universal, OpenAI-compatible endpoint with retries and fallback, exact-match response caching, spend limits and analytics, and Cloudflare Guardrails for prompt/response moderation. Traffic runs through Cloudflare's edge rather than your own network; semantic caching is not yet available, and MCP is handled by separate Cloudflare products. SSO is free; SCIM is enterprise-only.
Best for: Teams already on Cloudflare that want a fast, edge-cached gateway with built-in content moderation.
If your priority is keeping LLM traffic and data inside your own network, AISIX is the most natural fit. It is open source (Apache-2.0) from the creators of Apache APISIX, runs its data plane in your own VPC, and governs LLM and AI-agent traffic on the same data plane as your API gateway — plus ensemble models that few gateways offer. As a hybrid platform you can self-host it outright or pair it with a managed control plane, so you keep data residency without taking on all the operations yourself.
The other gateways fit different priorities: choose LiteLLM when raw provider breadth matters most, Vercel or Cloudflare when you want a fully managed, zero-ops service and do not need data residency, Envoy AI Gateway when you are standardizing on Envoy and Kubernetes, and TrueFoundry or Portkey when you want a hosted governance console you can later self-host. AISIX’s SSO/SCIM and semantic caching are on the active roadmap; everything else credited to it above — including its MCP gateway — is available today.
Read the full, side-by-side breakdowns: