What is an AI gateway?

An AI gateway is a specialized gateway that sits in front of LLM providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure OpenAI, and others) and gives teams one place to route, govern, secure, cache, and observe LLM and AI-agent traffic. On top of the routing and reliability features of a traditional API gateway, it adds AI-specific controls such as multi-model and semantic routing, token-based rate limits and budgets, prompt and response guardrails, and per-model cost tracking.

What is the difference between an AI gateway and an API gateway?

An API gateway manages traffic to general backend services — routing, authentication, rate limiting, and protocol translation by request count. An AI gateway applies the same single-entry-point pattern to LLM and AI-agent traffic, but reasons in AI-native terms: it rate-limits and budgets by tokens rather than requests, routes across multiple model providers, caches semantically similar prompts, and inspects prompts and completions for safety. Some platforms — including AISIX, built by the creators of Apache APISIX — deliver AI-gateway capabilities on the same data plane as their API gateway.

Which AI gateways are open source, and which are managed services?

AISIX (Apache-2.0), LiteLLM (MIT), the Portkey gateway (MIT), and Envoy AI Gateway (Apache-2.0) are open source and self-hosted. Vercel AI Gateway and Cloudflare AI Gateway are proprietary, fully managed services that run on the vendor’s network. TrueFoundry is proprietary but can be self-hosted or run as SaaS, and Portkey pairs its open-source gateway with a hosted control plane. Choose open-source/self-hosted when you need control and data residency; choose managed when you want zero-ops onboarding.

Should I use an open-source or a hosted (SaaS) AI gateway?

Open-source AI gateways such as AISIX (Rust, Apache-2.0), LiteLLM (Python, MIT), and Envoy AI Gateway (Go, Apache-2.0) run inside your own environment, giving you full control over deployment, customization, and data residency — the model traffic never has to leave your network. Managed services like Vercel and Cloudflare trade some of that control for zero-ops onboarding and edge caching, but route traffic through the vendor. Hybrid platforms such as TrueFoundry and Portkey let you start hosted and self-host the data plane later.

Which AI gateway is best for self-hosting and data residency?

If keeping LLM traffic inside your own VPC is a hard requirement, prioritize gateways that run entirely on infrastructure you control: AISIX, LiteLLM, Envoy AI Gateway, and self-hosted TrueFoundry or Portkey. AISIX deploys its data plane inside the customer's environment with outbound-only connectivity, so prompts and responses stay in-network. Fully managed services (Vercel, Cloudflare) route through the vendor's network and are not suitable when data must not leave your boundary. Always confirm where the control plane, logs, and cache live, not just the proxy.

What should I look for when comparing AI gateways?

Key evaluation criteria include: (1) Provider and model coverage — how many LLM providers and models are supported out of the box; (2) Routing — multi-model load balancing, fallback, and semantic or intent-based routing; (3) Cost controls — token-based rate limits, budgets, and per-model cost tracking; (4) Guardrails — prompt and response filtering, PII handling, and policy enforcement; (5) Caching — exact and semantic response caching; (6) Governance — API keys, teams, RBAC, SSO/SCIM, and audit logging; (7) Deployment model — self-hosted vs hosted, and whether traffic stays in your network; (8) Observability — token usage, latency, and tracing across providers.

New

Announcing AISIX: The AI-Native AI Gateway for LLMs and AI AgentsLearn More

Learn More

AI Gateway Comparison 2026:
Choose the Right AI Gateway for LLM Traffic

Compare AISIX, LiteLLM, Portkey, Envoy, TrueFoundry, Vercel, and Cloudflare AI gateways across provider coverage, multi-model routing, guardrails, token budgets, caching, and governance — and find the gateway that fits how your team runs LLM and AI-agent traffic.

Introduction

At a Glance

Gateway Profiles

Our Take

In-Depth Comparisons

FAQ

How to Choose an AI Gateway in 2026

An AI gateway sits in front of your LLM providers and gives teams one place to route, govern, secure, cache, and observe LLM and AI-agent traffic. It adds AI-native controls — multi-model routing, token-based budgets, prompt and response guardrails, and per-model cost tracking — on top of the routing and reliability of a traditional API gateway.

This page consolidates how the leading AI gateways compare, so you can evaluate them side by side before reading any single head-to-head. The fastest way to narrow the field is to decide how you want to run the gateway.

Three Ways Teams Run AI Gateways

1. Open-source, self-hosted AI gateways
AISIX, LiteLLM, the Portkey gateway, and Envoy AI Gateway run on infrastructure you control. You get full control over deployment, customization, and data residency — LLM traffic can stay entirely inside your network — in exchange for operating the gateway yourself.
2. Managed / SaaS AI gateways
Vercel AI Gateway and Cloudflare AI Gateway are fully managed: one endpoint reaches many models with edge caching and built-in analytics. Onboarding is fast and there is nothing to operate, but traffic routes through the vendor's network rather than your own.
3. Hybrid platforms you can self-host
AISIX, TrueFoundry, and Portkey pair a governance control plane with a data plane you can run yourself — start with a managed experience, then self-host in your VPC for compliance or data residency. AISIX is the open-source option here, keeping all model traffic in your network while still offering a managed control plane.

AI Gateways at a Glance

Last updated: July 2026. Competitor facts are sourced from official documentation and repositories. AISIX capabilities that are not yet shipped are marked Roadmap rather than listed as available.

✓ supported · ✗ not available · Ent. enterprise / paid tier · — not disclosed

Capability	AISIX	LiteLLM	Portkey	Envoy	TrueFoundry	Vercel	Cloudflare
Foundation
License	Apache-2.0	MIT	MIT	Apache-2.0	Proprietary	Proprietary	Proprietary
Built with	Rust	Python	TypeScript	Go	TypeScript	—	—
Hosting model	Self-host or SaaS	Self-host	Self-host + hosted	Self-host	Self-host or SaaS	SaaS	SaaS
Can run in your network	✓	✓	✓	✓	✓	✗	✗
Routing & traffic
Provider / model coverage	Major providers	100+ providers	1,600+ models	16+ providers	1,600+ models	Hundreds of models	20+ providers
Multi-model routing & fallback	✓	✓	✓	✓	✓	✓	✓
Semantic caching	Roadmap	✓	Ent.	✗	✓	✗	✗
MCP gateway	✓	✓	✓	✓	✓	✗	✗
Governance & cost
Token budgets & cost tracking	✓	✓	✓	✓	✓	✓	✓
Prompt & response guardrails	✓	✓	✓	✗	✓	✗	✓
SSO / SCIM	Roadmap	Ent.	Ent.	✗	✓	Ent.	Ent.

AISIX

Foundation

License

Apache-2.0

Built with

Rust

Hosting model

Self-host or SaaS

Can run in your network

✓

Routing & traffic

Provider / model coverage

Major providers

Multi-model routing & fallback

✓

Semantic caching

Roadmap

MCP gateway

✓

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✓

SSO / SCIM

Roadmap

LiteLLM

Foundation

License

MIT

Built with

Python

Hosting model

Self-host

Can run in your network

✓

Routing & traffic

Provider / model coverage

100+ providers

Multi-model routing & fallback

✓

Semantic caching

✓

MCP gateway

✓

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✓

SSO / SCIM

Ent.

Portkey

Foundation

License

MIT

Built with

TypeScript

Hosting model

Self-host + hosted

Can run in your network

✓

Routing & traffic

Provider / model coverage

1,600+ models

Multi-model routing & fallback

✓

Semantic caching

Ent.

MCP gateway

✓

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✓

SSO / SCIM

Ent.

Envoy

Foundation

License

Apache-2.0

Built with

Hosting model

Self-host

Can run in your network

✓

Routing & traffic

Provider / model coverage

16+ providers

Multi-model routing & fallback

✓

Semantic caching

✗

MCP gateway

✓

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✗

SSO / SCIM

✗

TrueFoundry

Foundation

License

Proprietary

Built with

TypeScript

Hosting model

Self-host or SaaS

Can run in your network

✓

Routing & traffic

Provider / model coverage

1,600+ models

Multi-model routing & fallback

✓

Semantic caching

✓

MCP gateway

✓

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✓

SSO / SCIM

✓

Vercel

Foundation

License

Proprietary

Built with

—

Hosting model

SaaS

Can run in your network

✗

Routing & traffic

Provider / model coverage

Hundreds of models

Multi-model routing & fallback

✓

Semantic caching

✗

MCP gateway

✗

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✗

SSO / SCIM

Ent.

Cloudflare

Foundation

License

Proprietary

Built with

—

Hosting model

SaaS

Can run in your network

✗

Routing & traffic

Provider / model coverage

20+ providers

Multi-model routing & fallback

✓

Semantic caching

✗

MCP gateway

✗

Governance & cost

Token budgets & cost tracking

✓

Prompt & response guardrails

✓

SSO / SCIM

Ent.

Gateway Profiles

AISIX

Open-source AI gateway (Apache-2.0) in Rust from the team behind Apache APISIX — a hybrid platform you can self-host or run with a managed control plane.

AISIX is a hybrid: its open-source data plane runs inside your own VPC over an outbound-only connection — so prompts and responses never leave your network — while the control plane can be self-managed or used as a hosted service, letting you start quickly without giving up data residency. It governs LLM and AI-agent traffic on the same high-performance data plane as its API gateway: multi-model routing and load balancing (including cost-, latency- and load-aware strategies and sticky canary releases), token-based budgets and cost tracking, prompt and response guardrails, and an MCP gateway that runs MCP tools through the same keys, rate limits, and guardrails as LLM traffic. It also offers ensemble models that fan a single request out to a panel of models and synthesize the result — something most AI gateways do not offer — and publishes a performance baseline of ~28,300 req/s at saturation on 4 vCPUs, with sub-millisecond p50 gateway overhead at low-to-moderate load. SSO/SCIM and semantic caching are on the active roadmap.

Best for: Teams that want LLM traffic and data to stay inside their own infrastructure, a unified API + AI gateway on one data plane, and the flexibility to self-host or use a managed control plane.

LiteLLM

MIT-licensed, Python-based proxy that calls 100+ LLM providers behind an OpenAI-compatible API.

LiteLLM focuses on breadth and portability. The open-source proxy supports semantic caching, per-key/team spend budgets, and an MCP gateway, and self-hosts via Docker, Helm, or Terraform. SSO is free for up to 5 users; larger SSO deployments and SCIM require an enterprise license.

Best for: Teams that want the widest provider compatibility and a lightweight, fully open-source self-hosted proxy.

Portkey

Open-source gateway (MIT, TypeScript) paired with a hosted control plane for governance and analytics.

Portkey routes to 1,600+ models and ships 20+ native guardrails (plus partner integrations) and an MCP gateway. Governance, analytics, and configuration run through a Portkey-hosted control plane, with a self-hosted data-plane option for enterprises — so LLM traffic only stays in your network when you run that data plane yourself. Semantic caching, SSO, and SCIM sit on enterprise / control-plane tiers.

Best for: Teams that want a managed governance console and don't mind a hosted control plane (with a self-hosted data plane available for enterprises).

Envoy AI Gateway

Apache-2.0, Go-based AI gateway built on Envoy Proxy and Envoy Gateway, Kubernetes-native and self-hosted.

Envoy AI Gateway brings LLM traffic onto the Envoy data plane: model-name virtualization, automatic provider fallback, token-based usage rate limiting and quota budgets, a GA MCP gateway (v1.0), and GenAI metrics via OpenTelemetry. As pure infrastructure it has no native guardrail engine or semantic cache, and no built-in identity/SSO plane — those are expected to live in surrounding tooling.

Best for: Platform teams already standardized on Envoy and Kubernetes who want a CNCF-aligned, infrastructure-grade AI data plane.

TrueFoundry AI Gateway

Proprietary enterprise gateway (TypeScript/Hono) deployable as SaaS or fully self-hosted in your own VPC.

TrueFoundry can run on its managed cloud or be self-hosted in a VPC, on-prem, or air-gapped — keeping LLM traffic in your network. It offers latency-based routing, weighted load balancing and fallback, semantic caching, an MCP gateway with OAuth/RBAC, token and cost quotas, PII/toxicity guardrails, and SSO (OIDC/SAML) with SCIM. The vendor claims sub-3ms internal latency.

Best for: Enterprises that want a full governance plane they can also self-host for compliance or data-residency reasons.

Vercel AI Gateway

Proprietary managed service that exposes hundreds of models through a single Vercel-hosted endpoint.

Vercel AI Gateway is a hosted SaaS: one key reaches hundreds of models across ~45 providers, with provider ordering, fallback, BYOK (no token markup), and built-in spend/budget observability. Requests route through Vercel rather than your network; it offers prompt caching only (no semantic cache), no native guardrails, and SSO/SCIM via Vercel Enterprise. MCP is a Vercel platform feature, not a gateway capability.

Best for: Teams building on Vercel that want zero-ops model access and spend visibility without running infrastructure.

Cloudflare AI Gateway

Proprietary managed service that proxies LLM traffic through Cloudflare's edge network.

Cloudflare AI Gateway fronts 20+ providers through a universal, OpenAI-compatible endpoint with retries and fallback, exact-match response caching, spend limits and analytics, and Cloudflare Guardrails for prompt/response moderation. Traffic runs through Cloudflare's edge rather than your own network; semantic caching is not yet available, and MCP is handled by separate Cloudflare products. SSO is free; SCIM is enterprise-only.

Best for: Teams already on Cloudflare that want a fast, edge-cached gateway with built-in content moderation.

Our Take

If your priority is keeping LLM traffic and data inside your own network, AISIX is the most natural fit. It is open source (Apache-2.0) from the creators of Apache APISIX, runs its data plane in your own VPC, and governs LLM and AI-agent traffic on the same data plane as your API gateway — plus ensemble models that few gateways offer. As a hybrid platform you can self-host it outright or pair it with a managed control plane, so you keep data residency without taking on all the operations yourself.

The other gateways fit different priorities: choose LiteLLM when raw provider breadth matters most, Vercel or Cloudflare when you want a fully managed, zero-ops service and do not need data residency, Envoy AI Gateway when you are standardizing on Envoy and Kubernetes, and TrueFoundry or Portkey when you want a hosted governance console you can later self-host. AISIX’s SSO/SCIM and semantic caching are on the active roadmap; everything else credited to it above — including its MCP gateway — is available today.

In-Depth Comparisons

Read the full, side-by-side breakdowns:

AI Gateway Comparison 2026:Choose the Right AI Gateway for LLM Traffic

How to Choose an AI Gateway in 2026

Three Ways Teams Run AI Gateways

AI Gateways at a Glance

Gateway Profiles

AISIX

LiteLLM

Portkey

Envoy AI Gateway

TrueFoundry AI Gateway

Vercel AI Gateway

Cloudflare AI Gateway

Our Take

In-Depth Comparisons

Frequently Asked Questions

What is the difference between an AI gateway and an API gateway?

Which AI gateways are open source, and which are managed services?

Should I use an open-source or a hosted (SaaS) AI gateway?

Which AI gateway is best for self-hosting and data residency?

What should I look for when comparing AI gateways?

AI Gateway Comparison 2026:
Choose the Right AI Gateway for LLM Traffic