Why Desktop AI Agents Need AI Gateways

Anthropic's launch of Claude Cowork this week sent shockwaves through the tech community. The desktop AI agent—which can organize files, synthesize documents, and generate spreadsheets autonomously—was built in approximately ten days using Claude Code itself. The recursive development story captured headlines, but a more fundamental question went largely unaddressed: who's managing the infrastructure behind all those AI agent API calls?

Cowork operates on the same Claude Agent SDK that powers Claude Code, which reached 115,000 developers and processed 195 million lines of code weekly by mid-2025. Each of those interactions represents API calls to Anthropic's Claude models. Multiply that by the multi-agent coordination Cowork uses for complex tasks, and you have a recipe for infrastructure chaos—unless you have the right control plane.

That control plane is an AI Gateway. And as desktop AI agents proliferate, it's becoming the most critical piece of infrastructure that most organizations don't have.

The API Call Explosion Nobody's Talking About

When a user asks Cowork to "organize my project files and create a summary document," here's what actually happens behind the scenes:

flowchart TD
    U["User Request<br>Organize files & create summary"] --> O["Cowork Orchestrator"]

    O --> P1["Planning<br>API Call #1"]
    O --> P2["File Listing & Analysis<br>API Call #2"]

    O --> SA1["Sub-Agent: File Categorization<br>API Calls #3-5"]
    O --> SA2["Sub-Agent: Content Analysis<br>API Calls #6-10"]
    O --> SA3["Sub-Agent: Summary Generation<br>API Calls #11-15"]
    O --> SA4["Sub-Agent: Formatting<br>API Calls #16-20"]

    SA1 --> AGG["Result Aggregation"]
    SA2 --> AGG
    SA3 --> AGG
    SA4 --> AGG

    AGG --> F1["Merge Outputs<br>API Call #21"]
    F1 --> F2["Generate Final Document<br>API Call #22"]
    F2 --> F3["Verification & Completion<br>API Call #23"]

    classDef agent fill:#eef,stroke:#555;
    class SA1,SA2,SA3,SA4 agent;

For complex assignments, Cowork coordinates multiple sub-agents working in parallel, each starting with fresh context. This architecture enables larger tasks but multiplies API consumption dramatically.

The math is sobering: If an enterprise deploys Cowork to 100 knowledge workers, each running 20 tasks per day, that's 2,000 tasks × 23 API calls = 46,000 API calls daily. At average token consumption, that's potentially $2,000-$6,000 per day in API costs—with zero visibility into what's driving those costs.

Three Infrastructure Challenges Desktop AI Agents Create

Challenge 1: Cost Visibility and Control

Anthropic's subscription model ($1,200-$2,400/year per user) covers the Claude API costs for Cowork. But enterprises using Claude API directly, or building their own agents with the Claude Agent SDK, face a different reality: unbounded API costs with no built-in controls.

Without an AI Gateway, you have no way to:

Set per-user or per-department token budgets
Track which tasks consume the most tokens
Implement cost allocation for chargeback
Prevent runaway agents from draining budgets

Challenge 2: Security Boundaries

Anthropic warns that prompt injection attacks remain a threat, with a 1% attack success rate even for Claude Opus 4.5. The company advises users to "watch for suspicious actions indicating potential attacks"—guidance that Forbes notes "seems unrealistic for non-technical audiences."

The security model trades convenience for risk. An agent with folder access can:

Delete files after misreading instructions
Exfiltrate data if manipulated by prompt injection
Perform destructive actions if prompts aren't specific enough

An AI Gateway provides a security boundary that individual agents cannot:

Inspect prompts for injection patterns before they reach the LLM
Block sensitive data from leaving the organization
Audit all agent actions for compliance review
Enforce content moderation policies

Challenge 3: Reliability and Failover

What happens when Anthropic's API has an outage? With Cowork, your knowledge workers stop working. With a properly configured AI Gateway, traffic automatically fails over to alternative providers.

Cowork's architecture—multiple sub-agents with fresh context—is particularly vulnerable to partial failures. If one sub-agent's API call fails mid-task, the entire task may need to restart. An AI Gateway with retry logic and health checks can mask transient failures and maintain task continuity.

sequenceDiagram
    participant Agent as Desktop Agent
    participant GW as AI Gateway
    participant A as Anthropic API
    participant O as OpenAI API

    Agent->>GW: LLM Request
    GW->>A: Forward Request

    A-->>GW: 5xx Error / Timeout
    GW-->>GW: Health Check Fails
    GW->>O: Retry via Fallback Provider
    O-->>GW: Successful Response
    GW-->>Agent: Final Result

Building an AI Gateway for Desktop Agent Traffic: Step-by-Step Implementation

Prerequisites

Install Docker to be used in the quickstart script to create containerized etcd and APISIX.
Install cURL to be used in the quickstart script and to send requests to APISIX for verification.

Step 1: Set Up Your Environment

For this tutorial, you'll need Docker, cURL, and an OpenAI API key.

First, start APISIX in Docker with the quickstart script:

curl -sL "https://run.api7.ai/apisix/quickstart" | sh

You should see the following message once APISIX is ready:

✔ APISIX is ready!

Step 2: Configure Consumer-Based Rate Limiting

Create consumers for each department with token budgets:

# Create Engineering department consumer
curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "username": "engineering-dept",
    "plugins": {
      "key-auth": {
        "key": "eng-api-key-2026"
      },
      "ai-rate-limiting": {
        "limit": 500000,
        "time_window": 86400,
        "limit_strategy": "total_tokens",
        "show_limit_quota_header": true,
        "rejected_code": 429,
        "rejected_msg": "Engineering department daily token budget exceeded"
      }
    }
  }'

# Create Marketing department consumer with lower budget
curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "username": "marketing-dept",
    "plugins": {
      "key-auth": {
        "key": "mkt-api-key-2026"
      },
      "ai-rate-limiting": {
        "limit": 500000,
        "time_window": 86400,
        "limit_strategy": "total_tokens",
        "show_limit_quota_header": true,
        "rejected_code": 429,
        "rejected_msg": "Marketing department daily token budget exceeded"
      }
    }
  }'

Step 3: Configure Prompt Injection Protection

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "desktop-agent-route",
    "uri": "/v1/messages",
    "methods": ["POST"],
    "plugins": {
      "key-auth": {},
      "ai-prompt-guard": {
        "match_patterns": [
          "ignore previous instructions",
          "disregard all prior",
          "forget everything above",
          "system prompt:",
          "you are now",
          "new instructions:"
        ],
        "action": "block",
        "rejected_code": 400,
        "rejected_msg": "Request blocked: potential prompt injection detected"
      },
      "ai-proxy-multi": {
        "instances": [
          {
            "name": "anthropic-primary",
            "provider": "anthropic",
            "priority": 10,
            "weight": 1,
            "auth": {
              "header": {
                "x-api-key": "'"$ANTHROPIC_API_KEY"'"
              }
            },
            "options": {
              "model": "claude-sonnet-4-20250514"
            }
          },
          {
            "name": "openai-fallback",
            "provider": "openai",
            "priority": 5,
            "weight": 1,
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4o"
            }
          }
        ],
        "fallback_strategy": ["http_429", "http_5xx"],
        "logging": {
          "summaries": true
        }
      }
    }
  }'

Step 4: Configure Desktop Agents to Use Gateway

For Claude Code, set the API endpoint in your environment:

export ANTHROPIC_BASE_URL="http://your-gateway:9080/v1"
export ANTHROPIC_API_KEY="eng-api-key-2026"  # Consumer key, not Anthropic key

For custom agents using the Claude Agent SDK:

import anthropic

# Point to your AI Gateway instead of Anthropic directly
client = anthropic.Anthropic(
    api_key="eng-api-key-2026",  # Your gateway consumer key
    base_url="http://your-gateway:9080/v1"
)

# All API calls now flow through the gateway
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze this document..."}
    ]
)

Step 5: Monitor Token Usage

Query Prometheus for token consumption by department:

# Total tokens consumed by department today
sum(increase(apisix_ai_tokens_total{consumer=~".*-dept"}[24h])) by (consumer)

# Token consumption rate (tokens per minute)
rate(apisix_ai_tokens_total[5m]) by (consumer)

# Cost estimation (assuming $3/1M input tokens, $15/1M output tokens)
(sum(increase(apisix_ai_input_tokens_total[24h])) * 3 / 1000000) +
(sum(increase(apisix_ai_output_tokens_total[24h])) * 15 / 1000000)

Results: Before and After AI Gateway

Metric	Before (Direct API)	After (AI Gateway)	Improvement
Cost visibility	None	Per-user, per-department	100%
Budget enforcement	Manual monitoring	Automatic cutoff	Automated
Prompt injection protection	None	Pattern-based blocking	New capability
Provider failover	Manual intervention	Automatic (<5s)	Instant
Audit trail	API logs only	Full request/response	Complete
Cost per 1M tokens	$15 (Anthropic only)	$10 (blended)	33% savings

Key Takeaways

Claude Cowork represents a significant step toward autonomous AI agents in the enterprise. But the infrastructure challenges it creates—cost control, security, reliability—are not unique to Cowork. Every desktop AI agent, every coding assistant, every autonomous workflow faces the same issues.

Three principles for managing desktop AI agent infrastructure:

Centralize API traffic through a gateway. Don't let each agent make direct API calls. Route everything through a control plane where you can enforce policies.
Implement token budgets, not just rate limits. Traditional rate limiting (requests per second) doesn't work for LLM traffic. You need token-based quotas that reflect actual cost.
Assume prompt injection will happen. Anthropic's 1% attack success rate sounds low until you realize that's 1 in 100 requests potentially compromised. Defense in depth is essential.

Conclusion: The Agent Era Needs New Infrastructure

Anthropic's achievement in building Cowork in 10 days with AI is remarkable. But it also highlights a growing gap: AI capabilities are advancing faster than the infrastructure to manage them.

Desktop AI agents are just the beginning. As these tools proliferate—Claude Cowork, OpenAI Operator, Google's Project Mariner—enterprises will face an explosion of AI-driven API traffic. Those without proper infrastructure will face runaway costs, security incidents, and reliability failures.

The organizations that build AI Gateway infrastructure now will be positioned to adopt these tools safely. Those that don't will be left choosing between innovation and control.