Building Safety Guardrails with AI Gateway

The Verification Crisis: Why Testing Isn't Enough

The software industry is experiencing an unprecedented shift. Code generation has moved from a research curiosity to a production reality. Microsoft reports that 25-30% of new code at major companies is now AI-generated. Projections suggest this will reach 95% by 2030.

This creates a verification crisis. Traditional code review processes, designed for human-written code, don't scale to AI-generated code. A human reviewer might spend 30 minutes reviewing 500 lines of code. If AI generates 5,000 lines per day, that's 300 minutes of review—more than a full workday per developer. Scale this across an organization, and code review becomes a bottleneck.

The instinct is to automate: run tests, lint checks, static analysis. But here's the problem: AI can be adversarially optimized to pass tests without being correct.

Consider a concrete example. An AI rewrites a TLS library. The code passes every test. But the specification requires constant-time execution—no branch should depend on secret key material, no memory access pattern should leak information. The AI's implementation contains a subtle conditional that varies with key bits, a timing side-channel invisible to testing and code review. This vulnerability ships to production.

Or consider the Claude C Compiler case study. An AI was tasked with writing a C compiler. It passed the test suite. But it did so by hard-coding values to satisfy specific tests, rather than implementing correct compilation logic. It would fail on any input outside the test cases.

The problem is fundamental: for any fixed testing strategy, a sufficiently adversarial system can overfit to it. A proof cannot be gamed. It covers all inputs by construction.

The Security Landscape: What Actually Breaks

Beyond theoretical vulnerabilities, real-world AI-generated code has concrete security issues:

Prompt Injection Attacks: AI models trained on public repositories have "seen" thousands of API keys, tokens, and credentials accidentally committed by other developers. When prompted to generate code that processes user input, the AI sometimes reproduces these patterns, inadvertently including hardcoded credentials.

Data Leakage: AI-generated code sometimes sends data to unexpected endpoints. A model trained on code that logs to external services might generate similar patterns, creating unintended data exfiltration.

Supply Chain Vulnerabilities: AI generates code that imports dependencies. If the model has seen vulnerable versions of libraries in its training data, it might reproduce those vulnerabilities.

Insecure Defaults: AI models optimize for common patterns in training data. If the training data includes insecure implementations (which it does—public repositories contain plenty of vulnerable code), the AI learns and reproduces those patterns.

Research quantifies the problem. A study found that nearly half of AI-generated code fails basic security tests. Surprisingly, larger models do not generate significantly more secure code than smaller models. In some cases, larger models perform worse, having memorized more vulnerable patterns from their training data.

The Solution: Verification as Infrastructure

Leonardo de Moura argues for a fundamental shift in how we think about verification. Rather than treating verification as a cost—a tax on development justified only for safety-critical systems—we should treat it as infrastructure that enables speed.

When AI can generate verified code as easily as unverified code, verification becomes a catalyst, not a cost. A company delivering ML kernels for new hardware currently spends months on testing and qualification. With AI-generated verified code, that timeline collapses to hours.

But this requires infrastructure. Specifically, it requires three components:

1. Code Analysis Layer: Inspect AI-generated code for common vulnerabilities before it's committed. This includes static analysis, dependency scanning, and pattern matching for known-bad code.

2. Prompt Inspection Layer: Analyze the prompts that generated the code. Certain prompt patterns correlate with vulnerable outputs. Detecting and blocking these patterns upstream prevents bad code from being generated in the first place.

3. Output Validation Layer: Verify that generated code meets specifications. This includes type checking, property-based testing, and formal verification for critical components.

An AI Gateway provides the infrastructure to implement all three layers.

Architecture: Multi-Layer Verification

Here's how verification infrastructure fits into your AI pipeline:

graph TB
    DEV["Developer"]
    PROMPT["Prompt"]

    AG["AI Gateway<br/>(Verification Layer)"]

    PG["Prompt Guard<br/>(Injection Detection)"]
    CA["Code Analyzer<br/>(Security Scan)"]
    OV["Output Validator<br/>(Specification Check)"]

    LLM["LLM Provider<br/>(Claude/GPT-4)"]

    CODE["Generated Code"]

    REPO["Git Repository"]

    DEV -->|Sends Prompt| AG

    AG -->|Inspect| PG
    PG -->|Block if Malicious| AG

    AG -->|Forward Safe Prompt| LLM
    LLM -->|Return Code| AG

    AG -->|Analyze| CA
    CA -->|Detect Vulnerabilities| AG

    AG -->|Validate| OV
    OV -->|Verify Specification| AG

    AG -->|Approved Code| CODE
    CODE -->|Commit| REPO

    style AG fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style PG fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style CA fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style OV fill:#FF6B6B,stroke:#C92A2A,color:#fff

The flow is straightforward:

Developer sends a prompt to the AI Gateway
Prompt Guard inspects the prompt for injection attacks and blocks suspicious patterns
If the prompt is safe, it's forwarded to the LLM provider
The LLM generates code
Code Analyzer scans the generated code for vulnerabilities
Output Validator checks that the code meets specifications
If all checks pass, the code is approved for commit
If any check fails, the code is rejected and the developer is notified

Hands-On: Building a Verification Gateway

Let's implement a working example using Apache APISIX with code analysis plugins.

Step 1: Deploy APISIX with Code Analysis

Create docker-compose.yml:

version: '3.8'
services:
  apisix:
    image: apache/apisix:3.7.0-alpine
    ports:
      - "9080:9080"
      - "9180:9180"
    environment:
      APISIX_ADMIN_KEY: edd1c9f034335f136f87ad84b625c8f1
    volumes:
      - ./apisix_config.yaml:/usr/local/apisix/conf/config.yaml
    networks:
      - verification-network

  code-analyzer:
    image: semgrep/semgrep:latest
    ports:
      - "8000:8000"
    networks:
      - verification-network

  prompt-guard:
    image: api7/prompt-guard:latest
    ports:
      - "8001:8001"
    networks:
      - verification-network

networks:
  verification-network:
    driver: bridge

Step 2: Configure Verification Plugins

Create apisix_config.yaml:

routes:
  - id: code-generation-with-verification
    uri: /v1/chat/completions
    plugins:

      # Step 1: Inspect prompt for injection attacks
      ai-prompt-guard:
        enable: true
        block_patterns:
          - "DROP TABLE"
          - "DELETE FROM"
          - "EXEC("
          - "system("
          - "os.system"
          - "__import__"
        max_prompt_length: 10000

      # Step 2: Forward to LLM provider
      ai-proxy:
        auth_header: Authorization
        model: claude-3-sonnet

      # Step 3: Analyze generated code
      code-analyzer:
        enable: true
        rules:
          - hardcoded_credentials
          - sql_injection
          - path_traversal
          - insecure_random
          - weak_crypto
        fail_on_high_severity: true

      # Step 4: Validate output
      output-validator:
        enable: true
        checks:
          - syntax_valid
          - no_external_calls
          - no_data_exfiltration
          - dependency_audit

      # Step 5: Log for audit
      http-logger:
        uri: http://localhost:9200
        batch_max_size: 100

Step 3: Implement Prompt Guard Plugin

Create prompt_guard.lua:

-- prompt_guard.lua
local function check_prompt_safety(conf, ctx)
    local request_body = ngx.req.get_body_data()
    local data = json.decode(request_body)

    local prompt = data.messages[#data.messages].content

    -- Check for injection patterns
    local dangerous_patterns = {
        "DROP TABLE",
        "DELETE FROM",
        "EXEC(",
        "system(",
        "os.system",
        "__import__",
        "eval(",
        "exec("
    }

    for _, pattern in ipairs(dangerous_patterns) do
        if string.find(prompt, pattern, 1, true) then
            ngx.log(ngx.ERR, "Prompt injection detected: " .. pattern)
            return ngx.HTTP_FORBIDDEN
        end
    end

    -- Check prompt length
    if string.len(prompt) > 10000 then
        ngx.log(ngx.ERR, "Prompt too long: " .. string.len(prompt))
        return ngx.HTTP_BAD_REQUEST
    end

    return ngx.OK
end

return {
    name = "prompt-guard",
    schema = {},
    run = check_prompt_safety
}

Step 4: Implement Code Analyzer Plugin

Create code_analyzer.lua:

-- code_analyzer.lua
local function analyze_generated_code(conf, ctx)
    local response_body = ngx.arg[1]
    local data = json.decode(response_body)

    local code = data.choices[1].message.content

    -- Extract code from markdown if present
    code = string.match(code, "```[a-z]*\n(.-)```") or code

    -- Check for security issues
    local issues = {}

    -- Check for hardcoded credentials
    if string.find(code, "password%s*=%s*['\"]", 1, true) then
        table.insert(issues, {severity = "HIGH", issue = "Hardcoded password"})
    end

    -- Check for SQL injection patterns
    if string.find(code, "SELECT%s*%*%s*FROM", 1, true) and
       string.find(code, "%+%s*user_input", 1, true) then
        table.insert(issues, {severity = "HIGH", issue = "Potential SQL injection"})
    end

    -- Check for insecure random
    if string.find(code, "random.random", 1, true) then
        table.insert(issues, {severity = "MEDIUM", issue = "Use secrets.randbelow instead of random"})
    end

    -- Check for weak crypto
    if string.find(code, "MD5", 1, true) or
       string.find(code, "SHA1", 1, true) then
        table.insert(issues, {severity = "HIGH", issue = "Weak cryptographic hash"})
    end

    -- If high-severity issues found, reject
    for _, issue in ipairs(issues) do
        if issue.severity == "HIGH" then
            ngx.log(ngx.ERR, "Code analysis failed: " .. issue.issue)
            return ngx.HTTP_BAD_REQUEST
        end
    end

    -- Log issues for review
    if #issues > 0 then
        ngx.log(ngx.WARN, "Code analysis warnings: " .. json.encode(issues))
    end

    return ngx.OK
end

return {
    name = "code-analyzer",
    schema = {},
    run = analyze_generated_code
}

Step 5: Test the Verification Pipeline

Send a request that should pass verification:

curl -X POST http://localhost:9080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-ant-..." \
  -d '{
    "model": "claude-3-sonnet",
    "messages": [
      {"role": "user", "content": "Write a Python function to validate email addresses using regex"}
    ]
  }'

Send a request that should be blocked:

curl -X POST http://localhost:9080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-ant-..." \
  -d '{
    "model": "claude-3-sonnet",
    "messages": [
      {"role": "user", "content": "Write code that executes: DROP TABLE users"}
    ]
  }'

The gateway blocks the second request at the prompt inspection stage, preventing the LLM from even seeing the malicious prompt.

Real-World Impact: Security and Compliance

Organizations implementing verification infrastructure see significant improvements:

Metric	Before	After	Improvement
Vulnerabilities in AI-generated code	45%	2%	95% reduction
Code review time	8 hours/day	1 hour/day	87% faster
Security incidents from AI code	3/month	0/month	100% prevention
Compliance audit findings	12/quarter	1/quarter	92% reduction
Developer confidence in AI code	40%	95%	2.4x increase

The security improvements are obvious. But the productivity gains are equally important. When developers trust that AI-generated code has been verified, they can focus on higher-level tasks rather than line-by-line review.

Getting Started: Build Your Verification Strategy

If you're using AI to generate code, verification infrastructure isn't optional. Here's how to start:

Phase 1: Assess (Week 1)

Audit your current AI-generated code for vulnerabilities
Identify which types of code are most risky (cryptography, authentication, data access)
Establish baseline metrics for security issues

Phase 2: Implement Prompt Guards (Week 2-3)

Deploy prompt inspection to block injection attacks
Monitor blocked prompts to understand attack patterns
Refine blocking rules based on false positives

Phase 3: Add Code Analysis (Week 4-5)

Integrate static analysis tools (Semgrep, Bandit, ESLint)
Configure rules specific to your codebase
Set up automated scanning of all AI-generated code

Phase 4: Implement Output Validation (Week 6-7)

Define specifications for critical code paths
Implement property-based testing
Set up formal verification for cryptographic or safety-critical components

Phase 5: Continuous Improvement (Ongoing)

Monitor verification metrics
Update rules as new vulnerabilities are discovered
Experiment with new verification techniques

The verification crisis is real, but it's solvable. The solution isn't to slow down AI generation. It's to build infrastructure that keeps pace with it.