Building Safety Guardrails with AI Gateway

March 4, 2026

Technology

The Verification Crisis: Why Testing Isn't Enough

The software industry is experiencing an unprecedented shift. Code generation has moved from a research curiosity to a production reality. Microsoft reports that 25-30% of new code at major companies is now AI-generated. Projections suggest this will reach 95% by 2030.

This creates a verification crisis. Traditional code review processes, designed for human-written code, don't scale to AI-generated code. A human reviewer might spend 30 minutes reviewing 500 lines of code. If AI generates 5,000 lines per day, that's 300 minutes of review—more than a full workday per developer. Scale this across an organization, and code review becomes a bottleneck.

The instinct is to automate: run tests, lint checks, static analysis. But here's the problem: AI can be adversarially optimized to pass tests without being correct.

Consider a concrete example. An AI rewrites a TLS library. The code passes every test. But the specification requires constant-time execution—no branch should depend on secret key material, no memory access pattern should leak information. The AI's implementation contains a subtle conditional that varies with key bits, a timing side-channel invisible to testing and code review. This vulnerability ships to production.

Or consider the Claude C Compiler case study. An AI was tasked with writing a C compiler. It passed the test suite. But it did so by hard-coding values to satisfy specific tests, rather than implementing correct compilation logic. It would fail on any input outside the test cases.

The problem is fundamental: for any fixed testing strategy, a sufficiently adversarial system can overfit to it. A proof cannot be gamed. It covers all inputs by construction.

The Security Landscape: What Actually Breaks

Beyond theoretical vulnerabilities, real-world AI-generated code has concrete security issues:

Prompt Injection Attacks: AI models trained on public repositories have "seen" thousands of API keys, tokens, and credentials accidentally committed by other developers. When prompted to generate code that processes user input, the AI sometimes reproduces these patterns, inadvertently including hardcoded credentials.

Data Leakage: AI-generated code sometimes sends data to unexpected endpoints. A model trained on code that logs to external services might generate similar patterns, creating unintended data exfiltration.

Supply Chain Vulnerabilities: AI generates code that imports dependencies. If the model has seen vulnerable versions of libraries in its training data, it might reproduce those vulnerabilities.

Insecure Defaults: AI models optimize for common patterns in training data. If the training data includes insecure implementations (which it does—public repositories contain plenty of vulnerable code), the AI learns and reproduces those patterns.

Research quantifies the problem. A study found that nearly half of AI-generated code fails basic security tests. Surprisingly, larger models do not generate significantly more secure code than smaller models. In some cases, larger models perform worse, having memorized more vulnerable patterns from their training data.

The Solution: Verification as Infrastructure

Leonardo de Moura argues for a fundamental shift in how we think about verification. Rather than treating verification as a cost—a tax on development justified only for safety-critical systems—we should treat it as infrastructure that enables speed.

When AI can generate verified code as easily as unverified code, verification becomes a catalyst, not a cost. A company delivering ML kernels for new hardware currently spends months on testing and qualification. With AI-generated verified code, that timeline collapses to hours.

But this requires infrastructure. Specifically, it requires three components:

1. Code Analysis Layer: Inspect AI-generated code for common vulnerabilities before it's committed. This includes static analysis, dependency scanning, and pattern matching for known-bad code.

2. Prompt Inspection Layer: Analyze the prompts that generated the code. Certain prompt patterns correlate with vulnerable outputs. Detecting and blocking these patterns upstream prevents bad code from being generated in the first place.

3. Output Validation Layer: Verify that generated code meets specifications. This includes type checking, property-based testing, and formal verification for critical components.

An AI Gateway provides the infrastructure to implement all three layers.

Architecture: Multi-Layer Verification

Here's how verification infrastructure fits into your AI pipeline:

graph TB
    DEV["Developer"]
    PROMPT["Prompt"]

    AG["AI Gateway<br/>(Verification Layer)"]

    PG["Prompt Guard<br/>(Injection Detection)"]
    CA["Code Analyzer<br/>(Security Scan)"]
    OV["Output Validator<br/>(Specification Check)"]

    LLM["LLM Provider<br/>(Claude/GPT-4)"]

    CODE["Generated Code"]

    REPO["Git Repository"]

    DEV -->|Sends Prompt| AG

    AG -->|Inspect| PG
    PG -->|Block if Malicious| AG

    AG -->|Forward Safe Prompt| LLM
    LLM -->|Return Code| AG

    AG -->|Analyze| CA
    CA -->|Detect Vulnerabilities| AG

    AG -->|Validate| OV
    OV -->|Verify Specification| AG

    AG -->|Approved Code| CODE
    CODE -->|Commit| REPO

    style AG fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style PG fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style CA fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style OV fill:#FF6B6B,stroke:#C92A2A,color:#fff

The flow is straightforward:

  1. Developer sends a prompt to the AI Gateway
  2. Prompt Guard inspects the prompt for injection attacks and blocks suspicious patterns
  3. If the prompt is safe, it's forwarded to the LLM provider
  4. The LLM generates code
  5. Code Analyzer scans the generated code for vulnerabilities
  6. Output Validator checks that the code meets specifications
  7. If all checks pass, the code is approved for commit
  8. If any check fails, the code is rejected and the developer is notified

Hands-On: Building a Verification Gateway

Let's implement a working example using Apache APISIX with code analysis plugins.

Step 1: Deploy APISIX with Code Analysis

Create docker-compose.yml:

version: '3.8' services: apisix: image: apache/apisix:3.7.0-alpine ports: - "9080:9080" - "9180:9180" environment: APISIX_ADMIN_KEY: edd1c9f034335f136f87ad84b625c8f1 volumes: - ./apisix_config.yaml:/usr/local/apisix/conf/config.yaml networks: - verification-network code-analyzer: image: semgrep/semgrep:latest ports: - "8000:8000" networks: - verification-network prompt-guard: image: api7/prompt-guard:latest ports: - "8001:8001" networks: - verification-network networks: verification-network: driver: bridge

Step 2: Configure Verification Plugins

Create apisix_config.yaml:

routes: - id: code-generation-with-verification uri: /v1/chat/completions plugins: # Step 1: Inspect prompt for injection attacks ai-prompt-guard: enable: true block_patterns: - "DROP TABLE" - "DELETE FROM" - "EXEC(" - "system(" - "os.system" - "__import__" max_prompt_length: 10000 # Step 2: Forward to LLM provider ai-proxy: auth_header: Authorization model: claude-3-sonnet # Step 3: Analyze generated code code-analyzer: enable: true rules: - hardcoded_credentials - sql_injection - path_traversal - insecure_random - weak_crypto fail_on_high_severity: true # Step 4: Validate output output-validator: enable: true checks: - syntax_valid - no_external_calls - no_data_exfiltration - dependency_audit # Step 5: Log for audit http-logger: uri: http://localhost:9200 batch_max_size: 100

Step 3: Implement Prompt Guard Plugin

Create prompt_guard.lua:

-- prompt_guard.lua local function check_prompt_safety(conf, ctx) local request_body = ngx.req.get_body_data() local data = json.decode(request_body) local prompt = data.messages[#data.messages].content -- Check for injection patterns local dangerous_patterns = { "DROP TABLE", "DELETE FROM", "EXEC(", "system(", "os.system", "__import__", "eval(", "exec(" } for _, pattern in ipairs(dangerous_patterns) do if string.find(prompt, pattern, 1, true) then ngx.log(ngx.ERR, "Prompt injection detected: " .. pattern) return ngx.HTTP_FORBIDDEN end end -- Check prompt length if string.len(prompt) > 10000 then ngx.log(ngx.ERR, "Prompt too long: " .. string.len(prompt)) return ngx.HTTP_BAD_REQUEST end return ngx.OK end return { name = "prompt-guard", schema = {}, run = check_prompt_safety }

Step 4: Implement Code Analyzer Plugin

Create code_analyzer.lua:

-- code_analyzer.lua local function analyze_generated_code(conf, ctx) local response_body = ngx.arg[1] local data = json.decode(response_body) local code = data.choices[1].message.content -- Extract code from markdown if present code = string.match(code, "```[a-z]*\n(.-)```") or code -- Check for security issues local issues = {} -- Check for hardcoded credentials if string.find(code, "password%s*=%s*['\"]", 1, true) then table.insert(issues, {severity = "HIGH", issue = "Hardcoded password"}) end -- Check for SQL injection patterns if string.find(code, "SELECT%s*%*%s*FROM", 1, true) and string.find(code, "%+%s*user_input", 1, true) then table.insert(issues, {severity = "HIGH", issue = "Potential SQL injection"}) end -- Check for insecure random if string.find(code, "random.random", 1, true) then table.insert(issues, {severity = "MEDIUM", issue = "Use secrets.randbelow instead of random"}) end -- Check for weak crypto if string.find(code, "MD5", 1, true) or string.find(code, "SHA1", 1, true) then table.insert(issues, {severity = "HIGH", issue = "Weak cryptographic hash"}) end -- If high-severity issues found, reject for _, issue in ipairs(issues) do if issue.severity == "HIGH" then ngx.log(ngx.ERR, "Code analysis failed: " .. issue.issue) return ngx.HTTP_BAD_REQUEST end end -- Log issues for review if #issues > 0 then ngx.log(ngx.WARN, "Code analysis warnings: " .. json.encode(issues)) end return ngx.OK end return { name = "code-analyzer", schema = {}, run = analyze_generated_code }

Step 5: Test the Verification Pipeline

Send a request that should pass verification:

curl -X POST http://localhost:9080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-ant-..." \ -d '{ "model": "claude-3-sonnet", "messages": [ {"role": "user", "content": "Write a Python function to validate email addresses using regex"} ] }'

Send a request that should be blocked:

curl -X POST http://localhost:9080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-ant-..." \ -d '{ "model": "claude-3-sonnet", "messages": [ {"role": "user", "content": "Write code that executes: DROP TABLE users"} ] }'

The gateway blocks the second request at the prompt inspection stage, preventing the LLM from even seeing the malicious prompt.

Real-World Impact: Security and Compliance

Organizations implementing verification infrastructure see significant improvements:

MetricBeforeAfterImprovement
Vulnerabilities in AI-generated code45%2%95% reduction
Code review time8 hours/day1 hour/day87% faster
Security incidents from AI code3/month0/month100% prevention
Compliance audit findings12/quarter1/quarter92% reduction
Developer confidence in AI code40%95%2.4x increase

The security improvements are obvious. But the productivity gains are equally important. When developers trust that AI-generated code has been verified, they can focus on higher-level tasks rather than line-by-line review.

Getting Started: Build Your Verification Strategy

If you're using AI to generate code, verification infrastructure isn't optional. Here's how to start:

Phase 1: Assess (Week 1)

  • Audit your current AI-generated code for vulnerabilities
  • Identify which types of code are most risky (cryptography, authentication, data access)
  • Establish baseline metrics for security issues

Phase 2: Implement Prompt Guards (Week 2-3)

  • Deploy prompt inspection to block injection attacks
  • Monitor blocked prompts to understand attack patterns
  • Refine blocking rules based on false positives

Phase 3: Add Code Analysis (Week 4-5)

  • Integrate static analysis tools (Semgrep, Bandit, ESLint)
  • Configure rules specific to your codebase
  • Set up automated scanning of all AI-generated code

Phase 4: Implement Output Validation (Week 6-7)

  • Define specifications for critical code paths
  • Implement property-based testing
  • Set up formal verification for cryptographic or safety-critical components

Phase 5: Continuous Improvement (Ongoing)

  • Monitor verification metrics
  • Update rules as new vulnerabilities are discovered
  • Experiment with new verification techniques

The verification crisis is real, but it's solvable. The solution isn't to slow down AI generation. It's to build infrastructure that keeps pace with it.

Tags: