EzAI
Back to Blog
Tutorial Apr 1, 2026 8 min read

Build an AI API Request Validator with Python

E

EzAI Team

Build an AI API Request Validator with Python

Every malformed API call costs you money. A missing model field returns a 400 error — but not before your infrastructure processes the request. An oversized prompt burns through tokens that produce nothing useful. A garbled messages array triggers a retry loop that multiplies the damage. In production, roughly 10-15% of AI API calls carry preventable errors that a simple validation layer would catch in under 2 milliseconds.

This tutorial builds a request validator in Python that sits between your application code and the EzAI API. It checks schema structure, counts tokens before sending, validates content quality, and provides clear error messages so your team can fix issues at the source instead of debugging API error responses.

Why Validate Before Sending?

The EzAI API (and any OpenAI/Anthropic-compatible endpoint) returns detailed error messages when something is wrong. So why bother validating locally? Three reasons:

  • Latency — A round trip to the API just to get a 400 error takes 200-500ms. Local validation takes <2ms.
  • Token waste — If your prompt is 50,000 tokens but the model's limit is 8,192, you've already serialized and sent that payload for nothing.
  • Debuggability — Your validator can say "messages[2].content is an empty string" instead of the API's generic "invalid_request_error".
AI API request validation pipeline showing schema, token, content, and send stages

The four-stage validation pipeline — each stage catches a different class of error before tokens are spent

Project Setup

You need Python 3.10+ and two packages. tiktoken handles token counting, and httpx makes the actual API calls.

bash
pip install tiktoken httpx

Set your EzAI API key as an environment variable:

bash
export EZAI_API_KEY="sk-your-key-here"

Building the Validator Class

The validator runs four checks in sequence. If any check fails, it returns immediately with a structured error — no API call is made. Here's the core class:

python
import os, tiktoken, httpx
from dataclasses import dataclass

# Token limits per model family (input tokens)
MODEL_LIMITS = {
    "claude-3-opus": 200_000,
    "claude-3.5-sonnet": 200_000,
    "claude-sonnet-4": 200_000,
    "gpt-4o": 128_000,
    "gpt-4o-mini": 128_000,
    "gemini-2.5-pro": 1_000_000,
}
VALID_ROLES = {"system", "user", "assistant"}

@dataclass
class ValidationResult:
    valid: bool
    errors: list[str]
    token_count: int = 0

class RequestValidator:
    def __init__(self):
        self.encoder = tiktoken.get_encoding("cl100k_base")

    def validate(self, request: dict) -> ValidationResult:
        errors = []

        # Stage 1: Schema validation
        errors.extend(self._check_schema(request))
        if errors:
            return ValidationResult(False, errors)

        # Stage 2: Token counting
        token_count = self._count_tokens(request)
        model = request["model"]
        limit = self._get_limit(model)
        if token_count > limit:
            errors.append(
                f"Token count {token_count:,} exceeds {model} limit of {limit:,}"
            )

        # Stage 3: Content validation
        errors.extend(self._check_content(request))

        return ValidationResult(len(errors) == 0, errors, token_count)

Each stage is a separate method. This makes it easy to add new checks later — for example, PII detection or prompt injection scanning — without touching existing validation logic.

Schema Validation

The schema check catches structural errors: missing fields, wrong types, invalid enum values. These are the errors that would give you a 400 from the API.

python
def _check_schema(self, req: dict) -> list[str]:
    errors = []

    if "model" not in req:
        errors.append("Missing required field: 'model'")

    if "messages" not in req:
        errors.append("Missing required field: 'messages'")
        return errors

    if not isinstance(req["messages"], list):
        errors.append("'messages' must be a list")
        return errors

    if len(req["messages"]) == 0:
        errors.append("'messages' cannot be empty")

    for i, msg in enumerate(req["messages"]):
        if not isinstance(msg, dict):
            errors.append(f"messages[{i}] must be a dict, got {type(msg).__name__}")
            continue
        role = msg.get("role")
        if role not in VALID_ROLES:
            errors.append(f"messages[{i}].role '{role}' invalid. Use: {VALID_ROLES}")
        if "content" not in msg:
            errors.append(f"messages[{i}] missing 'content' field")

    # Validate optional params
    if "temperature" in req:
        t = req["temperature"]
        if not isinstance(t, (int, float)) or not 0 <= t <= 2:
            errors.append(f"temperature must be 0-2, got {t}")

    if "max_tokens" in req:
        mt = req["max_tokens"]
        if not isinstance(mt, int) or mt < 1:
            errors.append(f"max_tokens must be a positive integer, got {mt}")

    return errors

Notice we bail early if messages is missing or isn't a list. No point checking individual messages if the container is broken. This fail-fast pattern keeps error messages relevant and avoids cascading noise.

Token Counting

Token counting prevents the most expensive class of errors: sending a 100K-token prompt to a model with an 8K context window. The API will process part of your request before rejecting it — you pay for that processing time.

python
def _count_tokens(self, req: dict) -> int:
    total = 0
    for msg in req.get("messages", []):
        content = msg.get("content", "")
        if isinstance(content, str):
            total += len(self.encoder.encode(content))
        elif isinstance(content, list):
            # Multi-modal: text blocks + image blocks
            for block in content:
                if block.get("type") == "text":
                    total += len(self.encoder.encode(block["text"]))
                elif block.get("type") == "image_url":
                    total += 1000  # rough estimate per image
        total += 4  # per-message overhead
    return total

def _get_limit(self, model: str) -> int:
    for prefix, limit in MODEL_LIMITS.items():
        if model.startswith(prefix):
            return limit
    return 128_000  # safe default

The token count is approximate — tiktoken uses the cl100k_base encoding which matches GPT-4 and is close enough for Claude models. For production, the margin of error is under 5%, which is fine for a pre-flight check.

Content Quality Checks

Beyond structure and size, you want to catch content-level issues that produce garbage output: empty prompts, whitespace-only messages, or system prompts placed after user messages.

python
def _check_content(self, req: dict) -> list[str]:
    errors = []
    messages = req.get("messages", [])
    seen_non_system = False

    for i, msg in enumerate(messages):
        content = msg.get("content", "")
        role = msg.get("role", "")

        # Check for empty or whitespace-only content
        if isinstance(content, str) and not content.strip():
            errors.append(f"messages[{i}].content is empty or whitespace-only")

        # System messages should come first
        if role == "system" and seen_non_system:
            errors.append(
                f"messages[{i}]: system message after user/assistant (move to front)"
            )
        if role in ("user", "assistant"):
            seen_non_system = True

    # Last message should be from user (not assistant)
    if messages and messages[-1].get("role") == "assistant":
        errors.append("Last message is from 'assistant' — did you forget the user prompt?")

    return errors

Wiring It Into Your API Client

The validator wraps your existing EzAI API call. Here's a complete client that validates, sends, and handles errors:

python
class ValidatedClient:
    BASE = "https://api.ezaiapi.com/v1"

    def __init__(self, api_key: str | None = None):
        self.key = api_key or os.getenv("EZAI_API_KEY")
        self.validator = RequestValidator()
        self.client = httpx.Client(
            base_url=self.BASE,
            headers={"Authorization": f"Bearer {self.key}"},
            timeout=60.0,
        )

    def chat(self, request: dict) -> dict:
        # Validate first
        result = self.validator.validate(request)
        if not result.valid:
            raise ValueError(
                f"Request validation failed:\n"
                + "\n".join(f"  - {e}" for e in result.errors)
            )

        print(f"✓ Validated: {result.token_count:,} tokens")

        # Send to EzAI
        resp = self.client.post("/chat/completions", json=request)
        resp.raise_for_status()
        return resp.json()


# Usage
client = ValidatedClient()

response = client.chat({
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
        {"role": "system", "content": "You are a code reviewer."},
        {"role": "user", "content": "Review this function:\ndef add(a, b): return a + b"},
    ],
})

print(response["choices"][0]["message"]["content"])

When validation fails, you get a clear error before any network request is made:

output
ValueError: Request validation failed:
  - messages[0].role 'systme' invalid. Use: {'system', 'user', 'assistant'}
  - messages[1].content is empty or whitespace-only

Adding Middleware for FastAPI

If you're building an AI gateway or proxy with FastAPI, the validator slots in as middleware. Every request hitting your /v1/chat/completions endpoint gets checked automatically:

python
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import JSONResponse

app = FastAPI()
validator = RequestValidator()

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
    body = await request.json()

    # Validate before forwarding
    result = validator.validate(body)
    if not result.valid:
        return JSONResponse(
            status_code=400,
            content={
                "error": {
                    "type": "validation_error",
                    "message": "Request failed pre-flight validation",
                    "details": result.errors,
                    "token_count": result.token_count,
                }
            },
        )

    # Forward to EzAI
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.ezaiapi.com/v1/chat/completions",
            json=body,
            headers={"Authorization": request.headers.get("Authorization")},
            timeout=60.0,
        )
        return JSONResponse(
            status_code=resp.status_code,
            content=resp.json(),
        )

This gives callers a structured 400 response with specific errors instead of a generic message. Your team's debugging time drops from "what went wrong?" to "fix messages[2].content".

Going Further

The four-stage pipeline handles the most common issues, but you can extend it. A few ideas:

  • PII detection — Scan for email addresses, phone numbers, or SSNs before they reach the model. Use regex patterns or a lightweight NER model.
  • Cost estimation — Use the token count + model pricing from EzAI's pricing page to estimate the cost before sending. Reject requests above a threshold.
  • Prompt injection detection — Look for common injection patterns like "ignore previous instructions" in user messages. Flag them for review.
  • Rate limit awareness — Track request counts locally and queue requests that would exceed your rate limit instead of letting them fail.

Validation isn't glamorous, but it's the difference between an AI integration that burns money on bad requests and one that runs clean. Two milliseconds of checking saves minutes of debugging and dollars of wasted tokens.


Get your API key at ezaiapi.com/dashboard and start building. The validator above works with any OpenAI-compatible endpoint — just change the base URL.

E

EzAI Team

Building tools for developers who use AI APIs in production. Ship faster, spend less.