How to Get Structured JSON Output from AI APIs

You send a prompt asking for a list of products with names, prices, and categories. The model returns beautifully formatted markdown. Your JSON.parse() throws. You regex the backticks out. It works for ten requests, then one comes back with a trailing comma and a helpful "Here's your data:" prefix. Sound familiar?

Getting structured JSON output from AI models is one of the most common production challenges. Free-form text is great for chatbots. It's terrible for pipelines. This guide covers three battle-tested approaches to extract typed, parseable JSON from Claude, GPT, and Gemini — all through the EzAI API.

Why "Just Ask for JSON" Breaks in Production

Telling a model "respond only in JSON" works about 95% of the time. That sounds fine until you're handling 10,000 requests a day and 500 of them blow up your parser. Common failure modes:

The model wraps output in ```json code fences
It adds preamble text like "Sure! Here's the JSON:"
Trailing commas, unquoted keys, or single quotes
Missing fields when the model "decides" a field isn't relevant
Extra fields the model thought would be helpful

Each of these is a runtime exception waiting to happen. The fix isn't better prompts — it's using the right API features.

Method 1: Claude's Tool Use for Guaranteed Schema

Claude's tool_use feature isn't just for function calling. You can define a "tool" that matches your desired output schema, and Claude will return data in exactly that shape. The model treats the tool's input_schema as a strict contract.

python

import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-key",
    base_url="https://ezaiapi.com"
)

extract_tool = {
    "name": "extract_products",
    "description": "Extract structured product data from text",
    "input_schema": {
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "price_usd": {"type": "number"},
                        "category": {
                            "type": "string",
                            "enum": ["electronics", "clothing", "food"]
                        },
                        "in_stock": {"type": "boolean"}
                    },
                    "required": ["name", "price_usd", "category", "in_stock"]
                }
            }
        },
        "required": ["products"]
    }
}

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=2048,
    tools=[extract_tool],
    tool_choice={"type": "tool", "name": "extract_products"},
    messages=[{
        "role": "user",
        "content": """Extract products from this invoice:
- MacBook Pro 14" M4: $1,999 (available)
- AirPods Max: $549 (backordered)
- iPhone 16 Pro case: $49 (available)"""
    }]
)

# The tool_use block contains valid, schema-conforming JSON
for block in message.content:
    if block.type == "tool_use":
        products = block.input["products"]
        for p in products:
            print(f"{p['name']}: ${p['price_usd']} ({p['category']})")

The key trick is tool_choice: {"type": "tool", "name": "extract_products"}. This forces the model to call the tool — no free-form text, no preamble, just structured data. The enum constraint on category means you'll only ever get one of your three valid values.

Comparison of structured output methods across AI providers

Three approaches to structured output — tool_use, response_format, and prompt engineering

Method 2: OpenAI-Compatible JSON Mode

If you're using OpenAI-compatible models through EzAI, you get access to response_format with JSON schema enforcement. This approach works with GPT-4o, GPT-4-turbo, and compatible models.

python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
    base_url="https://ezaiapi.com/openai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "ticket_analysis",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "priority": {"type": "string", "enum": ["low", "medium", "high", "critical"]},
                    "summary": {"type": "string"},
                    "assigned_team": {"type": "string"},
                    "estimated_hours": {"type": "number"}
                },
                "required": ["priority", "summary", "assigned_team", "estimated_hours"],
                "additionalProperties": False
            }
        }
    },
    messages=[{
        "role": "user",
        "content": "Analyze this support ticket: 'Our payment gateway has been returning 502 errors for the last 20 minutes. Customers can't check out. Revenue impact estimated at $2k/min.'"
    }]
)

import json
ticket = json.loads(response.choices[0].message.content)
print(ticket)
# {"priority": "critical", "summary": "Payment gateway returning 502 errors...",
#  "assigned_team": "payments-infra", "estimated_hours": 2}

With "strict": true, the model is constrained at the token level. It physically cannot produce output that violates your schema. No trailing commas, no extra fields, no missing required properties.

Method 3: Pydantic Validation Layer

For maximum safety, combine the API-level schema enforcement with client-side validation using Pydantic. This catches edge cases like a model returning 0 for a price when you expected positive numbers, or a suspiciously long string in a field that should be a short label.

python

from pydantic import BaseModel, Field, field_validator
from typing import Literal
import anthropic, json

class SentimentResult(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float = Field(ge=0.0, le=1.0)
    key_phrases: list[str] = Field(max_length=10)
    language: str = Field(max_length=5)  # ISO 639-1

    @field_validator("key_phrases")
    @classmethod
    def no_empty_phrases(cls, v):
        return [p for p in v if p.strip()]

def analyze_sentiment(text: str) -> SentimentResult:
    client = anthropic.Anthropic(
        api_key="sk-your-key",
        base_url="https://ezaiapi.com"
    )

    tool = {
        "name": "record_sentiment",
        "description": "Record the sentiment analysis result",
        "input_schema": SentimentResult.model_json_schema()
    }

    msg = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=512,
        tools=[tool],
        tool_choice={"type": "tool", "name": "record_sentiment"},
        messages=[{"role": "user", "content": f"Analyze sentiment: {text}"}]
    )

    raw = next(b.input for b in msg.content if b.type == "tool_use")
    return SentimentResult.model_validate(raw)

result = analyze_sentiment("The new API is blazing fast but the docs need work")
print(result.model_dump_json(indent=2))

The Pydantic model auto-generates a JSON schema via model_json_schema(), which you feed directly to the tool definition. When the response comes back, model_validate() checks types, ranges, enums, and string lengths. If the model hallucinates a confidence of 1.5, Pydantic catches it before it hits your database.

The validation pipeline: AI output → schema check → Pydantic → your database

Handling Failures Gracefully

Even with schema enforcement, you need a retry strategy. Network issues, model timeouts, and occasional schema violations still happen. Here's a production-ready wrapper with exponential backoff:

python

import time
from pydantic import ValidationError

def extract_with_retry(prompt, schema_model, max_retries=3):
    client = anthropic.Anthropic(
        api_key="sk-your-key",
        base_url="https://ezaiapi.com"
    )
    tool = {
        "name": "extract",
        "description": "Extract structured data",
        "input_schema": schema_model.model_json_schema()
    }

    for attempt in range(max_retries):
        try:
            msg = client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=2048,
                tools=[tool],
                tool_choice={"type": "tool", "name": "extract"},
                messages=[{"role": "user", "content": prompt}]
            )
            raw = next(b.input for b in msg.content if b.type == "tool_use")
            return schema_model.model_validate(raw)

        except ValidationError as e:
            print(f"Validation failed (attempt {attempt + 1}): {e}")
        except anthropic.APIError as e:
            print(f"API error (attempt {attempt + 1}): {e}")

        time.sleep(2 ** attempt)  # 1s, 2s, 4s

    raise RuntimeError("Failed to extract structured data after retries")

This gives you three attempts with increasing backoff. In production, you'd likely log the failures to a monitoring service and fall back to a different model via multi-model fallback.

Performance Tips

Structured output has real cost implications. Here's how to keep it efficient:

Keep schemas tight. Every optional field is a field the model might hallucinate. Use required liberally and additionalProperties: false when possible.
Use enum for constrained values. Instead of letting the model invent category names, give it a fixed list. This also cuts output tokens since the model can commit to a value faster.
Sonnet over Opus for extraction. Claude Sonnet 4.5 is 5x cheaper than Opus and follows tool schemas just as reliably for data extraction tasks. Save Opus for complex reasoning.
Cache the schema. If you're processing batches, use prompt caching to avoid re-encoding the tool definition on every request. The schema tokens become free after the first call in a cache window.
Set tight max_tokens. If your expected output is 200 tokens, set max_tokens to 512, not 4096. This avoids runaway costs if the model hallucinates a novel-length response.

When to Use Which Method

Match the approach to your use case:

Claude tool_use — Best for Anthropic models. Highest reliability, supports nested schemas, works with streaming. Use this as your default.
OpenAI json_schema — Required for GPT models. The strict: true flag gives token-level enforcement. Slightly more rigid than tool_use but equally reliable.
Pydantic wrapper — Layer this on top of either method for business logic validation (ranges, patterns, cross-field dependencies) that JSON Schema alone can't express.

Through EzAI, you can use all three approaches with a single API key. Switch between Claude and GPT extraction based on which model handles your specific domain better — or use model routing to pick automatically.

What's Next

Structured output is the foundation for reliable AI pipelines. Once you have typed, validated data coming out of your models, you can build real systems on top:

Feed extracted data into data pipelines without manual parsing
Power tool-calling agents that operate on structured commands
Build classification, extraction, and analysis workflows that run at scale with batch processing

Get your API key at ezaiapi.com and start extracting structured data in minutes.

How to Get Structured JSON Output from AI APIs

Why "Just Ask for JSON" Breaks in Production

Method 1: Claude's Tool Use for Guaranteed Schema

Method 2: OpenAI-Compatible JSON Mode

Method 3: Pydantic Validation Layer

Handling Failures Gracefully

Performance Tips

When to Use Which Method

What's Next

Related Posts

AI Tool Use & Function Calling via API

AI API Error Handling: Retries, Timeouts & Fallbacks