Documentation

What is EzAI API?

EzAI API is a unified AI gateway that gives you access to 20+ models from Anthropic, OpenAI, Google, and xAI through a single endpoint. Fully compatible with Claude Code, Cursor, Cline, and any Anthropic or OpenAI-compatible tool.

20+ AI Models

Claude, GPT, Gemini, Grok — one API

Real-time Dashboard

Live usage, costs & request tracking

20+ Premium Models

Claude, GPT, Gemini & more

How We Keep Costs Low

✓Smart caching — Maximizing prompt cache hits to cut token costs
✓Infrastructure optimization — Efficient routing reduces overhead
✓Transparent pricing — Pay per token, no hidden fees, no subscriptions required

Quickstart

Up and running in under a minute. Sign up to get your API key.

1

Get your API key

Sign in and copy your API key from the dashboard.

2

Run the install command

Sets ANTHROPIC_BASE_URL, ANTHROPIC_API_KEY, and configures ~/.claude/settings.json.

View source: install.sh · install.ps1

3

Restart your terminal

source ~/.bashrc  # or ~/.zshrc

4

Start using Claude Code

claude

That's it! Claude Code routes through EzAI automatically.

Manual Installation

Prefer manual setup? Two steps:

1. Set environment variables

Add to ~/.bashrc, ~/.zshrc, or your shell config:

export ANTHROPIC_BASE_URL=""
export ANTHROPIC_API_KEY="YOUR_KEY"

2. Update settings.json

Create or edit ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "",
    "ANTHROPIC_API_KEY": "YOUR_KEY"
  },
  "disableLoginPrompt": true
}

Uninstall

Remove the proxy configuration:

# Remove env vars from your shell config
unset ANTHROPIC_BASE_URL
unset ANTHROPIC_API_KEY

# Remove Claude Code settings
rm ~/.claude/settings.json

Authentication

Include your API key in every request:

# Required headers
x-api-key: sk-your-api-key
content-type: application/json
anthropic-version: 2023-06-01

Base URL: — all requests go here. Get your key from the dashboard.

Create Message

POST /v1/messages

Send a message to any model. The proxy auto-converts between formats upstream — you always use the same Anthropic Messages format.

curl /v1/messages \
  -H "x-api-key: sk-your-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Hello, Claude!" }
    ]
  }'

curl /v1/messages \
  -H "x-api-key: sk-your-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Hello, GPT!" }
    ]
  }'

Note: Request uses Anthropic format. The proxy auto-converts to OpenAI format upstream and normalizes the response back.

curl /v1/messages \
  -H "x-api-key: sk-your-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "gemini-2.5-pro",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Hello, Gemini!" }
    ]
  }'

Response follows the standard Anthropic Messages API format for all models.

Streaming

Add "stream": true to receive Server-Sent Events:

curl /v1/messages \
  -H "x-api-key: sk-your-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      { "role": "user", "content": "Write a haiku" }
    ]
  }'

SSE events: message_start → content_block_delta → message_stop. See Anthropic streaming docs.

Extended Thinking

Enable Claude's chain-of-thought reasoning for complex tasks:

{
  "model": "claude-opus-4-6",
  "max_tokens": 16000,
  "stream": true,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  },
  "messages": [
    { "role": "user", "content": "Solve this step by step..." }
  ]
}

Supported Models

claude-opus-4-6 · claude-sonnet-4-5 · claude-sonnet-4

Best Practices

budget_tokens: 5K–10K for most tasks. Max: 32,000. Always use with streaming.

Count Tokens

POST /v1/messages/count_tokens Free — no credits charged

curl /v1/messages/count_tokens \
  -H "x-api-key: sk-your-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{ "model": "claude-sonnet-4-5", "messages": [{ "role": "user", "content": "Hello!" }] }'

// → { "input_tokens": 12 }

OpenAI Compatible Endpoint

POST /v1/chat/completions

Native OpenAI Chat Completions format. Use with OpenAI SDK, LiteLLM, or any OpenAI-compatible tool.

curl /v1/chat/completions \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      { "role": "system", "content": "You are helpful." },
      { "role": "user", "content": "Hello!" }
    ],
    "max_tokens": 1024
  }'

Response (OpenAI format):

{
  "id": "req_abc123",
  "object": "chat.completion",
  "model": "claude-sonnet-4-5",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Hello! How can I help?" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 15, "completion_tokens": 9, "total_tokens": 24 }
}

GET /v1/models

Returns all available models in OpenAI format. Useful for model discovery in tools like LiteLLM.

Also available at: /openai/v1/models, /api/v1/models

Single model: GET /v1/models/{model_id} — returns model info or 404

POST /v1/responses NEW

OpenAI Responses API format. Compatible with n8n, LangChain, and tools using the newer OpenAI SDK.

curl /v1/responses \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{ "model": "claude-sonnet-4-5", "input": "Hello!", "max_output_tokens": 1024 }'

Accepts input (string or message array) + instructions (system prompt). Auto-converted to Anthropic format upstream.

Supported Models

C

Claude

claude-opus-4-6 144K
claude-opus-4-5 160K
claude-sonnet-4-6 new 200K
claude-sonnet-4-5 144K
claude-sonnet-4 216K
claude-haiku-4-5 144K

G

GPT / OpenAI

gpt-5.4 ★ 400K
gpt-5.4-mini new ★ 400K
gpt-5.3-codex 400K
gpt-5.2 264K / gpt-5.2-codex 400K
gpt-5.1 264K
gpt-5-mini 264K
gpt-4.1 128K
gpt-4o / gpt-4o-mini 128K

G

Gemini

gemini-3.1-pro preview 128K
gemini-3-flash preview 128K
gemini-2.5-pro 128K

X

xAI

grok-code-fast-1 128K

All models accessible via /v1/messages (Anthropic), /v1/chat/completions (OpenAI) and /v1/responses (Responses API) formats.

SDK Examples

# pip install anthropic
import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-key",
    base_url="https://ezaiapi.com"
)

# Claude
msg = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)
print(msg.content[0].text)

# GPT — same client, just change model
msg = client.messages.create(model="gpt-4.1", max_tokens=1024,
    messages=[{"role": "user", "content": "Hello GPT!"}])

# Gemini
msg = client.messages.create(model="gemini-2.5-pro", max_tokens=1024,
    messages=[{"role": "user", "content": "Hello Gemini!"}])

// npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'sk-your-key',
  baseURL: 'https://ezaiapi.com'
});

const msg = await client.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(msg.content[0].text);

OpenAI SDK Integration

Use the official OpenAI SDK to access all EzAI models via the /v1/chat/completions endpoint.

⚠️ Important: Cloudflare blocks the default User-Agent: OpenAI/Python header with a 403 error. You must set a custom User-Agent as shown below.

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://ezaiapi.com/v1",
    api_key="sk-your-key",
    default_headers={"User-Agent": "EzAI/1.0"}  # Required!
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=1024
)
print(response.choices[0].message.content)

// npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://ezaiapi.com/v1',
  apiKey: 'sk-your-key',
  defaultHeaders: { 'User-Agent': 'EzAI/1.0' }  // Required!
});

const resp = await client.chat.completions.create({
  model: 'claude-sonnet-4-5',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' }
  ],
  max_tokens: 1024
});
console.log(resp.choices[0].message.content);

Compatibility

EzAI works as a drop-in replacement for any Anthropic or OpenAI-compatible tool.

Tool	Status	Setup
Claude Code	✅ Native	Set env vars or run install script
Cursor	✅ Works	Settings → Models → Custom API provider
Cline	✅ Works	Extension settings → Anthropic → Base URL
Continue	✅ Works	config.json → custom provider
Aider	✅ Works	Set ANTHROPIC_BASE_URL env var
OpenAI SDK	✅ Works	Set custom User-Agent (see above)
LiteLLM	✅ Works	Set api_base + api_key
Any Anthropic tool	✅ Works	Change base URL + API key

Rate Limits

Rate limits are applied per account based on your tier. Two limits apply simultaneously:

RPM

Requests per minute

Concurrent

Max parallel requests

5 min

IP block auto-expires

Tier	RPM	Concurrent	Daily Requests	Condition
Free	5	1	50	$0 balance, no plan
Top-up	30	2	1,500	Has credit balance (base)
Starter	30	2	2,500	$10/month (1,500 base + 1,000)
Pro	60	3	3,500	$20/month (1,500 base + 2,000)
Max	90	6	6,500	$40/month (1,500 base + 5,000)
Ultra	120	8	11,500	$80/month (1,500 base + 10,000)

💎 Monthly Plans — Get free credits that reset every 5 hours + higher rate limits + daily request quota. Multiple plans stack daily limits. View plans →

Response Headers

Every API response includes rate limit headers so your client can track usage:

x-ratelimit-tier: topup
x-ratelimit-limit-requests: 30          # RPM limit for your tier
x-ratelimit-remaining-requests: 28      # Remaining requests this minute
x-ratelimit-reset-requests: 2026-03-03T09:00:00.000Z
x-concurrent-limit: 5                   # Max parallel requests
x-concurrent-remaining: 3               # Available parallel slots

•Exceeding any limit returns 429 with a Retry-After header
•50+ consecutive failed auth attempts from the same IP triggers auto-block (403)
•IP blocks expire automatically after 5 minutes
•Balance exhaustion returns 402 — top up
•Your current tier is shown in your dashboard — upgrade plan for higher limits

Error Codes

Code	Meaning	What to Do
400	Bad request / invalid body	Check your JSON payload and required fields
401	Invalid or missing API key	Check `x-api-key` header value
402	Insufficient balance	Top up credits in dashboard
403	IP blocked or Cloudflare WAF	Wait 5 min (IP block) or set custom User-Agent (SDK)
405	Method not allowed	Use POST for /v1/messages
408	Request timeout	Retry — upstream model was slow to respond
413	Payload too large	Reduce input tokens or message count
429	Rate limit exceeded (RPM or concurrent)	Check `Retry-After` header; reduce request frequency
502	Upstream error	Retry after a few seconds
503	All upstream providers busy	Retry with exponential backoff
529	Upstream overloaded	Wait and resend — automatic retry recommended

Troubleshooting

403 Forbidden with OpenAI SDK

Cloudflare WAF blocks the default User-Agent: OpenAI/Python header. Set a custom User-Agent:

client = OpenAI(
    base_url="https://ezaiapi.com/v1",
    api_key="sk-your-key",
    default_headers={"User-Agent": "EzAI/1.0"}
)

403 Forbidden — IP Blocked

Your IP was auto-blocked after 50+ failed auth attempts. Wait 5 minutes for the block to expire, then fix your API key. Check the key in your dashboard.

429 Too Many Requests

You've exceeded your tier's rate limit (RPM or concurrent). Check the Retry-After header for when to retry. If your balance is exhausted (402), top up in the dashboard. See Rate Limits for your tier's limits.

Connection Timeout / Reset

Intermittent upstream connection issues. The proxy automatically retries. If persistent, try again after a few seconds — the upstream model may be under heavy load.

Claude Code not using proxy

Verify your environment variables are set:

echo $ANTHROPIC_BASE_URL   # Should show https://ezaiapi.com
echo $ANTHROPIC_API_KEY  # Should show your API key
cat ~/.claude/settings.json # Should contain env block

Changelog

Mar 2026

+GPT-5.4, Gemini 3.1 Pro Preview, Grok Code Fast
+Model discovery: /openai/v1/models, /api/v1/models, /v1/models/{id}
+GPT-5.3 Codex, GPT-5.4 Mini
+Streaming usage fix: input_tokens now reported correctly in message_delta
+Cache token tracking in Usage tab
+Responses API endpoint /v1/responses (n8n, LangChain)
+Legacy model auto-mapping (Claude 3.x → 4.x)
+GPT-5.1 Codex Mini/Max, GPT-5.2 Codex
+LZT Market payment (Invoice + Transfer)
+Crypto payment via NOWPayments (350+ coins)
+Per-user rate limiting (RPM + Concurrent, 6 tiers)
+Reseller system (reseller panel, end-user management, API)
+Database optimization (daily aggregation, auto-cleanup)
+Standalone reseller panel at /reseller

Feb 2026

+OpenAI-compatible endpoint /v1/chat/completions
+Claude Sonnet 4.6 support
+Model discovery endpoint GET /v1/models

Jan 2026

+GPT-5.2, GPT-5.2-codex
+Gemini 3.1 Pro
+Extended thinking support
+Real-time usage dashboard

Dec 2025

🚀EzAI API launch

Documentation

What is EzAI API?

20+ AI Models

Real-time Dashboard

20+ Premium Models

How We Keep Costs Low

Quickstart

Get your API key

Run the install command

Restart your terminal

Start using Claude Code

Manual Installation

1. Set environment variables

2. Update settings.json

Uninstall

Authentication

Create Message

Streaming

Extended Thinking

Supported Models

Best Practices

Count Tokens

OpenAI Compatible Endpoint

Supported Models

SDK Examples

OpenAI SDK Integration

Compatibility

Rate Limits

Response Headers

Error Codes

Troubleshooting

403 Forbidden with OpenAI SDK

403 Forbidden — IP Blocked

429 Too Many Requests

Connection Timeout / Reset

Claude Code not using proxy

Changelog

Pricing

Blog

Download

FAQ