Documentation
What is EzAI API?
EzAI API is a unified AI gateway that gives you access to 20+ models from Anthropic, OpenAI, Google, and xAI through a single endpoint. Fully compatible with Claude Code, Cursor, Cline, and any Anthropic or OpenAI-compatible tool.
20+ AI Models
Claude, GPT, Gemini, Grok — one API
Real-time Dashboard
Live usage, costs & request tracking
20+ Premium Models
Claude, GPT, Gemini & more
How We Keep Costs Low
- ✓Smart caching — Maximizing prompt cache hits to cut token costs
- ✓Infrastructure optimization — Efficient routing reduces overhead
- ✓Transparent pricing — Pay per token, no hidden fees, no subscriptions required
Quickstart
Up and running in under a minute. Sign up to get your API key.
Get your API key
Sign in and copy your API key from the dashboard.
Run the install command
Sets ANTHROPIC_BASE_URL, ANTHROPIC_API_KEY, and configures ~/.claude/settings.json.
curl -fsSL "/install.sh?key=YOUR_KEY" | sh
irm "/install.ps1?key=YOUR_KEY" | iex
View source: install.sh · install.ps1
Restart your terminal
source ~/.bashrc # or ~/.zshrcStart using Claude Code
claude
That's it! Claude Code routes through EzAI automatically.
Manual Installation
Prefer manual setup? Two steps:
1. Set environment variables
Add to ~/.bashrc, ~/.zshrc, or your shell config:
export ANTHROPIC_BASE_URL="" export ANTHROPIC_API_KEY="YOUR_KEY"
2. Update settings.json
Create or edit ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "",
"ANTHROPIC_API_KEY": "YOUR_KEY"
},
"disableLoginPrompt": true
}Uninstall
Remove the proxy configuration:
# Remove env vars from your shell config unset ANTHROPIC_BASE_URL unset ANTHROPIC_API_KEY # Remove Claude Code settings rm ~/.claude/settings.json
Authentication
Include your API key in every request:
# Required headers x-api-key: sk-your-api-key content-type: application/json anthropic-version: 2023-06-01
Base URL: — all requests go here. Get your key from the dashboard.
Create Message
/v1/messages
Send a message to any model. The proxy auto-converts between formats upstream — you always use the same Anthropic Messages format.
curl /v1/messages \ -H "x-api-key: sk-your-key" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Hello, Claude!" } ] }'
curl /v1/messages \ -H "x-api-key: sk-your-key" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "gpt-4.1", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Hello, GPT!" } ] }'
Note: Request uses Anthropic format. The proxy auto-converts to OpenAI format upstream and normalizes the response back.
curl /v1/messages \ -H "x-api-key: sk-your-key" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "gemini-2.5-pro", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Hello, Gemini!" } ] }'
Response follows the standard Anthropic Messages API format for all models.
Streaming
Add "stream": true to receive Server-Sent Events:
curl /v1/messages \ -H "x-api-key: sk-your-key" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024, "stream": true, "messages": [ { "role": "user", "content": "Write a haiku" } ] }'
SSE events: message_start → content_block_delta → message_stop. See Anthropic streaming docs.
Extended Thinking
Enable Claude's chain-of-thought reasoning for complex tasks:
{
"model": "claude-opus-4-6",
"max_tokens": 16000,
"stream": true,
"thinking": {
"type": "enabled",
"budget_tokens": 10000
},
"messages": [
{ "role": "user", "content": "Solve this step by step..." }
]
}Supported Models
claude-opus-4-6 · claude-sonnet-4-5 · claude-sonnet-4
Best Practices
budget_tokens: 5K–10K for most tasks. Max: 32,000. Always use with streaming.
Count Tokens
/v1/messages/count_tokens
Free — no credits charged
curl /v1/messages/count_tokens \ -H "x-api-key: sk-your-key" \ -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ -d '{ "model": "claude-sonnet-4-5", "messages": [{ "role": "user", "content": "Hello!" }] }' // → { "input_tokens": 12 }
OpenAI Compatible Endpoint
/v1/chat/completions
Native OpenAI Chat Completions format. Use with OpenAI SDK, LiteLLM, or any OpenAI-compatible tool.
curl /v1/chat/completions \ -H "Authorization: Bearer sk-your-key" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-5", "messages": [ { "role": "system", "content": "You are helpful." }, { "role": "user", "content": "Hello!" } ], "max_tokens": 1024 }'
Response (OpenAI format):
{
"id": "req_abc123",
"object": "chat.completion",
"model": "claude-sonnet-4-5",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "Hello! How can I help?" },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 15, "completion_tokens": 9, "total_tokens": 24 }
}/v1/models
Returns all available models in OpenAI format. Useful for model discovery in tools like LiteLLM.
/v1/responses
NEW
OpenAI Responses API format. Compatible with n8n, LangChain, and tools using the newer OpenAI SDK.
curl /v1/responses \ -H "Authorization: Bearer sk-your-key" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-5", "input": "Hello!", "max_output_tokens": 1024 }'
Accepts input (string or message array) + instructions (system prompt). Auto-converted to Anthropic format upstream.
Supported Models
- claude-opus-4-6
- claude-opus-4-5
- claude-sonnet-4-6 new
- claude-sonnet-4-5
- claude-sonnet-4
- claude-haiku-4-5
- gpt-5.4 new ★
- gpt-5.3-codex
- gpt-5.2 / gpt-5.2-codex
- gpt-5.1 / gpt-5.1-codex
- gpt-5.1-codex-mini / max
- gpt-5 / gpt-5-mini
- gpt-4.1 / gpt-4.1-mini / nano
- gpt-4o / gpt-4o-mini
- o3 / o3-mini / o4-mini
- gemini-3-pro preview
- gemini-3-flash preview
- gemini-2.5-pro
- gemini-2.5-flash
- grok-code-fast-1
All models accessible via /v1/messages (Anthropic), /v1/chat/completions (OpenAI) and /v1/responses (Responses API) formats.
SDK Examples
# pip install anthropic import anthropic client = anthropic.Anthropic( api_key="sk-your-key", base_url="https://ezaiapi.com" ) # Claude msg = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[{"role": "user", "content": "Hello!"}] ) print(msg.content[0].text) # GPT — same client, just change model msg = client.messages.create(model="gpt-4.1", max_tokens=1024, messages=[{"role": "user", "content": "Hello GPT!"}]) # Gemini msg = client.messages.create(model="gemini-2.5-pro", max_tokens=1024, messages=[{"role": "user", "content": "Hello Gemini!"}])
// npm install @anthropic-ai/sdk import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic({ apiKey: 'sk-your-key', baseURL: 'https://ezaiapi.com' }); const msg = await client.messages.create({ model: 'claude-sonnet-4-5', max_tokens: 1024, messages: [{ role: 'user', content: 'Hello!' }] }); console.log(msg.content[0].text);
OpenAI SDK Integration
Use the official OpenAI SDK to access all EzAI models via the /v1/chat/completions endpoint.
⚠️ Important: Cloudflare blocks the default User-Agent: OpenAI/Python header with a 403 error. You must set a custom User-Agent as shown below.
# pip install openai from openai import OpenAI client = OpenAI( base_url="https://ezaiapi.com/v1", api_key="sk-your-key", default_headers={"User-Agent": "EzAI/1.0"} # Required! ) response = client.chat.completions.create( model="claude-sonnet-4-5", messages=[ {"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Hello!"} ], max_tokens=1024 ) print(response.choices[0].message.content)
// npm install openai import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://ezaiapi.com/v1', apiKey: 'sk-your-key', defaultHeaders: { 'User-Agent': 'EzAI/1.0' } // Required! }); const resp = await client.chat.completions.create({ model: 'claude-sonnet-4-5', messages: [ { role: 'system', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' } ], max_tokens: 1024 }); console.log(resp.choices[0].message.content);
Compatibility
EzAI works as a drop-in replacement for any Anthropic or OpenAI-compatible tool.
| Tool | Status | Setup |
|---|---|---|
| Claude Code | ✅ Native | Set env vars or run install script |
| Cursor | ✅ Works | Settings → Models → Custom API provider |
| Cline | ✅ Works | Extension settings → Anthropic → Base URL |
| Continue | ✅ Works | config.json → custom provider |
| Aider | ✅ Works | Set ANTHROPIC_BASE_URL env var |
| OpenAI SDK | ✅ Works | Set custom User-Agent (see above) |
| LiteLLM | ✅ Works | Set api_base + api_key |
| Any Anthropic tool | ✅ Works | Change base URL + API key |
Rate Limits
Rate limits are applied per account based on your tier. Two limits apply simultaneously:
| Tier | RPM | Concurrent | Daily Requests | Condition |
|---|---|---|---|---|
| Free | 5 | 1 | 50 | $0 balance, no plan |
| Top-up | 30 | 2 | 500 | Has credit balance |
| Starter | 30 | 2 | 1,000 | $10/month plan |
| Pro | 60 | 3 | 2,000 | $20/month plan |
| Max | 90 | 6 | 5,000 | $40/month plan |
| Ultra | 120 | 8 | 10,000 | $80/month plan |
💎 Monthly Plans — Get free credits that reset every 5 hours + higher rate limits + daily request quota. Multiple plans stack daily limits. View plans →
Response Headers
Every API response includes rate limit headers so your client can track usage:
x-ratelimit-tier: topup x-ratelimit-limit-requests: 30 # RPM limit for your tier x-ratelimit-remaining-requests: 28 # Remaining requests this minute x-ratelimit-reset-requests: 2026-03-03T09:00:00.000Z x-concurrent-limit: 5 # Max parallel requests x-concurrent-remaining: 3 # Available parallel slots
- •Exceeding any limit returns
429with aRetry-Afterheader - •50+ consecutive failed auth attempts from the same IP triggers auto-block (
403) - •IP blocks expire automatically after 5 minutes
- •Balance exhaustion returns
402— top up - •Your current tier is shown in your dashboard — upgrade plan for higher limits
Error Codes
| Code | Meaning | What to Do |
|---|---|---|
| 400 | Bad request / invalid body | Check your JSON payload and required fields |
| 401 | Invalid or missing API key | Check x-api-key header value |
| 402 | Insufficient balance | Top up credits in dashboard |
| 403 | IP blocked or Cloudflare WAF | Wait 5 min (IP block) or set custom User-Agent (SDK) |
| 405 | Method not allowed | Use POST for /v1/messages |
| 408 | Request timeout | Retry — upstream model was slow to respond |
| 413 | Payload too large | Reduce input tokens or message count |
| 429 | Rate limit exceeded (RPM or concurrent) | Check Retry-After header; reduce request frequency |
| 502 | Upstream error | Retry after a few seconds |
| 503 | All upstream providers busy | Retry with exponential backoff |
| 529 | Upstream overloaded | Wait and resend — automatic retry recommended |
Troubleshooting
403 Forbidden with OpenAI SDK
Cloudflare WAF blocks the default User-Agent: OpenAI/Python header. Set a custom User-Agent:
client = OpenAI(
base_url="https://ezaiapi.com/v1",
api_key="sk-your-key",
default_headers={"User-Agent": "EzAI/1.0"}
)403 Forbidden — IP Blocked
Your IP was auto-blocked after 50+ failed auth attempts. Wait 5 minutes for the block to expire, then fix your API key. Check the key in your dashboard.
429 Too Many Requests
You've exceeded your tier's rate limit (RPM or concurrent). Check the Retry-After header for when to retry. If your balance is exhausted (402), top up in the dashboard. See Rate Limits for your tier's limits.
Connection Timeout / Reset
Intermittent upstream connection issues. The proxy automatically retries. If persistent, try again after a few seconds — the upstream model may be under heavy load.
Claude Code not using proxy
Verify your environment variables are set:
echo $ANTHROPIC_BASE_URL # Should show https://ezaiapi.com echo $ANTHROPIC_API_KEY # Should show your API key cat ~/.claude/settings.json # Should contain env block
Changelog
- +GPT-5.3 Codex, GPT-4.1 Mini/Nano, o3/o3-mini/o4-mini
- +Streaming usage fix:
input_tokensnow reported correctly inmessage_delta - +Cache token tracking in Usage tab
- +Responses API endpoint
/v1/responses(n8n, LangChain) - +Legacy model auto-mapping (Claude 3.x → 4.x)
- +GPT-5.1 Codex Mini/Max, GPT-5.2 Codex
- +LZT Market payment (Invoice + Transfer)
- +Crypto payment via NOWPayments (350+ coins)
- +Per-user rate limiting (RPM + Concurrent, 6 tiers)
- +Reseller system (reseller panel, end-user management, API)
- +Database optimization (daily aggregation, auto-cleanup)
- +Standalone reseller panel at
/reseller
- +Free models: Step 3.5 Flash, GLM 4.5 Air, Nemotron 3 Nano
- +OpenAI-compatible endpoint
/v1/chat/completions - +Claude Sonnet 4.6 support
- +Model discovery endpoint
GET /v1/models
- +GPT-5.2, GPT-5.2-codex
- +Gemini 3.1 Pro
- +Extended thinking support
- +Real-time usage dashboard
- 🚀EzAI API launch