EzAI
Back to Blog
Tips Apr 8, 2026 8 min read

Add Circuit Breakers to AI API Calls in Production

E

EzAI Team

Add Circuit Breakers to AI API Calls in Production

Your AI-powered app handles 500 requests per minute. Then Claude goes down for 90 seconds. Without a circuit breaker, those 500 requests stack up — threads block, memory spikes, timeouts cascade through your entire stack. Users see a blank screen. Your monitoring lights up red across every service, not just the one calling the AI model.

Circuit breakers prevent this exact scenario. Borrowed from electrical engineering, the pattern is simple: track failures, and when they exceed a threshold, stop making calls entirely for a cooldown period. Return a fallback response or a fast error instead of waiting 30 seconds for a timeout you already know is coming.

How Circuit Breakers Work

A circuit breaker has three states:

  • Closed — Normal operation. Requests pass through. The breaker tracks consecutive failures.
  • Open — Too many failures. All requests are rejected immediately without hitting the API. A timer starts.
  • Half-Open — Timer expired. One probe request is allowed through. If it succeeds, the breaker closes. If it fails, it reopens.

The key insight: failing fast is better than failing slow. A 5ms rejection beats a 30-second timeout every time.

Building a Circuit Breaker in Python

Here's a production-grade circuit breaker you can drop into any Python codebase. No external dependencies — just time and threading:

python
import time, threading
from enum import Enum

class State(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5,
                 reset_timeout=60, half_open_max=1):
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.half_open_max = half_open_max
        self.failures = 0
        self.state = State.CLOSED
        self.opened_at = 0
        self.lock = threading.Lock()

    def call(self, func, *args, **kwargs):
        with self.lock:
            if self.state == State.OPEN:
                if time.time() - self.opened_at > self.reset_timeout:
                    self.state = State.HALF_OPEN
                else:
                    raise CircuitOpenError(
                        f"Circuit open, retry in "
                        f"{self.reset_timeout - int(time.time() - self.opened_at)}s"
                    )
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        with self.lock:
            self.failures = 0
            self.state = State.CLOSED

    def _on_failure(self):
        with self.lock:
            self.failures += 1
            if self.failures >= self.failure_threshold:
                self.state = State.OPEN
                self.opened_at = time.time()

class CircuitOpenError(Exception):
    pass

Wire it into your AI client like this:

python
import anthropic

client = anthropic.Anthropic(
    api_key="sk-your-key",
    base_url="https://ezaiapi.com"
)

breaker = CircuitBreaker(failure_threshold=3, reset_timeout=30)

def ask_claude(prompt: str) -> str:
    def _call():
        return client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}]
        )
    try:
        response = breaker.call(_call)
        return response.content[0].text
    except CircuitOpenError:
        return "Service temporarily unavailable. Please try again shortly."
    except anthropic.APIStatusError as e:
        if e.status_code == 429:
            return "Rate limited — backing off."
        raise

Three consecutive failures and the breaker opens. For the next 30 seconds, every call returns the fallback string instantly — zero network round-trips, zero wasted threads.

Circuit Breaker States Explained

Circuit breaker state diagram — Closed, Open, and Half-Open transitions

Circuit breaker state machine — failures trip the breaker open, timer allows a probe, success resets

The half-open state is where the magic happens. Instead of slamming the recovering API with your full traffic volume, you send a single probe request. If that probe succeeds, traffic resumes normally. If it fails, the breaker reopens and waits another cooldown cycle. This prevents the "thundering herd" problem where a recovering service gets immediately overwhelmed again.

Node.js Implementation

Same pattern in TypeScript for Node.js backends. This version uses async/await and includes an event emitter for monitoring:

typescript
import Anthropic from "@anthropic-ai/sdk";

class CircuitBreaker {
  private failures = 0;
  private state: "closed" | "open" | "half_open" = "closed";
  private openedAt = 0;

  constructor(
    private threshold = 5,
    private resetMs = 60_000
  ) {}

  async exec<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "open") {
      if (Date.now() - this.openedAt > this.resetMs) {
        this.state = "half_open";
      } else {
        throw new Error("Circuit open");
      }
    }
    try {
      const result = await fn();
      this.failures = 0;
      this.state = "closed";
      return result;
    } catch (err) {
      this.failures++;
      if (this.failures >= this.threshold) {
        this.state = "open";
        this.openedAt = Date.now();
      }
      throw err;
    }
  }
}

const client = new Anthropic({
  apiKey: "sk-your-key",
  baseURL: "https://ezaiapi.com",
});

const breaker = new CircuitBreaker(3, 30_000);

async function askClaude(prompt: string) {
  return breaker.exec(() =>
    client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 1024,
      messages: [{ role: "user", content: prompt }],
    })
  );
}

Choosing the Right Thresholds

The two numbers that matter most: failure threshold and reset timeout. Get them wrong and your circuit breaker either never trips (useless) or trips on a single hiccup (overreactive).

Circuit breaker threshold configuration guide

Recommended thresholds by use case — tune based on your traffic volume and latency tolerance

For AI API calls specifically, these starting points work well:

  • Low-traffic apps (under 10 req/min) — threshold: 3, timeout: 60s. You can afford to wait longer.
  • Medium-traffic apps (10–100 req/min) — threshold: 5, timeout: 30s. Fast enough to detect real outages, not jumpy.
  • High-traffic apps (100+ req/min) — threshold: 10, timeout: 15s. At this volume, you'll hit the threshold fast anyway. Shorter timeouts mean faster recovery.

One non-obvious tip: don't count 429 (rate limit) responses as failures. Rate limits are expected behavior, not outages. Count only 500s, timeouts, and connection errors. Otherwise your breaker will trip every time you spike above your rate limit, which defeats the purpose.

Combining Circuit Breakers with Model Fallback

Circuit breakers get really powerful when paired with multi-model fallback. If Claude's circuit trips, route to GPT. If GPT trips too, fall back to a cached response or a smaller model:

python
models = [
    ("claude-sonnet-4-5", CircuitBreaker(threshold=3, reset_timeout=30)),
    ("gpt-4o", CircuitBreaker(threshold=3, reset_timeout=30)),
    ("gemini-2.5-flash", CircuitBreaker(threshold=5, reset_timeout=60)),
]

def ask_with_fallback(prompt: str) -> str:
    for model, breaker in models:
        try:
            response = breaker.call(
                lambda m=model: client.messages.create(
                    model=m, max_tokens=1024,
                    messages=[{"role": "user", "content": prompt}]
                )
            )
            return response.content[0].text
        except (CircuitOpenError, Exception):
            continue  # Try next model
    return "All models unavailable. Please try again later."

With EzAI, this works out of the box because all models share the same endpoint format. The same client instance handles Claude, GPT, and Gemini — just change the model string. Check the pricing page for available models and rates.

Monitoring Your Circuit Breakers

A circuit breaker you can't observe is a circuit breaker you can't trust. At minimum, log every state transition:

python
import logging

logger = logging.getLogger("circuit_breaker")

def _on_failure(self):
    with self.lock:
        self.failures += 1
        if self.failures >= self.failure_threshold:
            self.state = State.OPEN
            self.opened_at = time.time()
            logger.warning(
                "Circuit OPENED — %d consecutive failures, "
                "cooling down for %ds",
                self.failures, self.reset_timeout
            )

For production dashboards, expose metrics via OpenTelemetry: breaker state (as a gauge), trip count (as a counter), and time-in-open (as a histogram). These three metrics tell you everything: how often your breakers trip, which models are flaky, and how long outages last.

Common Pitfalls

After running circuit breakers on AI APIs across dozens of production deployments, these are the mistakes that bite hardest:

  1. Sharing one breaker across all models. Claude going down shouldn't disable your GPT calls. Use one breaker per model or per endpoint.
  2. Setting the threshold to 1. A single timeout isn't an outage — it's Tuesday. Start at 3 minimum.
  3. Forgetting about streaming. If you use SSE streaming, a stream that drops mid-response is a failure. Don't just check the initial connection.
  4. No fallback behavior. Opening the circuit without a fallback means your users see a raw error. Always have a graceful degradation path — even if it's just a cached response.

Wrapping Up

Circuit breakers are one of those patterns that feel like overkill until the first time they save you. Add them early. The Python implementation above is ~40 lines, the Node.js version is shorter, and they'll prevent the kind of cascading failure that turns a 90-second model hiccup into a 20-minute full-stack outage.

Start with the code above, tune the thresholds based on your traffic, and check the EzAI docs for model-specific timeout recommendations. If you're running multiple models, pair circuit breakers with automatic fallback for near-zero downtime.


Related Posts