Your AI-powered feature shipped Monday. Tests passed. Users loved it. By Wednesday, the same prompts returned garbled outputs, confidence scores dropped 15%, and your structured JSON parser started throwing exceptions. Nothing in your code changed. The model did.
Model version pinning prevents this exact scenario. Instead of pointing at claude-sonnet-4-20250514 and hoping for the best, you lock your production traffic to a specific snapshot and control when upgrades happen. This guide covers practical pinning strategies across Claude, GPT, and Gemini — with code you can drop into your EzAI integration today.
Why Model Updates Break Production
AI providers update models constantly. Anthropic ships new Claude snapshots every few weeks. OpenAI rotates GPT versions behind aliases. Google pushes Gemini updates silently. Each update can shift behavior in ways that look minor in benchmarks but cause real damage in production:
- Output format drift — A model that reliably returned JSON starts wrapping it in markdown code fences
- Tone and style shifts — Customer-facing responses change personality overnight
- Token count changes — Same prompt costs 20% more tokens on the new version
- Regression on edge cases — Niche prompts that worked fine get misinterpreted
- Structured output breakage — Field names change, enums gain new values, nested objects flatten
The fix isn't avoiding updates — it's controlling when they happen and validating before they reach users.
Pinning Strategies by Provider
Claude (Anthropic)
Anthropic uses date-based snapshot IDs. The alias claude-sonnet-4-20250514 points to a specific snapshot that Anthropic may update. To pin, use the full snapshot ID:
import httpx
# ❌ Floating alias — Anthropic can update this anytime
MODEL_RISKY = "claude-sonnet-4-20250514"
# ✅ Pinned snapshot — won't change until YOU change it
MODEL_PINNED = "claude-sonnet-4-20250514"
async def call_claude(prompt: str, model: str = MODEL_PINNED):
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://api.ezaiapi.com/v1/messages",
headers={
"x-api-key": EZAI_KEY,
"anthropic-version": "2023-06-01",
},
json={
"model": model,
"max_tokens": 1024,
"messages": [{"role": "user", "content": prompt}],
},
)
return resp.json()
Store the pinned model ID in an environment variable or config file — not hardcoded across 30 files. When you're ready to upgrade, change it in one place.
GPT (OpenAI)
OpenAI provides dated snapshots like gpt-4o-2024-11-20. The bare alias gpt-4o floats to whatever OpenAI considers current. Always pin the dated version:
const MODEL_CONFIG = {
// Pin to dated snapshot — update manually after testing
production: "gpt-4o-2024-11-20",
// Staging floats to test upcoming changes
staging: "gpt-4o",
};
async function complete(prompt, env = "production") {
const res = await fetch("https://api.ezaiapi.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${EZAI_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: MODEL_CONFIG[env],
messages: [{ role: "user", content: prompt }],
}),
});
return res.json();
}
Gemini (Google)
Google uses a similar pattern with gemini-2.5-pro as the floating alias. Pin to the full version string when available, and track release notes from Google's changelog.
The Staging-First Upgrade Pattern
Pinning alone isn't enough. You need a process for when to unpin. Here's the pattern that works for teams running AI in production through EzAI:
- Production runs the pinned version. Never changes without a PR.
- Staging runs the floating alias. New model versions hit staging first.
- Eval suite runs nightly against staging. Catches regressions before anyone notices.
- Upgrade PR bumps the pinned version after evals pass for 3+ consecutive days.
# config/models.yml — single source of truth
models:
claude:
production: "claude-sonnet-4-20250514"
staging: "claude-sonnet-4-20250514"
last_validated: "2026-04-10"
gpt:
production: "gpt-4o-2024-11-20"
staging: "gpt-4o"
last_validated: "2026-04-08"
The last_validated field is your alarm clock. If it's older than 30 days, someone should be running evals.
Building an Eval Suite That Catches Drift
You don't need a fancy ML framework for model evals. A Python script with assertions covers 90% of production use cases. The key is testing your actual prompts against your actual expectations:
import json, httpx, asyncio
EZAI_URL = "https://api.ezaiapi.com/v1/messages"
EVAL_CASES = [
{
"name": "json_extraction",
"prompt": "Extract {name, email} from: 'Contact Jane at [email protected]'",
"check": lambda r: "[email protected]" in r and "Jane" in r,
},
{
"name": "sentiment_positive",
"prompt": "Classify sentiment (positive/negative/neutral): 'This product saved me 4 hours a week'",
"check": lambda r: "positive" in r.lower(),
},
{
"name": "code_generation",
"prompt": "Write a Python function that reverses a string. Return ONLY the function.",
"check": lambda r: "def " in r and "return" in r,
},
]
async def run_eval(model: str):
results = []
async with httpx.AsyncClient(timeout=30) as client:
for case in EVAL_CASES:
resp = await client.post(EZAI_URL, headers={
"x-api-key": EZAI_KEY,
"anthropic-version": "2023-06-01",
}, json={
"model": model,
"max_tokens": 512,
"messages": [{"role": "user", "content": case["prompt"]}],
})
text = resp.json()["content"][0]["text"]
passed = case["check"](text)
results.append({"name": case["name"], "passed": passed})
print(f"{'✅' if passed else '❌'} {case['name']}")
return results
# Run against both pinned and floating
asyncio.run(run_eval("claude-sonnet-4-20250514"))
Run this in CI on a schedule. When evals start failing on the floating alias, you know a model update landed — and you know before it hits production.
Rollback Strategies
Sometimes you upgrade and things break in ways evals didn't catch. Here's how to roll back fast:
Environment variable swap
The simplest approach. Store your model version in an env var and swap it without redeploying code:
# Upgrade
export CLAUDE_MODEL="claude-sonnet-4-20250514"
pm2 restart my-app
# Rollback (30 seconds, no deploy needed)
export CLAUDE_MODEL="claude-sonnet-4-20250514"
pm2 restart my-app
EzAI model aliasing
With EzAI's model routing, you can map custom aliases to specific model versions in your dashboard. Your code references my-production-claude and the actual model version is managed through EzAI — no code changes needed for upgrades or rollbacks.
Canary deployments
Route 5% of traffic to the new model version, monitor error rates and latency for 24 hours, then gradually increase. If anything spikes, flip back to 0%. EzAI's usage dashboard makes it straightforward to compare cost-per-request between versions side by side.
Multi-Model Pinning with Fallback
Production systems often use multiple models. Pin each independently and define fallback chains so a deprecated version doesn't take down your entire stack:
import os
# Each model pinned independently with fallback chain
MODEL_CHAINS = {
"summarize": [
os.getenv("SUMMARIZE_MODEL", "claude-sonnet-4-20250514"),
"claude-haiku-4-20250414", # fallback
],
"code_review": [
os.getenv("CODE_MODEL", "claude-sonnet-4-20250514"),
"gpt-4o-2024-11-20", # cross-provider fallback
],
}
async def call_with_fallback(task: str, prompt: str):
chain = MODEL_CHAINS.get(task, ["claude-sonnet-4-20250514"])
for model in chain:
try:
return await call_ezai(model, prompt)
except Exception as e:
print(f"Model {model} failed: {e}, trying next...")
raise RuntimeError(f"All models failed for task: {task}")
Cross-provider fallback is where EzAI earns its keep. One API key, one endpoint — if Claude's snapshot gets deprecated, fall back to GPT without changing any infrastructure. Check the multi-model fallback guide for the full pattern.
When NOT to Pin
Pinning isn't always the right call:
- Internal tools with low stakes — If a Slack bot summarizing threads changes tone slightly, nobody notices. Let it float.
- Development and prototyping — You want the latest capabilities while building. Pin only when shipping.
- Cost-sensitive batch jobs — Newer model versions sometimes cost less per token. A floating alias on a nightly batch job keeps costs trending down.
The rule of thumb: if a change in model behavior would trigger a support ticket, pin it. If it wouldn't, let it float.
Model version pinning is the seatbelt of AI engineering. You don't notice it until the crash. Set up pinning once, build the eval pipeline, and model updates become a controlled upgrade instead of a 3am incident. Start with EzAI's getting started guide if you haven't yet — you'll have pinned models running in under 10 minutes.