Build an AI Resume Screener with Python and Claude

Hiring teams drown in resumes. A single job posting for a mid-level Python developer can pull 400+ applications in a week. Reading each one takes 3-5 minutes — that's 30 hours of screening before you've even scheduled a phone call. Claude can do it in seconds per resume, extracting structured data and scoring candidates against your exact requirements.

In this tutorial, you'll build a Python tool that reads PDF resumes, sends them to Claude via EzAI, and outputs a ranked spreadsheet with scores, extracted skills, and red flags. Total API cost: roughly $0.004 per resume using Sonnet.

Architecture Overview

The screener works in three stages. First, it extracts text from PDF resumes using PyPDF2. Then it sends each resume to Claude with a structured prompt that defines the job requirements and scoring rubric. Finally, it collects JSON responses and writes a ranked CSV file.

You'll need three packages beyond the standard library:

bash

pip install anthropic PyPDF2 aiofiles

Extracting Text from PDFs

Most resumes arrive as PDFs. Some are clean text, others are scanned images. We'll handle text-based PDFs first — they cover about 90% of resumes submitted through online portals. For scanned documents, you could add OCR with pytesseract, but that's outside our scope here.

python

import PyPDF2
from pathlib import Path

def extract_resume_text(pdf_path: str) -> str:
    """Extract text from a PDF resume. Returns empty string on failure."""
    try:
        reader = PyPDF2.PdfReader(pdf_path)
        pages = [page.extract_text() for page in reader.pages]
        text = "\n".join(p for p in pages if p)
        return text.strip()
    except Exception as e:
        print(f"Failed to read {pdf_path}: {e}")
        return ""

The function handles encrypted PDFs and corrupted files gracefully — it returns an empty string instead of crashing, so your batch processing continues even when individual files are unreadable.

AI resume screening pipeline: PDF extraction, Claude analysis, CSV output

The three-stage pipeline: extract → analyze → rank

The Screening Prompt

The prompt is where the real work happens. You define the job requirements, the scoring rubric, and the output format. Claude returns structured JSON that you can parse directly — no regex hacking required.

python

SCREENING_PROMPT = """You are a technical recruiter screening resumes.

Job: {job_title}
Required skills: {required_skills}
Nice-to-have: {nice_to_have}
Min years experience: {min_years}

Analyze this resume and return JSON only:
{{
  "name": "candidate full name",
  "email": "email if found",
  "years_experience": number,
  "matched_required": ["skill1", "skill2"],
  "matched_nice_to_have": ["skill1"],
  "missing_required": ["skill1"],
  "score": 0-100,
  "summary": "2-3 sentence assessment",
  "red_flags": ["flag1"] or [],
  "recommendation": "strong_yes | yes | maybe | no"
}}

Scoring rubric:
- Each required skill match: +15 points
- Each nice-to-have match: +5 points
- Years experience >= min: +10 points
- Gaps > 2 years unexplained: -10 points
- Job hopping (< 1 year avg tenure): -5 points

Resume text:
{resume_text}"""

The rubric matters. Without explicit scoring rules, Claude will assign scores based on vibes — and those vibes won't be consistent across 500 resumes. By defining point values for each criterion, you get repeatable, auditable results.

Calling Claude via EzAI

Now wire it up. The screen_resume function sends one resume to Claude and parses the JSON response. We use claude-sonnet-4-5 because it's fast, cheap, and accurate enough for structured extraction tasks.

python

import anthropic
import json

client = anthropic.Anthropic(
    api_key="sk-your-ezai-key",
    base_url="https://ezaiapi.com"
)

def screen_resume(resume_text: str, job_config: dict) -> dict:
    """Send a resume to Claude for analysis. Returns parsed JSON."""
    prompt = SCREENING_PROMPT.format(
        job_title=job_config["title"],
        required_skills=", ".join(job_config["required"]),
        nice_to_have=", ".join(job_config["nice_to_have"]),
        min_years=job_config["min_years"],
        resume_text=resume_text[:8000]  # truncate long resumes
    )

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )

    try:
        return json.loads(response.content[0].text)
    except json.JSONDecodeError:
        # Extract JSON from markdown code blocks if present
        text = response.content[0].text
        if "```json" in text:
            text = text.split("```json")[1].split("```")[0]
        return json.loads(text)

Notice the [:8000] truncation on the resume text. Most resumes are 1-3 pages, which fits well under the limit. But occasionally you'll see a 15-page CV from an academic — truncating prevents unnecessary token burn without losing the relevant information, which almost always appears in the first few pages.

Batch Processing with Concurrency

Processing resumes one at a time is slow. With async concurrency, you can screen 10 resumes simultaneously. Here's the batch processor that handles a folder of PDFs:

python

import asyncio
import csv
from pathlib import Path

async def process_batch(resume_dir: str, job_config: dict, output: str):
    """Screen all PDFs in a directory, write ranked CSV."""
    pdfs = sorted(Path(resume_dir).glob("*.pdf"))
    print(f"Found {len(pdfs)} resumes")

    semaphore = asyncio.Semaphore(10)  # max 10 concurrent
    results = []

    async def screen_one(pdf_path):
        async with semaphore:
            text = extract_resume_text(str(pdf_path))
            if not text:
                return None
            result = await asyncio.to_thread(
                screen_resume, text, job_config
            )
            result["file"] = pdf_path.name
            return result

    tasks = [screen_one(p) for p in pdfs]
    results = await asyncio.gather(*tasks)
    results = [r for r in results if r]

    # Sort by score descending
    results.sort(key=lambda r: r["score"], reverse=True)

    # Write CSV
    fields = ["score", "recommendation", "name", "email",
              "years_experience", "summary", "file"]
    with open(output, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fields,
                                  extrasaction="ignore")
        writer.writeheader()
        writer.writerows(results)

    print(f"Ranked {len(results)} candidates → {output}")

Defining Job Requirements

The job config is a plain dictionary. Keep it in a JSON file so non-technical hiring managers can edit it without touching Python code:

python

job = {
    "title": "Senior Python Developer",
    "required": [
        "Python", "FastAPI or Django", "PostgreSQL",
        "REST APIs", "Docker"
    ],
    "nice_to_have": [
        "Kubernetes", "Redis", "GraphQL",
        "CI/CD pipelines", "AWS"
    ],
    "min_years": 5
}

asyncio.run(process_batch(
    "./resumes", job, "ranked_candidates.csv"
))

Run it and you'll get a CSV sorted by score. The top candidates float to the surface immediately. A batch of 200 resumes finishes in under 2 minutes with 10 concurrent workers.

Resume scoring breakdown showing skill matching and point allocation

How the scoring rubric maps skills to points for consistent ranking

Handling Edge Cases

Real-world resumes are messy. Here are the common edge cases and how to handle them:

Empty PDFs — The extract_resume_text function returns an empty string, and the batch processor skips it. Log these so someone can manually review them.
Non-English resumes — Claude handles multilingual input natively. Add a note to the prompt: "Resume may be in any language. Extract and translate key information to English."
Overstuffed keyword resumes — Some candidates pack invisible text with every skill imaginable. The scoring rubric penalizes this indirectly: if someone claims 50 skills but only has 2 years of experience, the assessment catches the mismatch.
Rate limits — The semaphore caps concurrency at 10, which stays well within EzAI's rate limits. If you're processing thousands of resumes, lower it to 5.

Cost Breakdown

Each resume averages about 1,500 input tokens (the resume text + prompt) and 400 output tokens (the JSON response). Using claude-sonnet-4-5 through EzAI:

Per resume: ~$0.004
100 resumes: ~$0.40
500 resumes: ~$2.00
1,000 resumes: ~$4.00

Compare that to a recruiter spending 30+ hours on manual screening. Even with the most generous hourly rate calculation, you're looking at a 1000x cost reduction. And unlike humans, Claude doesn't get fatigued and start skimming at resume #200.

To track your actual spend, check the EzAI dashboard — it shows per-request cost breakdowns so you can see exactly what each screening run costs.

What's Next

This covers the core pipeline: extract, analyze, rank. From here, you could extend it in several directions:

Add a web UI with FastAPI — we've got a tutorial for that
Store results in a database instead of CSV for long-term tracking
Build a second-pass screener that compares top candidates head-to-head using extended thinking
Add support for DOCX files using python-docx
Feed the structured output into your ATS via webhook

The prompts shown here are starting points. Tune them for your specific hiring needs — add culture-fit criteria, weight certain skills higher, or adjust the scoring rubric based on what your team actually values.

Build an AI Resume Screener with Python and Claude

Architecture Overview

Extracting Text from PDFs

The Screening Prompt

Calling Claude via EzAI

Batch Processing with Concurrency

Defining Job Requirements

Handling Edge Cases

Cost Breakdown

What's Next

Related Posts

Build an AI PDF Analyzer with Python and Claude

Concurrent AI API Requests in Python