EzAI
Back to Blog
Tutorial Mar 20, 2026 8 min read

Build an AI File Organizer with Python and Claude

E

EzAI Team

Build an AI File Organizer with Python and Claude

Your Downloads folder has 2,000 files. Invoices mixed with memes, screenshots tangled with source code, and PDFs from three years ago buried under vacation photos. You could spend a Saturday sorting them manually, or you could let Claude do it in under a minute. In this tutorial, we'll build a Python CLI that reads filenames and content, sends them to Claude for intelligent classification, and moves each file into the right folder automatically.

What We're Building

The finished tool does three things: scans a target directory for unorganized files, sends batches of filenames to Claude for categorization, and moves each file into a labeled subfolder. It handles duplicates, respects dry-run mode so you can preview changes before committing, and processes hundreds of files in a single API call by batching filenames together.

The entire script is around 120 lines of Python. No ML training, no custom classifiers — just a well-crafted prompt and structured JSON output from Claude.

Prerequisites

You'll need Python 3.10+, an EzAI API key, and the Anthropic SDK:

bash
pip install anthropic

Set your EzAI credentials as environment variables:

bash
export ANTHROPIC_API_KEY="sk-your-ezai-key"
export ANTHROPIC_BASE_URL="https://ezaiapi.com"

The Classification Prompt

The prompt is the brain of the whole operation. We send Claude a list of filenames and ask it to return a JSON array mapping each file to a category. The key trick: we let Claude infer categories from the actual files rather than forcing a fixed list. This means the tool adapts to any directory — a developer's project folder gets categories like source-code, configs, and docs, while a photographer's folder gets raw-photos, edits, and exports.

python
SYSTEM_PROMPT = """You are a file organization assistant. Given a list of filenames,
classify each into a logical folder category.

Rules:
- Use lowercase kebab-case for category names (e.g. "source-code", "invoices")
- Create 5-12 categories max — group similar files together
- Common categories: documents, images, screenshots, source-code,
  configs, archives, videos, music, spreadsheets, presentations, misc
- If a file doesn't fit anywhere obvious, use "misc"
- Return ONLY valid JSON, no markdown fences

Output format:
[{"file": "example.py", "category": "source-code"}, ...]"""
AI file organizer pipeline: scan → batch → classify → move

The pipeline: scan files → batch filenames → Claude classifies → move to folders

Building the Organizer

Here's the complete script. It scans the target directory, batches filenames into groups of 200 (to stay within token limits), sends each batch to Claude, and moves files based on the response:

python — organize.py
import os, json, shutil, argparse
from pathlib import Path
import anthropic

client = anthropic.Anthropic(
    base_url=os.getenv("ANTHROPIC_BASE_URL", "https://ezaiapi.com")
)

BATCH_SIZE = 200  # filenames per API call

def scan_files(directory: Path) -> list[str]:
    """Get all files in directory (not subdirs)."""
    return [
        f.name for f in directory.iterdir()
        if f.is_file() and not f.name.startswith(".")
    ]

def classify_batch(filenames: list[str]) -> list[dict]:
    """Send a batch of filenames to Claude for classification."""
    file_list = "\n".join(filenames)

    message = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=4096,
        system=SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Classify these files:\n{file_list}"
        }]
    )

    raw = message.content[0].text.strip()
    return json.loads(raw)

def move_file(src: Path, dest_dir: Path, dry_run: bool) -> str:
    """Move file to destination, handling duplicates."""
    dest_dir.mkdir(parents=True, exist_ok=True)
    dest = dest_dir / src.name

    # Handle duplicate filenames
    if dest.exists():
        stem, suffix = src.stem, src.suffix
        counter = 1
        while dest.exists():
            dest = dest_dir / f"{stem}_{counter}{suffix}"
            counter += 1

    if dry_run:
        return f"[DRY RUN] {src.name} → {dest_dir.name}/"

    shutil.move(str(src), str(dest))
    return f"✓ {src.name} → {dest_dir.name}/"

def organize(directory: str, dry_run: bool = False):
    target = Path(directory).resolve()
    files = scan_files(target)

    if not files:
        print("No files found to organize.")
        return

    print(f"Found {len(files)} files. Classifying...")

    # Process in batches
    all_mappings = []
    for i in range(0, len(files), BATCH_SIZE):
        batch = files[i:i + BATCH_SIZE]
        print(f"  Batch {i // BATCH_SIZE + 1}: {len(batch)} files")
        mappings = classify_batch(batch)
        all_mappings.extend(mappings)

    # Move files
    categories = {}
    for item in all_mappings:
        src = target / item["file"]
        if not src.exists():
            continue
        cat = item["category"]
        dest_dir = target / cat
        result = move_file(src, dest_dir, dry_run)
        print(f"  {result}")
        categories[cat] = categories.get(cat, 0) + 1

    print(f"\nOrganized into {len(categories)} categories:")
    for cat, count in sorted(categories.items()):
        print(f"  📁 {cat}: {count} files")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="AI-powered file organizer")
    parser.add_argument("directory", help="Path to organize")
    parser.add_argument("--dry-run", action="store_true",
                        help="Preview changes without moving files")
    args = parser.parse_args()
    organize(args.directory, args.dry_run)

Run it with --dry-run first to preview the results:

bash
python organize.py ~/Downloads --dry-run

# Output:
# Found 847 files. Classifying...
#   Batch 1: 200 files
#   Batch 2: 200 files
#   Batch 3: 200 files
#   Batch 4: 200 files
#   Batch 5: 47 files
#   [DRY RUN] invoice_2025_03.pdf → invoices/
#   [DRY RUN] IMG_4521.jpg → images/
#   [DRY RUN] main.py → source-code/
#   ...
# Organized into 9 categories:
#   📁 archives: 23 files
#   📁 documents: 156 files
#   📁 images: 312 files
#   📁 source-code: 89 files

Adding Content-Aware Classification

Filename-based classification works for 90% of files. But what about report.pdf — is that a financial report, a bug report, or a school assignment? For ambiguous files, we can peek at the content. Here's an enhanced version that reads the first 500 bytes of text-based files:

python
TEXT_EXTENSIONS = {".txt", ".md", ".py", ".js", ".ts", ".json", ".csv", ".log"}

def get_file_context(filepath: Path) -> str:
    """Read first 500 bytes of text files for better classification."""
    if filepath.suffix.lower() not in TEXT_EXTENSIONS:
        return ""
    try:
        with open(filepath, "r", errors="ignore") as f:
            preview = f.read(500)
        return f" [preview: {preview[:200]}]"
    except:
        return ""

def classify_batch_with_context(directory: Path, filenames: list[str]) -> list[dict]:
    """Classify files with optional content previews."""
    entries = []
    for name in filenames:
        ctx = get_file_context(directory / name)
        entries.append(f"{name}{ctx}")

    message = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=4096,
        system=SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": "Classify these files:\n" + "\n".join(entries)
        }]
    )
    return json.loads(message.content[0].text.strip())

Content-aware mode uses slightly more tokens per batch, but the classification accuracy jumps dramatically. A file named notes.txt containing meeting minutes gets sorted into meetings instead of the generic documents folder.

Token cost comparison: filename-only vs content-aware classification

Cost per 1,000 files: filename-only ~$0.003 vs content-aware ~$0.02 via EzAI

Handling Edge Cases

Real-world directories are messy. Here are the edge cases the script handles:

  • Duplicate filenames — if images/photo.jpg already exists, the file becomes images/photo_1.jpg
  • Hidden files — anything starting with . is skipped (your .gitignore stays put)
  • Empty directories — the script exits gracefully with a message
  • Huge directories — batching at 200 files keeps each API call under 8K tokens
  • JSON parse failures — wrap the parse in a try/except and retry once with a stricter prompt

One thing the script intentionally does not do: it won't recurse into subdirectories. If you've already organized some files into folders, those folders stay untouched. Only loose files in the top-level directory get classified.

Cost Breakdown

Using Claude Sonnet 4.5 through EzAI's pricing, the cost is remarkably low:

  • 200 filenames ≈ 800 input tokens + 2,000 output tokens = ~$0.003 per batch
  • 1,000 files = 5 batches ≈ $0.015 total
  • Content-aware mode adds ~2x token usage, still under $0.04 for 1,000 files

For comparison, manually organizing 1,000 files takes 2-4 hours. The AI version takes 15 seconds and costs less than a penny through EzAI. If you need even lower costs, swap claude-sonnet-4-5 for a free model — filename-only classification works well on smaller models too.

Extending the Tool

The base script is deliberately minimal. Here are practical extensions you can add:

  • Undo support — log every move to a JSON file, then add an --undo flag that reverses all operations
  • Custom rules — pass a --rules flag with domain-specific categories ("put all .blend files in 3d-projects")
  • Watch mode — use watchdog to monitor a folder and auto-organize new files as they arrive
  • Date-based sorting — combine AI categories with file modification dates for time-bucketed folders like invoices/2026-Q1/

The pattern here scales to any classification task. Swap filenames for email subjects, support tickets, or customer feedback — the approach is identical. Send text to Claude, get structured categories back, act on the result.

Wrapping Up

You now have a working AI file organizer that classifies hundreds of files in seconds. The cost is negligible through EzAI, the accuracy is surprisingly good (even with filename-only mode), and the dry-run flag means zero risk of accidental data loss.

Grab an EzAI API key, copy the script, and point it at your messiest folder. Then check our guides on structured JSON output and batch API requests for more patterns like this.


Related Posts