Build an AI Chatbot API with Node.js and TypeScript

Most AI chatbot tutorials are Python-first. That's fine if your stack is Python — but if you're already running Node.js in production, spinning up a separate Python service just for AI feels wrong. You end up managing two runtimes, two dependency trees, and two deployment pipelines for what should be a single endpoint.

This guide builds a complete AI chatbot API with Express, TypeScript, and the Anthropic SDK, routed through EzAI for multi-model access at lower cost. You'll get streaming Server-Sent Events, per-session conversation memory, and proper error handling — all in under 100 lines of core logic.

What You'll Build

A REST API with two endpoints:

POST /api/chat — Send a message, get a streamed AI response
DELETE /api/chat/:sessionId — Clear conversation history for a session

Each session maintains its own conversation thread. The chatbot remembers previous messages within a session, so users can ask follow-up questions without repeating context. Responses stream back as Server-Sent Events — your frontend sees tokens appear in real time instead of waiting for the full response.

Node.js AI chatbot architecture diagram showing request flow from client through Express to EzAI API

Request flow — from client POST to streamed AI response via EzAI

Project Setup

Initialize a TypeScript project and install the dependencies. You need express for routing, @anthropic-ai/sdk for the AI calls, and uuid for generating session IDs.

bash

mkdir ai-chatbot && cd ai-chatbot
npm init -y
npm i express @anthropic-ai/sdk uuid
npm i -D typescript @types/express @types/uuid tsx
npx tsc --init --target es2022 --module nodenext --outDir dist

Set your EzAI API key as an environment variable. If you've already run the install script, this is already configured:

bash

export ANTHROPIC_API_KEY="sk-your-ezai-key"
export ANTHROPIC_BASE_URL="https://ezaiapi.com"

The Chatbot Server

Here's the full implementation. Create src/server.ts:

typescript

import express from "express";
import Anthropic from "@anthropic-ai/sdk";
import { v4 as uuid } from "uuid";

const app = express();
app.use(express.json());

// EzAI client — reads ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL from env
const client = new Anthropic();

// In-memory conversation store
type Message = { role: "user" | "assistant"; content: string };
const sessions = new Map<string, Message[]>();

const SYSTEM_PROMPT = `You are a helpful coding assistant. Be concise.
Give code examples when relevant. Use markdown formatting.`;

const MAX_MESSAGES = 50; // trim older messages to control token usage

app.post("/api/chat", async (req, res) => {
  const { message, sessionId = uuid() } = req.body;

  if (!message || typeof message !== "string") {
    return res.status(400).json({ error: "message is required" });
  }

  // Get or create conversation history
  if (!sessions.has(sessionId)) sessions.set(sessionId, []);
  const history = sessions.get(sessionId)!;
  history.push({ role: "user", content: message });

  // Trim to last N messages to avoid token explosion
  if (history.length > MAX_MESSAGES) {
    history.splice(0, history.length - MAX_MESSAGES);
  }

  // Set up SSE headers
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");
  res.setHeader("X-Session-Id", sessionId);

  try {
    const stream = client.messages.stream({
      model: "claude-sonnet-4-5",
      max_tokens: 2048,
      system: SYSTEM_PROMPT,
      messages: history,
    });

    let fullResponse = "";

    stream.on("text", (text) => {
      fullResponse += text;
      res.write(`data: ${JSON.stringify({ type: "text", text })}\n\n`);
    });

    await stream.finalMessage();

    // Save assistant response to history
    history.push({ role: "assistant", content: fullResponse });

    // Send final event with metadata
    const usage = (await stream.finalMessage()).usage;
    res.write(`data: ${JSON.stringify({
      type: "done",
      sessionId,
      usage: { input: usage.input_tokens, output: usage.output_tokens }
    })}\n\n`);
    res.end();
  } catch (err: any) {
    res.write(`data: ${JSON.stringify({ type: "error", error: err.message })}\n\n`);
    res.end();
  }
});

// Clear a session's conversation history
app.delete("/api/chat/:sessionId", (req, res) => {
  sessions.delete(req.params.sessionId);
  res.json({ cleared: true });
});

app.listen(3000, () => console.log("Chatbot API running on :3000"));

That's the entire backend. Run it with npx tsx src/server.ts and you've got a working AI chatbot API.

Streaming on the Frontend

The server pushes tokens as SSE events. On the client side, you consume them with the EventSource API or a simple fetch reader. Here's the fetch approach — it gives you more control over headers and error handling:

typescript

async function chat(message: string, sessionId: string) {
  const res = await fetch("http://localhost:3000/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message, sessionId }),
  });

  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() ?? "";

    for (const line of lines) {
      if (!line.startsWith("data: ")) continue;
      const event = JSON.parse(line.slice(6));

      if (event.type === "text") {
        process.stdout.write(event.text); // or append to DOM
      } else if (event.type === "done") {
        console.log(`\n[${event.usage.input}in / ${event.usage.output}out tokens]`);
      }
    }
  }
}

Adding Error Recovery

Production chatbots need to handle API failures gracefully. EzAI supports multi-model fallback, but you should also handle transient errors in your server code. Wrap the streaming call with a retry:

typescript

async function streamWithRetry(
  messages: Message[],
  retries = 2,
  models = ["claude-sonnet-4-5", "gpt-4.1-mini"]
) {
  for (const model of models) {
    for (let attempt = 0; attempt <= retries; attempt++) {
      try {
        return client.messages.stream({
          model,
          max_tokens: 2048,
          system: SYSTEM_PROMPT,
          messages,
        });
      } catch (err: any) {
        if (err.status === 429) {
          // Rate limited — wait and retry
          await new Promise(r => setTimeout(r, 1000 * (attempt + 1)));
          continue;
        }
        if (err.status >= 500) break; // try next model
        throw err; // 4xx (not 429) = client error, don't retry
      }
    }
  }
  throw new Error("All models failed");
}

This tries Claude Sonnet 4.5 first, and if it's down or rate-limited, falls back to GPT-4.1 Mini — both available through the same EzAI endpoint with no code changes. Your users never see a failure.

Managing Token Costs

Conversation memory grows fast. A 50-message thread can burn 30k+ input tokens per request because you're re-sending the full history every time. Three strategies to keep costs under control:

Sliding window — Keep only the last N messages (the MAX_MESSAGES constant above). Simple and effective for most chatbots.
Summarize and compress — Every 20 messages, ask the model to summarize the conversation so far, then replace the history with the summary. Cuts tokens by 80%+.
Use cheaper models for simple turns — Route short factual questions to claude-haiku-3.5 at 1/10th the cost. Save Sonnet for complex reasoning. EzAI lets you route dynamically per request.

Check your actual spending on the EzAI dashboard — it shows per-request token counts and costs in real time.

Deploy and Test

Add a start script to package.json and test it end-to-end:

bash

# Start the server
npx tsx src/server.ts

# In another terminal — test with curl
curl -N http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I read a file in Node.js?", "sessionId": "test-1"}'

You'll see tokens stream back in real time. Send a follow-up message with the same sessionId to verify conversation memory works — the bot will remember your previous question.

For production deployments, swap the in-memory Map for Redis or a database, add authentication middleware, and you're shipping a real product. The core AI logic stays exactly the same.

Wrapping Up

You've built a streaming AI chatbot API that handles conversation memory, error recovery, and multi-model fallback — entirely in TypeScript, running on the same Node.js stack as the rest of your app. No Python runtime, no extra infrastructure.

The full code is under 100 lines. EzAI handles the model routing and cost optimization behind the scenes, so you can focus on building the product instead of managing API keys across providers.

Sign up for EzAI and grab your API key
Read the API docs for advanced features like extended thinking
Check out the streaming guide for deeper SSE patterns

Build an AI Chatbot API with Node.js and TypeScript

What You'll Build

Project Setup

The Chatbot Server

Streaming on the Frontend

Adding Error Recovery

Managing Token Costs

Deploy and Test

Wrapping Up

Related Posts

How to Stream AI Responses in Real-Time

Build an AI Discord Bot with Python and Claude