Supra Builds

Beyond Chatbots: Building Real-World Stateful AI Agents on Cloudflare

黃小黃 — Tue, 17 Feb 2026 10:00:18 GMT

Most "AI agents" you see today are just LLM wrappers with a fancy prompt. They process a request, return a response, and forget everything. No memory. No scheduling. No persistence.

Real agents are different. They remember what happened yesterday. They wake up at 3 AM to check on things. They pause and ask for human approval when stakes are high. They maintain state across sessions, making decisions based on accumulated context — not just the current prompt.

In this tutorial, we'll build exactly that: a Smart Site Reliability Agent that monitors your websites, uses AI to detect anomalies, and escalates critical issues to you — all running on Cloudflare's edge network with zero cost when idle.

No chatbot UI. No conversational fluff. Just a stateful, autonomous agent doing real work.

What Makes an AI Agent "Stateful"?

A stateful AI agent is a long-running program that persists its memory, decisions, and context across interactions and restarts. Unlike stateless LLM calls where each request starts from scratch, a stateful agent accumulates knowledge over time.

Here's the key difference:

	Stateless LLM Wrapper	Stateful AI Agent
Memory	None between requests	Persistent across sessions
Scheduling	Only responds when called	Can wake itself up on a schedule
Context	Single conversation turn	Accumulated history and patterns
Decision Making	Reactive only	Proactive — acts on its own
Cost When Idle	$0	$0 (with hibernation)

Think of it this way: a stateless LLM call is like asking a stranger for directions every time. A stateful agent is like having an assistant who knows your route, remembers the traffic patterns, and proactively suggests alternatives before you even ask.

The challenge has always been: where do you run a stateful agent in production? Traditional serverless functions are stateless by design. Containers require always-on infrastructure. That's where Cloudflare's approach gets interesting.

Why Cloudflare for AI Agents?

Cloudflare's Agents SDK is built on top of Durable Objects — essentially stateful micro-servers that live on Cloudflare's global edge network. Each agent instance is its own isolated server with:

Built-in SQLite database — No external database needed. Your agent's memory lives right next to its compute.
WebSocket support with hibernation — Real-time connections that cost nothing when idle. The agent wakes up only when a message arrives.
Scheduled tasks (alarms) — Cron-like scheduling built into the runtime. Your agent can wake itself up to do work.
Automatic global distribution — Each agent instance runs closest to where it's needed.

The killer feature? Hibernation. When your agent has no active connections and no pending alarms, it literally costs $0. It's like having a dedicated server that only charges you when it's thinking.

When to Use What

Before reaching for the Agents SDK, consider the alternatives:

Use Case	Best Choice
Simple request/response AI	Regular Worker + Workers AI
Multi-step background jobs	Cloudflare Workflows
Stateful, long-lived agent with real-time sync	Agents SDK ✅
Key-value state without real-time	Durable Objects directly

The Agents SDK shines when you need persistent state + real-time communication + scheduled tasks in one package.

What We'll Build: A Smart Site Reliability Agent

Our agent isn't a simple uptime checker. It's an AI-powered reliability monitor that:

Feature	SDK Capability
⏰ Runs health checks every 5 minutes	`scheduleEvery()`
💾 Stores check history in SQLite	`this.sql`
🧠 Uses AI to detect anomaly patterns	AI SDK integration
📡 Pushes live updates to a dashboard	WebSocket + `useAgent`
🔧 Supports manual controls via RPC	`@callable()`
🚨 Escalates critical issues for human approval	Human-in-the-loop

By the end, you'll have a fully deployed agent that watches over your sites and thinks about what it sees — not just whether a URL returns 200.

Project Setup

Prerequisites

Node.js 20+ (Node 24+ recommended)
A Cloudflare account (Workers Paid plan for Durable Objects)
An API key from any LLM provider (OpenAI, Anthropic, or Cloudflare Workers AI)

Scaffold the Project

npm create cloudflare@latest site-reliability-agent -- --template cloudflare/agents-starter
cd site-reliability-agent
npm install

Project Structure

site-reliability-agent/
├── src/
│   ├── server.ts          # Agent class + Worker entry
│   └── client.tsx         # React dashboard with useAgent
├── wrangler.jsonc         # Cloudflare configuration
├── .dev.vars              # Local secrets (API keys)
└── package.json

Wrangler Configuration

// wrangler.jsonc
{
  "name": "site-reliability-agent",
  "main": "src/server.ts",
  "compatibility_flags": ["nodejs_compat"],
  "durable_objects": {
    "bindings": [
      {
        "name": "SiteAgent",
        "class_name": "SiteAgent"
      }
    ]
  },
  "migrations": [
    {
      "tag": "v1",
      "new_sqlite_classes": ["SiteAgent"]
    }
  ]
}

Add your LLM API key to .dev.vars:

# .dev.vars (never commit this file)
OPENAI_API_KEY=sk-your-key-here

Building the Agent Core

Defining State and the Agent Class

Let's start with the agent's state shape and core class:

// src/server.ts
import { Agent, routeAgentRequest } from "agents";

type Env = {
  SiteAgent: DurableObjectNamespace;
  OPENAI_API_KEY: string;
};

type SiteStatus = "healthy" | "degraded" | "down" | "unknown";

type AgentState = {
  monitoredUrls: string[];
  checkIntervalMinutes: number;
  lastCheckAt: string | null;
  currentStatus: Record<string, SiteStatus>;
  alertsEnabled: boolean;
  pendingEscalation: {
    url: string;
    reason: string;
    timestamp: string;
  } | null;
};

export class SiteAgent extends Agent {
  // Default state when the agent is first created
  initialState: AgentState = {
    monitoredUrls: [],
    checkIntervalMinutes: 5,
    lastCheckAt: null,
    currentStatus: {},
    alertsEnabled: true,
    pendingEscalation: null,
  };

  async onStart() {
    // Initialize the SQLite table for check history
    this.sql`
      CREATE TABLE IF NOT EXISTS check_history (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        url TEXT NOT NULL,
        status_code INTEGER,
        response_time_ms INTEGER,
        status TEXT NOT NULL,
        ai_analysis TEXT,
        checked_at TEXT DEFAULT (datetime('now'))
      )
    `;
  }
}

A few things to notice:

initialState sets the default state for new agent instances
this.sql is a tagged template literal — it gives you direct SQLite access, no ORM needed
State updates via setState() are automatically synced to all connected WebSocket clients

Health Check Logic with Scheduled Tasks

Now let's add the scheduled health checks:

// Inside the SiteAgent class

async onStart() {
  // ... SQLite init from above ...

  // Start the health check schedule
  if (this.state.monitoredUrls.length > 0) {
    this.scheduleEvery("runHealthChecks", `*/${this.state.checkIntervalMinutes} * * * *`);
  }
}

async runHealthChecks() {
  const results: Record<string, SiteStatus> = {};

  for (const url of this.state.monitoredUrls) {
    const result = await this.checkUrl(url);
    results[url] = result.status;

    // Store in SQLite
    this.sql`
      INSERT INTO check_history (url, status_code, response_time_ms, status)
      VALUES (${url}, ${result.statusCode}, ${result.responseTime}, ${result.status})
    `;
  }

  this.setState({
    currentStatus: results,
    lastCheckAt: new Date().toISOString(),
  });

  // Broadcast to all connected dashboard clients
  this.broadcast(JSON.stringify({
    type: "health_check_complete",
    results,
    timestamp: new Date().toISOString(),
  }));
}

private async checkUrl(url: string): Promise<{
  statusCode: number;
  responseTime: number;
  status: SiteStatus;
}> {
  const start = Date.now();

  try {
    const response = await fetch(url, {
      method: "GET",
      signal: AbortSignal.timeout(10_000), // 10s timeout
    });

    const responseTime = Date.now() - start;
    let status: SiteStatus = "healthy";

    if (!response.ok) {
      status = response.status >= 500 ? "down" : "degraded";
    } else if (responseTime > 3000) {
      status = "degraded";
    }

    return { statusCode: response.status, responseTime, status };
  } catch {
    return { statusCode: 0, responseTime: Date.now() - start, status: "down" };
  }
}

The scheduleEvery method accepts a cron expression. Every 5 minutes, the agent wakes up from hibernation, runs all health checks, stores results, updates its state, and broadcasts to any connected dashboards — then goes back to sleep.

Querying History with SQLite

The built-in SQLite database makes historical queries trivial:

// Inside the SiteAgent class

private getRecentHistory(url: string, limit = 20) {
  return this.sql<{
    status_code: number;
    response_time_ms: number;
    status: string;
    ai_analysis: string | null;
    checked_at: string;
  }>`
    SELECT status_code, response_time_ms, status, ai_analysis, checked_at
    FROM check_history
    WHERE url = ${url}
    ORDER BY checked_at DESC
    LIMIT ${limit}
  `;
}

private getStatusTrend(url: string) {
  return this.sql<{ status: string; count: number }>`
    SELECT status, COUNT(*) as count
    FROM check_history
    WHERE url = ${url}
      AND checked_at > datetime('now', '-1 hour')
    GROUP BY status
  `;
}

No external database. No connection strings. No cold starts on DB connections. The data lives right next to the agent's compute.

Adding AI-Powered Analysis

This is where our agent goes from "uptime checker" to "site reliability engineer." Instead of just checking status codes, we feed the check history to an LLM for pattern analysis.

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

// Inside the SiteAgent class

async runHealthChecks() {
  // ... health check logic from above ...

  // After checks complete, ask AI to analyze patterns
  const hasIssues = Object.values(results).some(
    (s) => s === "degraded" || s === "down"
  );

  if (hasIssues) {
    await this.analyzeWithAI(results);
  }
}

private async analyzeWithAI(currentResults: Record<string, SiteStatus>) {
  // Gather recent history for context
  const historyByUrl: Record<string, any[]> = {};
  for (const url of this.state.monitoredUrls) {
    historyByUrl[url] = this.getRecentHistory(url, 10);
  }

  const { text: analysis } = await generateText({
    model: openai("gpt-4o-mini", { structuredOutputs: true }),
    system: `You are a site reliability engineer analyzing website health data.
Be concise and actionable. Focus on patterns, not individual data points.
Flag anything that suggests an emerging problem, not just current outages.`,
    prompt: `Current check results: ${JSON.stringify(currentResults)}

Recent history (last 10 checks per URL):
${JSON.stringify(historyByUrl, null, 2)}

Analyze:
1. Are there any concerning patterns (increasing latency, intermittent failures)?
2. Is this likely a transient issue or systematic problem?
3. Recommended action: MONITOR, INVESTIGATE, or ESCALATE?`,
  });

  // Store the analysis
  for (const [url, status] of Object.entries(currentResults)) {
    if (status !== "healthy") {
      this.sql`
        UPDATE check_history
        SET ai_analysis = ${analysis}
        WHERE url = ${url}
        AND id = (SELECT MAX(id) FROM check_history WHERE url = ${url})
      `;
    }
  }

  // If AI recommends escalation, trigger human-in-the-loop
  if (analysis.includes("ESCALATE")) {
    this.setState({
      pendingEscalation: {
        url: Object.entries(currentResults)
          .filter(([, s]) => s !== "healthy")
          .map(([u]) => u)
          .join(", "),
        reason: analysis,
        timestamp: new Date().toISOString(),
      },
    });

    this.broadcast(JSON.stringify({
      type: "escalation_required",
      analysis,
      timestamp: new Date().toISOString(),
    }));
  }
}

The AI doesn't just check if a site is up — it looks at patterns. Is response time gradually increasing? Are failures clustered at specific times? Is this a CDN issue or an origin server problem? These are the kinds of insights that turn raw data into actionable intelligence.

Real-Time Dashboard with useAgent

The agent handles the backend. Now let's build a React frontend that stays in sync via WebSocket.

Connecting with useAgent

// src/client.tsx
import { useAgent } from "agents/react";

function Dashboard() {
  const agent = useAgent({
    agent: "site-agent",
    name: "my-sites", // Each unique name = unique agent instance
  });

  if (!agent.state) return Connecting to agent...;

  return (
    
      
        Site Reliability Agent
        
          Last check: {agent.state.lastCheckAt ?? "Never"}
        
      

      
        {agent.state.monitoredUrls.map((url) => (
          
        ))}
      

      {agent.state.pendingEscalation && (
         agent.stub.acknowledgeEscalation()}
          onDismiss={() => agent.stub.dismissEscalation()}
        />
      )}

      
    
  );
}

When the agent calls setState(), every connected dashboard updates instantly — no polling, no refetching. The useAgent hook handles WebSocket connection, reconnection, and state synchronization automatically.

Callable Methods for Manual Controls

The @callable() decorator exposes server-side methods that the frontend can call with full type safety:

// In src/server.ts — inside SiteAgent class

@callable()
async addUrl(url: string) {
  if (this.state.monitoredUrls.includes(url)) {
    return { success: false, error: "URL already monitored" };
  }

  this.setState({
    monitoredUrls: [...this.state.monitoredUrls, url],
    currentStatus: { ...this.state.currentStatus, [url]: "unknown" },
  });

  // Restart the schedule if this is the first URL
  if (this.state.monitoredUrls.length === 1) {
    this.scheduleEvery(
      "runHealthChecks",
      `*/${this.state.checkIntervalMinutes} * * * *`
    );
  }

  return { success: true };
}

@callable()
async removeUrl(url: string) {
  this.setState({
    monitoredUrls: this.state.monitoredUrls.filter((u) => u !== url),
    currentStatus: Object.fromEntries(
      Object.entries(this.state.currentStatus).filter(([u]) => u !== url)
    ),
  });

  return { success: true };
}

@callable()
async triggerManualCheck() {
  await this.runHealthChecks();
  return { success: true, checkedAt: new Date().toISOString() };
}

On the client, calling these is as simple as:

// Type-safe RPC — no manual fetch calls needed
await agent.stub.addUrl("https://example.com");
await agent.stub.triggerManualCheck();

Human-in-the-Loop: Escalation That Works

When the AI detects something serious, the agent doesn't just log it — it pauses and waits for human judgment:

// In SiteAgent class

@callable()
async acknowledgeEscalation() {
  const escalation = this.state.pendingEscalation;
  if (!escalation) return { success: false, error: "No pending escalation" };

  // Log the acknowledgment
  this.sql`
    INSERT INTO check_history (url, status_code, response_time_ms, status, ai_analysis)
    VALUES (
      ${escalation.url},
      0,
      0,
      'acknowledged',
      ${'Human acknowledged: ' + escalation.reason}
    )
  `;

  // Clear the escalation
  this.setState({ pendingEscalation: null });

  this.broadcast(JSON.stringify({
    type: "escalation_resolved",
    action: "acknowledged",
    timestamp: new Date().toISOString(),
  }));

  return { success: true };
}

@callable()
async dismissEscalation() {
  this.setState({ pendingEscalation: null });

  this.broadcast(JSON.stringify({
    type: "escalation_resolved",
    action: "dismissed",
    timestamp: new Date().toISOString(),
  }));

  return { success: true };
}

The escalation flow works like this:

AI detects a pattern → Recommends ESCALATE
Agent updates state → pendingEscalation is set
Dashboard shows banner → Human sees the AI's analysis and reasoning
Human decides → Acknowledge (take action) or Dismiss (false alarm)
Agent records the decision → Builds a history of escalations for future AI context

This is the real power of stateful agents: they can pause, wait, and resume based on human input without losing their context.

Worker Entry Point

Don't forget the Worker entry that routes requests to agent instances:

// At the bottom of src/server.ts

export default {
  async fetch(request: Request, env: Env) {
    // Route to the correct agent instance
    return routeAgentRequest(request, env);
  },
} satisfies ExportedHandler;

The routeAgentRequest function dispatches requests to the right Durable Object instance based on the URL pattern: /agents/site-agent/:instance-name.

Testing and Deploying to Production

Local Development

npx wrangler dev

This starts a local development server with full Durable Object support. Your agent runs with real SQLite, real WebSocket connections, and real scheduling — identical to production.

Open http://localhost:8787 to see your dashboard. Add a URL and watch the agent start monitoring.

Deploy to Cloudflare

# Set your API key as a secret
npx wrangler secret put OPENAI_API_KEY

# Deploy
npx wrangler deploy

Your agent is now live on Cloudflare's global network. Each unique instance name creates an isolated agent with its own state, database, and schedule.

Environment Separation

For staging vs production, use wrangler environments:

// wrangler.jsonc
{
  "name": "site-reliability-agent",
  "env": {
    "staging": {
      "name": "site-reliability-agent-staging",
      "vars": { "ENVIRONMENT": "staging" }
    },
    "production": {
      "name": "site-reliability-agent",
      "vars": { "ENVIRONMENT": "production" }
    }
  }
}

Performance, Limits, and Cost Breakdown

Cloudflare Agents Limits

Resource	Limit
CPU time per request	30 seconds (refreshes per event)
Memory per instance	128 MB
SQLite storage	1 GB per Durable Object
WebSocket connections	32,768 per instance
Alarm precision	~1 second

Cost Estimate

For a typical monitoring setup (100 URLs, checked every 5 minutes):

Component	Monthly Cost
Worker requests (routing)	~$0.50
Durable Object requests	~$2.00
Durable Object duration	~$1.50
SQLite storage (1 GB)	$0.20
AI API calls (OpenAI)	~$5.00
Total	~$9.20/month

Compare this to running the same setup on AWS (Lambda + DynamoDB + EventBridge + API Gateway), where you'd easily spend $20-30/month for equivalent functionality — plus the engineering overhead of wiring all those services together.

The real savings come from hibernation. Your agent only consumes resources when it's actively checking sites or serving dashboard requests. Between checks, the cost is effectively zero.

Common Pitfalls I Learned the Hard Way

1. The `destroy()` Lifecycle Trap

When a Durable Object is evicted from memory, it doesn't call any cleanup hooks. If you're relying on in-memory state that isn't persisted via setState() or SQLite, it will be lost. Always persist important data immediately — don't batch writes.

2. State Serialization Limits

setState() serializes your state as JSON. This means:

No Date objects (use ISO strings)
No Map or Set (use plain objects and arrays)
No circular references
Keep state reasonably small — it's synced to every connected client

3. Alarm Retry Behavior

If your scheduled handler throws an error, Cloudflare will retry it. This is usually good, but if your handler isn't idempotent (e.g., it sends notifications), you'll get duplicate actions. Always design handlers to be safe to retry.

4. WebSocket Reconnection

Clients will disconnect — networks are unreliable. The useAgent hook handles reconnection automatically, but your UI should gracefully handle the "reconnecting" state. Always show the last known state while reconnecting, rather than a blank screen.

Conclusion

We built a stateful AI agent that goes well beyond chat:

Scheduled health checks that run autonomously on cron
Persistent memory via built-in SQLite — no external database needed
AI-powered analysis that spots patterns, not just failures
Real-time dashboard with automatic WebSocket state sync
Human-in-the-loop escalation for critical decisions

The Cloudflare Agents SDK makes this surprisingly straightforward. The combination of Durable Objects (state + compute), built-in SQLite (persistent memory), WebSocket hibernation (zero idle cost), and scheduled alarms (autonomous execution) creates a platform where stateful agents are a first-class concept — not something you have to hack together from five different services.

What's Next

This is just the beginning. From here, you could:

Add MCP server support — Expose your agent as a Model Context Protocol server so AI assistants like Claude can interact with it
Build multi-agent systems — Have specialized agents that coordinate with each other
Add voice interaction — Cloudflare's roadmap includes real-time voice agent support
Integrate browser automation — Use Cloudflare's Browser Rendering API for visual monitoring

The full source code for this project is available on GitHub. If you build something cool with the Agents SDK, I'd love to hear about it — drop a comment below or find me on GitHub.

Want to learn more about building AI-ready APIs? Check out my previous article: Your API Wasn't Built for AI Agents — Here's How to Fix It.

Your API Wasn't Built for AI Agents — Here's How to Fix It

黃小黃 — Mon, 16 Feb 2026 14:07:57 GMT

By 2026, over 30% of API traffic will come from AI agents rather than human-driven applications. That number will keep climbing.

Here's the uncomfortable truth: most APIs were designed for human developers who read documentation, interpret ambiguous responses, and manually handle edge cases. AI agents do none of that. They parse schemas, chain requests programmatically, and fail silently when your API does something unexpected.

I learned this firsthand while building a zero-cost email API on Cloudflare Workers. The API worked perfectly for human integrators — clear docs, sensible endpoints, proper auth. But when I started thinking about how an AI agent would consume the same API, I realized how many assumptions I'd baked in that only made sense to humans.

This article is the guide I wish I'd had. We'll cover the five principles of agent-ready API design, walk through a real before-and-after retrofit, tackle authentication and error handling for non-human consumers, and finish with a migration checklist you can start on tomorrow.

Whether you're building new APIs or maintaining existing ones, the agent era is already here. Let's make sure your APIs are ready.

Why AI Agents Break Your Existing APIs

The fundamental disconnect is simple: your API was designed for developers who think. AI agents don't think — they parse.

Here's what that means in practice:

Aspect	Human Developer	AI Agent
Documentation	Reads prose, follows tutorials	Parses OpenAPI schemas and descriptions
Ambiguity	Infers meaning from context	Needs explicit, precise definitions
Workflow	Makes isolated, manual requests	Chains multiple calls automatically
Errors	Reads error messages, checks Stack Overflow	Needs structured codes and remediation steps
Discovery	Browses docs, bookmarks endpoints	Needs programmatic schema endpoints

When an AI agent encounters your API, it's essentially doing this:

// What your API returns
{
  "status": "error",
  "message": "Invalid request. Please check your parameters and try again."
}

// What the agent needs
{
  "status": "error",
  "code": "INVALID_PARAMETER",
  "message": "The 'email' field must be a valid email address.",
  "parameter": "email",
  "received": "not-an-email",
  "expected": "string (email format, RFC 5322)",
  "docs": "https://api.example.com/docs/errors#INVALID_PARAMETER",
  "remediation": "Validate the email format before sending. Example: user@domain.com"
}

The first response is perfectly fine for a human who can read the message and figure out what went wrong. The second gives an agent everything it needs to self-correct and retry without human intervention.

This isn't just about error handling. Every layer of your API — from endpoint naming to authentication flows — carries assumptions about human consumers that break down when an agent is on the other end.

The 5 Principles of Agent-Ready API Design

Through building APIs and studying how agents consume them, I've distilled the essentials into five principles. These aren't theoretical — they're the minimum bar for making your API useful to autonomous agents.

1. Self-Describing: Let Your API Explain Itself

The most impactful thing you can do is make your API self-describing. This means every endpoint, parameter, and response includes enough context for an agent to understand what it does and how to use it without external documentation.

OpenAPI 3.0+ is the foundation:

# Good: Rich descriptions that agents can parse
paths:
  /users/{userId}/orders:
    get:
      operationId: getUserOrders
      summary: Retrieve all orders for a specific user
      description: >
        Returns a paginated list of orders placed by the specified user.
        Orders are sorted by creation date (newest first).
        Includes order items, totals, and current fulfillment status.
        Requires authentication with at least 'read:orders' scope.
      parameters:
        - name: userId
          in: path
          required: true
          description: The unique identifier of the user (UUID v4 format)
          schema:
            type: string
            format: uuid
            example: "550e8400-e29b-41d4-a716-446655440000"
        - name: status
          in: query
          description: >
            Filter orders by fulfillment status.
            Use 'pending' for unprocessed orders,
            'shipped' for orders in transit,
            'delivered' for completed orders.
          schema:
            type: string
            enum: [pending, shipped, delivered, cancelled, refunded]
        - name: limit
          in: query
          description: Maximum number of orders to return (1-100, default 20)
          schema:
            type: integer
            minimum: 1
            maximum: 100
            default: 20

Notice the difference: every field has a description that explains not just what it is but when and why you'd use it. An agent reading this schema knows exactly what each parameter does, what values are valid, and what to expect back.

2. Predictable: Zero Surprises

Agents rely on patterns. If your API returns created_at in one endpoint and createdAt in another, an agent will either fail or require special handling for each endpoint.

Consistency checklist:

Naming: Pick one convention (snake_case or camelCase) and stick with it everywhere
Response format: Every endpoint should return the same envelope structure
Pagination: Use the same pagination pattern across all list endpoints
Timestamps: One format everywhere (ISO 8601: 2026-02-11T05:30:00Z)
Null handling: Decide whether missing fields are null, omitted, or empty strings

3. Semantic: Meaning Over Syntax

Name things for what they do, not how they're implemented:

# Bad: Implementation-leaked naming
POST /api/v2/db/insert-record
GET  /api/v2/cache/fetch?key=user_123

# Good: Intent-driven naming
POST /api/v2/users
GET  /api/v2/users/123

When an agent sees POST /users, it immediately understands the intent: create a user. When it sees POST /db/insert-record, it has to guess what kind of record and where it goes.

4. Composable: Building Blocks, Not Monoliths

Design endpoints as atomic operations that chain well. An agent orchestrating a checkout flow should be able to:

GET /cart → Get current cart
POST /orders → Create order from cart
POST /orders/{id}/payments → Process payment
GET /orders/{id} → Verify order status

Each step is independent, has clear inputs/outputs, and can be retried individually if something fails.

Avoid "god endpoints" that do multiple things:

# Bad: One endpoint does everything
POST /checkout
{
  "action": "process",
  "validate_inventory": true,
  "apply_discount": "SAVE10",
  "payment_method": "card",
  "send_confirmation": true
}

# Good: Composable steps
POST /orders              → Creates order
POST /orders/{id}/discounts → Applies discount
POST /orders/{id}/payments  → Processes payment
POST /orders/{id}/confirm   → Sends confirmation

5. Discoverable: Help Agents Find You

Even the best-designed API is useless if agents can't find it. Expose your API schema at known endpoints:

GET /.well-known/openapi.json — Your full OpenAPI spec
GET /api — API root with available resources and links
Response headers with Link pointing to related resources

We'll dig deeper into discoverability with MCP and HATEOAS in a later section.

Before & After: Retrofitting a Real API

Let's take a concrete example — a user management API — and walk through the transformation step by step.

Before: A Typical REST API

// Express.js — Traditional API endpoint
app.get('/api/users/:id', async (req, res) => {
  try {
    const user = await db.users.findById(req.params.id);
    if (!user) {
      return res.status(404).json({
        error: 'User not found'
      });
    }
    res.json(user);
  } catch (err) {
    res.status(500).json({
      error: 'Something went wrong'
    });
  }
});

This works for humans. A developer gets a 404, reads "User not found," and knows to check the ID. But an agent?

No structured error code to branch on
No indication of why the user wasn't found (invalid ID format? deleted? never existed?)
No hint about what to do next
No links to related resources
The success response has no schema guarantee

After: Agent-Ready API

// Express.js — Agent-ready API endpoint
app.get('/api/users/:id', async (req, res) => {
  // Validate input format first
  if (!isValidUUID(req.params.id)) {
    return res.status(400).json({
      error: {
        code: 'INVALID_PARAMETER_FORMAT',
        message: 'User ID must be a valid UUID v4.',
        parameter: 'id',
        received: req.params.id,
        expected: 'UUID v4 (e.g., 550e8400-e29b-41d4-a716-446655440000)',
        docs: 'https://api.example.com/docs/users#get-user'
      }
    });
  }

  try {
    const user = await db.users.findById(req.params.id);

    if (!user) {
      return res.status(404).json({
        error: {
          code: 'RESOURCE_NOT_FOUND',
          message: `No user found with ID '${req.params.id}'.`,
          resource: 'user',
          parameter: 'id',
          suggestions: [
            'Verify the user ID is correct',
            'Use GET /api/users?search={query} to find users'
          ],
          docs: 'https://api.example.com/docs/users#get-user'
        }
      });
    }

    res.json({
      data: {
        id: user.id,
        email: user.email,
        name: user.name,
        role: user.role,
        createdAt: user.createdAt.toISOString(),
        updatedAt: user.updatedAt.toISOString()
      },
      _links: {
        self: { href: `/api/users/${user.id}` },
        orders: { href: `/api/users/${user.id}/orders` },
        profile: { href: `/api/users/${user.id}/profile` }
      }
    });
  } catch (err) {
    res.status(500).json({
      error: {
        code: 'INTERNAL_ERROR',
        message: 'An unexpected error occurred while fetching the user.',
        requestId: req.id,
        remediation: 'Retry the request. If the issue persists, contact support with the requestId.',
        retryable: true,
        retryAfter: 5
      }
    });
  }
});

What changed and why:

Change	Why It Matters for Agents
Input validation with specific error	Agent can self-correct the ID format
Structured error codes (`INVALID_PARAMETER_FORMAT`)	Agent can branch logic on error type
`suggestions` array	Agent knows alternative approaches
`_links` in success response (HATEOAS)	Agent discovers related resources programmatically
`retryable` + `retryAfter`	Agent knows whether and when to retry
`requestId`	Agent can reference specific failures in escalation
Consistent `data` wrapper	Agent always knows where to find the payload

The before version has about 15 lines. The after version has more code, but every additional line serves the agent. And here's the thing — humans benefit from these improvements too. Better error messages and discoverable links make any API easier to work with.

Authentication for Non-Human Consumers

Authentication is where most "agent-ready" articles get hand-wavy. Let's get specific.

The JWT Problem

Traditional JWT flows assume a human is present to log in, handle MFA, and refresh tokens. AI agents operate autonomously — there's no human in the loop to re-authenticate when a token expires at 3 AM.

Worse, if you pass JWTs to an LLM as part of a tool's context, you're exposing credentials in the model's context window. That's a security risk with no upside.

Recommended: OAuth 2.0 Client Credentials

For agent-to-API communication, the OAuth 2.0 Client Credentials grant is the right choice:

// Agent authenticates with client credentials
const tokenResponse = await fetch('https://auth.example.com/oauth/token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    grant_type: 'client_credentials',
    client_id: process.env.API_CLIENT_ID,
    client_secret: process.env.API_CLIENT_SECRET,
    scope: 'read:users read:orders'  // Request only needed scopes
  })
});

const { access_token, expires_in } = await tokenResponse.json();

// Agent uses the token for API calls
const userResponse = await fetch('https://api.example.com/users/123', {
  headers: {
    'Authorization': `Bearer ${access_token}`,
    'X-Agent-Id': 'order-processing-agent-v2',  // Identify the agent
    'X-Request-Id': crypto.randomUUID()          // Trace requests
  }
});

Why this works for agents:

No human in the loop required
Scoped permissions (principle of least privilege)
Token rotation is automated
The agent never sees user credentials — only its own service credentials
X-Agent-Id header lets your API track and rate-limit by agent

API Key Patterns

For simpler setups, API keys work — but treat them differently than you would for human developers:

Separate keys per agent: Don't reuse the same key across agents with different purposes
Scoped permissions: Each key should only allow the operations that specific agent needs
Auto-rotation: Set expiration policies and provide a key rotation endpoint
Rate limits per key: AI agents can generate bursts of requests — set appropriate limits

Rate Limiting for Non-Human Traffic

AI agents behave differently than humans. A human might make 5-10 API calls during a session. An agent orchestrating a complex task might make 50-100 calls in seconds.

Design your rate limiting accordingly:

# Headers your API should return
X-RateLimit-Limit: 1000          # Requests per window
X-RateLimit-Remaining: 847       # Remaining in current window
X-RateLimit-Reset: 1707635400    # Unix timestamp when window resets
Retry-After: 30                  # Seconds to wait (on 429 response)

Consider tiered rate limits: a basic tier for general API keys and a higher tier for verified agent integrations that have been reviewed and approved.

Error Handling That Agents Can Act On

Here's a principle that will transform your API's agent-friendliness: every error response should tell the agent what to do next.

The Error Response Contract

Define a consistent error schema that agents can rely on:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "You have exceeded the rate limit for this endpoint.",
    "details": {
      "limit": 100,
      "window": "60s",
      "current": 103
    },
    "retryable": true,
    "retryAfter": 45,
    "remediation": "Wait 45 seconds before retrying. Consider reducing request frequency or upgrading to a higher rate limit tier.",
    "docs": "https://api.example.com/docs/rate-limits",
    "requestId": "req_abc123def456"
  }
}

Key fields explained:

Field	Purpose
`code`	Machine-readable error type for branching logic
`message`	Human-readable explanation
`details`	Contextual data specific to this error type
`retryable`	Can the agent retry this exact request?
`retryAfter`	How long to wait (in seconds)
`remediation`	Step-by-step fix instructions
`docs`	Link to detailed documentation
`requestId`	Unique ID for debugging and support escalation

Error Categories

Organize your error codes into categories that agents can use for high-level branching:

AUTH_*       → Authentication issues    → Re-authenticate
PERM_*       → Permission issues        → Request different scope
PARAM_*      → Parameter issues         → Fix input and retry
RATE_*       → Rate limiting            → Wait and retry
RESOURCE_*   → Resource state issues    → Check resource status
INTERNAL_*   → Server issues            → Retry with backoff

An agent receiving AUTH_TOKEN_EXPIRED knows to refresh the token and retry. An agent receiving PARAM_INVALID_FORMAT knows to fix the input. An agent receiving INTERNAL_ERROR knows to back off and retry later.

This categorization turns error handling from guesswork into a deterministic state machine — exactly what autonomous agents need.

Making Your API Discoverable: MCP and Beyond

Your API might be perfectly designed, but if agents can't find it, it might as well not exist.

Model Context Protocol (MCP)

MCP is becoming the standard way for AI agents to discover and interact with APIs. Think of it as a universal adapter between AI agents and your services.

Instead of teaching each AI model how to use your specific API, you expose your API through an MCP server that speaks a protocol agents already understand:

// MCP server exposing your API to AI agents
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';

const server = new McpServer({
  name: 'user-management-api',
  version: '1.0.0',
});

// Define a tool that agents can discover and use
server.tool(
  'get_user',
  'Retrieve a user by their unique ID. Returns user profile including name, email, role, and account creation date.',
  {
    userId: {
      type: 'string',
      description: 'The unique UUID v4 identifier of the user to retrieve',
    }
  },
  async ({ userId }) => {
    const response = await fetch(`https://api.example.com/users/${userId}`, {
      headers: { 'Authorization': `Bearer ${API_TOKEN}` }
    });
    const data = await response.json();
    return {
      content: [{ type: 'text', text: JSON.stringify(data, null, 2) }]
    };
  }
);

The key insight: MCP bridges the gap between your existing REST API and agent consumption. You don't have to rewrite your API — you wrap it in a layer that agents can discover.

HATEOAS: The Comeback

HATEOAS (Hypermedia as the Engine of Application State) was ahead of its time. Human developers mostly ignored it — who needs machine-navigable links when you can bookmark the docs?

AI agents, that's who.

{
  "data": {
    "id": "user_123",
    "name": "Supra Huang",
    "email": "supra@example.com"
  },
  "_links": {
    "self": {
      "href": "/api/users/user_123",
      "method": "GET"
    },
    "update": {
      "href": "/api/users/user_123",
      "method": "PATCH",
      "description": "Update user profile fields"
    },
    "orders": {
      "href": "/api/users/user_123/orders",
      "method": "GET",
      "description": "List all orders for this user"
    },
    "deactivate": {
      "href": "/api/users/user_123/deactivate",
      "method": "POST",
      "description": "Deactivate the user account (reversible)"
    }
  },
  "_actions": {
    "available": ["update", "deactivate", "orders"],
    "unavailable": [
      {
        "action": "delete",
        "reason": "User has active orders. Resolve orders before deletion.",
        "blockedBy": "/api/users/user_123/orders?status=active"
      }
    ]
  }
}

Notice the _actions block. It tells the agent not just what it can do, but also what it can't do and why. An agent attempting to delete this user would know to resolve active orders first — without making a failed request and parsing an error.

Schema-First Design

Expose your full API schema at well-known endpoints:

GET /.well-known/openapi.json — Full OpenAPI specification
GET /.well-known/mcp.json — MCP server configuration (if applicable)
GET /api — Root endpoint listing all available resources

This is the minimum for discoverability. An agent landing on your API domain can immediately understand what's available and how to use it.

Testing Your API with AI Agents

You wouldn't ship a website without testing it in a browser. Don't ship an agent-ready API without testing it with actual agents.

Prompt-Based Testing

The simplest test: give an AI agent your API docs and ask it to accomplish a task. If it struggles, your API has discoverability or usability issues.

# Simple agent-based API test
def test_api_with_agent():
    """
    Give an LLM your OpenAPI spec and see if it can
    successfully complete a multi-step workflow.
    """
    openapi_spec = load_openapi_spec('./openapi.json')

    test_scenarios = [
        {
            "task": "Find the user with email test@example.com and list their recent orders",
            "expected_calls": ["GET /users?email=test@example.com", "GET /users/{id}/orders"],
            "expected_result": "Returns a list of orders"
        },
        {
            "task": "Create a new user and assign them the 'editor' role",
            "expected_calls": ["POST /users", "PATCH /users/{id}"],
            "expected_result": "User created with editor role"
        }
    ]

    for scenario in test_scenarios:
        result = run_agent_with_tools(
            prompt=scenario["task"],
            tools=openapi_spec_to_tools(openapi_spec)
        )
        assert_calls_match(result.api_calls, scenario["expected_calls"])
        print(f"✅ Passed: {scenario['task']}")

Schema Validation

Validate that your actual API responses match your OpenAPI spec. Drift between spec and reality is the number one reason agents fail:

# Use openapi-diff to catch breaking changes
npx openapi-diff previous-spec.json current-spec.json

# Use Spectral to lint your OpenAPI spec (https://github.com/stoplightio/spectral)
npx @stoplight/spectral-cli lint openapi.json

Key Metrics to Monitor

Once agents are consuming your API, track these metrics:

Agent success rate: What percentage of agent workflows complete without errors?
Self-correction rate: How often do agents recover from errors without human help?
Average calls per task: Are agents making efficient use of your endpoints?
Error category distribution: Which error types are most common? That's where to improve.

Migration Checklist: Start Tomorrow

You don't have to rewrite your API from scratch. Here's a phased approach:

Quick Wins (This Week)

[ ] Add operationId and rich description to every OpenAPI endpoint
[ ] Standardize error response format with code, message, retryable
[ ] Add X-Request-Id to every response for tracing
[ ] Expose your OpenAPI spec at /.well-known/openapi.json
[ ] Add rate limit headers to all responses

Medium Effort (Next 2 Weeks)

[ ] Implement structured error codes with categories (AUTH_*, PARAM_*, etc.)
[ ] Add _links (HATEOAS) to resource responses
[ ] Set up OAuth 2.0 client credentials flow for agent auth
[ ] Create agent-specific API keys with scoped permissions
[ ] Add remediation field to error responses

Long-Term (1-3 Months)

[ ] Build an MCP server wrapping your API
[ ] Implement comprehensive agent-based integration tests
[ ] Set up monitoring dashboards for agent traffic patterns
[ ] Design composable endpoints for complex workflows
[ ] Add _actions blocks showing available/unavailable operations

Start with the quick wins. Just adding rich OpenAPI descriptions and structured error codes will make a measurable difference in how well agents work with your API.

Conclusion

The shift from human-first to agent-first API design isn't coming — it's already here. AI agents are consuming APIs at scale, and the APIs that work well with them will get more integrations, more traffic, and more adoption.

The good news: agent-ready API design isn't a radical departure from good API design. Self-describing endpoints, consistent response formats, structured errors, and proper authentication are improvements that benefit all consumers — human and AI alike.

Start with what matters most: make your API self-describing (rich OpenAPI specs), make errors actionable (structured codes with remediation), and make endpoints discoverable (schema at well-known URLs).

Your API wasn't built for AI agents. But with the changes in this guide, it can be — starting this week.

What's your experience building APIs that AI agents consume? Have you tried wrapping your API with MCP? I'd love to hear your approach — drop a comment below or find me on GitHub.

When Microservices Are Wrong: A Solutions Architect's Decision Framework

黃小黃 — Wed, 11 Feb 2026 10:00:22 GMT

I've been that architect. The one who spun up AWS Lambda functions and ECS clusters for every new service, convinced that microservices were the only "proper" way to build modern software. After years of managing distributed complexity — and eventually migrating most of my projects to Next.js, NestJS, Vercel, Railway, and Supabase — I learned something the hard way: the best architecture is the one that matches your actual needs, not your aspirations.

Across the industry, a growing number of organizations are consolidating their microservices back into simpler architectures. This isn't a step backward — it's the industry maturing. The microservices hype cycle has peaked, and we're finally having honest conversations about when distributed systems create more problems than they solve.

This article gives you a practical decision framework — backed by real cost data, case studies, and a ready-to-use checklist — so you can make architecture decisions based on evidence, not hype.

The Microservices Hype Cycle: Where We Stand in 2026

Microservices exploded in popularity after Netflix and Amazon shared their architecture stories around 2014-2015. The message was compelling: break your monolith into small, independent services, and you'll get better scalability, faster deployments, and team autonomy.

What got lost in translation was context. Netflix had thousands of engineers. Amazon had hundreds of teams that needed to deploy independently. The architecture solved problems at a scale that most organizations will never reach.

Fast forward to 2026, and the pendulum is swinging back:

The modular monolith pattern has emerged as the pragmatic middle ground
Major cloud providers now offer guides on when not to use microservices (even AWS)
High-profile teams like Amazon Prime Video have publicly moved services back to monoliths
The operational cost gap between microservices and monoliths is often 3-5x when accounting for infrastructure, tooling, and platform team overhead

The industry consensus is shifting from "microservices by default" to "microservices by necessity."

7 Scenarios Where Microservices Are the Wrong Choice

Not every project needs a distributed architecture. Here are seven concrete scenarios where microservices will likely hurt more than help.

1. Your Team Can't Staff Autonomous Teams Per Service

Microservices solve an organizational problem as much as a technical one. They allow large teams to work independently without stepping on each other's code. Each microservice ideally needs a dedicated team of 5-8 people (Amazon's "two-pizza team" concept) who can own it end-to-end.

If your organization isn't large enough to staff autonomous teams per service — and for most companies, that means having dozens of developers — you're adding distributed systems complexity without the organizational benefit.

Rule of thumb: If your entire engineering team fits in one meeting room, you probably don't need microservices.

2. You're Building an MVP or Early-Stage Product

In the early stages, your domain model is still evolving. You don't know which boundaries will be stable enough to become service boundaries. Premature decomposition means you'll spend more time refactoring service boundaries than building features.

As Martin Fowler observed: "Almost all the successful microservice stories have started with a monolith that got too big and was broken up."

What to do instead: Build a well-structured monolith with clear module boundaries. You can extract services later when you have real data about which components need independent scaling.

3. Your Domain Boundaries Are Unclear

Microservices work best when you have well-defined bounded contexts (in Domain-Driven Design terms). If your team frequently debates where a feature "belongs," or if services constantly need to call each other for basic operations, your boundaries are wrong.

A distributed monolith — microservices that can't function independently — is the worst of both worlds: all the network overhead with none of the autonomy benefits.

4. Your Team Lacks DevOps Maturity

Microservices require a significant operational foundation:

Capability	Required For Microservices	Monolith Alternative
Container orchestration (K8s)	Running dozens of services	Single deployment
Service mesh (Istio/Linkerd)	Service-to-service communication	Function calls
Distributed tracing (Jaeger)	Debugging across services	Stack traces
CI/CD per service	Independent deployments	One pipeline
Centralized logging	Correlating logs across services	grep

If your team doesn't already have these capabilities, the infrastructure tax will consume more engineering time than feature development.

5. Your Application Has Low Traffic and No Independent Scaling Needs

If all parts of your system scale together and your peak traffic can be handled by a single well-provisioned server (or a simple auto-scaling group), microservices add network latency and operational complexity for zero benefit.

Network calls are orders of magnitude slower than in-process function calls — a typical HTTP call between services takes 1-5 milliseconds, while an in-process function call completes in microseconds or nanoseconds. Every service boundary you introduce adds latency, potential failure points, and debugging complexity.

6. You Need Strong Data Consistency

Microservices favor eventual consistency — each service owns its data, and changes propagate asynchronously. If your domain requires strong transactional consistency (financial systems, inventory management, booking systems), you'll need to implement distributed transactions (sagas, two-phase commit) that are notoriously difficult to get right.

A monolith with a single database gives you ACID transactions for free.

7. You're a Startup With Limited Budget

The total cost of ownership for microservices is significantly higher:

Infrastructure: More containers, load balancers, service meshes
Tooling: Observability platforms, API gateways, secrets management
People: Platform engineers command $140,000-$180,000/year salaries (US average)
Cognitive overhead: Every developer needs to understand distributed systems patterns

For a startup, that money and engineering time is better spent on product development.

The Real Cost: Why Microservices Are 3-5x More Expensive

The cost gap between microservices and monoliths is wider than most teams expect. Here's where the money goes:

Cost Category	Monolith	Microservices	Why It's Higher
Compute	Single process, efficient	Dozens of containers, each with overhead	Each service needs its own resources, plus orchestration
Networking	In-process calls (free)	Cross-service HTTP/gRPC calls	Load balancers, service mesh, API gateways
Observability	Stack traces, single log stream	Distributed tracing, log correlation	Tools like Datadog/New Relic charge per host
CI/CD	One pipeline	Pipeline per service	Build times multiply, artifact storage grows
Database	One database, ACID for free	Database per service	More instances, plus eventual consistency tooling
Platform team	Not needed	2-3 dedicated engineers	Someone must maintain K8s, service mesh, pipelines

The multiplier effect is real. When the Amazon Prime Video monitoring team moved back to a monolith, they saw a 90% infrastructure cost reduction for that service. When Grape Up reported consolidating a client from 25 to 5 services, the result was an 82% cost reduction.

These aren't outliers — they're what happens when the architecture's complexity exceeds the problem's complexity.

The hidden cost that most comparisons miss is the platform team. Microservices don't run themselves — someone needs to maintain the Kubernetes clusters, service mesh, deployment pipelines, and monitoring infrastructure. That's typically 2-3 dedicated platform engineers (earning $140-180K/year in the US) who could otherwise be building product features.

Case Studies: When Teams Reversed Course

Amazon Prime Video Monitoring: 90% Cost Reduction

In 2023, the Amazon Prime Video team published a case study (originally on primevideotech.com, now covered by The Stack and DevClass) about moving their audio/video quality monitoring service from a microservices architecture (using AWS Lambda and Step Functions) back to a single-process application. The result? A 90% reduction in infrastructure costs for that specific service.

Important context: this was one monitoring tool within Prime Video, not the entire platform. But the lesson is universal — their microservices were passing large volumes of video data between services through S3, creating enormous data transfer costs. Consolidating into a single process eliminated the inter-service communication entirely.

As The New Stack noted, what they built was arguably a modular monolith — a single deployable unit with well-separated internal components. Amazon's core services remain microservices-based at a scale that justifies the complexity.

Grape Up Client: 25 Services Down to 5

Consulting firm Grape Up documented a client engagement where they consolidated 25 microservices into 5 well-defined services. The reported results:

82% reduction in cloud infrastructure costs
70% reduction in monitoring tool costs
10 databases migrated into 5
3 cache instances reduced to 1

The original decomposition had been driven by the "one service per entity" anti-pattern — each database table essentially had its own service, leading to constant inter-service calls for basic operations. (Note: the client is anonymous, as is typical for consulting case studies.)

My Own Journey: From AWS Everything to Pragmatic Simplicity

I spent years building on AWS Lambda and ECS, decomposing everything into microservices because that's what "real architects" were supposed to do. Each function was independently deployable. Each service had its own database. The architecture diagrams looked impressive.

But the reality was different:

Cold starts on Lambda added latency that users noticed
Debugging a request that touched 6 services required correlating logs across multiple CloudWatch log groups
Local development was painful — you can't easily run 15 services on your laptop
Deployment coordination still existed because services had implicit dependencies

I gradually migrated to Next.js + NestJS deployed on Vercel, Railway, and Fly.io. The result was a system that was simpler to develop, cheaper to run, and faster to iterate on. Not because these tools are inherently better than AWS services, but because the architecture matched my actual scale and team size.

The lesson: the right architecture is the one that lets you ship features, not the one that looks best on a whiteboard.

The Solutions Architect's Decision Framework

Instead of debating microservices vs. monolith in the abstract, use this scoring matrix to evaluate your specific situation. Rate each dimension from 1 (favors monolith) to 5 (favors microservices):

Dimension	1 (Monolith)	3 (Either)	5 (Microservices)
Team size	< 15 developers	15-50	50+
Domain maturity	Exploring / pivoting	Stable core, evolving edges	Well-defined bounded contexts
Scaling needs	Uniform traffic	Some hotspots	Components scale independently
DevOps maturity	Manual deployments	CI/CD in place	K8s, service mesh, observability
Deployment frequency	Weekly / monthly	Daily	Multiple times per day per team
Data consistency	Strong ACID required	Mix of consistent and eventual	Eventual consistency acceptable
Budget	Constrained	Moderate	Significant infrastructure budget
Organizational structure	Single team	Few teams	Multiple autonomous teams

How to Interpret Your Score

8-16 points: Monolith — A well-structured monolith is your best bet. Focus on clean module boundaries and solid testing.
17-28 points: Modular Monolith — You need better separation than a traditional monolith but don't need the overhead of full microservices. This is the sweet spot for most organizations.
29-40 points: Microservices — You have the scale, team structure, and operational maturity to benefit from microservices. Proceed with clear domain boundaries.

Decision Flowchart

Start
  │
  ├─ Team < 15 people? ──── YES ──→ Monolith
  │         │
  │        NO
  │         │
  ├─ Domain boundaries clear? ── NO ──→ Monolith (define boundaries first)
  │         │
  │        YES
  │         │
  ├─ DevOps maturity high? ──── NO ──→ Modular Monolith
  │         │
  │        YES
  │         │
  ├─ Independent scaling needed? ── NO ──→ Modular Monolith
  │         │
  │        YES
  │         │
  └─ Team > 50 & multiple teams? ── YES ──→ Microservices
            │
           NO ──→ Modular Monolith

The Modular Monolith: The Third Option Most Teams Ignore

If your score lands in the 17-28 range (and statistically, most teams fall here), the modular monolith deserves serious consideration.

A modular monolith is a single deployable unit with strictly enforced module boundaries:

Each module owns its domain logic and data access
Modules communicate through well-defined internal APIs (not direct database queries)
Shared kernel is kept minimal — only truly cross-cutting concerns
Each module can be independently tested

The beauty of this approach is that it gives you a clear migration path. When (and if) a module genuinely needs to become an independent service — because it needs to scale independently, or a separate team needs to own it — you can extract it with minimal refactoring because the boundaries are already defined.

Frameworks like NestJS modules, Spring Boot's module system, and .NET's project structure make this pattern straightforward to implement.

The modular monolith is not a compromise — it's the optimal architecture for teams of 10-100 engineers who need clean separation without distributed systems complexity.

Pre-Migration Readiness Checklist

Before committing to microservices, ensure your organization can answer "yes" to these questions:

[ ] We have automated CI/CD pipelines for every service
[ ] We have container orchestration (Kubernetes or equivalent) in production
[ ] We have centralized logging and distributed tracing across services
[ ] We have defined clear bounded contexts with minimal cross-service dependencies
[ ] We have a dedicated platform / DevOps team (or budget for one)
[ ] Each service can be deployed independently without coordinating with other teams
[ ] We have a strategy for data consistency across service boundaries
[ ] We have service-level SLAs and monitoring for each service
[ ] Our developers are comfortable with distributed systems patterns (circuit breakers, retries, sagas)
[ ] We have a plan for local development that doesn't require running all services
[ ] Our monthly infrastructure budget can absorb a 2-3x increase
[ ] We have enough developers to staff autonomous teams (5-8 people) per service

If you answered "no" to more than 3 of these, you're not ready for microservices. Consider a modular monolith as your next step, and revisit this checklist in 6-12 months.

Conclusion: Making the Right Call

The microservices vs. monolith debate has always been a false binary. The real question is: what architecture gives your specific team the best chance of shipping quality software quickly?

For most teams in 2026, the answer is somewhere between a monolith and full microservices — and that's perfectly fine. The modular monolith pattern gives you clean boundaries, testability, and a future migration path without the operational tax of distributed systems.

Here's what I wish someone had told me when I was spinning up my fifth Lambda function for a project with three contributors:

Start with the simplest architecture that could work. Add complexity only when you have evidence — not speculation — that you need it.

Architecture decisions should be driven by your team's size, your domain's maturity, your operational capabilities, and your budget. Not by conference talks, not by what FAANG companies do, and definitely not by your architecture diagram's aesthetic appeal.

Use the decision framework in this article. Score your project honestly. And if the score says monolith or modular monolith — embrace it. You'll ship faster, spend less, and sleep better.

Next time someone says "we need microservices," ask them to score it first.

Building a Zero-Cost Enterprise Email API: Complete Guide to Timing Attack and Header Injection Protection

黃小黃 — Tue, 10 Feb 2026 04:30:49 GMT

Have you ever found yourself in this situation: your project needs to send system notifications, but SendGrid charges monthly fees, AWS SES setup is complicated, and self-hosting an email server is a maintenance nightmare?

In this article, I'll share how I built a completely free email notification API using Cloudflare Workers + Email Routing. More importantly, I'll dive deep into two often-overlooked security attacks: Timing Attacks and Email Header Injection—and how to defend against them.

This isn't just theory. I've open-sourced the entire project: worker-email-notifier. Feel free to use it!

🤔 Why Build Your Own Email Notification System?

The Cost Problem with Paid Services

Let's look at the pricing of mainstream email services:

Service	Free Tier	Beyond Free Tier
SendGrid	100 emails/day (60-day trial only)	Starting at $19.95/month
AWS SES	3,000 emails/month (12-month trial)	$0.10/1000 emails
Mailgun	100 emails/day	Starting at $15/month
Postmark	100 emails/month	Starting at $15/month

For personal projects or small teams, these costs add up. More importantly—I just want to send a system notification. Why does it have to be this complicated?

Cloudflare's Free Tier

Cloudflare Workers + Email Routing offers:

✅ 100,000 API requests/day
✅ Generous email sending limits
✅ No credit card required
✅ Global edge network with ultra-low latency

For system notifications, monitoring alerts, and CI/CD notifications, this quota is more than enough.

Use Cases: What It's For and What It's Not

Before diving in, let's clarify what this system is designed for:

✅ Good fit:

Server monitoring alerts (high CPU, service down)
Application event notifications (new orders, payment success)
CI/CD pipeline notifications (build success/failure)
IoT device alerts
Internal team notifications

❌ Not suitable for:

Marketing emails / newsletters (Email Routing has whitelist restrictions)
User-to-user messaging
Transactional emails to arbitrary external users

Clear boundaries are important—this is a design decision, not a limitation.

🏗️ Technology Stack and Architecture Design

Why Cloudflare Workers?

Feature	Cloudflare Workers	AWS Lambda
Cold start	Almost none	Can be seconds
Global deployment	Automatic (edge network)	Manual configuration
Free tier	100,000 req/day	1M req/month
Email integration	Native Email Routing	Requires SES
Setup complexity	Low	Medium-High

The biggest advantage of Workers is native integration with Email Routing—no additional email service needed. Just configure DNS and you're ready to send.

System Architecture Overview

flowchart TD
    A[👤 Client] -->|REST API Request| B[⚡ Cloudflare Worker]
    B --> C[1. CORS Validation]
    C --> D[2. API Key Check 🔒]
    D -->|Timing Attack Protection| E[3. Input Validation 🔒]
    E -->|Header Injection Protection| F[4. Email Sending]
    F --> G[📧 Email Routing]
    G -->|Recipient Whitelist| H[✅ Recipient Inbox]

Key design decisions:

Multi-platform isolation: Each platform has its own sender, API key, and recipient whitelist
Security-first: Multiple validation layers before sending any email
Flexible configuration: All settings managed via wrangler.toml

💻 Core Implementation

Project Structure

worker-email-notifier/
├── src/
│   └── index.js          # Main code (~450 lines)
├── wrangler.toml         # Workers configuration
├── wrangler.toml.example # Configuration template
└── package.json

Platform Configuration (wrangler.toml)

[[send_email]]
name = "MAILER_A"
destination_address = "boss@gmail.com"
allowed_destination_addresses = ["boss@gmail.com", "admin@company.com"]

[vars.PLATFORMS.platform-a]
senderEmail = "noreply@your-domain.com"
senderName = "Platform A Notifications"
mailer = "MAILER_A"

Each platform binds to a MAILER, and each MAILER has its own whitelist—that's the key to isolation.

Email Sending Logic

import { createMimeMessage } from "mimetext";

async function sendEmail(mailer, from, fromName, to, subject, content, html) {
  const msg = createMimeMessage();
  msg.setSender({ name: fromName, addr: from });
  msg.setRecipient(to);
  msg.setSubject(subject);

  // Provide both plain text and HTML versions
  msg.addMessage({
    contentType: "text/plain",
    data: content,
  });

  if (html) {
    msg.addMessage({
      contentType: "text/html",
      data: html,
    });
  }

  const message = new EmailMessage(from, to, msg.asRaw());
  await mailer.send(message);
}

Using mimetext to create MIME-compliant email format, supporting both plain text and HTML emails.

🔐 Security Protection (Part 1): Timing Attack Defense

This is one of the most important sections of this article. You may have never heard of "timing attacks," but they're a hidden killer for API security.

What is a Timing Attack?

Imagine a combination lock: every time you get a digit right, the lock makes a subtle "click" sound. A thief can listen to the sounds and guess the combination one digit at a time.

Timing attacks work exactly the same way—attackers measure server response times to deduce your API key.

Why is `===` Not Safe?

JavaScript's string comparison uses "short-circuit comparison":

// Assume the correct API key is "secret123"
apiKey === "secret123"

// Comparison process:
// "a" vs "s" → 1st char differs, immediately returns false (very fast)
// "s" vs "s" → same, continue comparing
// "sa" vs "se" → 2nd char differs, returns false (slightly slower)
// "se" vs "se" → same, continue...
// ...and so on

What's the problem?

First character wrong: comparison time ~0.1ms
First five characters correct: comparison time ~0.5ms
All correct: comparison time ~1ms

Attackers can:

Try "a000000..." → measure time
Try "b000000..." → measure time
Try "s000000..." → this one's slower! First char is "s"
Try "sa00000..." → measure time
...repeat until the entire API key is guessed

This is why you should never use === to compare secrets.

Constant-Time Algorithm Implementation

The solution is "constant-time comparison"—the comparison takes the same amount of time regardless of whether the strings match:

function timingSafeEqual(a, b) {
  const encoder = new TextEncoder();
  const aBytes = encoder.encode(a);
  const bBytes = encoder.encode(b);

  // Even when lengths differ, perform full comparison
  // to avoid leaking length information
  if (aBytes.length !== bBytes.length) {
    // Compare bBytes against itself to consume constant time
    // proportional to input length, then return false
    let result = 1;
    for (let i = 0; i < bBytes.length; i++) {
      result |= aBytes[i % aBytes.length] ^ bBytes[i];
    }
    return false;
  }

  // Use XOR operation, accumulate all differences
  let result = 0;
  for (let i = 0; i < aBytes.length; i++) {
    result |= aBytes[i] ^ bBytes[i];
  }

  // result is 0 only if strings are identical
  return result === 0;
}

Why does this work?

XOR operation: Same = 0, different = non-zero
OR accumulation: If any bit differs, result won't be 0
Full iteration: Loop runs completely regardless of match
Constant time: Execution time depends only on string length, not content

Practical Application

function validateApiKey(providedKey, env, platformId) {
  // Try to get platform-specific key
  const apiKeys = parseApiKeys(env.API_KEYS);
  const platformKey = apiKeys[platformId];

  if (platformKey) {
    // Use constant-time comparison!
    return timingSafeEqual(providedKey, platformKey);
  }

  // Fall back to shared key
  if (env.API_KEY) {
    return timingSafeEqual(providedKey, env.API_KEY);
  }

  return false;
}

💡 Note: Cloudflare Workers now supports crypto.subtle.timingSafeEqual() natively, and also supports crypto.timingSafeEqual() from node:crypto with the nodejs_compat flag enabled. The custom implementation above is kept for educational purposes—in production, prefer the built-in API.

🛡️ Security Protection (Part 2): Email Header Injection Defense

The second attack to defend against is "email header injection"—more common but equally overlooked.

What is Header Injection?

SMTP email structure uses \r\n to separate different headers:

From: sender@example.com\r\n
To: recipient@example.com\r\n
Subject: Hello\r\n
\r\n
Email body...

If an attacker can inject \r\n into the subject, they can insert arbitrary headers:

// Malicious input
const subject = "Hello\r\nBcc: victim1@example.com, victim2@example.com\r\n\r\nSpam content";

// Actual generated email
/*
Subject: Hello
Bcc: victim1@example.com, victim2@example.com

Spam content
*/

The attacker successfully added their own recipients!

Impact of the Attack

📧 Spam distribution: Send massive spam using your domain
🎭 Phishing: Forge the From field for phishing attacks
📛 Domain reputation damage: Your domain may be blacklisted
🔓 Data leakage: Secretly BCC sensitive information to attackers

Defense Strategies

Strategy 1: Strict Newline Detection

// Check if subject contains newline characters
if (/[\r\n]/.test(subject)) {
  return jsonResponse(
    { success: false, error: "Invalid subject: contains forbidden characters" },
    400
  );
}

Simple but effective—reject any subject containing \r or \n.

Strategy 2: Strict Email Format Validation

function isValidEmail(email) {
  // Basic length check
  if (!email || email.length > 254) {
    return false;
  }

  // Prevent consecutive dots (possible path traversal)
  if (/\.\./.test(email)) {
    return false;
  }

  // Prevent leading or trailing dots
  if (email.startsWith(".") || email.endsWith(".")) {
    return false;
  }

  // Practical email validation (RFC 5322-inspired)
  const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+$/;

  return emailRegex.test(email);
}

This validation function:

Limits maximum length (254 characters per RFC)
Blocks consecutive dots (prevents ../ style attacks)
Uses strict regex validation

Strategy 3: HTML Content Escaping (XSS Prevention)

If allowing HTML emails, also prevent XSS:

function escapeHtml(text) {
  const map = {
    "&": "&",
    "<": "<",
    ">": ">",
    '"': """,
    "'": "'",
  };
  return text.replace(/[&<>"']/g, (char) => map[char]);
}

Error Message Sanitization

Another easily overlooked point—error messages can also leak sensitive information:

function sanitizeErrorMessage(message) {
  if (typeof message !== "string") {
    return "An error occurred";
  }

  return message
    // Remove stack traces
    .replace(/at\s+.*:\d+:\d+/g, "")
    // Remove file paths
    .replace(/\/[\w/.-]+/g, "[path]")
    // Remove sensitive keywords
    .replace(/password|secret|key|token/gi, "[redacted]")
    .trim()
    .substring(0, 200);
}

Never expose internal implementation details in error messages.

🚀 Deployment and Testing

Deployment Steps

# 1. Install dependencies
npm install

# 2. Copy and modify configuration
cp wrangler.toml.example wrangler.toml
# Edit wrangler.toml to set your domain and platforms

# 3. Login to Cloudflare
wrangler login

# 4. Generate and set API Key
npm run generate-key
wrangler secret put API_KEY

# 5. Deploy
npm run deploy

Testing the API

curl -X POST https://email-notifier.your-subdomain.workers.dev \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "platformId": "platform-a",
    "to": ["boss@gmail.com"],
    "subject": "🔔 System Notification",
    "content": "This is a test email",
    "html": "System Notification
This is a test email"
  }'

Success response:

{
  "success": true,
  "message": "Email sent: 1 success, 0 failed",
  "platform": "platform-a",
  "details": [
    { "to": "boss@gmail.com", "status": "fulfilled" }
  ]
}

Common Issues Troubleshooting

Issue	Possible Cause	Solution
401 Unauthorized	Wrong API Key	Verify header name is `X-API-Key`
400 Invalid platform	platformId doesn't exist	Check platform config in wrangler.toml
500 Email failed	Recipient not in whitelist	Add to `allowed_destination_addresses`
CORS error	Origin not allowed	Set `ALLOWED_ORIGINS` environment variable

📦 Open Source Project and Recommendations

Project Information

GitHub: supra126/worker-email-notifier
License: MIT License
Documentation: Available in English and Traditional Chinese

Free Tier Limits

Item	Free Quota	Suitable For
API requests	100,000/day	Most small-medium applications
Email sending	Generous daily limit	More than enough for system notifications

Extension Suggestions

Want to extend the functionality? Here are some directions:

Add platforms: Add new [[send_email]] and PLATFORMS config in wrangler.toml
Email templates: Store HTML templates in KV Storage
Rate limiting: Integrate with Cloudflare WAF Rate Limiting rules
Logging: Use Workers Analytics or Logpush

📝 Conclusion

This article shared how to build a zero-cost email notification system using Cloudflare Workers + Email Routing, and more importantly, explored two commonly overlooked security attacks in depth.

Key Takeaways

Zero-cost doesn't mean low-quality Cloudflare's free tier is sufficient for most notification scenarios
Timing attacks are a hidden risk Never use === to compare secrets—use constant-time algorithms
Input validation is fundamental Header injection attacks are simple but dangerous—validate all inputs strictly
Security is not optional—it's essential Even for small features, the cost of security measures is far less than post-incident remediation

Final Thoughts

The full project is open-sourced on GitHub — feel free to use it, fork it, or contribute.

References:

Astro 6 Beta Upgrade: Zero Code Changes in a Real-World Blog — And Why the Future Looks Different

黃小黃 — Mon, 09 Feb 2026 04:30:38 GMT

Two weeks ago, Cloudflare acquired Astro. Days later, Astro 6 Beta dropped with first-class Cloudflare Workers support. The timing wasn't a coincidence.

As someone who recently rewrote Hashnode's Next.js Starter Kit in Astro and has been building on the Cloudflare ecosystem, this felt like a natural next step: upgrade my real-world Astro 5 project to v6 Beta and see what happens.

The upgrade itself took minutes. Zero code changes. But the architecture shift behind Astro 6 — that's what makes this interesting.

Quick Context — The Project

The project is astro-starter-hashnode: an open-source Astro-based frontend for Hashnode blogs. It replaces Hashnode's official Next.js starter kit, cutting client-side JavaScript from ~150 kB to roughly 15 kB.

The stack before the upgrade:

Astro v5.17.1
Tailwind CSS v4 (via @tailwindcss/vite)
GraphQL for fetching content from Hashnode's API
Deployed on Vercel (fully static output)
10 pages, 540K total build output

This makes it a useful upgrade test case: it's a real project with real dependencies, not a starter template with hello-world complexity.

The Upgrade — What Actually Happened

Step 1: Bump the Version

npm install astro@next

That's it. Astro's CLI pulled in v6.0.0-beta.9 along with Vite 7.0. The @tailwindcss/vite v4 adapter required no changes.

Step 2: First Build

npm run build

Result: BUILD SUCCESS. No errors. The only output worth noting was a single internal warning:

[WARN] [vite] "isRemoteAllowed", "matchHostname", "matchPathname",
"matchPort" and "matchProtocol" are imported from external module
"@astrojs/internal-helpers/remote" but never used in
"node_modules/astro/dist/assets/utils/index.js".

This lives inside node_modules — it's Astro's own housekeeping, not something you need to act on.

The Numbers

Metric	Astro v5.17.1	Astro v6 Beta	Delta
Build Time	8.54s	~7.32s (avg of 3)	-14%
Output Size	540K	540K	0%
Pages	10	10	—
Build Errors	0	0	—
Code Changes Required	—	0 files	—
Vite	6.x	7.0	Major upgrade

The -14% build time improvement sounds nice but is misleading. Build time in this project is dominated by Hashnode's GraphQL API latency (network I/O), not Astro's compilation. Individual runs varied from 5.87s to 9.00s depending on network conditions. The honest answer: build performance is effectively identical.

What didn't change is equally important: same output, same HTML, same 540K. The upgrade is transparent to end users.

Why Zero Breaking Changes? It's Not Luck — It's Architecture

Astro 6 ships with a list of documented breaking changes. None of them affected this project. Here's the checklist I ran through:

Breaking Change	Affected?	Why Not
Node 22+ required	No	Already on v24
Vite 7.0	No issues	Tailwind v4 compatible
Zod 4 API changes	N/A	Not used
removed	No	Already using
`Astro.glob()` removed	N/A	Not used
Legacy Content Collections removed	N/A	Data from Hashnode API
Markdown heading ID algorithm	N/A	Content rendered by Hashnode
Script/style tag order change	No impact	—
Image service defaults	N/A	Using raw with CDN URLs
`import.meta.env` always inlined	No	Only `PUBLIC_` vars used
Experimental flags removed	N/A	None configured
`redirectToDefaultLocale` changed	N/A	No i18n
`getStaticPaths()` Astro access removed	N/A	Not using Astro object in paths

This wasn't luck. The project had already adopted Astro's recommended patterns:

instead of the deprecated
External CMS (Hashnode API) instead of Content Collections
Standard PUBLIC_ env vars instead of server-side secrets
Raw image tags with CDN URLs instead of Astro's image pipeline

The takeaway: if you've been following Astro's best practices in v5, your upgrade path to v6 is likely smoother than you think.

Okay, But Here's Why Astro 6 Actually Matters

The smooth upgrade is nice. But if that were the whole story, this would be a short post. What makes Astro 6 significant isn't what changed in the code — it's what changed in the architecture.

The Dev Server Runs Your Production Runtime

Before Astro 6, astro dev ran your project in Node.js regardless of where you'd deploy it. If your production target was Cloudflare Workers, you were developing against a simulation.

Astro 6 changes this fundamentally. The new dev server leverages Vite's Environment API to run your application inside the same runtime as production. For Cloudflare Workers, that means astro dev now uses workerd — the actual open-source runtime that powers Workers globally.

This isn't a mock. It's the real engine.

Real Platform APIs During Development

With the workerd-powered dev server, you get access to real Cloudflare primitives during local development:

Durable Objects — Test stateful serverless objects locally
KV Namespaces — Read/write to key-value storage in dev
R2 Storage — Object storage available during development
Workers Analytics Engine — All with hot module replacement

No more "it works in dev but breaks in production" surprises for platform-specific APIs.

Sessions API with Automatic KV

Astro's Sessions API (stable since v5.7) stores user data between requests. When using the Cloudflare adapter, it automatically configures Workers KV for session storage. Wrangler provisions the KV namespace on deploy — zero manual setup.

Live Content Collections

Live content collections — experimental since Astro 5.10 — are now stable. They allow fetching content from CMSs, APIs, and databases with a unified API, updating in real-time without requiring a rebuild. For a Hashnode-powered blog like this one, that's a compelling path forward.

Built-in Content Security Policy

CSP support, previously experimental, is now stable. It controls which resources can load on your pages, protecting against XSS and code injection attacks — an increasingly important baseline for any production site.

The Bigger Picture — Cloudflare + Astro

On January 16, 2026, Cloudflare announced that the Astro team would be joining Cloudflare. This wasn't just an acqui-hire — it's a strategic bet on content-driven web development.

Why this matters for developers:

Astro already excels at content sites, docs, marketing pages, and hybrid sites with selective interactivity. Cloudflare Workers already excels at edge computing with global distribution. Combining them creates a "golden path" where the framework and the platform are designed to work together — similar to how Next.js and Vercel evolved.

What it doesn't mean:

Astro will remain open source. It will continue to deploy to Vercel, Netlify, and other platforms. This project still runs on Vercel, and that's fine. But the deepest integration, the most optimized path, will increasingly be Cloudflare Workers.

The Astro team's own announcement confirms this: Astro becomes the best way to build content sites, whether you host on Cloudflare or elsewhere.

As someone who was already building with both Astro and Cloudflare — an email API on Workers and a blog frontend in Astro — watching these two ecosystems merge feels exciting rather than surprising. The tools are converging around the same vision: fast, lightweight, edge-first.

Should You Upgrade Now?

If you're on Astro 5 and following best practices: the upgrade is probably easier than you expect. Check the breaking changes list against your project. If you're not using Content Collections, Astro.glob(), or , you might be in the same "zero changes" boat.

If you're evaluating frameworks for a content site: Astro 6 + Cloudflare Workers is becoming the most integrated option for edge-first content delivery. Worth serious consideration.

If you're on Vercel or Netlify: no urgency. Astro 6 works great on these platforms too. You gain Vite 7, stable Live Content Collections, and CSP support regardless of where you deploy.

One caveat: Astro 6 is still in beta. For production sites, it's reasonable to wait for the stable release. But for side projects or new builds, the beta is stable enough — this project built and ran without a single issue.

What's Next

For this project, the natural next experiment is exploring a Workers deployment path — moving from Vercel static output to Cloudflare Workers with SSR. That would unlock KV caching for GraphQL responses, Sessions for user preferences, and the full edge runtime experience.

That's a story for another post.

In the meantime, the astro-starter-hashnode repo is open source. If you've done your own Astro 6 upgrade — smooth or rocky — drop a comment. The more data points we have from real projects, the better the community can prepare for the stable release.

I Rewrote Hashnode's Next.js Starter Kit in Astro — From 150 kB to ~15 kB of Client JS

黃小黃 — Sun, 08 Feb 2026 08:12:59 GMT

Your blog doesn't need 150 kB of JavaScript.

I discovered this when I started using Hashnode. Their official Next.js starter kit worked fine out of the box — but something felt off. A blog that publishes a few articles a week was loading an entire React runtime, multiple JavaScript bundles, and a full client-side router. For what? Rendering text and images.

So I rewrote the entire thing in Astro. The result? A fully-featured blog frontend that ships ~15 kB of client-side JavaScript — a 90% reduction. Same features. Same CMS. Dramatically less code sent to your readers' browsers.

Here's the story, the technical decisions, and how you can deploy your own in under 2 minutes.

The Problem: A React Runtime for a Blog

Don't get me wrong — Hashnode's starter kit is well-built, and the team has done solid work with it. But there's a fundamental mismatch: a blog is mostly static content, yet the starter kit ships an entire React runtime to the browser.

When I first deployed it, I opened DevTools and looked at what was being loaded. For a page that's essentially an article with some images, the browser was downloading:

React + ReactDOM
The Next.js client-side router
Hydration logic
Various runtime utilities

All together, 150 kB+ of JavaScript — before any of my actual content loads.

Then I thought about what a blog post page actually needs to do on the client side:

Render text (HTML does this natively)
Display images (HTML does this natively)
Apply syntax highlighting (CSS can handle most of this)
Toggle dark mode (a few lines of vanilla JS)

There's also operational complexity. The starter kit uses SSR (Server-Side Rendering) or ISR (Incremental Static Regeneration), which means you need a Node.js server or a platform that supports edge functions. For a blog that publishes a few posts a week, this felt like overkill.

There had to be a lighter way to do this.

Why Astro?

Astro is built around a philosophy that aligns perfectly with content-heavy sites: ship zero JavaScript by default. Every page is pre-rendered to static HTML at build time. No framework runtime. No hydration. Just HTML, CSS, and your content.

The key concept is Islands Architecture. Instead of hydrating the entire page with a JavaScript framework, Astro lets you create small "islands" of interactivity — only the components that genuinely need JavaScript get it. Everything else stays as static HTML.

For a blog, this means:

Article content? Static HTML. Zero JS.
Navigation and layout? Static HTML. Zero JS.
Dark mode toggle? A tiny island with a few lines of vanilla JS.
Search modal? An island that loads only when triggered.

This isn't a trade-off. It's the right architecture for the job.

Astro is also framework-agnostic. If I ever need a React or Svelte component for something complex, I can drop it in as an island. But for this project, vanilla JS in Astro components was more than enough.

Key Architecture Decisions

GraphQL Client: Lightweight by Design

Hashnode's API is GraphQL-based. The Next.js starter kit typically pairs this with heavier clients. I chose graphql-request — a minimal GraphQL client with zero unnecessary dependencies. Since it only runs at build time in a static Astro site, it adds zero bytes to the client bundle.

The entire client setup is 16 lines:

// src/lib/client.ts
import { GraphQLClient } from 'graphql-request';

const GQL_ENDPOINT =
  import.meta.env.PUBLIC_HASHNODE_GQL_ENDPOINT || 'https://gql.hashnode.com';

export const gqlClient = new GraphQLClient(GQL_ENDPOINT, {
  headers: {
    'hn-trace-app': 'astro-starter-hashnode',
  },
});

export const PUBLICATION_HOST =
  import.meta.env.PUBLIC_HASHNODE_PUBLICATION_HOST || 'engineering.hashnode.com';

All GraphQL queries are organized in 11 dedicated files (845 lines total), covering everything from homepage posts to RSS feeds to search.

Static Output + Smart Prefetching

The Astro config is intentionally minimal:

// astro.config.mjs
import { defineConfig } from 'astro/config';
import tailwindcss from '@tailwindcss/vite';

export default defineConfig({
  site: siteUrl,
  output: 'static',
  prefetch: {
    prefetchAll: false,
    defaultStrategy: 'hover',
  },
  vite: {
    plugins: [tailwindcss()],
  },
});

Two things to note:

output: 'static' — Every page is pre-built as an HTML file. No server needed.
defaultStrategy: 'hover' — When a user hovers over a link, Astro prefetches that page in the background. By the time they click, the page is already cached. This gives the feel of a SPA without any client-side router.

Tailwind CSS v4

Styling uses Tailwind CSS v4 with the @tailwindcss/typography plugin for beautiful article rendering. The entire CSS output compiles to a single 55.6 kB file — and that's CSS, not JavaScript. It doesn't block interactivity.

Building Features Without a Framework

Here's where it gets interesting. The Hashnode Next.js starter kit uses React for features like dark mode, search, and comments. I rebuilt all of them without any framework.

Dark Mode: CSS + localStorage

Dark mode doesn't need React state management. It needs a class toggle and a localStorage call:

// Inside Header.astro

Supra Builds

Beyond Chatbots: Building Real-World Stateful AI Agents on Cloudflare

What Makes an AI Agent "Stateful"?

Why Cloudflare for AI Agents?

When to Use What

What We'll Build: A Smart Site Reliability Agent

Project Setup

Prerequisites

Scaffold the Project

Project Structure

Wrangler Configuration

Building the Agent Core

Defining State and the Agent Class

Health Check Logic with Scheduled Tasks

Querying History with SQLite

Adding AI-Powered Analysis

Real-Time Dashboard with useAgent

Connecting with useAgent

Callable Methods for Manual Controls

Human-in-the-Loop: Escalation That Works

Worker Entry Point

Testing and Deploying to Production

Local Development

Deploy to Cloudflare

Environment Separation

Performance, Limits, and Cost Breakdown

Cloudflare Agents Limits

Cost Estimate

Common Pitfalls I Learned the Hard Way

1. The destroy() Lifecycle Trap

2. State Serialization Limits

3. Alarm Retry Behavior

4. WebSocket Reconnection

Conclusion

What's Next

Your API Wasn't Built for AI Agents — Here's How to Fix It

Why AI Agents Break Your Existing APIs

The 5 Principles of Agent-Ready API Design

1. Self-Describing: Let Your API Explain Itself

2. Predictable: Zero Surprises

3. Semantic: Meaning Over Syntax

4. Composable: Building Blocks, Not Monoliths

5. Discoverable: Help Agents Find You

Before & After: Retrofitting a Real API

Before: A Typical REST API

After: Agent-Ready API

Authentication for Non-Human Consumers

The JWT Problem

Recommended: OAuth 2.0 Client Credentials

API Key Patterns

Rate Limiting for Non-Human Traffic

Error Handling That Agents Can Act On

The Error Response Contract

Error Categories

Making Your API Discoverable: MCP and Beyond

Model Context Protocol (MCP)

HATEOAS: The Comeback

Schema-First Design

Testing Your API with AI Agents

Prompt-Based Testing

Schema Validation

Key Metrics to Monitor

Migration Checklist: Start Tomorrow

Quick Wins (This Week)

Medium Effort (Next 2 Weeks)

Long-Term (1-3 Months)

Conclusion

When Microservices Are Wrong: A Solutions Architect's Decision Framework

The Microservices Hype Cycle: Where We Stand in 2026

7 Scenarios Where Microservices Are the Wrong Choice

1. Your Team Can't Staff Autonomous Teams Per Service

2. You're Building an MVP or Early-Stage Product

3. Your Domain Boundaries Are Unclear

4. Your Team Lacks DevOps Maturity

5. Your Application Has Low Traffic and No Independent Scaling Needs

6. You Need Strong Data Consistency

7. You're a Startup With Limited Budget

The Real Cost: Why Microservices Are 3-5x More Expensive

Case Studies: When Teams Reversed Course

Amazon Prime Video Monitoring: 90% Cost Reduction

1. The `destroy()` Lifecycle Trap

Why is `===` Not Safe?