Skip to content

Adding a Free AI Assistant to Your Blog with Cloudflare Workers AI

Hossain Khan

After setting up my blog on Astro and Cloudflare Workers recently, I was wondering what else I can tinker with for my new blog that I have full control over 😆. I was aware that Cloudflare has free daily quota for AI models, so thought it would be fun to build a little AI assistant directly into the blog posts - something that could summarize the post and answer questions about it, all powered by Workers AI.

This post walks through how I added an AI summarizer and Q&A panel directly into each blog post using Cloudflare Workers AI, and how I used AI Gateway to make sure I never get an unexpected bill.

What it does

Scroll to the bottom of any post on this blog and you’ll see a small collapsible panel - ❓ “Ask AI about this post”. Click it and you can:

  1. Summarize the post - one click to get a 3-5 sentence summary
  2. Ask follow-up questions - dig into anything from the post for more context or details

The assistant only answers based on the post content. I intentionally scoped it this way so it stays on-topic and doesn’t start making things up.

How it’s wired together

There are two pieces - a client-side Astro component for the UI, and a server-side API route that calls Workers AI.

Browser
  └── AiPostAssistant.astro  (Astro component - UI + fetch logic)
        │  POST /api/ai-chat

Cloudflare Worker
  └── src/pages/api/ai-chat.ts  (API route)
        │  env.AI.run() with gateway option

AI Gateway  (rate limiting, caching, observability)


Workers AI  (@cf/meta/llama-3.1-8b-instruct-fp8)

Responses are streamed back as Server-Sent Events (SSE) so the text appears token-by-token, which gives that nice “typing” feel.

The API endpoint

Since the blog runs on Cloudflare Workers, I get access to the env.AI binding for free - no API keys needed, no separate service to authenticate against.

// src/pages/api/ai-chat.ts
import { env } from "cloudflare:workers";

export const POST: APIRoute = async ({ request }) => {
  const { content, messages } = await request.json();

  const systemMessage = {
    role: "system",
    content: `You are a helpful assistant for a tech blog.
Answer questions based only on the blog post content below.
If asked for a summary, provide a structured 3–5 sentence summary.

Blog post content:
---
${content.slice(0, 8_000)}
---`,
  };

  const stream = await env.AI.run(
    "@cf/meta/llama-3.1-8b-instruct-fp8",
    { messages: [systemMessage, ...messages], stream: true, max_tokens: 1024 }
  );

  return new Response(stream, {
    headers: {
      "content-type": "text/event-stream",
      "cache-control": "no-cache",
    },
  });
};

A few things worth noting here:

The UI component

The Astro component sits at the bottom of every post, collapsed by default so it doesn’t get in the way of reading.

// Excerpt from AiPostAssistant.astro <script>

// Grab the full post text from the DOM
const articleEl = document.querySelector("article");
const postContent = articleEl?.innerText ?? "";

async function sendMessage(userMessage) {
  const res = await fetch("/api/ai-chat", {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({
      content: postContent,
      messages: conversationHistory,
    }),
  });

  // Stream the SSE response token-by-token
  const reader = res.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const chunk = decoder.decode(value);
    // Parse SSE data lines and append tokens to the UI
    appendTokenToUI(chunk);
  }
}

One nice thing here: the post content is read directly from the rendered DOM (article element), so I don’t need to pass anything extra as props. The component is completely self-contained.

Keeping it free with AI Gateway

Workers AI gives you 10,000 free neurons per day. A typical request to the llama model costs around 60 neurons, which works out to roughly 166 free requests before charges kick in. For a personal blog that’s fine - until a post gets picked up somewhere and suddenly 500 people hit the summarize button in an hour. At that point you’re paying.

I wanted to avoid that scenario entirely without writing any custom quota-tracking code. Turns out Cloudflare has a native solution for this called AI Gateway.

What is AI Gateway?

AI Gateway is a proxy layer that sits between your Worker and the AI provider. For Workers AI it gives you:

The really nice part is that connecting it to your Worker is literally one extra argument to env.AI.run().

Setting it up

Step 1 - Create the gateway

In the Cloudflare Dashboard, go to AI → AI Gateway → Create Gateway. Give it a name - I used my-blog.

Step 2 - Add the gateway ID to wrangler.jsonc

// wrangler.jsonc
{
  "vars": {
    "AI_MODEL": "@cf/meta/llama-3.1-8b-instruct-fp8",
    "AI_GATEWAY_ID": "my-blog"
  }
}

Step 3 - Pass the gateway option to env.AI.run()

// src/pages/api/ai-chat.ts  - the only code change needed
const gatewayId = env.AI_GATEWAY_ID ?? "";
const gatewayOptions = gatewayId ? { gateway: { id: gatewayId } } : {};

const stream = await env.AI.run(
  model,
  { messages: allMessages, stream: true, max_tokens: 1024 },
  gatewayOptions,
);

That’s really it. No API token, no changing the request URL — the Workers AI binding handles authentication automatically when you use it this way. See the Cloudflare AI Gateway - Workers Binding docs for more detail.

Step 4 - Set a rate limit in the dashboard

In the gateway settings, enable Rate Limiting. The dashboard only supports up to a 1-hour window (no daily option), so set it per hour and keep the daily math in mind:

SettingValue
Requests6
Window1 Hour
TypeFixed

6 requests/hour × 24 hours = 144 requests/day × ~60 neurons = ~8,640 neurons/day, which stays comfortably under the 10,000 free tier. When the limit is hit, the gateway returns a 429 response and stops forwarding requests.

p.s. Even though you may want to strictly stay under the free tier, I recommend bumping the limit up a bit (e.g. 15 req/hour) just to avoid accidentally blocking real users if you hit the limit. The cost of a few extra requests is negligible, and it gives you some breathing room.

Handling the 429

When the gateway blocks a request, the Worker catches the error and returns something readable:

// In the catch block of the API route
const errMsg = e instanceof Error ? e.message : String(e);
if (errMsg.includes("429") || errMsg.toLowerCase().includes("rate limit")) {
  return new Response(
    JSON.stringify({ error: "Daily AI usage limit reached. Please try again tomorrow." }),
    { status: 429 }
  );
}

The frontend checks for status 429 and shows the message directly - no ugly stack traces showing up in the UI.

Caching is a nice bonus

AI Gateway also caches responses by default. If 10 different visitors all ask “summarize this post” on the same day, only the first request actually hits Workers AI. The rest get the cached response instantly, for free. This helps stretch that 150 request budget even further.

What does it actually cost?

ScenarioNeurons usedCost
144 requests/day × 60 neurons (6/hr × 24hr)~8,640$0.00 (under free tier)
Cache hit (repeated question)0$0.00
Over limit (rate-limited by gateway)0$0.00

For a personal blog, 150 requests a day is plenty. And once you’ve set the rate limit, you can forget about it.

Seeing it in the dashboard

After deploying, the AI Gateway dashboard started filling in with real data - request counts, token breakdown (input vs output), calculated cost, cache hit rate. It was the first time I could see exactly what an AI feature was costing me in real time, right down to the sub-cent level.

Honestly that visibility alone made the AI Gateway integration worth it, even if the rate limiting wasn’t needed.

Source code

The full implementation is open source:

hossains-dev-bytes

Astro blogging site hosted using Cloudflare workers.

Astro
TypeScript
0

If you want to dig into the details, main files are:


If you’re running a blog on Cloudflare Workers already, adding this is pretty low effort using AI Agents. Give it a try 👍🏽

Next
Moving away from Medium to self-hosted blogging for good