Making My Website AI-Agent Friendly — Here's What Changed
The Test That Failed
Last weekend, I pointed an AI agent at my own blog and asked it a simple question about an article I’d just published — my hands-on experiment with self-reflection on Amazon Bedrock [12]: “What scored 3/15 and why?”
The agent received 29,099 bytes of HTML. After stripping navigation, CSS, scripts, headers, and footers, only about 4,600 characters of actual content remained — 69% of the response was noise. The agent consumed 6,083 input tokens, then gave a confused answer about “personal growth.” It couldn’t find the article content buried in the markup.
# Before: Agent receives HTML
$ curl -sI https://schristoph.online/blog/when-thinking-twice-helps/ | grep content-type
content-type: text/html
# 29,099 bytes. 69% noise. Agent answer: wrong.
This bothered me. Not because the agent was dumb — but because I was serving it the wrong format. My website was built for browsers. AI agents aren’t browsers.
Same question, same model, same URL — different format, different result.
There’s an irony here: I recently wrote about running my entire workday through an AI agent [13] — meetings, research, CRM, content creation. The agent that writes my articles couldn’t properly read my website. The tool I use to produce content was being failed by the infrastructure I use to publish it.
The Numbers Behind the Shift

Two audiences, one website — the traffic is shifting.
This isn’t just my problem. The web is experiencing a fundamental traffic shift, and most website owners can’t even see it happening.
AI agents are already consuming the web at scale. DataDome recorded 7.9 billion AI agent requests in January and February 2026 alone — a 5% increase over Q4 2025. For some enterprises, agentic traffic now accounts for nearly 10% of total traffic [1].
The industry breakdown from Adobe Analytics tells the story [2]:
| Industry | AI Traffic Growth (YoY) |
|---|---|
| Retail | 4,700% |
| Travel & Leisure | 3,500% |
| Financial Services | 266% |
| Tech & Software | 120% |
| Media & Entertainment | 92% |
But here’s the catch: 89% of e-commerce stores can’t even see their AI traffic — it shows up as “Direct” in Google Analytics [3]. The traffic is there. The attribution is broken.
Meanwhile, the other side of the equation is equally dramatic. Traditional search referrals have declined 60% for small publishers over the past two years, 47% for medium publishers, and 22% for large ones [4]. Google’s market share dropped from 89% in 2023 to 71% by Q4 2025 [5]. Nearly 59% of users say AI has reduced or replaced their use of traditional search engines [6].
Media & Entertainment: The Sharpest Edge
As a Solutions Architect working in the Media & Entertainment sector, this hits close to home. A Growtika study analyzed 10 major US tech media sites and found a 58% collective decline in organic visits — from 112 million monthly at peak to 47 million by January 2026 [7]. Digital Trends lost 97% of its traffic. ZDNet lost 90%. The Verge lost 85%.
The pattern is clear: content that AI can synthesize and serve directly — how-to guides, product reviews, factual summaries — is being consumed inside AI interfaces rather than on publisher websites. The Retail Economics report (co-published with AWS, Botify, and DataDome) found that OpenAI crawls retail sites 198 times for every 1 visit it sends back. Google’s ratio is 6:1 [8]. AI systems are reading your content at massive scale — they’re just not sending visitors.
This creates a strategic choice: you can fight the shift (block AI crawlers, like Amazon did in August 2025), monetize it (Cloudflare’s pay-per-crawl), standardize it (IAB Tech Lab’s CoMP framework), or optimize for it — make your content easy for agents to consume so they represent you accurately.
I chose to optimize.
Three Layers of Agent-Friendliness
There are fundamentally three architectural approaches to serving AI agents, each putting the conversion logic in a different place:
| Approach | Where | Example | Best for |
|---|---|---|---|
| Origin-side generation | Build pipeline | Hugo .md output |
Full control, content curation |
| CDN gateway conversion | Edge | Cloudflare Markdown for Agents | Zero-effort, flip a switch |
| Browser-native declaration | Client | WebMCP (navigator.modelContext) |
Interactive apps, authenticated sessions |
The agent-friendliness stack: discovery, consumption, interaction — each layer builds on the previous.
Cloudflare’s approach is the easiest: flip a switch, and their CDN converts HTML to Markdown on-the-fly when an agent sends Accept: text/markdown. Zero origin changes. But it’s a mechanical conversion — you don’t control what the agent sees, and it doesn’t work for JavaScript-rendered content. It’s the right choice for sites that can’t change their build pipeline.
WebMCP is the most ambitious: a proposed W3C standard (Chrome 146 preview, February 2026) where websites declare callable tools — search, checkout, subscribe — that agents can invoke programmatically. It’s the future for interactive web apps, but overkill for serving blog posts. I include it here to complete the architectural picture, not as an actionable recommendation today.
I went with origin-side generation — generating clean Markdown at build time alongside HTML. For a Hugo static site, this is nearly free: the content is already in Markdown. I just needed Hugo to output it in a second format. Full control over what agents see, no vendor dependency, works with any CDN. The same approach works on Netlify, Vercel, or any static hosting — the Hugo templates are CDN-agnostic. Only the CloudFront Function is AWS-specific, and it’s 20 lines of JavaScript that translates to any edge compute platform.
What I Built

Two doors, same content — the agent picks the one it can read.
Three layers, each building on the previous:
Layer 1: Hugo Markdown Output Format
Hugo already has my content in Markdown. I added a custom output format [15] that emits .md files alongside HTML for every blog post. Now every post at /blog/my-post/ also exists at /blog/my-post/index.md — clean, structured, no HTML noise.
First, define the output format in config.toml:
[outputFormats.markdown]
mediaType = "text/markdown"
baseName = "index"
isPlainText = true
[outputs]
page = ["HTML", "markdown"]
Then create a template at layouts/blog/single.markdown.md that strips Hugo shortcodes and outputs clean Markdown:
# {{ .Title }}
{{ .Date.Format "2006-01-02" }}
{{ $content := .RawContent -}}
{{- /* Convert figure shortcodes to standard Markdown images */ -}}
{{- $content = $content | replaceRE `\{\{<\s*figure\s+src="([^"]+)"\s+alt="([^"]*)"[^>]*>\}\}` "" -}}
{{- /* Make relative paths absolute */ -}}
{{- $base := .Site.BaseURL | strings.TrimSuffix "/" -}}
{{- $content = $content | replaceRE `\!\[([^\]]*)\]\((/[^)]+)\)` (printf "" $base) -}}
{{ $content }}
The key challenge: Hugo shortcodes like {{< figure >}} aren’t valid Markdown. The template uses replaceRE to convert them to standard  image syntax. Relative paths are made absolute so the Markdown works when consumed outside the site context.
Layer 2: Auto-Generated llms.txt
A Hugo template generates /llms.txt at build time — a structured site index following the llms.txt specification [16]. It lists every blog post with a title, description, and link to the .md version. When I publish a new post, the index updates automatically.
Add the output format to config.toml:
[outputFormats.llmstxt]
mediaType = "text/plain"
baseName = "llms"
suffix = "txt"
isPlainText = true
[outputs]
home = ["HTML", "llmstxt"]
The template at layouts/_default/index.llmstxt.txt:
# {{ .Site.Title }}
> {{ .Site.Params.description }}
## Blog Posts
{{ range (where .Site.RegularPages "Section" "blog").ByDate.Reverse -}}
- [{{ .Title }}]({{ .Permalink }}index.md): {{ .Summary | plainify | truncate 200 }}
{{ end }}
## Optional
- [About]({{ "about/" | absURL }})
$ curl https://schristoph.online/llms.txt | head -8
# schristoph.online
> Personal homepage and blog of Stefan Christoph
## Blog Posts
- [When Thinking Twice Helps — And When It Doesn't](https://schristoph.online/blog/when-thinking-twice-helps/index.md): ...
- ["It's Faster If I Just Do It Myself" — The Most Expensive Sentence in AI](https://schristoph.online/blog/ai-agents-require-patience/index.md): ...
- [The AI Investment Paradox — A 1962 Book Explains Why Billions Don't (Yet) Deliver](https://schristoph.online/blog/the-ai-investment-paradox/index.md): ...
Layer 3: CloudFront Content Negotiation
I updated the existing CloudFront Function [17] to inspect the Accept header. When an AI agent sends Accept: text/markdown, the function rewrites the URI to serve the .md file instead of index.html. This is a viewer-request function running in CloudFront’s JavaScript 2.0 runtime [18]:
function handler(event) {
var request = event.request;
var uri = request.uri;
var headers = request.headers || {};
// Check if Accept header contains text/markdown
var wantsMarkdown = false;
if (headers['accept'] && headers['accept'].value) {
wantsMarkdown = headers['accept'].value.toLowerCase().indexOf('text/markdown') !== -1;
}
// For blog paths, serve index.md if markdown requested
if (uri.endsWith('/')) {
request.uri += wantsMarkdown && uri.startsWith('/blog/') && uri !== '/blog/'
? 'index.md'
: 'index.html';
} else if (!uri.includes('.')) {
request.uri += wantsMarkdown && (uri + '/').startsWith('/blog/')
? '/index.md'
: '/index.html';
}
return request;
}
The function only serves Markdown for blog post paths — not for the homepage, about page, or other non-content pages where Markdown wouldn’t make sense.
For this to work, the Accept header must be part of the CloudFront cache key [19] — otherwise CloudFront would cache the first response (HTML or Markdown) and serve it to everyone. In CDK:
const contentNegotiationCachePolicy = new cloudfront.CachePolicy(
this, 'ContentNegotiationCachePolicy', {
headerBehavior: cloudfront.CacheHeaderBehavior.allowList('Accept'),
queryStringBehavior: cloudfront.CacheQueryStringBehavior.none(),
cookieBehavior: cloudfront.CacheCookieBehavior.none(),
defaultTtl: cdk.Duration.days(1),
maxTtl: cdk.Duration.days(365),
}
);
# Same URL, different content based on Accept header:
$ curl -sI https://schristoph.online/blog/technology-spirals/ | grep content-type
content-type: text/html
$ curl -sI -H "Accept: text/markdown" https://schristoph.online/blog/technology-spirals/ | grep content-type
content-type: text/markdown; charset=utf-8
The entire stack runs on S3 + CloudFront + a CloudFront Function. No Lambda, no server-side processing, no additional cost beyond the function invocations ($0.10 per million requests).
The full stack: Hugo generates both formats, CloudFront serves the right one.
The Before/After

Same content, 69% less noise — the agent gets signal, not markup.
Same model (Amazon Nova Micro), same question, same URL. The only difference: what format the agent received.
| Metric | Before | After | Change |
|---|---|---|---|
| Content served | HTML | Markdown | |
| Response size | 29,099 bytes | 15,580 bytes | 46% smaller |
| Noise ratio | 69% | ~0% | Clean signal |
| Input tokens | 6,083 | 2,957 | 51% fewer tokens |
| Agent answer | ❌ Wrong | ✅ Correct |
The “before” agent gave a confused answer about “personal growth.” The “after” agent correctly identified that the MCP question scored 3/15 because the model hallucinated “Model Confidence Prediction” instead of Model Context Protocol — and explained why self-reflection couldn’t fix a knowledge gap.
Half the tokens. Correct answer. Same URL.
One caveat on methodology: the before-state agent actually received the homepage HTML rather than the article HTML — my CloudFront config serves a fallback page for non-existent paths. That’s a realistic failure mode (many static sites behave this way), but a fairer comparison would be agent-on-actual-article-HTML vs agent-on-Markdown. Even in that case, the 69% noise ratio of the article’s HTML page means significant token waste and degraded comprehension.
The Honest Reality Check
Here’s where I need to be transparent: almost no AI agent actually uses any of this today.
Dries Buytaert (Drupal/Acquia founder) made every page on his site available as Markdown in January 2026, then analyzed a month of logs [9]. His findings:
- llms.txt: 52 requests per month. Every single one from SEO audit tools. Zero from AI crawlers.
- Content negotiation: Zero requests with
Accept: text/markdown. Not one. - Markdown URLs: Some bots fetch them (GPTBot: 34.8% of requests as
.md), but serving Markdown increased total bot traffic by 7% — bots crawl both versions. - Across Acquia’s entire hosting fleet: ~5,000 llms.txt requests out of 400 million total (0.001%).
Flavio Longato (SEO Strategist at Adobe) found the same pattern: zero .md requests from LLM bots on high-authority sites, even when listed in llms.txt [10]. SonicLinker analyzed 2 million AI-agent requests and found zero requests for /llms.txt [11].
The consensus from independent researchers: AI agents don’t use llms.txt or content negotiation yet. They fetch normal HTML pages and parse them directly.
So Why Did I Do It Anyway?
Three reasons:
1. The cost is near-zero. For a Hugo site, generating .md output is a config change. The llms.txt template took minutes. The CloudFront Function is a few lines of JavaScript. The ongoing cost is negligible — CloudFront Functions cost $0.10 per million invocations, and the cache key split has minimal impact on a small blog.
2. The quality difference is real. When an agent does get Markdown instead of HTML, the improvement is dramatic — 51% fewer tokens, correct answers instead of confused ones. And it’s not just about size: Markdown preserves semantic structure (headings, code blocks, lists, links) that plain text extraction loses. The agent gets structured content with zero processing overhead. The infrastructure is ready for when adoption catches up.
3. The ecosystem is moving fast. Google shipped WebMCP in Chrome 146 (February 2026). Cloudflare launched Markdown for Agents. The IAB Tech Lab released the CoMP framework for publisher-AI commercial agreements. The standards are being built right now. Being early means being ready. And there’s a chicken-and-egg dynamic: AI agents don’t request Markdown today because almost no website offers it. If agents could reliably find clean Markdown across the web, they’d be built to prefer it — it’s cheaper, cleaner, and more accurate for them too. Someone has to go first.
And honestly — it made for a good experiment and a better article.
One more thing worth noting: Dries Buytaert found that serving Markdown increased total bot traffic by about 7% — bots crawl both versions [9]. You’re accepting a small bandwidth increase. For a static site on S3 + CloudFront, the cost is negligible. The question is whether it’s worth serving more data to bots that don’t send visitors. If you view AI agents purely as traffic sources, probably not. If you view them as a discovery and representation layer — ensuring your content is accurately represented when agents synthesize answers — then yes. The 198:1 crawl-to-visit ratio means AI systems are already consuming your content. Better they consume a clean version than a noisy one.
The Bigger Picture
The web is splitting into two audiences: humans who read HTML in browsers, and agents who consume structured data through APIs and protocols. Content negotiation isn’t new — Accept-Language has served different languages from the same URL for decades. As I explored in an earlier post about technology spirals [14], the patterns keep repeating — we’re applying a decades-old HTTP mechanism to a new consumer type.
For content sites, the stack is forming:
| Layer | Standard | Purpose | Status |
|---|---|---|---|
| Discovery | llms.txt | “What’s on this site?” | Proposed, low adoption |
| Consumption | Markdown serving | “Give me the content” | Works today, growing |
| Interaction | WebMCP | “Let me do things on this site” | Chrome preview only |
For my blog, layers 1 and 2 are live. Layer 3 (WebMCP) would matter if I add interactive features — search, newsletter signup, content filtering. That’s a future experiment.
For enterprise content platforms — media companies, publishers, e-commerce — the stakes are higher. If AI agents mediate discovery and your content isn’t agent-readable, you’re invisible to a growing share of your audience. The 198:1 crawl-to-visit ratio means AI systems are already consuming your content. The question is whether they’re doing it efficiently and accurately.
Try It Yourself
If you run a Hugo site on AWS:
- Add a Markdown output format in
config.toml— emit.mdalongside HTML - Create an llms.txt template — auto-generated site index linking to
.mdversions - Update your CloudFront Function — inspect
Accept: text/markdown, rewrite URI - Add
Acceptto your cache policy — so HTML and Markdown are cached separately
The full implementation plan and test script are described in this article — all code snippets above are copy-paste ready.
Then test it: point an AI agent at your site before and after. The difference speaks for itself.
💬 Have you made your website agent-friendly? What approach did you take — and did you see any impact?
Sources:
[1] DataDome — “AI Agent Traffic Surging” (March 2026): securityinfowatch.com
[2] Adobe Analytics — “AI Traffic Surges Across Industries” (January 2026): business.adobe.com
[3] Peerlist — “I Analyzed 47 E-commerce Stores’ Attribution Data” (February 2026): peerlist.io
[4] Axios/Chartbeat — “Small Publishers Hit Hardest by Search Traffic Declines” (March 2026): axios.com
[5] Graphite.io via The Starr Conspiracy — “AI Now Accounts for 56% of Global Search Volume” (March 2026): thestarrconspiracy.com
[6] Searcherries — “AI Search Statistics Report” (March 2026): financialcontent.com
[7] Growtika via Quasa.io — “The Media Industry’s Double Bind” (March 2026): quasa.io
[8] Retail Economics / AWS / Botify / DataDome — “The Future of Search and Discovery” (March 2026): ppc.land
[9] Dries Buytaert — “Markdown, llms.txt and AI crawlers” (March 2026): dri.es
[10] Flavio Longato — “Do LLMs Use .md Files?” (August 2025): longato.ch
[11] SonicLinker — “We analyzed 2M AI-agent requests. None asked for llms.txt.” (February 2026): soniclinker.com
[12] My hands-on experiment with self-reflection on Bedrock — “When Thinking Twice Helps — And When It Doesn’t”: schristoph.online
[13] My earlier post on running an SA workday through an AI agent — “The Coding Agent That Doesn’t Code”: schristoph.online
[14] My earlier post on recurring technology patterns — “Technology Evolution Doesn’t Move in a Straight Line — It Spirals”: schristoph.online
[15] Hugo Custom Output Formats documentation: gohugo.io
[16] llms.txt Specification (Jeremy Howard, Answer.AI): llmstxt.org
[17] CloudFront Functions event structure — Amazon CloudFront Developer Guide: docs.aws.amazon.com
[18] CloudFront Functions JavaScript runtime 2.0: docs.aws.amazon.com
[19] Understanding cache policies — Amazon CloudFront Developer Guide: docs.aws.amazon.com