Blog

The Book That Made Me Build My Own Website

A Gift to Humanity

An open book with web connections radiating outward, symbolizing the open web — Tim Berners-Lee gave the web away for free. That decision shaped everything.

In 1993, Tim Berners-Lee made a decision that shaped the modern world: he gave the World Wide Web away for free. No patents, no licensing fees, no royalties. CERN released the technology into the public domain, and the web became everyone’s.

Reading his memoir “This Is for Everyone: The Unfinished Story of the World Wide Web” [1], I was struck by how personal that decision was. This wasn’t a corporate strategy. It was a conviction. Berners-Lee believed the web’s value would come from universality, not ownership. The more people who could use it, the more powerful it would become.

Blog

Software Fundamentals Matter More Than Ever

The Talk That Confirmed What I’ve Been Seeing

Classic software engineering books next to a laptop running an AI coding agent — The books haven’t changed. The principles haven’t changed. The context has.

Matt Pocock stood on stage at the AI Engineer Summit and said something that most of the audience needed to hear: the developers who succeed with AI coding agents aren’t the ones who delegate everything. They’re the ones who fall back on engineering fundamentals [1].

Blog

MCP Sampling & Elicitation: When Servers Talk Back

From Request-Response to Collaboration

Two entities in conversation, representing server-initiated collaboration in MCP — MCP evolves: servers don’t just respond anymore. They ask questions back.

When I wrote about the CLI vs MCP debate [1], I focused on the infrastructure patterns underneath. But MCP itself has been evolving, and the latest additions change what’s architecturally possible.

The Model Context Protocol started as a clean way for AI agents to call tools: agent sends request, server returns response. Simple, stateless, effective. But real-world agent workflows need more than request-response. They need the server to ask questions back.

Blog

Nvidia's Real Moat: What Jensen Huang Told Dwarkesh Patel

Electrons In, Tokens Out

A layered cake made of technology layers from energy to applications — AI is a five-layer cake. Nvidia sits in the middle.

Long weekend drive, sunny weather, and nearly two hours of Jensen Huang arguing with Dwarkesh Patel about whether Nvidia’s moat will hold. As far as podcast entertainment goes, it doesn’t get much better than watching two sharp minds disagree about the future of the AI industry while you’re cruising through the countryside.

Blog

Self-Improving Models: What MiniMax M2.7 Actually Does

The Headline vs The Reality

Recursive loop visualization of a model improving its own training process — Self-evolution: the model improves the process that improves the model.

“Model trains itself over 100+ autonomous cycles.” That was the headline when MiniMax released M2.7 on March 18, 2026 [1]. It sounds like science fiction: a model bootstrapping its own intelligence in a recursive loop.

The reality is more nuanced, more interesting, and more relevant to how we’ll build AI systems in the near future.

Blog

The Citation Crisis: What AI Hallucinations Mean for Your Enterprise

The Reference I Almost Didn’t Check

A few days ago, I was reviewing an article my AI agent had drafted. The sources section looked clean: numbered references, proper formatting, plausible titles. One citation pointed to an AWS blog post about a feature I’d never heard of. The title sounded right. The URL structure looked legitimate.

I clicked it. 404.

The blog post didn’t exist. The agent had fabricated a reference that looked exactly like a real AWS publication: correct URL pattern, plausible title, appropriate date. If I hadn’t clicked, it would have gone into a published article with my name on it.

Blog

Is RAG Still Needed with 1M+ Token Context Windows?

The Kofferklausur, Revisited

In September 2024, a colleague asked an audience: “What is RAG?” I answered: Kofferklausur [1].

For non-German speakers: a Kofferklausur is an open-book exam. You bring your textbooks, notes, everything. The exam doesn’t test what you memorized — it tests whether you can find the right information and reason about it under pressure.

That analogy stuck with me. A foundation model is the student. RAG is the suitcase full of books. The model doesn’t need to memorize every fact — it needs to know how to find the right one and reason about it. Special-purpose tools beat the Swiss Army knife.

Blog

LLMs Don't Do Math — They Predict What Math Looks Like

The Invisible Error

To test this, I designed five calculations that anyone in business might ask an AI assistant — the kind of questions you’d type into ChatGPT or Claude expecting a quick, reliable answer:

Simple arithmetic — 7 × 8 (baseline sanity check)
A discount calculation — “What’s the final price of a €249.99 item with 15% off?” (retail, e-commerce)
Compound interest — “How much is €10,000 worth after 7 years at 3.5%?” (investment planning)
A mortgage payment — “What’s the monthly payment on a €250,000 loan at 3.8% over 25 years?” (the kind of number people make life decisions on)
Standard deviation — of a 10-number dataset (basic statistics, common in reporting)

I ran each calculation through two models on Amazon Bedrock: Amazon Nova Micro ($0.046/1M input tokens) and Claude Sonnet 4 ($3.00/1M input — roughly 65x more expensive). Prices are on-demand rates at the time of writing [4]. The choice of models isn’t a judgment on either — both are excellent at what they’re designed for. The point is to show that this is a structural limitation of how language models work, not a quality issue with any specific model. A small model gets it wrong more often. A large, expensive model gets it wrong less often. But neither is computing — both are predicting. The error shrinks with scale but doesn’t disappear, because the architecture is fundamentally probabilistic.

Blog

Security Is Job Zero — Even (Especially) in the Age of Coding Agents

$20 and Two Hours

On February 28, 2026, security startup CodeWall gave an autonomous AI agent a single input: a domain name. Two hours and approximately $20 in API tokens later, the agent had full read/write access to the production database of McKinsey’s internal AI platform, Lilli [1] [2].

The attack vector? SQL injection — a vulnerability class from the 1990s. But in a novel context: the injection was in JSON keys, not values, which standard security scanners missed [3].

Blog

AI Coding Productivity: 10%, Not 10x

The Number Nobody Wants to Hear

A few weeks ago, I wrote about running my entire workday through an AI agent [1] — meetings, research, CRM, content creation. Eight hours of productive work, not a single line of code. The response was overwhelmingly positive. But one comment stuck with me: “If AI agents are this good, why isn’t my team shipping 10x more?”

The answer is now backed by data from multiple independent studies — and it’s not what the vendor pitches suggest.