Blog

Software Fundamentals Matter More Than Ever

The Talk That Confirmed What I’ve Been Seeing

Classic software engineering books next to a laptop running an AI coding agent — The books haven’t changed. The principles haven’t changed. The context has.

Matt Pocock stood on stage at the AI Engineer Summit and said something that most of the audience needed to hear: the developers who succeed with AI coding agents aren’t the ones who delegate everything. They’re the ones who fall back on engineering fundamentals [1].

Blog

MCP Sampling & Elicitation: When Servers Talk Back

From Request-Response to Collaboration

Two entities in conversation, representing server-initiated collaboration in MCP — MCP evolves: servers don’t just respond anymore. They ask questions back.

When I wrote about the CLI vs MCP debate [1], I focused on the infrastructure patterns underneath. But MCP itself has been evolving, and the latest additions change what’s architecturally possible.

The Model Context Protocol started as a clean way for AI agents to call tools: agent sends request, server returns response. Simple, stateless, effective. But real-world agent workflows need more than request-response. They need the server to ask questions back.

Blog

Nvidia's Real Moat: What Jensen Huang Told Dwarkesh Patel

Electrons In, Tokens Out

A layered cake made of technology layers from energy to applications — AI is a five-layer cake. Nvidia sits in the middle.

Long weekend drive, sunny weather, and nearly two hours of Jensen Huang arguing with Dwarkesh Patel about whether Nvidia’s moat will hold. As far as podcast entertainment goes, it doesn’t get much better than watching two sharp minds disagree about the future of the AI industry while you’re cruising through the countryside.

Blog

Self-Improving Models: What MiniMax M2.7 Actually Does

The Headline vs The Reality

Recursive loop visualization of a model improving its own training process — Self-evolution: the model improves the process that improves the model.

“Model trains itself over 100+ autonomous cycles.” That was the headline when MiniMax released M2.7 on March 18, 2026 [1]. It sounds like science fiction: a model bootstrapping its own intelligence in a recursive loop.

The reality is more nuanced, more interesting, and more relevant to how we’ll build AI systems in the near future.

Blog

The Citation Crisis: What AI Hallucinations Mean for Your Enterprise

The Reference I Almost Didn’t Check

A few days ago, I was reviewing an article my AI agent had drafted. The sources section looked clean: numbered references, proper formatting, plausible titles. One citation pointed to an AWS blog post about a feature I’d never heard of. The title sounded right. The URL structure looked legitimate.

I clicked it. 404.

The blog post didn’t exist. The agent had fabricated a reference that looked exactly like a real AWS publication: correct URL pattern, plausible title, appropriate date. If I hadn’t clicked, it would have gone into a published article with my name on it.

Blog

From Cloud-Native to AI-Native: What Actually Changes

The Fifteen-Year Echo

Split-screen of a 2010 tech conference versus a 2025 stage with holographic AI agents — Fifteen years apart. Same stage. Different world.

In 2010, Adrian Cockcroft stood on the QCon stage and told the audience that Netflix was running its entire business on a public cloud. Most people in the room thought he was crazy.

Fifteen years later, Cockcroft was back at QCon, this time explaining how he manages swarms of autonomous AI agents that produce several days’ worth of code in fifteen minutes [1]. The audience reaction was different. Nobody called him crazy. They were taking notes.

Blog

The Protocol We Should Have Built for Humans

Namaste from 6,165 Meters

I just summited Imja Tse (Island Peak, 6,165 meters) in Nepal. No Slack, no email, no MCP servers crashing in the background. Just ice, thin air, and the kind of clarity that only comes when every step costs you something.

At that altitude, you don’t tolerate inefficiency. Every piece of gear earns its place or stays behind. Every movement is deliberate. You can’t afford to fumble with equipment that doesn’t work the first time.

Blog

Is RAG Still Needed with 1M+ Token Context Windows?

The Kofferklausur, Revisited

In September 2024, a colleague asked an audience: “What is RAG?” I answered: Kofferklausur [1].

For non-German speakers: a Kofferklausur is an open-book exam. You bring your textbooks, notes, everything. The exam doesn’t test what you memorized — it tests whether you can find the right information and reason about it under pressure.

That analogy stuck with me. A foundation model is the student. RAG is the suitcase full of books. The model doesn’t need to memorize every fact — it needs to know how to find the right one and reason about it. Special-purpose tools beat the Swiss Army knife.

Blog

LLMs Don't Do Math — They Predict What Math Looks Like

The Invisible Error

To test this, I designed five calculations that anyone in business might ask an AI assistant — the kind of questions you’d type into ChatGPT or Claude expecting a quick, reliable answer:

Simple arithmetic — 7 × 8 (baseline sanity check)
A discount calculation — “What’s the final price of a €249.99 item with 15% off?” (retail, e-commerce)
Compound interest — “How much is €10,000 worth after 7 years at 3.5%?” (investment planning)
A mortgage payment — “What’s the monthly payment on a €250,000 loan at 3.8% over 25 years?” (the kind of number people make life decisions on)
Standard deviation — of a 10-number dataset (basic statistics, common in reporting)

I ran each calculation through two models on Amazon Bedrock: Amazon Nova Micro ($0.046/1M input tokens) and Claude Sonnet 4 ($3.00/1M input — roughly 65x more expensive). Prices are on-demand rates at the time of writing [4]. The choice of models isn’t a judgment on either — both are excellent at what they’re designed for. The point is to show that this is a structural limitation of how language models work, not a quality issue with any specific model. A small model gets it wrong more often. A large, expensive model gets it wrong less often. But neither is computing — both are predicting. The error shrinks with scale but doesn’t disappear, because the architecture is fundamentally probabilistic.

Blog

Your AI Models Have an Expiry Date — A Practical Guide to Model Lifecycle Management

Introduction — The Promise I Made

In my previous article [1], I explored the maintenance trap in IT — how software systems are more like plants than stones, requiring constant care. I ended with a cliffhanger: “What is open from the article is how to specifically test and evaluate models — something to be picked up in the next article.”

This is that article.

Since publishing the first piece, something happened that made this topic very real for many of my customers. Anthropic announced the deprecation of Claude 3.5 Sonnet — a model that had become the backbone of countless production applications. Teams that had built their systems around a specific model version suddenly faced a hard deadline to migrate. Some were prepared. Most were not.