<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>schristoph.online</title><link>https://schristoph.online/tags/ailiteracy/</link><description>Personal homepage and blog of Stefan Christoph</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>Stefan Christoph. All rights reserved.</copyright><lastBuildDate>Mon, 11 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://schristoph.online/tags/ailiteracy/index.xml" rel="self" type="application/rss+xml"/><item><title>What Reasoning Actually Means (and Why It Matters for Your Architecture)</title><link>https://schristoph.online/blog/what-reasoning-actually-means/?utm=rss-feed</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://schristoph.online/blog/what-reasoning-actually-means/</guid><description>&lt;h2 id="it-started-with-a-saturday-morning-experiment">It Started with a Saturday Morning Experiment&lt;/h2>
&lt;p>I recently ran a simple test. I asked a small language model the same questions three times, with zero, one, and three rounds of self-reflection, and &lt;a href="https://schristoph.online/blog/when-thinking-twice-helps/">published the results&lt;/a>. The pattern was clear: self-reflection helped when the model already knew the topic. It did nothing when it didn&amp;rsquo;t. And on bleeding-edge questions, more thinking just produced more confidently wrong answers.&lt;/p>
&lt;p>That experiment raised a question I couldn&amp;rsquo;t shake: if &amp;ldquo;thinking harder&amp;rdquo; only works sometimes, what exactly is happening when a model reasons, and when is it just pretending?&lt;/p></description></item><item><title>LLMs Don't Do Math — They Predict What Math Looks Like</title><link>https://schristoph.online/blog/llms-dont-do-math/?utm=rss-feed</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://schristoph.online/blog/llms-dont-do-math/</guid><description>&lt;h2 id="the-invisible-error">The Invisible Error&lt;/h2>
&lt;p>To test this, I designed five calculations that anyone in business might ask an AI assistant, the kind of questions you&amp;rsquo;d type into ChatGPT or Claude expecting a quick, reliable answer:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Simple arithmetic&lt;/strong> — 7 × 8 (baseline sanity check)&lt;/li>
&lt;li>&lt;strong>A discount calculation&lt;/strong> — &amp;ldquo;What&amp;rsquo;s the final price of a €249.99 item with 15% off?&amp;rdquo; (retail, e-commerce)&lt;/li>
&lt;li>&lt;strong>Compound interest&lt;/strong> — &amp;ldquo;How much is €10,000 worth after 7 years at 3.5%?&amp;rdquo; (investment planning)&lt;/li>
&lt;li>&lt;strong>A mortgage payment&lt;/strong> — &amp;ldquo;What&amp;rsquo;s the monthly payment on a €250,000 loan at 3.8% over 25 years?&amp;rdquo; (the kind of number people make life decisions on)&lt;/li>
&lt;li>&lt;strong>Standard deviation&lt;/strong> — of a 10-number dataset (basic statistics, common in reporting)&lt;/li>
&lt;/ol>
&lt;p>I ran each calculation through two models on Amazon Bedrock: Amazon Nova Micro ($0.046/1M input tokens) and Claude Sonnet 4 ($3.00/1M input, roughly 65x more expensive). Prices are on-demand rates at the time of writing [4]. The choice of models isn&amp;rsquo;t a judgment on either, both are excellent at what they&amp;rsquo;re designed for. The point is to show that this is a &lt;em>structural&lt;/em> limitation of how language models work, not a quality issue with any specific model. A small model gets it wrong more often. A large, expensive model gets it wrong less often. But neither is &lt;em>computing&lt;/em>, both are predicting. The error shrinks with scale but doesn&amp;rsquo;t disappear, because the architecture is fundamentally probabilistic.&lt;/p></description></item></channel></rss>