🎙️ In a fantastic interview 'How AI will change software engineering – with Mart

November 25, 2025

🎙️ In a fantastic interview “How AI will change software engineering – with Martin Fowler” at the Pragmatic Engineer Podcast [1], Martin Fowler highlights the non-determinism introduced by Agentic AI as the big challenge for adoption. So big of a challenge that he compares it to the evolution from assembler code to higher-level programming languages.

🤔 Indeed, I see a lot of customers struggling with this non-determinism. What is correct? How do I evaluate a system? What about cascading effects in multi-agent systems?

✅ So I agree with Martin - non-determinism is hard to deal with. However, I slightly disagree with his side remark that non-determinism in non-AI software only happens in esoteric corner cases like race conditions. In a world of distributed systems, this becomes rather the norm - we just have mechanisms in place to deal with it by now.

🧠 But why is there non-determinism in LLMs in the first place? At first glance, it’s obvious: LLMs essentially operate on probabilities, so we get non-deterministic answers. But wait - shouldn’t we get the exact same output with the exact same input? After all, those probabilities are fixed in the LLM and don’t change. 🌡️ Yes, but as part of the input for the LLM (usually not visible to the standard user), we also provide a parameter called temperature. Setting this parameter to 0 should give us the same answer every time. But we don’t always get that.

🔬 Horace He, in collaboration with others at Thinking Machine, dives deeper into this in their blog post “Defeating Nondeterminism in LLM Inference” [2]. A fascinating piece of work:

“Large language model (LLM) inference is often surprisingly nondeterministic—even at zero temperature, supposedly a deterministic mode, repeated requests can yield different outputs. This nondeterminism isn’t just due to floating-point non-associativity and parallel execution, even though those can cause minor numerical discrepancies. Instead, the primary cause is lack of batch invariance: the batch size used by inference servers changes in response to system load, and many low-level GPU and framework kernels produce outputs that depend on batch size, even for a single prompt—making outputs unpredictably variable for users.”

A bit technical, but really a great read. And a good thing to know as this helps us better deal with non-determinism and possibly embrace it.

❓ 𝗜𝘀 𝗻𝗼𝗻-𝗱𝗲𝘁𝗲𝗿𝗺𝗶𝗻𝗶𝘀𝗺 𝗮𝗹𝘄𝗮𝘆𝘀 𝗮 𝗯𝗮𝗱 𝘁𝗵𝗶𝗻𝗴?

Actually, it’s not. It is if we want to achieve 100% repeatable results. But if we want to explore new options or variations, it’s not necessarily bad.

🎯 𝗦𝗼 𝗶𝘁 𝗮𝗹𝘄𝗮𝘆𝘀 𝗱𝗲𝗽𝗲𝗻𝗱𝘀 𝗼𝗻 𝘆𝗼𝘂𝗿 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲.

What is your experience with the non-determinism of LLM? Did you watch/listen to the interview? Got a chance to work through the blog post?

Cross-posted to LinkedIn