๐๏ธ In a fantastic interview 'How AI will change software engineering โ with Mart
๐๏ธ In a fantastic interview “How AI will change software engineering โ with Martin Fowler” at the Pragmatic Engineer Podcast [1], Martin Fowler highlights the non-determinism introduced by Agentic AI as the big challenge for adoption. So big of a challenge that he compares it to the evolution from assembler code to higher-level programming languages.
๐ค Indeed, I see a lot of customers struggling with this non-determinism. What is correct? How do I evaluate a system? What about cascading effects in multi-agent systems?
โ So I agree with Martin - non-determinism is hard to deal with. However, I slightly disagree with his side remark that non-determinism in non-AI software only happens in esoteric corner cases like race conditions. In a world of distributed systems, this becomes rather the norm - we just have mechanisms in place to deal with it by now.
๐ง But why is there non-determinism in LLMs in the first place? At first glance, it’s obvious: LLMs essentially operate on probabilities, so we get non-deterministic answers. But wait - shouldn’t we get the exact same output with the exact same input? After all, those probabilities are fixed in the LLM and don’t change. ๐ก๏ธ Yes, but as part of the input for the LLM (usually not visible to the standard user), we also provide a parameter called temperature. Setting this parameter to 0 should give us the same answer every time. But we don’t always get that.
๐ฌ Horace He, in collaboration with others at Thinking Machine, dives deeper into this in their blog post “Defeating Nondeterminism in LLM Inference” [2]. A fascinating piece of work:
“Large language model (LLM) inference is often surprisingly nondeterministicโeven at zero temperature, supposedly a deterministic mode, repeated requests can yield different outputs. This nondeterminism isn’t just due to floating-point non-associativity and parallel execution, even though those can cause minor numerical discrepancies. Instead, the primary cause is lack of batch invariance: the batch size used by inference servers changes in response to system load, and many low-level GPU and framework kernels produce outputs that depend on batch size, even for a single promptโmaking outputs unpredictably variable for users.”
A bit technical, but really a great read. And a good thing to know as this helps us better deal with non-determinism and possibly embrace it.
โ ๐๐ ๐ป๐ผ๐ป-๐ฑ๐ฒ๐๐ฒ๐ฟ๐บ๐ถ๐ป๐ถ๐๐บ ๐ฎ๐น๐๐ฎ๐๐ ๐ฎ ๐ฏ๐ฎ๐ฑ ๐๐ต๐ถ๐ป๐ด?
Actually, it’s not. It is if we want to achieve 100% repeatable results. But if we want to explore new options or variations, it’s not necessarily bad.
๐ฏ ๐ฆ๐ผ ๐ถ๐ ๐ฎ๐น๐๐ฎ๐๐ ๐ฑ๐ฒ๐ฝ๐ฒ๐ป๐ฑ๐ ๐ผ๐ป ๐๐ผ๐๐ฟ ๐๐๐ฒ ๐ฐ๐ฎ๐๐ฒ.
What is your experience with the non-determinism of LLM? Did you watch/listen to the interview? Got a chance to work through the blog post?
Cross-posted to LinkedIn