LLMs for the rescue?! Or are we actually building Compound AI Systems?

January 3, 2025

LLMs rule the world, right?! - Only thing what matters is using the most powerful LLM available and everything falls in place. Looking for numbers - just consult the latest LLM benchmark. Hmm - or do we need to build systems?!

I think it’s not just a matter of choosing an LLM, or any foundation model for that matter, and if you are following me, you already know that. E.g. in my medium post on “How do you choose the foundation model for your Generative AI App — like your car?"[2], I already argued how 1/ LLMs are just one part of your Generative AI application, but the overall application requires so much more components and engineering excellence and 2/ capabilities of frontier models become commodity with a ever increasing pace.

For both reasons you should rather stick to a just good enough model for your use case and build your application based on this instead steadily be on the hunt of the next best model as outlined in “Can I escape the never-ending cycle of “just” toying with new models to production?"[4].

But there is really another thing with bothered me a lot. The M in LLM stands for Model and here things got very fluffy and confusing within our super-focus on Models. In my perspective we started to confuse terms as ChatGPT has been launched. ChatGPT being a fascinating and impactful Generative AI application based initially on a GPT-3.5 model. Still we collegial use the term model for the entire ChatGPT application. And this is happening across multiple “Model” Providers who in fact are providing applications around their models, which you can interact via APIs, providing supporting functionalities like guardrails, CoT Prompting and many more things. Still we list them as models and compare them in model benchmark, totally ignoring that we actually not just compare models.

Just back from a break, I happily today stumbled across a fantastic lecture by Christopher Pots, the Standford Webinar “Large Language Models Get the Hype, but Compound Systems Are the Future of AI” available on Youtube [1]. To summarize the Webinar shortly:

“While large language models (LLMs) dominate AI discussions, the future of artificial intelligence lies in compound systems that integrate multiple components. These systems combine LLMs with tools like calculators, databases, and APIs, proving that smaller models within well-designed systems often outperform larger standalone models. The evolution of AI is moving through distinct phases, from unsupervised training to system-level scaling, with industry trends showing a preference for smaller, more efficient models due to practical considerations like cost and latency. The key to advancement lies not in increasing model size but in creating sophisticated systems that leverage multiple tools and components working in harmony.”

The illustration I’m using here is originated from a BAIR Post The Shift from Models to Compound AI Systems” [3] which predates my own post from June [2].

Fun fact. In his lecture Christopher is using a car analogy similar as I did in [2], but he has clearly the better images :)

Highly recommend to tune into the lecture and get your own impression.

Resources

[1] Standford Webinar “Large Language Models Get the Hype, but Compound Systems Are the Future of AI” https://www.youtube.com/watch?v=vRTcE19M-KE

[2] Medium Post “How do you choose the foundation model for your Generative AI App — like your car?” - https://medium.com/@stefanchristoph/how-do-you-chose-the-foundation-model-for-your-generative-ai-app-like-your-car-18f8a299678c

[3] BAIR Post “The Shift from Models to Compound AI Systems” (February 2024 ) - https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/

[4] LinkedIn Post “Can I escape the never-ending cycle of “just” toying with new models to production?” - https://www.linkedin.com/pulse/can-i-escape-never-ending-cycle-just-toying-new-models-christoph-xwnwe/?trackingId=hgU3GtflS0q7eW9gaPE0vQ%3D%3D

Cross-posted to LinkedIn