This morning I was late for work, but I got diffused by something worthwhile. I stumbled across a be

July 29, 2025

This morning I was late for work, but I got “diffused” by something worthwhile. I stumbled across a beautifully crafted video that provides excellent intuition on how AI-based image generation actually works.

In the video, Stephen Welch takes us on a tour starting with the CLIP model, explaining how this model combines vision and language, then dives deeper into diffusion models. He finishes by providing intuition on how prompts guide image and video generation models toward desired outcomes.

Yes, the video shows source code, discusses math & physics, and dives into the underlying academic papers, but it really provides beautiful visualizations that explain the underlying concepts. I don’t pretend to understand everything right now—actually, watching this opened many doors for diving deeper into certain aspects—but it left me with a much better intuition of how these things work.

Give it a try and let me know what you got out of it!

I found the video on the 3Blue1Brown YouTube Channel [1], but it’s actually a guest video provided by Welch Labs YouTube Channel [2]. It’s super nicely crafted, and both channels have many gems to learn from. The actual video can be found at [3].

Mariano this reminds me of your “Whiteboarding the Transformer Architecture” series—looking forward to the next episode!

#AI #DiffusionModels #Visualization #AWSomeVoices

[1] 3Blue1Brown YouTube Channel: https://lnkd.in/ePfpP7wZ [2] Welch Labs YouTube Channel: https://lnkd.in/e8T3n822 [3] Video “But how do AI images/videos actually work? “: https://lnkd.in/ett9-hhF]

Cross-posted to LinkedIn