You already know how LLMs work — text into tokens, tokens into math, predict the next one. Image generation uses the same broad ideas but flips the training game: instead of predicting the next token, the model learns to predict and remove noise. Starting from pure static, it chips away — step by step — until a coherent image emerges. What does Michelangelo have to do with any of this? More than you’d think. This is how image diffusion models work, in 20 minutes.
Full shownotes at fragmentedpodcast.com.
Show Notes #
- Episode 303 - How LLMs work in 20 minutes - text generation
- VAE - Variational Autoencoder
- RGB Color model - wikipedia
- Word2Vec technique - wikipedia
- Efficient Estimation of Word Representation - original Word2Vec paper by Mikolov et al.
- High-Resolution Image Synthesis with Latent Diffusion Models - Rombach et al. (2022) — the paper behind Stable Diffusion
- Image Training data
- Michelangelo
Get in touch #
We’d love to hear from you. Email is the best way to reach us or you can check our contact page for other ways.
We want to hear all the feedback: what’s working, what’s not, topics you’d like to hear more on.
Co-hosts: #
Ep. #300. Listen to that episode for the full story behind our new direction.