You already know how LLMs work — text into tokens, tokens into math, predict the next one. Image generation uses the same broad ideas but flips the training game: instead of predicting the next token, the model learns to predict and remove noise. Starting from pure static, it chips away — step by step — until a coherent image emerges. What does Michelangelo have to do with any of this? More than you’d think. This is how image diffusion models work, in 20 minutes.