How LLMs Think (Spoiler: They Don't)

The Million Dollar Question

What happens when you type “Write me a poem about pizza” into ChatGPT?

If you said “it understands your deep yearning for pepperoni and crafts a creative response,” I have bad news: you’ve been lied to.

LLMs don’t understand anything. They don’t think. They don’t know what pizza is. They’ve never tasted cheese. They’re just really, really good at one thing: predicting the next word.

Mind Blown

The World’s Most Expensive Autocomplete

Remember your phone’s keyboard suggestions? The ones that turn “I’m on my” into “I’m on my way”?

LLMs are that, but on steroids. And Red Bull. And training on the entire internet.

Here’s the mental model:

Input: "The capital of France is"
LLM thinking: "Based on 45,000 Wikipedia articles, the next word is 99.9% likely to be..."
Output: "Paris"

It’s not looking up facts. It’s not reasoning. It’s pattern matching at an absurd scale.

Tokens: The Building Blocks 🧱

LLMs don’t read words—they read tokens. A token is roughly 3-4 characters, or “a chunk of a word.”

Text	Tokens
”Hello”	1 token
”ChatGPT”	2 tokens: “Chat” + “GPT"
"Supercalifragilisticexpialidocious”	7 tokens (and a headache)

The “Goldfish Memory” Problem

Every LLM has a context window—a maximum amount of text it can hold in its “brain” at once.

When your conversation exceeds this limit, the model literally forgets the beginning. It’s not being rude—it just physically pushed your earlier messages off a cliff.

Memory Erasure (The LLM forgetting your name after 4000 tokens)

Attention: The Real Magic ✨

So how does “next word prediction” produce coherent essays? The secret sauce is Attention.

Imagine you’re at a loud cocktail party. You can hear everyone, but you pay attention only to the person saying your name.

LLMs do this with words. When generating a response, the model looks back at all previous tokens and decides which ones are “relevant” to the current word it’s trying to spit out.

If I say: “The doctor took her stethoscope…” The model connects “her” to “doctor”. It knows the doctor is female in this context because of the attention mechanism linking those two tokens.

Why They Hallucinate (Lying with Confidence)

Here’s the uncomfortable truth: LLMs don’t know what they don’t know.

When you ask an LLM about something it wasn’t trained on, it doesn’t say “I don’t know.” Instead, it predicts the most statistically likely series of words.

You: "Who is the CEO of The Made Up Company Inc?"
LLM: "The CEO of The Made Up Company Inc is John Smith, appointed in 2021."

Why?! Because “John Smith” and “appointed in” are words that frequently appear near “CEO” in its training data. It’s not lying; it’s improv.

🤓 The “Danger Zone” (Math Ahead)

Warning: The following section contains linear algebra. Proceed at your own risk.

The core of transformer-based LLMs is the self-attention mechanism.

The Formula of Doom

\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

Translation for humans:

Q (Query): What am I looking for? (“I need a noun”)
K (Key): What do I have? (“I am the word ‘Apple’”)
V (Value): What information do turn over? (“I am a red fruit”)

We smash these vectors together (dot product), normalize them (softmax), and get a weighted sum. It’s basically a giant, mathematical matchmaking service for words.

Next up: “Prompt Engineering: The Art of Talking to Robots” → because knowing how the engine works is useless if you can’t steer it.