How LLMs work

malwrar

Back when ChatGPT came out, I was so shocked by how _good_ it was for an “AI” product that I simply had to know how it worked. Over the next month I ended up drawing out a block diagram on a whiteboard I have in my office, with the math involved next to each step in the blackboard. I’d puzzle about each step along the way, and the triumph of completing the drawing was also that of this sense of deep understanding. I kept that drawing up for many months after, and would gaze at it often during meetings and idle moments in wonder.

This is to say: the autoregressive decoder-only transformer llm architecture as pioneered by openai is wildly simple for how revolutionary its results are. I was reading about non-learned classical SLAM systems (uses video + handcrafted math to produce 3d mappings of physical spaces while also locating the camera in those spaces) at the time, and comparatively speaking I’d say the math is about as complicated as ONE of the components in those complex formulations. The only reason frontier LLMs need 6-figure computers to run is because the model designers made the middle bit in those models REALLY BIG, dimensionally speaking. They just took the steam engine, made a few gargantuan versions of it, and are selling them as the ultimate source of power.

This was openai’s entire breakthrough. Making this particular model architecture larger leads to emergent capabilities like being able to pick the best ending to a story/set of instructions or answer questions about broad factual knowledge. I’ve been meanwhile watching these AI companies attempt, successfully, to sell this capability as some sort of robot consciousness hand-crafted by supergeniuses. The fact that they are getting away with it is almost as shocking to me as the discovery itself.

show comments

miki123211

There's one thing I wish people understood about LLMs, and it doesn't really have anything to do with what's inside the neural network part. It's the fact that LLMs can only write in one direction — forward.

When you are writing an essay and realize midway through a sentence that what you've written doesn't make sense, you go back and edit. An LLM can't do that, the only thing it can do is keep on generating. Because training data typically contains full essays and not half-finished sentences which were then edited, LLMs have a strong preference for "saving face" and producing grammatically correct, internally coherent outputs. They will often do so even if the only way to write themselves out of the corner they wrote themselves into is to lie. To maintain internal coherence, they'll then repeat that lie for the rest of the response.

This is also why changing response structure used to affect LLM performance so dramatically. If you asked an LLM to solve a math problem and all-but-forced it to start with the answer, it would have had to calculate that answer before emitting any tokens, something which it very often wasn't able to do. If it was told to follow up the answer with an explanation, it would produce a plausible-sounding explanation to maintain coherence.

If, on the other hand, it was told to start by "thinking step by step", it would often be able to solve the first step, and then the next one given the results of the first, and so on, until it was able to reach the answer. Because the answer came last, it wasn't committing to anything, so had no reason to "save face" and lie.

This part of the problem is basically solved now with reasoning; reasoning is where all the step-by-step stuff happens, even if users aren't always able to see it. In the process of RLVR, models even train themselves into outputting phrases like "let me check my answer once again" in the chain-of-thought; those serve as their "life rafts" which they can use to both save face and change their answer.

show comments

helloplanets

The part about positional encoding is not correct.

> The intuition: instead of adding position info to each token’s vector, RoPE rotates the vector by an angle that depends on its position

You can't rotate the token's entire vector (or all three vectors, whatever is being implied is unclear). You rotate each token's Query and Key vectors only, so dot product can be used to tell how far apart the tokens are when comparing token 1's Query vector to token 2's Key vector.

Positional embedding should just be explained after explaining the Query, Key and Value vectors. When the article explains those only after that, the reader is building up on a wrong intuition and it gets confusing.

show comments

10GBps

I learned TCP/IP by watching and reading raw packets over packet radio at 1200 baud.

I've noticed the same thing is possible if you watch the output of a slow LLM. Eventually you start to see the machinery. input tokens = output tokens, it's math. I can't exactly predict the tokens generated but I can see how they are formed. It's a lot like chess. You can't see every possible move but the mechanism is understandable.

show comments

oceansky

Out of curiosity, I wondered if you could break a tokenizer by introducing weird characters not mapped to an id.

But apparently, they either just emit a [UNK] token or translate the unrecognized character into raw UTF-8 bytes.

alecco

A better blog on Transformers: https://www.aleksagordic.com/blog/transformer

vocram

Saying an article is of inferior quality just because editing was AI-assisted is like saying a book is lower quality just because it was printed rather than written by hand

show comments

andai

I couldn't load the article directly due to an SSL issue, so here's the archive link:

https://archive.ph/aWtFG

zenfoxai

Nice article but chain of thought is what makes frontier LLMs smart, not really the token loop

show comments

agumonkey

Nice intro, gonna help me dig further a lot now. Thanks a ton.

whyage

Style nit: the transitions between dark-mode text and large diagrams with a snow white background are jarring.

AltruisticGapHN

I don't like how most LLM explainer articles and videos say that essentially a LLM " predicts the next word".

I'm a developer but not very good at maths and I still don't understand any of it.

A LLM clearly has some "visual" capacity. You ask Gemini to build something with Canvas and it's able to reason about the shape of things. Like recently I waanted a checkbox that has like a gradient flowing around the edge. It figured out it could use a radial gradient from the center of the checkbox, and overlay that with a small inner div so you only see the edge that looks like the gradient is circling around the checkbox.

How is that "predicting the next word"?

Not saying AI is intelligent or conscious or anything like that, but the algorithm clearly is far more complex than "predicting words".

What I mean, is the LLM is able to represent things in space . That part I don't understand.

I also still dont understand the relationship between the chat based LLM and the multi modal stuff. I think I read somewhere when image is generated it is also tokens?

show comments

yukIttEft

> so the model figures out during training what each token should look for and what it should offer

But how does it learn this token-relationship?

All it has is many text samples, but still, nowhere it says how the tokens relate to each other, so where does this information come from?

show comments

melvinroest

I thought Karpathy’s microgpt explain how LLMs work

show comments

stalfie

This article describes how Transformers work, but not really how LLMs work. Explaining the underlying architecture gives you about as much insight into how a modern LLM behaves as an breakdown of neuronal biochemistry and a few pathways does for the brain. Meaning, almost no insight at all.

rishbz

Great insights. RL training is the key

spacebacon

But how do they “think”? This is the only repo that can tell you that.

https://github.com/space-bacon/SRT

aabdi

this is hard to read...

it goes all over the place.

i'm not actually sure who your target audience is.

there's too many side tangents.

just like, structure it plz.

1. customer feels bad cuz they don't understand how llms work

2. provide high level abstracted explanation (don't dive into concepts yet)

3. provide breakdown guide of overall set of components.

4. walk through each component. don't side track. no need to explain, ROPE,GQA etc... it just distracts.

i.e. customers don't know how llms work, leading them to feel bad about their own intelligence.

at a high level llms take in words, do some math on them, and then produce words, one by one.

inside llms have these different components. we walk through them step by step.

1. tokenizer

2. embedding

3. attention

4. heads

5. ffn

6. sampling

## tokenizer

show comments

mathisdev7

very interesting and useful!

lhd1

find it difficult to engage with AI generated text. What am I getting here that I couldn't get from a chatbot.

show comments

cubefox

We are living in a crazy science fiction world where on the top of the HN frontpage there is an article on how LLMs work which is likely itself LLM generated, and the only way to tell is its writing style rather than its factual accuracy.

lateral_cloud

I don't understand how these AI written articles get so many votes.

show comments

singpolyma3

Next do "why LLMs work"

show comments

codeakki

What's the point of this? Im not here to engage with AI bots

whateveracct

accidentally quadratic

lionkor

It sucks that this article is clearly LLM edited, with common phrases like "same shape as", "the intuition: ", and the "tiny explainer" which clearly generalized from a prompt accidentally.

Good article, but when sharing it I will have to preface "yes it's slop, but it's a good explanation".

Absolutely embarrassing that the author didn't catch that these LLM-isms are a (and here I'll use one) bad signal.

In fact, I would go so far as to say that publishing in this style stems from a lack of reading experience and writing experience, which does not bode well for someone pretending to be an expert. I gave this article to someone highly intelligent who doesn't know the first thing about how LLMs work internally, and she immediately called out that it reads like AI text.

show comments