kfsone

Minor nit: In familiarity, you gloss over the fact that it's character rather than token based which might be worth a shout out:

"Microgpt's larger cousins using building blocks called tokens representing one or more letters. That's hard to reason about, but essential for building sentences and conversations.

"So we'll just deal with spelling names using the English alphabet. That gives us 26 tokens, one for each letter."

show comments
msla

About how many training steps are required to get good output?

show comments