Making a vintage LLM from scratch

mg794613

"The code is semi-vibe-coded with whatever LLM I had with VS-Code and PI (OpenRouter models)."

I appreciate the honesty, but now there's no journey, and that's what I'm interested in. I can ask a LLM myself.

show comments

tancop

> These samples have very good scores overall, but they are useless. I am guessing it's not English text... I counted a few hundred examples mostly from LOC-PD and other few hundred in the OTA datasets. Imagine if I feed that crap to my LLM, what will it learn?

im pretty sure its a real text in Welsh. there might be typos from ocr but yeah thats what the language really looks like, i dont speak it but its easy to recognize.

show comments

dennysora-main

Recently, I started a personal project to build an LLM from zero.

I've spent a ton of time reading up on math, ML, and DL through books, open courses, and papers, while also studying all the major open-source LLM architectures.

Since I only have one DGX Spark machine to run experiments, I can't train a massive LLM from the get-go. Instead, I'm experimenting with an auto-scaling parameter mechanism, which has led me to create a pretty unconventional and fun architecture!

Why go through all this effort when modern LLMs can basically write simple LLMs themselves, and I clearly can't out-compute the big tech giants?

Honestly, it's because I'm obsessed with the core mechanics of LLMs. I want to build something exclusively for myself and hopefully discover some completely undiscovered mechanisms along the way.

Just keeping a record and sharing my progress—having fun with it is truly the biggest reward!

I'll share it when I get a chance!

show comments

croqaz

I am creating my tiny Llama 340M base model from scratch. If you're curious about the steps, challenges and cost, read on. I am still working on the instruct model.

show comments

cyberge99

There are certain things you can only truly learn by doing. I remember doing Linux From Scratch over a weekend and the depth of linux that I still understand to this day.

Thanks for the writeup. A more granular followup would be cool too.

show comments

macwhisperer

super inspiring! thanks for sharing!

rxm

Nice project. I’m curious to see how it writes after instruct.

HexPhantom

Instead of always trying to make models more current and general, there may be value in making them deliberately narrow, historically constrained and weird in a well-defined way