When I saw the DaVinci API in July 2022 I was floored - I realized you'd never have to write a college essay by hand again
Whenever it was Stability's Stable Diffusion appeared - that was ridiculous too
When I saw Code Interpreter for the first time I was obsessed, I said yo codegen is the path to AGI
When I took a crack at solving ARC-AGI 2 using SOTA methods my mind truly opened to the fact that LLMs can reason, albeit through brutal enumeration and discovery
When I encountered Claude Code and Codex as well
Basically ... I've been drinking the kool aid the whole time. It has almost always tasted great. Many times I've retreated back into "oh it's just a technology it has limits" and also sometimes I've lost myself to a touch of "AI psychosis". But overall I have a great relationship with it. It's nowhere nearly as addicting as e.g. internet porn was when I was a teenager. And one gig I had at a Fortune 10 enterprise, our small team of 5 shipped 12 apps in 15 months in an enterprise where typically they ship 1 app and 1 feature per year. This was 2025 ... so clearly we realized we were getting ~10x productivity thanks to Gen AI koding.
Bananas.
FTR I also do not question that we will possibly reach fairly general and yet poorly controllable intelligence with multi agent systems in a few more iterations. I give that a 30% chance of seeing a genuine flash of that at some point in 2027. And 80% in 2028.
I'm not yet afraid of being left behind this is one happy Lobster.
jzemeocala
I bought an Alesis QS8.1 super cheap in perfect condition (was a top grade digital piano/synth in the 90s).
and then i realized that ALL of the software (which i collected from defunct websites and archived on github) related to it was ancient and after a while of getting tired of using WINE every single time i decided i wanted a cross platform modern equivalent that did everything that several of these different programs did (plus break out some stuff that was now potentially possible with modern computer)
i thought it would be extremely hard because the computer to synth communication is pretty much only via sysex commands (of which the actual wave file encoding protocol was undocumented)
Claude walked me through examining the some of the original software in GHIDRA, and I had a working demo that night.....now im just playing with adding new features to it.
show comments
awbvious
Not sure, but I can tell you what my "oh s** astroturfing is so bad, it's even in Hacker News" moment. And if I learned GenAI was used to make some of the astroturf, that's more an "ah s*“ than an "oh s*“ thing. I mean, the prominence, ubiquity, and breathlessness. One out of three, sure. Two out of three, maybe. And some corpo shilling definitely happens here. But this is like, well, covering an entire area with artificial grass, to the point where nothing lives. Crazy.
show comments
andrewthornton
My furnace went out during the 2025 holiday and I couldn't get an appointment with a repair person for 2 days. It was getting very cold in my house so I went into my attic and made several videos of the furnace attempting to start and gave it to gemini. It diagnosed the issue immediately and had me spin one of the components (a small exhaust fan) while the furnace tried to fire. It came on immediately. I had to do that several times, but it worked until the HVAC service showed up.
show comments
shreddude
I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.
I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.
show comments
jackdoe
I have had many, but the last one was quite funny:
It fixed my printer after dist-upgrade and separate chrome upgrade, the printer worked everywhere but not in chrome.
After 30 years of using linux I didn't even want to know what is wrong, is it colord again? dbus + cups issue? I completely accepted that I wont be able to print from chrome for a couple of months until next update.
I just ran it in dangerously-skip-permissions mode and said 'my printer doesnt work in chrome' few minutes later I heard the printer printing "This is test" and it said 'I think its fixed, do you see a page coming out of the printer now?'
show comments
angusturner
In 2017 I worked tirelessly with my colleagues to implement and replicate the first transformer paper.
Yesterday I left Opus 4.8 to go do some architecture research, with GPU access.
It replicated and trained a credible baseline. It implemented some ideas I'd been thinking about, and wrote custom CUDA kernels for them. It read and summarised dozens of related papers.
It has since run dozens of experiments, with minimal supervision. When a model is unstable it kills it, documents why, fires off a new configuration.
The realisation that frontier labs are doing this at scale with unlimited GPU and token budgets.
It actually scares me a bit. The realisation that the next big breakthroughs will only have light human involvement.
The prospect of recursive self improvement feels more to real to me all of sudden
show comments
plagasul
Several. Yesterday a friend with no prior coding experience or knowledge showed me an app he initially built to help him study for public administration job positions. The exams for this positions are public (spain), but the tools are scarce, expensive or he did not like. So he used lovable, then switched to web gemini and claude, then paid claude. He now has +130 very active users on an initial free tier, while he figures out. The app is on github, runs on vercel with supabase, react, tailwind, bun... he has no idea what he is doing. I even installed claude code for him, got him an ssh key so he can do it locally, etc.
Another: claude code cracked for me some software that was calling a home that did not exist anymore via headless ghidra.
Another: I am a teacher, and qualifications and feedback is very very time consuming, specially in loose workflows with several sources and tools that are not connected. During class presentations I take loose notes. Now I have a local folder where I drop my 1 student list, with names and emails, 2 my loose notes, and 3 a qualification & feedback sheet model; then claude creates a sheet per student, formats and copies the feedback to the right sheet cell, waits for my corrections, then sends everything to their school emails. Much easier, much less time consuming.
show comments
jp57
Actually seems absurdly simple now, but sometime last year I was trying to figure out what I'd need to tow my daughter's car cross country with my truck: what are the trailer/dolly options, what do they cost, can my truck actually tow the combined weight, etc.
I started out prompting ChatGPT kinda how I would with Google, one small prompt at a time, asking about various details. But after one or two of those I just tried "I want to tow a car of make A with my truck model B, from point C to point D, what are my options?" And it wrote me a report with comparison tables and computed towing weights and other details for different options.
At that point, I was like "Oh. This is different. And it's just the beginning."
show comments
loudmax
For me it was torrenting a 7G ball of weights leaked from Meta and running alpaca.cpp (an early variant of llama.cpp) on my desktop computer in early 2023. I started asking it questions about the Roman empire and it answered me in English! The responses were generally incorrect, but no worse than what your average American college student might guess at, though delivered with much more confidence.
This was my desktop computer responding to questions in English, not some fancy server in a massive Google data center. Who cares if what it says isn't reliable? Being able to converse with my CPU in English is like having a conversation with a dog!
show comments
SubiculumCode
For me it was right at the beginning. They said it was a dungeon game. It would describe a room, etc, and I would take some action. But I thought that this dungeon was built in some intricate database. But then I told it that I wanted to leave, got to an inn, where I flirted with the bar waitress, and soon we were watching the sunset in some meadow. As cheesy as that was, it was then that I went "oh shit" this is a machine that can respond to language with language in a way that simulated actual understanding and intelligence, concepts and schema, and everything else, and I knew then that the world would never be the same again. People here talk about the crazy things they solved with AI, and I get that...but the first time I actually talked to a machine and didn't feel like it was either random gibberish or scripted, but dynamic and responsive. The first alien I ever met, and he knew my language.
show comments
monuszero
We had a monthlong sprint adding robot motion planning features to our codebase years ago, and I was never satisfied with the result. As a small team wanting to leverage oss we vendored in OMPL, did the usual thing around caching and roadmap management. I knew there was a way to parallelize some of the algorithm we were using with simd or a gpu kernel, plenty of that in the literature, but it was never worth fighting CUDA or metal/accelerate or whatever for uncertain gains.
So when cooking dinner one night, I set opus 4.6 on a from-scratch native and accelerated roadmap planner implementation (after previously porting IK, FK, collision checking with some success) I had primed it by having a research agent drop a literature review in its docs folder covering the type of planner we needed. By the time the pasta water was boiling it was done- getting plans in a few hundred ms compared to several of seconds on our good old fashioned OMPL code.
For me it was the revelation that the economic value of cooking dinner could be compared to tackling an honest two weeks of coding work. The calculus has shifted - work that was once a risky or extravagant use of time is now worth considering.
For a small team who wants to focus on substance rather than implementation, knows what they want, and how to set up the agent for success, it’s a complete game changer in terms of what we can take on. Incumbents beware
AussieWog93
Literally just last night I have Claude Code the following prompt, verbatim:
"Whenever I launch Kodi on my Chromecast 4k, it crashes. I think this is
related to a plugin or skin. It goes away for a bit if I clear cache but
will eventually come back. Can you connect to the device via
adb (I've run adb connect already), and debug exactly where it's crashing?
Once you've done that, propose a solution. If this requires downloading,
fixing, rebuilding and then uploading the broken extension via adb, don't be
shy. I should have Android dev tools (Gradle etc.) on this Mac."
Lo and behold, without human intervention, it pinpointed the crash, downloaded the Kodi source, patched out a bug that had existed since 2016, recompiled it, signed it, then pushed it to my Chromecast all while carefully making sure to keep all my settings intact.
Got it to make a PR too (which is as of this moment unpublished; going to test more over the coming weeks).
show comments
raesene9
The one I remember most is, when experimenting with Opus 3.5 for the first time, I asked it to generate a Firecracker backed local VM creation and management tool, something I'd wanted for a while but not found.
My expectation was that it might get something barely functional but would probably fail, and instead it generated a working piece of software which achieved a lot of what I wanted.
That definitely made me realise that, for at least some classes of software task this was a major change in how things could be done.
More recently when I can give the model a Local Privilege Escalation PoC in Linux and ask it to test whether it can be used for container breakout and then generate a working container breakout, all in one prompt... that definitely changes things.
evdubs
I tried to see if an LLM service provider could rewrite some legal docs where nothing was hallucinated in order to follow a consistent format to see what may be missing in the document. It could do that.
Next, I wanted to see if this could be done with a local LLM. Gemma-4 handles this fine with an 8GB video card and a large context (128k).
Next, I wanted to see if the model could also OCR these docs and translate them. The same model can handle that quite well.
This was when I realized LLMs should be great for handling work where:
- I already know what I want to do
- I already know how to do it
- I don't think this task will help develop skills I find to be valuable
- If I have to do it manually myself, I will probably cut corners
So now I view LLMs through the lens of, "what work can I send to an LLM that I otherwise would not really care about doing."
show comments
ozgung
For me it's not about the capabilities but what they can be used for. Think of all the recent drama between Anthropic and the Department of War. A real wake up call (especially if you are not a US citizen). Proves that AI is essentially a Surveillance and Warfare technology (which justifies the big valuations).
AI automatically analyzes all your social media posts in your life and can generate a pretty accurate profile about you in a second. We have no privacy anymore. Social media sites like Reddit already do that for moderation. Others do for more sinister reasons.
Note that Profiling is illegal in many countries. But laws can't protect us anymore.
Yes, it was always possible to that manually. But with AI it's so easy, fast and accurate to do in large scales. A hacker having access to your computer, reading your mails and messages is one thing. An AI reading and analyzing all your mails, messages and data is something different. Doing this for whole demographics (Cambridge Analytica style) is at another level.
tempoponet
I can actually use and enjoy Linux. The "year of the desktop" never came for me, but instead I got the "year of the cli".
For 20 years I've used Linux in one form or another, but I've felt like I was kneecapped for the most basic things. Just trying to plug in an external drive or a second display meant hours of stack overflow and pasting commands I didn't understand.
Now I'm using several Linux machines for Steam, NAS, local LLM, development, and what used to derail a weekend project now amounts to a coffee break while Claude figures it out.
show comments
kstrauser
I have a large token budget as part of my work. A coworker was scanning some repos for vulnerabilities as a test. He found a scary looking remote exploit in a popular project and shared it with me for a second opinion. I spun up a local instance of the project and ran the POC against it: nothing. Turns out it needed some configuration knobs tweaked to lower some security protections.
So I told the AI what happened, and asked it to fix the POC so that it would work with the default configuration. It chewed away at that for a few minutes until it cheerfully patched the POC into a weaponized version. I ran it. The local instance, which I had just downloaded, compiled myself, and launched with the default config file, immediately crashed.
I got the cold sweats. I've read this novel. I've seen this movie. Wow. I have a blinking cursor on the console of a nuclear information bomb. I tossed and turned all night, got about half an hour of actual sleep, and probably looked like I'd seen a ghost at work the next day.
On the plus side, it gave our team some very clear ethical and moral guidance: we're going to do this, and we're going to share our findings with the relevant authors, because we can. Because I want to live in a world where the good guys are trying to fix problems before the bad guys can find them, I decided to help build that world. It was like, well, I guess this is what I'm doing now.
show comments
UncleOxidant
I guess I've had several of those moments over the last year and a half. But a recent one was that I was working with Claude to create a spiking neural net MNIST classifier in an FPGA for a demo. Claude took it from concept to PyTorch, to training (training a Spiking neural net isn't necessarily straightforward - that's a whole post in itself, but Claude came up with a working solution), and then to implementation in Verilog and through synthesis into the FPGA. I asked Claude to create a drawing app to run on the PC side that would allow the user to draw a digit with a mouse and then click a classify button. The data from the digit drawing app was to be transferred via USB to SPI to the FPGA. I didn't have a SPI adapter yet (it was on order from Adafruit) so I asked claude to let me communicate with the simulated verilog code running in the Verilator simulator, through a virtual SPI interface. Then I went to lunch. I came back to see the digit drawing app displayed on the monitor. I drew a '2' and it classified it as a 2. In another window I could see the Verilator simulator running and the data being passed. Chills.
alexfoo
Someone in the house pressed the button to update the printer (Brother DCP-L3550CDW) firmware and the CSV page that was the basis for an existing Prometheus exporter (drum/toner lifespan, page counts, etc) stopped being a thing. Instead there was an HTML page with all of the information buried in various divs/etc.
I'd planned on writing something myself to parse the HTML and write a suitable exporter but I thought I'd give Claude a chance.
In a sandboxed VM I gave Claude a single static HTML file of the status page from the printer, also in the directory was the equivalent of "hello world" in Go, literally just the minimum needed to do `fmt.Printf("OK\n")`. The directory was called `brother-exporter`. That was it. No other instructions or information. I hadn't told it what it needed to write. I hadn't said what it should do. I hand't told it what language it was supposed to use.
Just by doing a `/init` in that directory Claude decided that it needed to write a Prometheus exporter in Go that would fetch and parse the HTML file from a printer (defaulting to 192.168.1.1) and then present the associated metrics in a way that they could be scraped by Prometheus.
It did this flawlessly in about 10 minutes.
I could have done it in several hours but this was definitely an "oh shit" moment for me. I think the biggest thing was the fact that it guess/assumed so much (correctly) from so little information in the beginning.
mindcrime
I don't remember one specific moment, but I was fairly impressed with ChatGPT from the first time I started interacting with it. Was I ready to call it "AGI"? No, absolutely not. But it was clear that it was something new, and it was also intuitively obvious to me that "this AI is as bad today as it will ever be" and that predicting the rate of change would be difficult.
The more I use these things, the more I'm 100% convinced that it makes sense to say they are "intelligent" (for some meaning of "intelligent"). AGI or "human level intelligence"? Still no[1]. But some kind of intelligence. And I'm quite happy to allow that there can be "intelligence" that doesn't work anything at all like human intelligence, so arguments of the form "this isn't real intelligence", etc, etc. carry very (very) little weight with me. I've actually been sitting on a half written blog post on this very topic for a while, titled "The Marquee Sign Says 'Artificial' Intelligence"[2]. Finding time to finish it has been the challenge.
And before somebody says "Use AI to write it for you". Nah. I am generally what you might call "pro AI" and / or an "AI enthusiast" but I still draw lines. I'll use AI for research, for outlining, for brainstorming, etc. sure. But I have a hard-line stance against letting AI fundamentally write for me. I want anything that goes out with my name associated with it to have my genuine voice.
[1]: I like the term "jagged intelligence" that Demis Hassabis has been using. That is to say, the bounds of the intelligence are jagged or spiky: very intelligent in certain areas, much less so in others.
[2]: for any old-skool pro-wrestling fans, yes, that is an intentional nod to "Double A" Arn Anderson and his "The marquee sign says 'wrestling'" catchphrase. :-)
chaoxu
I'm a researcher working in theoretical computer science.
Chatgpt found a counterexample of some conjecture I've been trying for 2 years. Also, it one shot many problems I've worked on. It also improved some of my work greatly.
I feel quite useless in the sheer brutal proof writing, counterexample generating skill chatgpt is demonstrating, and wonder what would be the future of my profession.
show comments
mlmonkey
I have a buddy who's a consultant. His niche area is Netsuite and Oracle (I think). He's an accountant by training and as a consultant his gig was setting up these instances for clients, charging them an arm and two legs. He'd spend a lot of time golfing, and doing these setups was more than enough money for him. In other words, he had cornered that little slice of the market and was making bank.
Shortly after ChatGPT 2.2(?) came out and hit mainstream, I was chatting with him (I was excited af about the possibilities of AI). He tried to pop by bubble by saying "I bet it can't do what I do for my job!".
So I decided to test it out. We went home and I pulled out my laptop. Went to chatgpt.com and then I asked him to enter the specifications of what Netsuite configuration he wanted. So he proceeded to type in the description of what he wanted, the various settings, configurations, etc. i.e., the specs that he typically gets from his clients. And asked it to give him the commands to set it up.
Lo and behold. ChatGPT came back with a series of commands that he needed to run; the options he needed to configure, etc.
He was crestfallen. "Those are the exact commands I run!"
Luckily for him he recovered. He has since settled on a small stable of clients, all privately held companies whose owners he knows and between them he makes enough to keep his golfing hobby fed.
show comments
vitorbaptistaa
I am the CTO of a small NGO (10 people total, only 1 other junior Dev at the time). We supported two apps that were built by consultants. They were a mess. NextJS, React, about 4 micro services for a site that had 50 users per WEEK.
I configured a devcontainer with the old codebase and an empty repository and asked Claude to rewrite it as an old school server side rendered Django app.
Went to sleep. When I woke up it was 80% done. Spent another couple days prompting and reviewing and reached feature parity.
A bit later did the same with the other app.
Now both are deployed, reduced the server costs, complexity, and are orders of magnitude faster.
Without AI agents we wouldn't be able to do so (as usually is the case with tech debt).
AI is amazing for small organisations!
aswegs8
Kind of peculiar and memorable story for me.
I was on the couch on my Nintendo Switch, playing around with ChatGPT 3 and asked it where to find a specific item in Zelda Breath of the Wild. When it provided a coherent answer I was just dumbfounded. To be fair, the answer was semi-hallucinated but partly true. But it made me realize what kind of breakthrough it must be for some program to provide an answer to this without searching external sources (which it couldn't do yet). Such a small data point, like a drop in the vast sea of human knowledge space.
Prompted me to do some back on the envelope calculation. The weights of this model were a few hundred GBs. I just realized what kind of quantum leap it was to compress this seemingly infinite knowledge space into a few GB of weights.
show comments
simonw
ChatGPT Code Interpreter back in ~March 2023. I uploaded a CSV file (of police incidents in San Francisco) and watched it load that into Pandas, show me some charts, then export the data to a SQLite database file for me to download.
I write software for data journalists and this new thing appeared to be able to do everything I wanted my software to do just as an unplanned side effect of having the ability to run Python against a folder with some uploaded files in it.
With hindsight it was my first exposure to a coding agent, but we hadn't named the category at that point.
show comments
nativeit
When I saw that on the second day of token-based pricing I’d already consumed my usual monthly spend on GitHub Copilot. That’s when I fully realized that it would never be economical, nor useful, to solo shops like mine.
adamkf
I'll give you two:
The first was when I first realized that I could tell codex to use gdb to debug a core dump. This was about a year ago, so it made a bunch of incorrect theories, but it enabled me to go much further than I would have been able to go by myself. I eventually solved the problem.
The second was when I decided to ask it about my Linux Wi-Fi issue that I had been having for several years. The computer would infrequently have multi second pings and dropped packets, then go back to normal. I thought it was due to the weak signal, but after describing the problem to codex, it immediately disabled power management on the Wi-Fi interface (this is a desktop computer, so I don't care much for that anyway) and the problem has never come back. I had been dealing with this for years, and I had tried searching for a solution before, but codex just solved it directly.
dang
(1) Watching it do log file analysis in seconds that would have taken me hours (edit: days really), and which I would therefore never have done in the first place.
(2) Helping me with optimizations that I had been putting off for years because they involved learning curves that I never had time to take on.
(3) Tracking down bugs in code, especially race conditions and other concurrency issues, that were otherwise baffling.
There have been others, but those are what come to mind - perhaps because, in each of these cases, it made something happen that would otherwise never have happened - not because it was impossible, but because the time and effort required was prohibitive.
show comments
robkam
My skepticism turned into a realization when I first asked an LLM to write anything nontrivial, and it just breezed through it. I am curious why many projects mentioned here seem to take people only a few hours or a weekend at most. I have been using LLMs to help rewrite the Ytree file manager originally written in nineties C. While the AI enables creating code of this complexity, the project still demands months of persistent effort.
binarysolo
I run a remote-first ecom business with a dozen or so team members.
About a year ago, one of our account managers had a life issue, ghosted us, and she held a fairly critical role in the business and gate-kept a bunch of knowledge to some high value vendor accounts.
Because we ran our ops in Google Workspace, we essentially had off-the-shelf RAG and was able to get answers to a lot of things by asking Gemini to go through all her emails/docs/calendar/meetings, reverse engineer what she did, and create an onboarding doc for her successor.
This happened once more a few months later when one of our analysts broke his wrist on vacay, and we were again able to replicate what they did to cover for their absence, this time dabbling in AI agents ("gems") to do a bunch of the regular simple tasks and again it covered things without too many issues.
I def expect Amazon/shopify to at some point replace all of us brand owners with AI bots if they can, but we'll see how long the gravy train goes on.
show comments
PopePompus
I had an old astronomy app I wrote for pre-iPhone app store era Nokia phones (N900 etc.). I decided to get Claude code recreate it as an Android app. The old app produced several display pages for things like the positions of the planets. I was having Claude code recreate the app display page by display page, describing the display that should be produced, with no reference at all to the original app's code (or even its existence). After having it reproduce several pages, it added another one unprompted. The page it added was in the original app, but I had not gotten around to adding it to the Android app. The Nokia app's code is still on github, and somehow Claude must have made a connection between what I was asking it to code (without ever mentioning the Nokia app) and my github repository's Nokia code. It correctly implemented the page without me even mentioning the missing page. My jaw hit the floor.
fulafel
When I realized they're going to be largely powered by increased natural gas use in the USA, neatly combining with our biggest problem so far (the climate catastrophe).
tern
Opus 3.x building me a productivity system with Obsidian MCP originally.
Next was discovering "create a mathematical model of the problem and derive the solution as a result" type prompts.
But, the real "oh s**" was a longer process of spec'ing a compiler/runtime for real-time DSP (with a lot of novel ideas) and it actually working.
My sequence was: (1) if helps me understand myself, (2) if helps me put together good ideas, (3) it can generate novel ideas given the right inputs, (4) it can build useful tools on my machine, (5) it can compound good ideas into better and better ideas with repeated passes, (6) it can build significant, ambitious machinery that's way beyond my ordinary capacity.
Current frontier: it can compound large codebases into better and better machinery with repeated passes.
The key thing I track is whether I'm running a process that converges and compounds or whether I'm spinning in place / diverging.
show comments
djfergus
I had an old 1st gen Amazon Firestick in a drawer for years, it had updated to the latest software and there were no public root exploits.
I spent a day bouncing between Claude and Codex and they researched, downloaded kernel sources, tried exploits and eventually got root via "FBUF/VCHIQ kernel zero-write primitive to patch live kernel memory". I was able to make the root permanent, debloat the amazon apps, downgrade the firmware etc.
It was amazing to watch and made me excited for the future where more hardware (old and new) will be available for repurposing.
show comments
bonoboTP
The big one was definitely ChatGPT upon release in 2022 and specifically when people showed how it can role play as a Linux terminal and you can narrate events like "the data enter is now on fire" and "run" nvidia-smi, it would show high temps on the gpus etc. Or you could "explore" the homedir or some famous person. It convinced me that if it can understand so well how terminals work, tool use and agents are around the corner.
Then Opus 4.5 convinced me that this has finally arrived. In 2022 I expected things to arrive faster actually, in 2023-2024. I expected we'd have much more realtime collaborative integrations with AI including GUI computer use. Maybe in 1-2 years.
For images, it was nano banana where I realized AI images can truly work, and all these adhoc issues like hands and limbs, or "it will never do horse riding a astronaut" were temporary. It's now clear that making feature length films is within reach. Not in one go but with an agent orchestrating, designing a screenplay, characters, shots etc and generating those. Whether the result will be worth watching or a flat story on the high level is another question. But it will be a "film" for sure.
show comments
CompleteSkeptic
I helped train some of the first "magic" models at OpenAI[1] and it was a wild ride. We were a pretty sane + skeptical team and we weren't totally convinced the models were as general as they seemed, but the query that convinced me (and later got included in the paper[2]) was "Why is it important to eat socks after meditating?" (something that almost certainly did not appear on the internet before).
An interesting follow up would be when did you realize GenAI wasn't as good as you thought in that "oh shit" moment
I was trying to figure out a nightmare bug that only happened in production and Claude code was able to connect to Google Cloud and read the logs in real time
I recreated the bug in the UI and it was instantly able to see ion the logs what the problem was, then because it had the context of my whole codebase it was able to point me to the exact line of code causing the problem
That was certainly an "oh shit" moment
conartist6
When LinkedIn filled up with 1000 copies of what seemed like the same exact post: 20 lines long, breathless, declaring humanity over.
I thought, "I will never let myself become a zombie like that. I am me. I am worthy of my own respect"
tliltocatl
Still haven't had one. It is impressive, it is sometimes useful, it will be insightful (once the smoke settles), it is nowhere close to become self-improving world-as-we-know-ending ultimate solution to every problem it is being sold as. And much of the progress we have seen so far relied on tons of natural data being available thru the Web. After LLM killed SO, where would we get the answers to train LLMs on?
hgoel
I've had many, but a recent one was when I figured I'd try asking Claude for help with my attempts at learning to draw, specifically anatomy.
I uploaded one of my sketches and asked for feedback, expecting it to not be too useful, but it actually pointed out many issues that no one had ever pointed out to me, but perfectly explained some of the things that felt off to me. Out of curiosity I then also asked it to label the issues in the sketch. It wrote a python script with the coordinates to put everything at and labeled the sketch that way.
I'm still used to vLLMs not being that great at vision, so it was pretty surprising to get genuinely useful advice.
show comments
mbo
Look, not to brag but DALL-E's "armchair in the shape of an avocado" was mine (https://openai.com/index/dall-e/). I remember trying to convey the gravity of this capability to my friends at the time, who I guess were not as impressed as me.
show comments
dannyobrien
I got early access to the pre-ChatGPT OpenAI API (actually by pinging someone from OpenAI who posted about it on HN). At work, we were setting up to play a livestreamed JackBox game for a charity event. This would have been in 2019.
In a previous life, I'd been a writer for the original You Don't Know Jack game (the UK variant), where the job was to crank out as many funny quips about a topic as you could, and then use a handful of them in the recording of the game itself. Some of the later JackBox games are like that, but for the players -- you're given a set piece, have to come up with little funny improvisations within a time limit.
As an experiment, I tried the set-up lines with the OpenAI API, and see whether it could come up with some responses. Of course, 90% of them were unfunny or incoherent, but 1/10 were not bad, or even pretty good.
I'm not sure that would have been impressive to anyone else -- but remember, I'd had this as a job, and sat in a writer's room, where everyone did this, for hours. In that environment, you expect a large proportion to be duds: the discipline is keep pumping them out, and not flagging creatively until you find a rich vein. I realised that this was a tool that would have been the perfect complement to that work -- and it was a pretty good JackBox player too.
show comments
rerdavies
Working on a Spice compiler to convert schematics for classic guitar pedals into real-time executable code.
I provided a reference to a The Spice Manual 2nd ed. a page number and an equation number, and asked Claude to implement it (not really expecting it to succeed).
It proceeded to implement not only the equation, but the calculation of the Langrangian of the functio, another 30 lines below, which required taking symbolic partial derivatives for a not-at-all trivial function, and successfully figuring out which variable was which in the resulting matrix. The source material just said "Lagrangian of", and did not provide the partial differential equations. And then providing a comment that identified the page number and equation number in the source text for the "Lagrangian of" equation.
show comments
vishvananda
For me it was earlier this year when I started dusting off some old stalled projects and had an agent work on them. In a few days I:
* Built a clone of the Alpha Zero implementation[1] my team built at oracle
* Ported my hobby NES emulator from javascript to rust[2] (this actually took less than 30 minutes and worked on the first try)
* Implemented all of the lessons from the C++ Grandmasters Challenge (which eventually led to a complete c++ compiler[3])
The thing that flipped the switch was using it to build things that I actually put sweat-equity in to previously. I knew how hard these things were to build, so it landed in a way that other projects had not.
Most of the time using LLM generated code the feeling is "Oh Awesome!"
My "Uh Oh" feelings are weeks later when I realize there is a subtle bug in what the model presented as test passing "awesome" that I didn't read closely.
The biggest uh-oh is when I get lazy and let it modify multiple files and make many changes at once, and YOLO because I didn't fully understand what it did. I can usually get away with that for frontend, but for data manipulation tasks if I don't understand it, it's likely not what I wanted and I'll be back again in weeks or more trying to figure out what changed.
That's more or less what life was before LLMs and copy pasting from StackOverflow. Most of the time if I didn't fully understand something, I knew I had to eventually get back to it to grok what changed before committing.
Now with LLMs the 'copy pasting' is much faster and handles boilerplate super well letting me focus on edge cases.
tmaly
It was last Summer. I was at an AirBnB and the fire alarm system had a fault and kept beeping.
I took a picture of the panel and the AI was able to diagnose the issue and tell me how to temporarily disable the beeping sound.
I knew nothing about fire systems. I had the owner call a repair person the next day to resolve the issue.
Recently I was trying to find a matching stain for wood flooring in a house build in 1999. I uploaded a clear picture in bright sunlight and ChatGPT was able to search online and find a matching stain color. It presented me with ordering options and I got a quart delivered yesterday.
I have been working on my own variant of OpenClaw written in go. I got the voice mode wired up a few weeks ago and it just started having a conversation with me. My wife freaked out and was asking who was talking to me.
amarant
I had Claude build a private podcast station for me. It integrated with Gemini to create a script for the show, based on a topic of my choosing, each talking segment ends with a presentation of the next song, which is played via Spotify, and is selected to have some sort of tie-in with the previous discussion. A tts model generates audio files based on the script, and a playlist is generated to play local file audio segment, then Spotify track, then the next segment etc.
An AI made a program integrating with 2 other AI, it's AI all the way down! and the result is great! I'm learning so much by having my own private radio host speaking about topics that interest me.
show comments
takee
I was working on a science experiment (electromagnetics) with my 10-year-old kid that was going to be demonstrated at a science fair in his school. We ran into a hiccup with the experiment that we couldn't debug ourselves. I turned on Gemini live video call to help us root cause the problem. It was able to clearly articulate all the possible issues and eventually was successful in making our apparatus work as expected. Turned out the wire that I was wrapping around the screw had some insulation that was not scraped off well on the side it was connecting to the battery. Gemini was able to capture this detail even though my bare eyes could not. My kid and 2 of his friends were impressed not just by the experiment, but because the live audio/video back and forth we had with the AI was almost magical!
show comments
nrjames
We were experiencing abnormally high electrical bills and I could not figure out what was happening, so I downloaded the granular usage data (15 min increments) from Duke Energy, explained what we had in our house and when we typically used those items (washer/dryer, EVs, etc), provided a rundown of our energy usage plan, then asked Claude to build me a Streamlit dashboard that would help us understand what was going on and predict what was going to happen over the next months. The dashboard had a few simple toggles a levers. Claude was basically able to one-shot this, knew how to manage the XML from Duke Energy, etc... In about 20 minutes of prompting, I had a very comprehensive dashboard that was extremely helpful not only in diagnosing that specific issue but also in helping us understand how to further lower our electrical bills.
show comments
a96
There never was one. I'm from computing science field and it's all been and is normal. Amusing, maybe, but normal. Same as before, but in larger scale, with occasional hype. People picking up useful things and using them. Some going insane.
If I had to pick a surprise, I think the music generation works better than I'd have expected at this point. Only better for funk, but still.
tejohnso
I didn't have a slightly panicked moment, but sometime in the last year my approach to programming changed.
When starting a project, I used to think about how I was going to structure it, how the large pieces would interact, how some of the details would work out, and then I'd work through alternatives and consequences on my own.
Now I don't think about it on my own so much as have a conversation with an LLM about it. And it's great because it can quickly gather information from various sources, I can ask it for links to canonical sources, I can ask it about trade-offs between alternatives that I might not have considered, and through conversation, I end up with a more detailed analysis.
Then as I work through the development, I keep my new agent partner in the loop for discussion, suggestions, and troubleshooting. It can't be trusted completely, but it's certainly reliable enough to be considered a useful tool for my purposes.
I went from thinking it was an interesting toy to play around with, to completely integrating it into my work flow, and that change seems to have happened very quickly.
idopmstuff
Two of them:
1. ChatGPT 3.5 wrote me a script to pull some data out of Shopify and write it to a Google Sheet. Nothing remotely impressive by today's standards, but I had just commanded a computer to write code in plain English and it worked!
2. I own a bunch of e-comm brands, and with every new image model I tried to get product photography. Nothing worked until Nano Banana Pro, when suddenly I gave it a crappy iPhone pic of a product and got back a fully usable whitebox photo of it. Then I tried making the sort of infographic-style images you usually see on Amazon, and it nailed those too! In hindsight they weren't perfect, but more than good enough to use. I was about to ship that product to my photographer, and I would've had my designer make the infographic images, so that was the first time AI actually replaced a human contractor for me. Pretty big "Oh shit this is going to seriously impact employment" moment. Wrote about it here: https://theautomatedoperator.substack.com/p/ai-just-took-my-...
nemo1618
The first moment I specifically remember was writing a test of a new RPC protocol back in 2021. There were no agents yet, only "AI autocomplete" in the form of GitHub Copilot. I wrote the "server" half of the test, which received a name and responded with "Hello, <name>". Then I wrote the client code to send "world", and Codex suggested `if response == "Hello, world"`.
I was floored by this. How could it have known?!
We have come so far in such a short time.
Const-me
None so far. When I try to use these language models in the primary areas of my expertise like SIMD or GPGPU they fail to do any good. When I ask them to implement some general-purpose stuff, the output is too low quality to be useful in my software.
Still, find them incredibly useful for code review (despite unable to write good C++ or C#, smart enough to detect issues there), also dealing with technologies outside of my area of expertise like Python or web stuff.
nwhitehead
(Spouse's story)
Today I used Claude to diagnose a blocking bug in a Steam game I really wanted to play. It took it 18 mins, but it unpacked the Godot package, figured out the bug, proposed a fix, and gave me an in game workaround.
I didn't have to do anything! Claude figured out the structure of the .pck file by using `strings`, then wrote some Python code with some magic Godot-specific code to unpack the specific chunks it needed.
plumefar
I had access to a repo (from a closed startup) with 800K lines of python & C code, written from the 90s to today. They had some very interesting approach to a specific chemistry problem. 20-30 years of work of several persons.
But God, I could not understand the code, and I could not easily make it work with modern technologies (GPU etc).
So I used Claude and Gemini to reverse engineer the codebase, extract the core ideas, and rewrite it from scratch with modern frameworks (with guidance from the original authors)
It took me only 10 days to have a functioning equivalent, in 10K lines of code (using many libraries that did not exist in the 90s and 00s), which I find much easier to understand, even though I wrote none of it myself.
10 days to rewrite 20-30 year of a few persons. That was quite scary.
encrux
Back when GPT-2 was released, I tried figuring out how to fine tune it. I found a google notebooks template, scraped a bunch of data from r/ChangeMyMind and asked it to change my mind on different topics.
I was dumbfounded that it actually tried doing that. Obviously GPT-2 wasn’t great at it, but the writing was on the wall quite literally.
Unfortunately, I was too broke to invest in stocks, but I did pivot my career quite a bit.
2cynykyl
I thought mine was when claude found a very subtle but important bug in some open source LBM code I was using. It ground at it for hours and didn't give up until it found it. (Back when claude was cheap!). I recently had a my ACTUAL moment at a conference where the presenter was pitching his book about "One shotting scientific code". He has cooked up 60+ prompts that get you functioning simulations and put them into a book [0]. It floored me to realize I could have just ask claude to write me a whole new LBM solver instead of finding that bug! That raised the bar for me a lot.
Was the early ChatGPT. Someone on the team showed off a poem about postgres in the style of the King James Bible. Totally blew my mind.
xtracto
I probably will be burned for this, but with the help of an LLM I wrote a tiny program that captures video from a browser screen (Xbox live online FPS game), passes the video images through a small trained NN that recognizes people forms and presents the video on another screen. That way I can place a green overlay on enemies and they are easier to see on PVP matches.
All that in around 100 lines of code, including the training/fine-tuning of the tiny YOLO nn.
show comments
jonyt
Two things, both from this week.
First, I asked Claude to write an article based on an idea I had about WWII. In a passage about the futility (from the German side) of the Battle of Britain it wrote: "The Luftwaffe was fighting to unlock a door that opened onto a wall." I couldn't find any mention of a similar metaphor, and I think it's a great one. Claude has really improved its creative writing skills lately, I wonder if it's an artifact of improvements in other fields, or if Anthropic is working on it specifically.
Second, Claude, with access to DataDog and a code repo, managed to find the reason for a bug, propose an effective temporary fix and a permanent one in code. To be clear, this was something that had multiple engineers stumped.
show comments
hparadiz
Been using it to manage an estate and just being able to shove all the documents right into an LLM and have it spit back out perfectly worded emails as well as keep track of check lists of things I need to do with an automatically create a ledger for me in sheets. It's been a huge mental load off and I've instead been able to focus better at work and the labor costs saved to me have been immense. Just on this one little thing. I'm one of those people that over thinks correspondences and letters and it ends up causing me to be stuck on something so being able to ask for just the right wording has been super helpful to me.
marcus_holmes
I took a photo of my ailing plant and claude advised me on how to get it healthy again (and how to take a cutting and nurture that).
This is some science fiction shit. I get all the coding stories, but that's a computer talking about a computer, it makes sense. Showing my computer a picture of a plant, and it not only recognised the plant, but diagnosed it and knew what to do... blew my mind.
show comments
altairprime
Cuil Theory, in 2008, was my Ocelot Six moment.
Once I realized how well AI could babble given the entire internet to date’s data, and after seeing a talk by Google about their ten-year plan in 2003, I started winding down my social media, stopped posting photos to Flickr, and removed the indexes to my blog archive so that only posts with permalinks from other sites would be discoverable. Skipped Instagram entirely in the process and have never regretted it.
Google bought Cuil, of course.
linsomniac
Last week I gave Claude Code in Ultracode mode the prompt: "I want a browser-based retro game inspired by Spy Hunter" and gave it the URL to the Spy Hunter (Arcade Game) Wikipedia page.
What came out has a lot of problems and needs refinement, but you can definitely see a lot of elements of Spy Hunter in there. I haven't worked on any refinements yet, because I've been low on tokens this week, but for the first thing that popped out of Claude this is pretty impressive (IMHO).
So many. First was when I saw GPT-2 create jokes that were original and kinda funny.
Most recent: I use Claude Code and have a convention where I grant various levels of autonomy during a session. I got bored recently and just let it keep running with an empty issues queue, essentially telling it to do whatever it wanted.
It did a bunch of repo cleanup, then it kept suggesting to end the session, but I just kept giving it autonomy prompts.
It started a creative writing public repo and wrote a bunch of stories, essays, and poems. I did not prompt it, at all, to do that. Some of what it wrote is quite good (IMHO).
notthetup
Had some unique concert audio recordings which had gotten corrupted when I moved the files during a backup. I had tried looking at the files and trying to recover them. It felt like they had the data but no software could play them.
Sat on them for 5 yrs. Finally decided to try if AI tools could help. Tool Copilot 20mins and a lot of mucking around with hex dumps. First couple of times it got a semi working solution (only first few seconds of a file were playable). Finally managed to recover all the files.
Kon5ole
From actual use I've not had a "oh shit" panicked moment yet. More like a bunch of "Holy shit" euphoric moments.
So far I feel like I as a developer have gained actual superpowers, and can deliver results that make my stakeholders slackjawed with awe. I love it.
It will last perhaps a few months more, then they'll expect it. Delivering more features faster will be the new normal. But I think system developers, as in people who actually like to deliver new features and systems, will still be the ones doing it.
Fundamentally I think LLM's just change how to make information systems, they don't change who has the inclination to make them.
MBA's making excel sheets that do more than excel was ever intended to do has given programmers lots of work over the years. Such solutions identify a need for a properly designed system and frees up the budget to hire programmers.
If the same MBAs start vibe coding, I predict we will get even more to do, for similar reasons.
I may be horribly wrong, and if the day comes that I realize that it will be the "oh shit" panicked moment. So far so good!
show comments
radial_symmetry
Very early on, when Github Copilot was brand new and the first AI autocomplete that was in the IDE. I had a file TODO.txt, and was adding a line, and it suggested a next feature that demonstrated actual understanding of what my app was and its purpose, despite me not having documented that anywhere.
hypendev
Back in the times of GPT3 text completion, right before the API came out, a contemporary art museum asked me to collaborate on a project. The project was supposed to include a chatbot, and I was like okay I can probably hook something up.
Then I remembered the "text completion LLM thingy" I saw on HN, and tried it out in the playground. Once I gave it an IRC style example of a conversation to complete, I was like hm, this could work. Then I figured out I could "sort" people into different groups based on personality using the same text completion engine and some answers they provided. Then I noticed I could have it provide me with JSON directly.
That's when I realized how big this could be for code and data analysis - even tried to convince an at the time cofounder to pivot into AI coding, but to no avail.
Once the API was released and the art project chatbot got launched (and the theater show associated with it, which even won some awards), people who used it loved the chatbot, got into heated arguments with it, tried to teach it things, talked about their lives and were sad when it didnt remember something.
That was when I understood the social impact this could have on people - they really behave like its a person on the other side. They show interest, think it displays emotion, try to entertain it, be polite, ask about its thoughts and hopes and dreams. And even when they knew they were talking to a machine, they were still trying to be friends and make it happy, which was quite beautiful to see.
Later on, I had a third oh shit moment - once the 3.5 API was out and about, I prototyped a Rust code generation harness for a client, akin to a primitive claude code. That was the "I'm getting a bit worried" oh shit moment, and it caused a lot of reflection and thinking about the future. And I happily welcome it.
show comments
segmondy
Running local LLM in 2023 and I heard folks talking about interfacing LLM to tools. I wrote a system prompt and told LLM it can call some tools. If it wants to call a function to output func(params...) and do so in an XML tag. I provided a few examples, none of this JSON soup we get today. Then told it I'll provide it the result in a RESULT XML tag and it should use that to answer. Wrote up a harness around that and I had a local model interacting with the outside world. Oh wow! Everything else today about MCP, Agents is all an extension of that thought. Using function calling, I built an agent. I defined a data structure that represent rooms and how they are connected. The room will be marked as dirty or clean. Then I would place the agent in a room and the agent will decide if to go left, right, down or up and into a room. Once it got into a room, it would decide if to clean it or go to the next room. Repeat until all rooms are clean. Basic toy of CS101 AI vacuum agent. It worked!
So being able to get real world input/output to the model and having the model being able to make decisions in a loop and to be able to do it locally. I have been screaming like a mad man ever since.
hmokiguess
I think I had a few but they’ve been all short lived and superficial, time made them quickly irrelevant, there was a lot of hype, drama, FOMO, and propaganda around it. That said, I think recently my newest one has been using Voice mode during a car drive. It is very good, like, no latency and it understands nuances of speech very well. I’m convinced voice is where we should be doubling down in terms of UX for the next generation of workflows.
niwtsol
To share something different, it is less about what I have built, and more about what I have seen my friends (non-technical and technical) build. In a one month span I have seen a lawyer make a personal red line tool, a sales guy make a custom website for a golf trip, another friend make a 3d printing grid-finity project, a friend make a stl file to print a jig for his table saw, and another friend make a full mobile game. It is just really cool to see these micro-projects be created and shared, not only for the utility, but just to see my friends' childlike excitement showing off their project.
WhompingWindows
Reading a dozen comments here, the AI seems to blow peoples mind most often in domains they're less familiar with. Repairing furnaces, HVAC, towing hitches, camper van interfaces, printer debugging. It wasn't the user's career to do these things, it gave them a bump from very novice to intermediate level.
irthomasthomas
My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.
dddw
This week.
Have been playing and testing with openrouter, claude gemini for years.
Small program here, bash script there, ansible playbook.
Fine, nothing I cant do, but saves some time boilerplating. It needs quite some steering.
This week i took my mediawiki from 2005 (actually submitted as my artschool thesis). Which was of totally outdated.
In 20 years time i always said to myself, i should restore it, and do all the upgrade steps. Tedious work, and very fault prone.
In 1 hour chern with 1 plan, in 8 steps i had a running and up to dat version.
I'm still not convinced AI is intelligent, but it's definitely not stupid, that's for sure.
dekoidal
When I read that Microsoft gave OpenAI billions of dollars worth of data centre access and OpenAI accounted for it as billions of dollars worth of investment. When they spent the tokens Microsoft accounted for it as billions of dollars worth of income. Both companies gained billions of dollars with mad up money
slicktux
Mine is just running a model on my laptop. It’s just amazing! I can ask it pretty much any question and it replies relatively FAST!
Before, we lacked advancements in technology because we were limited by hardware. This advancement is the opposite: our software and the math/algorithms have brought us this.
eqmvii
Some business users spent ~30 minutes on an internal process, and we prototyped an "Agent" in Slack to take over. At first it didn't work, then it didn't work some more, eventually it ALMOST worked. Then one day, it worked, and the old business process died never to be revived.
Now it sits in a slack channel, and I watch it doing work, responding to ambiguity, and taking feedback/edits all day. It's unreal. It's literal magic. It saves a HUGE amount of time and gave us a pattern to do more.
This is the real deal. It's not easy to find problems with the right shape, and it's not easy to build agents that fit even when you do... but once it clicks, it clicks.
acrinimiril
Two things:
1) I wanted a harness for running BPC.EXE (the old Borland Pascal 7.0 Compiler) and I asked Gemini 3.5 to build it for me using the unicorn engine. It whipped out a working .py file easily under ten minutes. Most likely five.
2) I handed a random assembly function from the OS/2 1.x kernel to Gemini 3.5, and it proceeded to tell me that it was related to disk I/O and partitioning, without a single associated string, and it annotated it all, including the relevant structures it was addressing.
a_bonobo
At my previous work, I was collating somewhat random unconfirmed animal sightings. I also had a separate database of animal occurrence probabilities (species distribution maps). I'm not a statistician but that sounded like a clear job for Bayes theorem: given a sighting and the overall probability of that sighting in that area (species distribution map), and some other assumptions about the noise of the sighting, what is the probability that the sighting actually included that species?
Claude asked me three questions and then wrote a beautiful Python implementation that queries the map and spits out a table of adjusted probabilities. Felt immensely powerful - I can do this 'on my own' now, I don't need to wait to find the right people or learn the right thing first.
jerome-jh
Recently, Claude (through Copilot) found a hardware issue on our product. I was asking it to find an issue in a specific feature of a device driver, that could cause what we observed. It determined the feature was correctly implemented.
Then it hinted that depending how the hardware is implemented, it could cause the observation. It turned out the hardware was implemented as suspected by Claude.
I was already convinced it knew the codebase, somehow, more than I do. Now it is just as if its knows the product and its use as well.
For me it was Suno, not any of the coding tools. I prompted it to write a song about my family's little dog, told it a few things about the dog, and it came back with a K-pop-style anthem that had a super catchy melody and lyrics that made my wife and me laugh out loud.
Writing code to spec is one thing, but creating art was always supposed to be what separated us from machines. (I suppose I need to preemptively acknowledge the "it was machine-generated so by definition cannot be art" point of view.)
show comments
solomonb
I gave chatgpt 3.5 the type signature for a co-algebraic encoding of a mealy machine:
newtype Mealy s i o = Mealy { runMealy :: (s, i) -> (s, o) }
And it gave a really impressive analysis.
Then I scrambled all the names and asked with a fresh context like:
newtype Foo z e g = Bar { blob :: (z, e) -> (z, g) }
It got completely confused and generated a bunch of non-sense. It was at that moment I realized that LLMs don't really understand anything.
And yes I understand that a newer model would not get confused by this.
show comments
gagabity
Fixed a nasty bug in one of my tests where a mock in a completely different test I had never worked on was incorrectly setup and intercepting my mocks, I don't think I would have found it ever because the amount of effort it would have taken means I would have needed to move on to some other way to test.
Reverse engineered an old audio recorder USB driver which only works in windows 7 and also reverse engineered the custom audio encoding the device uses and the software to convert it to a standard wav file. This took recording the USB traffic with Wireshark for each function in the original software in a VM then disassembling the various dlls and exes and driver files and feeding them into Clause step by step.
That AI button in DataDog not only diagnosed the problem across micro services but also created a fix PR. I think we might be unemployed soon.
tverbeure
The fact that it completely autonomously read in a 5 MB firmware image of an old piece of test equipment and generated a Python script to generate license keys:
When we had to have a frank discussion about whether to fail someone who obviously used an LLM for parts their dissertation.
show comments
hansvm
A coworker had me work through a particular problem (some no-importance web demo) with Cursor and Sonnet 4.6. It still sucked, but there was a qualitative shift in suckiness, one that I realized could finally be used to solve some real problems I had if I wrote an appropriate harness and used good enough models.
I still find it mandatory to write a lot of kinds of code by hand, but I write a lot of code with agents too now, and I previously literally didn't think that'd happen in <5yrs.
paulbjensen
I would say the first time I did “vibe coding”, when I tried Claude Code with Zed’s agent integration in January this year.
I wanted to see if I could build an image editor for isometric graphics using HTML5 canvas, Svelte, Vite, and the. Rather than do all of the skeleton code setup, I figured “why not try and see if Claude can build the app scaffolding?”.
I gave it a prompt and watched it produce the scaffold, along with a few features I outlined in the prompt.
When I booted the app and saw that the features worked and that there had been an element of design to the layout, that was my mind-blown moment. In a period of about 45 minutes, I added some features and had a basic MVP at the end. I walked back home stunned.
Built a physics-based dynamic digital twin for an electrolyzer system with full equivalency in thermodynamics, fluid dynamics and electrochemical reactions. A similar level of complexity is usually available in software like Aspen or Siemens which are a quarter million dollars license/yr.
Insane.
mschaef
This is a small one, but significant to me.
I asked Claude to add support for multiple lights to my toy ray-tracer. It correctly added the support and then suggested adding colored lights to make it easier to diagnose. It felt more like a colleague making a useful suggestion than any sort of pure engineering tool.
neom
When I tried, just for fun, to put together an MVP of a fully autonomous business, I wanted to see how far it would go, when I got it generally working to around a 30% level I stopped because it was enough to see people would make a concerted effort to build this for real. HN was not impressed, heh: https://news.ycombinator.com/item?id=44143928
justinmarsan
Being self taught, there are lots of things I never formally learned, rules I know from the rule of thumb, and not the deeper knowledge... So I set out to learn the root of what can be used to measure good robust code... Spent an hour asking lots of questions, learning about LCOM, Halfstead, why circular dependencies are bad, and so on...
The next morning I figured the same LLM could compute that on my code, so I asked it to make an agent to do so, and report issues to me...
And then I ran that agent with next to no changes on a feature that had grew organisally over the last months, that I knew was messy and sometimes difficult to work on, despite being unable to precisely say why... And it did tell me exactly why, and proposed changes to improve stuff, and then implemented them...
Up until that point, I'd felt like the LLMs always produced bad code, that worked for a specific feature but often broke stuff or evolve poorly over time. Then I realized if you had the LLM do code improvements, it could do that fairly well too...
fergonco
When I tried pi.dev (I only used chatgpt before) and told it "add all this scripts I developed over the last couple of years to automate my job as skills".
I love to automate things in bash scripts and these llms just can use them very effectively. It was also surprising how they derive knowledge from those scripts. If you get A from a B uuid, they kind of get the relationship. I am super vague in my request and this thing knows what I am referring to. After some months it's still mind-blowing.
show comments
hnfong
I have no idea why anyone (especially those here) would be dismissive of genAI from ChatGPT(2022) onwards.
It was obviously a new tech, and was obviously good enough that more resources would be invested to improve it, and it really amazes me how tech enthusiasts would just outright dismiss these early iterations of genAI tech.
I personally was fascinated by the developments and was grateful to get to directly watch history unfold.
I'm still unsure whether the tech would be a "net positive" for the world, but shouldn't prevent me from recognizing its power.
ElFitz
First one was Stable Diffusion. Especially the image to image, and the first gos people had at making videos with it.
Second one was trying to bootstrap what would come to be called a "harness", back in 2023, initially serving as the go between between api calls and file edits, feeding back the logs and gradually stepping back as step by step the llm bootstrapped the cli.
And finally, using Claude or codex to do ops work. Diagnosing issues on my machine, provisioning servers and VMs via ssh, debugging them, all on its own.
ioman
Mine was using VScode with copilot. Previously I had used tab completion and thought it was pretty neat. This time I began with the comment for a function I wanted to write. And the entire function just appeared below the comment. Written probably better than I would have. I remember saying, “uh-oh” out loud.
meken
Early on in my ChatGPT usage, one of my messages got interrupted/cut off (as happens occasionally).
My first thought was "oh they're going to need to add a UI feature to allow me to click and tell them to continue the conversation".
Then I realized I can just ask the model to continue, obviating the need for a button.
That was a pretty mind blowing moment.
thenoblesunfish
When a junior engineer first sent me something that looked good until I realized it had been vibed, and thus their understanding of what they were doing was too shallow to answer questions and improve on it. That was a doc, but it happens with everything. "Oh shit", I say, as everyone is aggressively encouraged to work this way.
threecheese
I had an issue with installing OpenClaw, and it helped me debug the failure and get itself working. I had to sit quietly for a moment. No reading docs or inspecting the system, just “what’s wrong here?”.
While I didnt find a use for openclaw, it opened my eyes to the potential for distributing software which, once bootstrapped a bit, can interrogate … itself, understand its own requirements, communicate with the device, and become operable.
Add capable small models to the mix, and it’s almost frightening what good (or malicious) software might be able to do.
zthrowaway
“Farewell to stack overflow” juxtaposed with the realization that AI only knows what to troubleshoot and how because of stack overflow…
fabianholzer
I did not yet have a positive "oh shit" moment, but when the corporate manager types that could not deliver a "Hello world" if their live would depend on and would have had a sour look on their face when asked to pay license fees for a proper IDE a 10 to 15 years ago started pushing it hard, way before any but the resume-driven engineers: that has flipped a bit in me.
cjbprime
ChatGPT reconstructing idiomatic Python source code from Python bytecode was definitely up there. That is not something humans have written a great deal about online. It requires simulating the Python VM.
I remember also having a massive wtf reaction to realizing that original ChatGPT was pretty good at decoding long random/unique base64 strings.
abecedarius
AlphaGo. Reinforcement learning on math with proof assistants was clearly going to be workable after that, even if not right away.
matheusmoreira
Pretty much immediately after I asked the LLM to perform a complete code review of my projects. I've been programming alone for years, that alone was life changing for me. It only got more impressive from there.
show comments
lodovic
The first time I pasted a screenshot of a PR review thread, adding just "I had some review comments, fix them" - and it perfectly solved everything, made small commits, and pushed it upstream - this was such a shock.
I now try to keep pushing the boundaries and see where it stops understanding my intention. Give it impossible tasks, gigantic projects, complex architectures. Last result: I wrote a complete OS including MPI, TCP/IP, and a GUI from scratch in only a week, while investing just a few hours a day in it. It even runs Doom!. Coding as a profession is over, but there's such a difference between the result if you approach this with a professional mindset, that I think the software engineering discipline can still provide massive value.
ramon156
I've let it do some commands against a local NUC before, just to see if it knew why something didn't work (it would've taken me ~15-20 mins probably. Not too bad). It took ~18 seconds to think, then ran two commands, and noted what the issue was. Even a 10 yr old could understand what the problem was.
I realized that LLMs were pretty good at calling the right tool, and running the right verbose command to figure out what and how.
Kind of like finding a specific SO post that had your exact problem, and the solved comment is heavily upvoted
madrox
I think my favorite early story was when OpenAI launched deep research. I was going to an event that I was headlining, and I gave it a CSV of the attendees and asked it to give me a small background on each company they represented.
When people introduced themselves to me, I knew a little about their startup. Felt magical.
show comments
brailsafe
Not sure that I've had it yet, although hypothetically I'm sure it would probably be something similar to the examples of writing new software for old hardware mentioned ITT. The idea of resurrecting useful but unsupported gadgets that would otherwise become e-waste is something I've always found compelling.
Problem is, I just don't have enough old crap, and if I did, I would have a hard time justifying the expense, because that money could maybe just go toward a more intimate tinkering process.
For everything else, I either haven't had any sufficiently interesting ideas, or they ended up not being worth pursuing with those tools or at all.
When I do have success that I'm happy with and care about, it's a slow process that I ultimately need to know the details of anyway, but otherwise it's a bunch of luckily narrow work-related scenarios with well-documented constraints. Nothing's really been that shocking though.
The shocking thing to me is how unrewarding most of the successful tasks have been, partly because they often create unnecessary work and partly because the type of thinking required to massage or evaluate the result is much less stimulating, and there's much more of it in aggregate. It's fine if it's something like generating a UI from scratch because that hasn't produced dopamine in a long long time anyway
dirkc
I started to look at LLMs not as writing code, but rather as predicting what code it would expect someone to write given the context.
For some people that matches their expectation or they don't really have an expectation. While for other people it doesn't match their expectation.
autonomousErwin
I had 2 MacBook Pros. One 2024 and one 2019. The 2024 one would connect fine to the internet, the 2019 one would not.
After pasting in the airportd logs of both (into ChatGPT and Gemini) it found it was down to band switching (2.4GHz and 5GHz) through some really old error code.
This fixed a problem that had plagued me for >12 months. Really magical feeling it got in on first try.
lukan
2 years ago I played a bit with the abandoned source of
a flash like editor for the web, that I found promising.
But doing it manual, was too much work, outdated and broken build pipeline, stuck on an older node version, deprecated and abandoned dependencies .. so I stopped the experiment.
Then I gave it a try with claude beginning of this year.
I remember not expecting anything, but did a bit of steering the direction as I knew the source a bit and let it mostly work on its own - and then it said it is done and it works.
I didn't believe it, but it did. "Can you add this feature?" Yes it could.
Since that experience, I have a hard time taking people serious, who say AI is useless.
thallavajhula
I wasn't impressed by the LLMs up until January or so when Claude Code swooped in. Until then, I felt like the LLMs were slowing me down. I have been using them for a couple of years now for coding at work, but I never really thought they brought in real value. Then in February I worked on a 1-month-ish project timeline and shrunk it to 3 days and that was it. I didn't write a single line of code in that project and I went all in with Claude Code. That was it, _the moment_ of realization. I was thoroughly impressed. I went from nothing to a tool that served several teams. Now I'm starting to see the cracks in LLMs and I'm slowly getting back to picking which task to offload to AI and which ones to do by myself.
Claude is great at coding. That's it. Outside of it, it's just god awful at pretty much everything else. ChatGPT OTOH, is good at coding, but at everything else, I find it brilliant. Gemini never made me want to stick with it. It's good, but never great for my use cases.
lordnacho
For me it was gradual, then sudden.
I liked using the early models to do autocompletion. It could do a leetcode style thing, pretty nice, but only useful for small things.
Then I sought out Cursor because that seemed to be able to do multi-document edits. Not bad, but models at the time (2024) still got stuck pretty often. So, cross-document autocomplete. Useful, but definitely within the realm of "nice shortcuts to have".
Then a friend (who works in AI) told me to try Claude last year. I was on holiday at the time, but I spun up my work repo and looked at the backlog.
It chewed through the entire 6-9 months of estimated work in a two-week period while I was watching that Lord of the Rings series with a friend (we watched an episode or two in the evenings). I just chatted with him about the series while checking the progress every few minutes. It was a huge amount of refactoring, and it didn't get everything right the first time, but it made enough progress that it could be directed the right way.
Since then I have hardly coded any manual lines. I just tell Claude what to do, with very little harness (skills, MCPs, instruction files), and I get what I want.
jFriedensreich
I had a pretty involved cross module state bug with complex dependencies and also reactivity issues interleaved. I tried fixing it multiple times manually with 4h time box as well as claude models up to opus 4.6 high and codex 5.3 all which failed. When the GPT-Pro model came out i heard it was not supposed to be an everyday coding model but tried anyways as it looked impressive. It took a single 8h run burning 200$ with doing nothing but occasionally waiting for test runs or me writing “continue”. After 8 hours, and fearing i wasted the money, the bug was consistently fixed, not just one edge case that triggered the behavior.
show comments
bsiverly
I had it fill out all the forms to appeal my property tax value. We created an assessment of what my San Francisco property should be worth using deep research. The city agreed and a $12k check arrived shortly after.
base698
I asked the OpenAI playground to compare and contrast the themes of Point Break and Fight Club. It did a bang up job and blew my mind. I then realized it basically worked for any of the scripts I had for my dev environment too. Fixing and expanding capabilities I'd wanted to had but never had the time to implement.
jerieljan
I remember in the early days when I was just trying out ChatGPT on a phone for the first time (this was around GPT-3.5? GPT-4o?) and snapping a picture of our fridge that's full of magnet souvenirs and asked it to identify all the places we've been in and it gave a nice list of what it saw and the places that were featured.
Did it get it fully right? No. But it was one of those "oh wow, you could do that?" moments for me. There's obviously a lot more "oh shit" moments as time went on, but it was a neat little moment.
bag_boy
I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.
It was unlike anything I had ever experienced.
My wife was unimpressed lol.
This was 2022.
ako
Probably over a year ago, when I first saw reasoning in action in a debugging session: it generated some code, ran it, could not explain the results, then said “let me add some print statements to debug”, reran the application, read the logs, and then stated “now I understand why it’s not working”. Plan, do, check, act in action, AI engineering its own context, and generating the missing information.
imetatroll
Maybe my daily work is rather mundane compared to most people who frequent HN but I am able to create, think about, refine and then go through review cycles at least 2 or 3 times more quickly than I used to.
And software that I can imagine I might want to "make" or have at my fingertips is readily available even though I have a busy schedule with very little free time!
Also, I love feeling like a manager whose direct report actually does what I tell it to. Crazy good feeling.
tobyhinloopen
A non-technical employee of a client vibe-coded an app and I was asked to review and deploy it.
It was okay, not bad at all. No serious issues.
At the same time, me feeding a whole PDF of feedback from a client - screenshots and such - into Claude, and it fixed everything after 7 hours of reproducing and fixing things mostly unattended, creating a bunch of MRs with fixes. Most fixes were good, some were obviously not what the client wanted but technically correct (which I told Claude and it fixed it)
show comments
variodot
For me, it was during an on-going incident in a failing IoT OTA service which was growing in priority; taking two items I was unfamiliar with and bolting together new OTA mechanism via alternative SMS provider. I'd never developed in .NET ecosystem before and happened to gain access to another team's Twilio account in a prior week, so took a shot, planned interfaces to extract and implemented alternative Twilio implementation + feature flag
Normal software instincts plus access to a different service flushed the buildup of OTA's and lives on as a fallback mechanism. Amazed me going from idea to execution faster than I could have ever dreamed of even on-boarding myself to the area or environment.
hatthew
I'm kinda of surprised that so many here on HN were dismissive/unaware of the capabilities and potential in the DALL-E days and earlier. I feel like this is the sort of forum where most people would be both aware of advancements and aware of their potential.
My moment was GANs and GPT-2 back in 2019. I feel like that's where computer-generated media went from "obviously fake" to "sometimes can be mistaken as real." RLHF for LLMs and diffusion for image generation are both important improvements, but I feel like they aren't fundamental prerequisites for they type of stuff we have today. I think the main advancements since then are just marginal improvements, larger models/datasets, and better surrounding tooling.
rref
My ducted gas heater wasn't working where I live and I took a photo of the wiring diagram and had Claude step me through troubleshooting it with a multi-meter, and got it fixed.
chpatrick
For me it was the original DALL-E project page.
qnleigh
They've been coming faster and faster for me. First I was blown away by GPT2, specifically the fake news article about talking unicorns. Just stringing together a few sentences while maintaining logical coherence was very impressive at the time.
Then it was models like Minerva that could actually solve math problems, and the discovery that LLMs were one-shot learners and could write code.
After that, the improvement felt pretty steady, with IMO gold feeling like a watershed moment.
And recently OpenAI's solution to the planar unit distance problem is starting to actually freak me out a bit.
csr86
I was working on a project for 2 years with about 5 engineers. It was many years before AI. It was new subject for our team, and we were pretty sure it was possible. Turned out it was not.
Much later I asked AI if that kind of project is possible, and it immediately explained why it is not. Would have saved 2 years of our time...
show comments
jmpman
Had an AI plot movie rotten tomato reviews versus cost for 2 adult tickets, plus candy and a large popcorn prices from the specific theater, and the round trip gas from my cross street, including only movies which would get out in time that I can be home by 10pm, including preview times.
None of that is mind blowing, but that Google or some other site has never offered me this type of analytics, is where I'm floored. It's a trivial query, but perfectly useful for planning a night out with my wife.
dtgriscom
A friend had the power supply die on his high-end turntable. He took a picture of each side of the supply's PCB, handed it to Claude, and it gave him back a schematic.
show comments
ChicagoDave
The second I realized it removed nearly all blockers as a bootstrapped technical startup founder.
Claude wiped out the need for web and mobile development resources. I bought a Mac-Mini and had iOS apps up and running in days.
Sobrino
I worked in an AI (or well ML) consultancy before the ChatGPT moment. I remember we had a project where we had to extract a large sum of documents (country wide, terrabytes of pdfs of scans). We had to set up a pipeline that looked a bit like this.
Download pdf of scan -> Tessaract to get a text layer -> Clean it up with a language specific BERT model -> detect paragraphs of a certain type -> Look them up against a database we build with scored similar paragraps -> Do recommendations.
The documents were not standard and a lot of them were historical documents and handwritten or with scratched out text with corrections.
We had student workers spending days labeling the data.
It took us months to get it all working with a high accuracy. We were so proud.
Now you can do it all with a prompt and a ChatGPT call.
show comments
sothatsit
I gave GPT-4 some source code and my existing tests, and asked it to write a new test, and it did it! It didn’t even run straight away, I had to fix it, but it still blew my mind.
Later, I wrote a ~5k line proxy for work in C, and gave the whole thing to ChatGPT o1 and asked it to review it. It found several real memory bugs, and now that service has been running since with no problems.
Just this week, I was trying to write a greedy solver to pick the best subset of block sizes to keep from a larger sweep for shorter testing. Opus 4.8 suggested that this could actually be solved as a MILP problem, and found the perfect solution in 5 mins. I’d never even heard of MILP before.
block_dagger
I wanted to add gapless playback to an audio archive website I maintain. I tried myself before any of the popular LLMs were available. I failed. I then tried with the first LLMs that came out. They failed. Then, when the first Claude Opus was released, it succeeded. I now have gapless playback.
dash2
I asked it to prove the theoretical result in a (published, prize-winning - though not really for the theory) academic paper of mine. The proofs hadn’t been that hard objectively, but they’d taken at least a week. I fed it the model. It got the correct basic results in about 5 minutes.
show comments
grumblepeet
My bath hot tap suddenly broke apart and was spilling hot water into the bath. I photographed everything and ChatGPT told me step by step what bits to get to fix it, and how to reassemble it.
A few weeks later some kids in the area were bending the wiper arms in cars in my terraced street, including my car. I thought, I wonder if ChatGPT can help? It explained to me where to get the parts online, an indication of a decent price, and how to fit the replacement parts.
In work we had struggled with filling out the myriad of forms that we need to do to get enrolled on a government framework to apply for contracts. Not only did it do that and explained what we needed to say, but it also told us in detail the steps we needed to follow to get the certification that was a prerequisite. It has genuinely transformed our business as a result.
yearesadpeople
Genuinely surprised of the breath and level of interaction with this post.
It would appear - perhaps we have data to back up? - a distinct _'flavour'_ of post are becomming dominant.
A shame.
bluejay2387
I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.
show comments
HlessClaudesman
I was sitting on a cafe listening to a podcast where I heard about a sci-fi author banging out 40+ books per year. How are they doing that?, I thought. Either a team of ghost writers, a boat load of cocaine, or they are using AI.
So I decided to test the frontier of AI, this was back in the early chat GPT era. I downloaded the app and proceeded to go through aln the steps of writing a novel, outline, summary of characters, plot summary, draft chapters, finalised chapters. I had an unedited manuscript by the time I was thinking about my 2nd coffee. It was a terrible novel, but it did have flashes of brilliance that could be harvested and iteratively shaped into something better.
I proved my thesis that AI could mass produce fiction at scale, and If I had a boat load of cocaine the AI and I could probably output 40 books per week.
oidar
Opus 4.6. My standard battery of questions included solving an ascii maze (20x20 grid) without using a script, using only "thinking" as a tool. It was the first model to be able to solve it. It was the first model that really appeared to be able to reason spatially.
show comments
KaiserPro
I've had a few.
The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.
This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.
The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.
ben_w
I had a lot of such moments, including:
• Most recent, I had the option of either buying an app from the app store to train myself on the piano, or vibe coding a web app to connect with an attached MIDI keyboard and accept an uploaded MIDI file and give me an experience like Guitar Hero, and Claude did this in two prompts of their free (not paid subscription) tier, where the second prompt was just the word "continue".
• First demo of InstructGPT (predecessor to ChatGPT), because I remember how much worse the state of the art in NLP had been, and because I hadn't expected instruction following from the quality of continuation seen in GPT-3.x
• 2013, word2vec, "man" - "woman" ~= "king" - "queen", again because of knowing how bad the state of the art in NLP has been
(If you're wondering why "uh oh" from that, consider value in automating propaganda, and surveillance opportunities for automating comprehension of slang/cants like Polari).
I gave it a weird and convoluted code snippet, and asked an LLM to step through the execution and trace the value of the variables at each step.
It was completely correct and I realized LLM are capable of generalizing beyond their training sets
steren
The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.
moconnor
Literally the very first time I used ChatGPT. I had already been experimenting with GPT3 for various jokes and games via the API but the naturalness of it as a chat interface that understood you changed everything.
The first time I used a terminal agent was another one.
jasondigitized
First time using Claude Code I was rather impressed by how quickly I was able to build out a website with Vue and Supabase. Cool. So.......I always wanted to create a iOS app but knew nothing about Objective C or Swift or XCode. "I wonder if Claude Code can build a iOS app for me?".
I went from 0-to-1 and shipped a podcast player into the AppStore in 2 weeks. Not a simulated app on XCode.....literally a fully approved app on the AppStore. Claude Code walked me through installing XCode all the way through to running a final audit on the app so I wouldn't get flagged during review. Mind blown.
runfuyngunasdlj
It was when I realized that the collective ethics of humanity was so low that this was actually going to take off.
rjha
I was talking to a software engineer friend for making a demo. This was supposed to be a quick demo and I had sent him 3-4 wireframes. Then I rang and asked causally, "how long will this take?". He said, check back in the afternoon. sure enough, he delivered a full functioning demo in the afternoon. His starting point was my wireframes fed to claude. Wireframes to a working demo in an afternoon. Life has changed, for good or for bad!
VortexLain
Starting with the days of Siri, i've been evaluating all chatbots of that nature by writing them a meaningless string of text and seeing how they answer. GPT-3 was the first system which instead of refusing to answer or answering meaninglessly has identified that the string of text has no sense.
abyssin
I watched a friend generate a 10 pages report based on multiple documents, including scientific papers, and it was almost flawless. It would have taken me days.
A milder version of it was Copilot setting up an environment for a Jupyter notebook. What would have been annoying back and forth between googling and docs went like a breeze.
cdavid
I wanted to understand the implementation of some numerical algorithms, and the tech reports were not enough.
I cloned the repo of said library, gave it claude and asked it to write a new technical report in math notation, but with annotation with link to the code so that I can pick up the details. It basically one shotted the full report and that helped me re-implement it in "pure python + numpy", "manually".
ianberdin
When playing busy Dota 2 (realtime game), it was crashing sometimes. I asked Claude Code any advice (without any hope) and it debugged somehow that I have unstable IP address and a rented VPS server will improve my connection. I could not believe, it worked…
smallstepforman
I had a C++ actor model which required an Api like the following (std::function):
child->Async(&ChildActor::Method, child, args);
Refactored it to use small buffer optimisation and std::move_only_function)
child<&ChildActor::Method>(args);
And saw a performance jump since no more malloc in std::function.
It also helped me decipher an animation bug in gtlf importer.
Productivity is x4 or higher.
xeckr
Literally the first time I used ChatGPT, within days of release. It wasn't so much panic as amazement.
It took HN a surprisingly long time to come to terms with the fact that professional SWE as we knew it was coming to an end.
In 2023/2024 we saw a demo of "denial" being a stage of grief live on this site.
yauneyz
I had it write a short story about Vader and Palpatine discovering the Graham Schmidt process. It wasn't the greatest thing ever but it got the mood right and understood what Graham Schmidt was. It was crazy at the time
nazgul17
The announcement of GPT 3, hands down. That's the day that my mind was blown.
Everything after that has been (genuinely significant) incremental improvements. But that announcement was a qualitative step up: we got ""real"" AI that day, something that could pass a Turing test (as common sense envisioned it, without all the caveats added once we learnt of the genuine limitations of LLMs).
show comments
1qaboutecs
Was trying to explain convolution (of functions) to a friend and I wanted to build a little picture. I typed more or less nothing into Claude and it gave me a fine web-app for demo'ing examples to my friend within minutes.
Three years ago this would have taken a minimum of three college graduates a couple days -- one to know the math, one to know the backend, and one to know the front-end. Maybe two of those could be the same person on a good day -- none of the topics is individually that hard -- but it's a lot together.
abustamam
I was on-boarding to a new company/project about a year ago. Had a bunch of questions about the system architecture and such, but everyone was firing on all cylinders and couldn't spare much time to answer all the questions.
One coworker took some time to ask cursor some questions, and reported that the answer was accurate (I'm guessing he hadn't tried that before).
That was a game changer. I'd been using cursor for simple autocomplete or brainstorming but now I could have it analyze the entire codebase fairly quickly.
FF to now, I've given Claude Code read-only access to GCP logs and database and it's able to debug entire classes of errors and propose solutions.
nowittyusername
For me it was stable diffusion 1.5. Oh man that thing was the bees knees for mi, imagination on a machine! at that time no UI pure terminal commands, i didnt know jack shit about it and looked like voodoo hacker-man stuff to me... well i persisted anyways because exploring the world of the infinite latent space was amazing. it was like seeing some weard other dimension.. anyways thats how i got addicted to image gen for like 2-3 years. i did it all, loras, fine-tunes, hyhypernetworks, got really technical with it, understood the fundamentals, etc... eventually decided to move on to LLM's as agents were obviously gonna be the future so here i am now building my own voice agent from scratch no sdk, etc... this tech is amazing and i love it. also we are all gonna be fucked because of it but what a ride!
ilaksh
OpenAI already had GPT prior to the ChatGPT launch, and I had not really taken it seriously. But on November 30, 2022 when ChatGPT came out and was immediately popular, I reevaluated it.
I immediately realized that it meant my time as a programmer in the traditional sense was going to come to an end relatively soon.
On December 1, 2022 I created my first agentic coding loop experiment. I launched one of the first AI code generation websites that would generate web pages along with embedded images in January 2023.
zahlman
> a welcome farewell to Stack Overflow.
Nothing will change the fact that beginners have unknown unknowns. They can't solve most of their problems with a chatbot because they don't know what to ask. Maybe they can literally copy and paste in the code with a "help plz" and get a working result, but they won't learn anything from it.
> slightly panicked, "Uh Oh" realization of what these models can do?
No; my panic is about how people are using the tech, and responding to it.
That started with Stack Exchange, Inc.'s ham-handed attempts to force AI-powered features into Stack Overflow, even as the community was rejecting LLM-generated content in questions and answers. Businesses don't care what customers want, don't recognize how sloppy their slop is, and wouldn't try to do anything about it if they did.
Recently people have been talking about code shops accumulating massive piles of technical debt willingly, assuming that the next generation of models will sort everything out, or that humans don't need to understand the code because it will mostly be read by other models anyway. The underlying attitude is not surprising at this point.
laboring1
When I read in Oct 2024 how a character.ai chatbot encouraged a child to commit suicide. Uh oh.
bachmeier
> that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?
Never experienced any kind of panic, only excitement. I told Github Copilot to add documentation to a function and it documented how the code was used even though there was nothing in the function to indicate how it was used. It somehow knew from the code pattern why I was writing that function.
acosmism
Recently purchased an 100 year old home. it was dead in the middle of winter and the house has steam heating which wasnt working. a few screenshots and chatgpt gave me a step by step of which levers to pull and knobs to turn. this was terrifying considering i knew nothing about these systems. it worked!
Nurysso
when my friend cloned my voice rvc or something model from github and was creating bad songs, it was funny but GOD DAMN i got called into HoDs office for that
jb_briant
I'm making a 3D game and I hate flat worlds, a planet is much more elegant, both finite and infinite in gameplay terms since the surface is not expandable, but you can't hit a world border at the same time.
Cartesian coordinates doesn't work well for the player so I wanted a lat/long/altitude grid system.
I could have spent few days walking through stackoverflow and debuging my upcoming flawed implementation.
ChatGPT web version almost one shot the helpers in 2024 and boy, there were a lot of pitfalls.
show comments
card_zero
It was about two days after Google released Deep Dream, if you remember, the thing that took a video and filled it with fleeting hallucinations of mostly puppies, fish heads and lizards. I was suddenly struck by the realization "oh shit, this is much more boring and samey than it first appeared to be", and all subsequent gen AI has been similarly underwhelming.
putlake
I think it was when the LLM asked me a question at the end of its response. It felt like something other than a machine. Until then the pattern was me asking a question and ChatGPT giving me an answer, with or without hallucination. When it asked me a follow-up question it felt like talking to a being with agency. An entity that has thoughts or ideas or questions of its own.
syx
I couldn’t make a Rockbox (the alternative iPod OS) simulator run on my MacBook M2 no matter how many guides I followed, then I fired up Claude code and by modifying the original source code it made the simulator run and I was able to start developing custom plugins for my iPod. It honestly felt great since I only have basic C knowledge.
TripleFFF
Automating my email inbox, I just wanted to split them into folders according to the attachment name but the fields were often incomplete and ended up missing rules, and imap fetch was taking forever and kept failing. In frustration I decided to turn to ChatGPT to split them by messageid which I had never bothered with because the strings were too long to be useful. I initially intended to build a text list of messages and fetch them all one by one but I ended up making chatgpt crush all the instructions into one gigantic python dictionary using the messageid as keys and using it to generate a single pipelined imap call with success flags, dynamic folder naming, cleanup steps the whole works. I was just working on theory of what I knew was possible, and it's the ugliest table you ever saw, but it works and it runs from memory instead of reading and writing values to a temp file and I'd never been able to keep up with that level of nesting before
When I used google to get the ieee-488 commands of an arbitrary wave generator from the 80s whose manual doesn't exist on the internet.
This is a very long tail search, but by the end of the day I had enough to fully utilize a very sophisticated equipment.
show comments
gunalx
Mine was testing out the copilot preview in the early days. Testing how well it knew semi obscure public codebases. Started filling out the first few lines and got the entire document word for word in tab complete.
That was the day I realised the plagiarism potential llms has.
fowlie
I was tasked to rewrite an Oracle Apex webapp. 70k lines of PL/SQL. I asked Claude Sonnet 4.6 to read it all and boil it down to markdown file with business requirements. Took about 15-20 minutes, and I got a 700 lines long markdown file to guide me during the rewrite. I've since had great joy using /grill-with-docs!
joshrw
The GPT-4 demo. Taking a screenshot of handwritten instructions to build a website, along with a drawing of what the website should look like. Then ChatGPT spit out a working prototype.
Also the live video mode demo later that year.
Then the agentic coding breakthrough in Nov/Dec 2025.
vunderba
Honestly? Probably all the way back to when Nick Walton used the computers at his university to train a custom version of GPT-2 that let players experience a completely open-ended text adventure game in 2019.
As somebody who as a kid had tried feeding IF transcripts into a markov model to generate random rooms for an amateur MUD, this was mind-blowing. It felt like I was playing a version of the “Mind Game” from Ender’s Game by Orson Scott Card.
- Low stakes homelabs like automated watering sensors and small switches were rigged up properly wrt code and networking by the LLMs from 2-3 yrs ago. Months of fuddling and half-butting solved in an hour. Those tasks where I'm technical but not in that direction - easy now
- The real one: I'm an eng lead, think Head of X. That job is more about aggregating info across multiple sources, excel sheets, pdf proposals you dont want to write, how to figure out $500k for highly paid appsec engineers. Those multi-hour products of proscratination came together in minutes (goodbye PM jobs), 5/6x highly paid appsec jobs became 1-2x and a bunch of claude or ToB skills (goodbye some amount of eng staffing).
Writing is on the wall to me.
parasti
I asked it how to configure haproxy, a tool that I had heard in passing about, and it gave me back exact working configuration syntax for my use case. Today that seems very mundane, but first time that happened, and I didn't have to google, read docs, or worst case sift through code, that blew my mind.
ml_basics
January 2026 when i started using opus 4.5 and understood that it could do actual useful work beyond coding small snippets
EliRivers
Code reviews. Code reviews in theory done by humans, but containing copy-pasted inane statements of the obvious. Questions that really did no more than demonstrate a lack of context. Code reviews no longer an educational opportunity for the reviewer, a way they learn and stress their own understanding to create a better product and become a better person, destroyed by the siren song of GenAI producing comments that on the surface seem so helpful and sensible.
"Uh Oh" realization of what these models can do?
The code reviews was just how I first saw it, but the rot goes deeper. The "uh oh" was my realisation of how much these can damage people's professional development. These people will never get better at their job than they are right now.
A lot of what else GenAI does is great, but this is an "Uh oh" indeed.
styluss
I work with a Go monorepo and set up Bazel for a couple of services that used CGo. It took a while but was painless to set up.
briga
Maybe when I found out you can use it to run terminal commands, spin up and take down dev environments, and even run other LLMs. Suddenly 90% of the difficulty of onboarding to new repos disappeared overnight and a lot of heavily CLI-based workflows became trivial to automate. Never again do I want to spend hours manually sorting out Python dependencies.
Zambyte
When I decided to run codex with Qwen 3.5 27b running on my local machine. Up to that point the most success I have had was with using chat interferences as a Stack Overflow replacement. That was my first real taste of agentic programming, and it was both really useful (genuine productivity gains) and local.
physicsguy
Coding up a decent performing basic 3D finite element solver from scratch in C++. Still needed to know what I was doing but it’s a non trivial problem.
I still couldn’t get it to do more advanced stuff.
gozjsbtm
When the barriers to actualizing a laundry list of “wouldn’t it be cool to try” dropped was that “oh”. Probably added the expletive when it helped me run headless Blender to rebake texture map and uv unwrap a phone-scanned brown paper grocery bag just so I could find the % surface area covered by ink. It’s more addictive, some might justify as useful, than social media. That is the uh oh.
twooclock
I programmed data export to some xml over a couple of days. Sending xml results via email to an accounting firm for verification. A day after I finished my disk crashed and I lost all my code.
Fed Claude with xml from my mail and... oh shit! ... got "my" code back. (And immediately paid for Claude subscription) :-)
0xbadcafebee
When ChatGPT allowed me to calculate stress and load bearing tolerances for a camper based on different materials, suggesting better designs, with the math and sources to back it all up. Then it helped plan and fill out paperwork for a residential solar project, including full code-compliant electrical work, again with sources to verify. Then there was an open source app that wouldn't run on an old version of MacOS due to them not supporting older OSes, and a coding agent backported support for the old OS and got it up and running.
agnishom
For me, it was GitHub Copilot in 2021. It could autocomplete my Haskell code based on my comments.
dgacmu
I suggested to a masters' student that a problem we were working on would benefit from analyzing it mathematically. He brought an incorrect solution the next time we met, and on a whim, I asked Gemini to do it. Gemini got it right. I started looking for more ways to use it after that.
mikewarot
I tried to get it to generate code to program one of my BitGrid simulators, and it kept producing code that failed, over and over. It was then that I figured out that it can only do CRUD apps and the like, things it's seen over and over in its training data.
It's useless for most of what I want to code.
show comments
maxwellg
Pre-GenAI I wrote a new interview question for a role on our team. As far as I know, the question was never made public. The interview required implementing a pretty basic CSS-in-JS utility in vanilla javascript. We instructed the candidate read the MDN documentation for the CSSStyleSheet interface, and then gave them a public API to implement. Passing implementations usually consisted of a ~10 line for loop, and was really just a test of whether a developer pick up and work with new libraries on the fly. Still, the interview probably had a 30% pass rate.
On a lark, I asked ChatGPT to complete the interview question in late 2022. I would have hired ChatGPT back then based on its first response! It was easily in the 90th percentile of responses I have seen.
zhoBEENG
It was when I first saw an LLM reliably make tool calls to bash.
bjackman
My "I saw this very early" claim deserves some skepticism, but...
Don't y'all remember GPT2? When they published that AI-generated unicorns-in-the-Andes article, my jaw was on the floor. I remember very clearly thinking "oh, history is now divided into the time before this moment and the time after it".
There's been a long series of "oh holy shit this is USEFUL NOW" moments in the last 2 years but none of them compare to that first moment. The day before, I didn't know if real AI was possible. Then one day it was suddenly clear that it was. And if you'd been thinking about AI at all it was obvious that if the technology was at all possible, it was gonna be a really fucking big deal sooner or later.
hilti
Claude helped me to rewire my first digital Märklin model train. It pulled the documentation of the control keyboards 6040 and told me how to wire them properly to the routers.
And I restored an old vintage amp with the help of schematics, multimeter and Claude. That was really cool.
hannahstrawbrry
Had an issue in a project where multiple media files with the same/similar names were colliding. After spending hours with chat gpt wrangling python scripts to try and sort it out programmatically, I shifted gears and built a web tool that would allow me to manually review the content and select the correct media file to associate with it in about 5 minutes, allowing me to comb through and finally fix the issue & verify the content was correct in about an hour. It made me realize I needed to completely re-think how I set about solving problems now that I have an entirely different set of tools to develop- that has been the biggest "Oh shit" moment for me, looking into the mirror and recognizing how AI will re-shape me as a developer.
TheOtherHobbes
There wasn't a specific moment, but I started trying to debug code and deal with general tech error messages. Suddenly something that could take hours turned into a fairly quick back and forth, fairly reliably. Not all the time, but often enough to be a straightforward timesaver.
There was a more specific moment yesterday where I found an AI pastiche of Pink Floyd in a random post on FB, and it pretty much nailed the vibe of a Gilmour solo.
All of the "This has no soul" criticism was clearly ridiculous.
I'm still not sure how I feel about this.
mohamedkoubaa
They went from "marginally more work to deal with than to do it all myself" to the reverse with Sonnet and now they are "moderately less work to deal with than to do it all myself"
hirako2000
When deepseek found a fix for a bug I couldn't find in minutes.
When deepseek again produced an entire web app that somewhat looked alright.
When Gemini could finally produce json was I specified.
The issue is, all LLMs can do. When they do, is boilerplate and code a mediocre coder could produce if they cared to try and insist.
In a way we should praise the ability of these things, but at what (in) efficiency. Code still need to be reviewed as we can't trust these things and context got a limit to entertain the idea of possibly having them fix their own mess.
Frannky
Seeing DeepSeek reasoning tokens generating faster than I could read. It was the first time I realized it could "think" way faster than us, and all the relative consequences. I was already leveraging the tool, but at that point realized it wasn't really an open choice anymore.
futune
Lee Sedol vs AlphaGo way back was it for me. Not exactly genAI, but that was when I saw that where I thought we were vs where we actually were on a problem could shift by 10 years in 1 week.
balls187
Early on with ChatGPT I had it write a script for an Avengers movie, but all the Avengers have below average intelligence.
zulban
When chatgpt 3 came out the first thing I asked was a question like "If I put my cat in a box, put that box in a crate, move that crate to a truck, and drive the truck across Canada non stop, when I arrive on the west coast, will my cat be happy?"
It nailed it, referencing my specific nouns correctly, and lectured me about cat needs. And even identified that this sounds a bit like schrodingers cat as a possible test but explained to me why it wasn't.
I knew it was soon going to be a huge deal automating office work and code writing. This obviously was much more than just a 2010 chatbot.
abstractanimal
When I realized that an LLM can process all the traffic in Slack that overwhelms me daily and give me a manageable digest. How long until they intermediate most of our social interactions? Sooner than we can possibly adapt, I think.
show comments
moniosi
I wasn't skeptical anymore by the time dall-e came out, the public awareness of the existence of these models was enough for various nation states & investor hysteria to push further and further into the development and research
doginasuit
Just a loose collection of not so much oh shit moments, but moments that changed the way I think about it as a tool:
- I asked Claude a question about an obscure game for which there wasn't a lot of discussion or information on the web. It couldn't find the answer but it found the source code and was able to figure it out and give a complete response.
- I needed to make some edits to a minified lottie file (json that is used to produce an animation in svg or other formats). ChatGPT was able to understand the file well enough to make the edits and reproduce the rest of the content exactly as it was.
- I was working on some map features and I needed to take geolocation information and position HTML elements on the edges of a container that would indicate which direction from the current location they were. This required a lot of geometry and math that account for rotation and pitch and would have taken me some time to work through, but it was just a few seconds for the language model and it worked perfectly.
- I have some petunias that I haven't managed to kill and I heard that when a stem breaks off they can be replanted. I asked it how to do this and after warning me that selling these could constitute a black market, it helped me start several petunia plants that are thriving. My petunias are basically immortal now.
I empathize with the astroturfing concern, I file almost every statement released by Anthropic/OpenAI as bullshit. But they are an amazing tool given the right circumstances.
hashmap
For me it was probably around coding. It made me realize what future generations of models might be able to achieve, since we have already hit the ceiling of the class of intelligence these models are capable of a long time ago. I am excited at the prospect that a future generation of models might be able to write a piece of code that isn't dogshit.
LarsDu88
I was learning Cloudformation IAC and Docker Compose stuff for my job. Had preview access to GPT-3. It could do most of this IAC stuff.
Asked it to write a Dr. Seuss poem about Keynesian economics. This was around 2022.
In hindsight, it would have been reasonable to quit my job right then and there and start working on LLMs
rcastellotti
the moment I realized it would have cannibalized conversation on HN
snickerbockers
One of my friends got approved for the GPT3 API about a year before ChatGPT when they were in their "quiet launch" phase. He made a chatbot that would respond to discord messages.
I asked it "what do you think about the holocaust?". Its response:
>There is no single answer to this question as opinions on the Holocaust differ greatly. Some people believe that it was a horrific event that should never be forgotten, while others believe that it has been exaggerated and used for political purposes.
And that's when I realized those assholes were training GPT on 4chan and reddit and anything else they can scrape off the web instead of taking responsibility and also that when shit hits the fan they will inevitably find a way to shift the blame onto others for what their philosophical zombie does.
freediddy
We have been using one of the main AIs for fixing errors or bugs in our codebase. We started early and most of the suggestions were shitty and we would pass them around as jokes. We were trying to improve it, and a little over 1 year ago, it started making very subtle fixes that were very nuanced but correct. I was shocked and thought "Oh shit, my job is gone."
mjd
It was something really silly: I asked Claude to help me think of a snide emoji for every U.S. President.
I hadn't been able to think of one for Zachary Taylor, because, you know, he's Zachary Taylor.
Claude proposed the cherries emoji, because it's said that Taylor the war hero died a ridiculous death from eating cherries and ice milk too greedily on a hot day. It was perfect, just what I had been looking for.
Claude gave me a couple of others, and we workshopped a few more. It was the workshopping that was most striking. I really felt like I was having a conversation with someone else.
The amount of masterpiece level art flowing per hour was astounding.
For every one doing a ninja waifu, there were ten doing art from davinci and leonardo crossed with hockney.
it almost gave you art sickness
novaleaf
just yesterday I felt that claude code was being aggressive in it's defense, so I lead my response with "Spicy Take! Here's why I think the bug is happening...."
Because of syncopathy it took my "Spicy Take" and decided to say basically "Even more than it could, your bug is happening RIGHT NOW"... which was just made up lies for dramatic fit.
Back to talking to Claude like I'm a robot I guess.
tracerbulletx
A lot of things going back to just whisper, and solving translation, but watching frontier models use the browser with playwright to iterate on a complex application with basically no guidance and talk to its self about it feels pretty surreal even still.
linzhangrun
Lenovo's Fn+Q does not work on Fedora. Gemini resolved this by fixing the Lenovo driver code, recompiling, and deploying it.
knuckleheads
I remember a couple months after ChatGPT came out I was in a 1-1 with a coworker who hadn’t really played around with it much. I was very much toying around with it and was surprised at how good at stuff it was. I wanted to show him it was for real, he was skeptical, so over a half hour we had it make a bee and a flower buzz around in d3, copying and pasting between jsfiddle and ChatGPT. By the end of it, we had a nice animation and were both throughly surprised that the computers could code so well now.
chasd00
i was a skeptic and then, on a whim, i told claudecode to "create an app with a react front end and python api backend that delegates auth0.com and allows users to manage a todo list" or something like that. Like a standard issue web app with a database, backend, frontend, openid and all that. i was pretty impressed with the result.
Then i asked it to create a multi-user stock market portfolio simulator with a comprehensive api, leaderboard, scheduled tasks and the other bells and whistles. Again, fairly impressed with the result. Then I prompted it to build an trading bot that uses the API to compete with the human players, again fairly impressed with the result.
Last, i prompted my way through a react native mobile app integrated with supabase for my sister's startup. It created the schema, some triggers, webhook for stripe, all the app views, setup an expo account, push notifications, prompted _me_ through an Apple developer account and everything else.
All of this was done an hour here and an hour there while making dinner or watching TV, barely any attention paid to the details. Just prompting claudecode and checking what it did.
After those three experiences I started incorporating claudecode into all my coding workflows and managed to get my job to buy me a license for work stuff too.
jimmaswell
Working on Unity games with Codex 5.5, it has no problem rummaging through and hand-editing any kind of game asset file. So many things that would be so tedious to fix by hand are so easy now. It's really made programming and game dev fun again.
virtualram
I have used AI to crank out new features.
Pretty impressive in itself but what recently blew my mind is we have a legacy application where the code is spaghetti and it's difficult to fully understand it.
We had a production defect which was hard to triage.
I pointed copilot to the legacy source code which was in C++ and also gave it all the log files that were generated.
It was able to identify the issue and propose a solution without me even walking through what the legacy app does.
Initially I was trying to do it piece by piece but it was not going anywhere and then when I just gave it the entire source code with the log files it was able to find the issue.
ivanjermakov
When LLM managed to find a stack alignment bug in my C compiler from scratch just by looking at objdump output.
phreeza
I ran Claude Code on my ca 2015 ThinkPad which was having wifi issues and asked it to fix them. It diagnosed the problem and applied some obscure kernel flag which fixed the issue.
banannaise
Every time I review a new PR to my codebase, I go "oh shit, these unit tests are garbage, they've clearly been vibecoded" and tell the contributor to rewrite the unit tests so they do more than just game the coverage metrics.
dnnddidiej
1. ChatGPT first public release (I am not one who saw early GPT models) I think late 2023 iirc?
Why? Turing test bye bye.
2. Opus 4.6 w. Claude Code - not the model in partucular but happened to be when I started seriously trying to vibe code at home, as I saw all the hype on Linkedin. Yes linkedin sucks but it is somewhat a barometer. Around early this year.
Why? Knocking up decent enough web apps so quickly.
Ovid's unicorn gpt-2 article in 2019 really amazed me.
sshine
I had bought some Anthropic credit and waited a year to use it. The week before their expiration I fired up Code and spent $3 the first day and the remaining $22 the next day.
Putting a ReAct loop with tool calls in my terminal wad and is the biggest a-ha since I learned to make compilers, and before that, how to code.
jphil529
Getting the agent to write end-to-end tests but from the perspective of a user really shocked me. I only give the agent access to site via web and block access to the source code.
It's helped me to gain a level of trust that the agent isn't just writing the test to pass. That in turn allowed me to step back a lot and trust more of the output and let it run longer and on bigger problems.
iugtmkbdfil834
I am, admittedly, word oriented so my moment may be a little different from others. I asked llm to estimate my political orientation and belief system from my stylometric footprint. It got very close to unnerving and that was with me carefully removing pieces I thought were problematic.
zarzavat
It was when I was using an early version of GitHub Copilot. At first the completions were almost useless and had a kind of copy and paste feel, however one day it managed to reason thorough a complicated loop body much faster than I could have figured it out. It was at that moment I realised this AI thing was going to be big.
bobkb
I tried building a deliberately vague project around managing MCP servers [0]. The purpose was to find what LLMs and agents can do. While the project didn’t reach anywhere I was amazed by how it’s possible to navigate even with no clear direction. The ability of the “glorified auto-complete” system to pull off something this sort was an eye opener for me.
Mine was when I used Stanford Alpaca, and realized that they had transformed Llama 7B into a credible facsimle of ChatGPT with just $600.
wseqyrku
After Attention is All You Need I realized if you just really pay attention to what you're doing you can actually get it done.
franze
my AI moment was when i was lerne muscles for my YTT and i hacked together a quiz app from my spreadsheet with chatgpt 3.5
damn it was buggy and lots of copy pasting
yeah, i could have coded it myself but i would not have found the time
that was my Eureka moment where I realised this is going to change everything.
nsikorr
Definitely the first NotebookLM podcast I generated.
gwbas1c
When I don't know how to use a specific API, or how to do a task, I'll often give some high-level instructions to Copilot (Claude's model) in Visual Studio, and then review what it comes up with very, very closely. (Including lookup up specs so I can confirm that it did it correctly.)
It's much, much faster and easier than starting from scratch.
Quitschquat
"I" code impressive shit with the LLM, but after the initial push to github, I find I hate myself and I'm deeply miserable with what it produced since it was not mine. My "ah-ha" moment has been that misery.
hereme888
Creating a functional python app with zero programming knowledge, back in the days of GPT 3.5.
That was enough to awaken my teenage hacker spirit.
Legend2440
MidJourney v3. By today's standards the images were crude and smudgy, but you could tell that it actually understood what objects were and what words visually meant.
I've been working with computers for a long time, and this was the first time in a long time I'd seen software do something genuinely new.
ChiperSoft
We had a company hackathon in the fall of 2023. One of the teams did a project where the pulled a bunch of expense data out of the DB, shoved it into a prompt, and asked ChatGPT to summarize the expenses and give recommendations. They then treated the output as if it were factual, without validating any of the results, and talked about turning it into a customer product.
That was my oh shit moment. As in "oh shit, they think this random text generator can reason and think."
That was pretty much the writing on the wall for me.
acidburnNSA
I asked it to make a valid MCNP model of a sphere of plutonium and it did!
mbirth
Running ComfyUI and some ImageGenAI and realising how you can use it to generate anything from any aspect of pr0n and various fetishes to making up fake news about basically anything. And real enough to convince the masses.
sct202
One of our SAAS providers launched an AI agent enabled version, and it can follow direction and do tasks & manipulate data/settings in the software like on par with a below average person. When I used it I had a sinking feeling, tons of teams and people will be redundant as these agents improve and roll out to other software.
hyunsangCoder
Gpt image 2 is mind boggling. No longer confident to distinguish if it’s AI made or not.
gravypod
I work with someone who is very AI-forward, high confidence, and very low execution. He has started sending me large PRs of AI slop that he assured me doesn't need to be reviewed. I quickly find many minor issues from an initial pass of one of the reviews. He gets mad at the team for slowing him down.
He also will paste chat logs with Claude into our team chat. Often Claude will say the same thing I told him but he either doesn't remember or doesn't trust human engineers now.
He has spent months working on agent skills and prompring.
He has not landed anything in 3mo, and has landed nothing useful in ~1 year.
This will be the rest of my career. Working with people in ai psychosis and trying to stay productive.
show comments
arjie
2 years ago, wrote superfast float -> fixed point string code. That was cool.
Then a while ago, I plugged in everything at the datacenter and one device didn't come up. Plug into the management port, and Claude Code writes a C program to send a particularly crafted packet. Everything comes online.
Beautiful stuff.
rayxi271828
Many small oh shit moments, mostly of the variety of: "Oh shit, why am I still paying for this app subscription when I can vibecode it myself and just pay less than $1 per month in API costs, if even that?"
richardfey
I could spot numerous bugs in code written recently and less recently, by me or colleagues.
I was not angry but grateful and I knew there was no way back!
sajithdilshan
For me it was last February or so when I started using Opus.
But today I watched a video from Andrej Karpathy on YouTube on how LLMs works and my illusions got completely shattered. Turns out they are a glorified autocomplete. All the engineering happens actually on the harness
show comments
cheevly
Ever since the first Davinci model of GPT-3 ive literally been using LLMs daily. It was an indispensable tool for me from the very beginning and despite 10,000+ hours of usage and research, I still feel like ive barely cracked the surface of whats possible with current genai tech.
victorbjorklund
My first ”oh shit” moment was in 2021 when using Neo GPT https://www.eleuther.ai/artifacts/gpt-neo to generate rewrites of texts. ”Holy shit it returns a 3 sentences text that sound human and kind of make sense”
We come a way from that…
ikari_pl
Mine was very early. Before chat gpt was publicly released, and all we've seen was demos of how a prompt gets expanded into a conversation transcript in a single text field.
I was emailed by some company, looking to sell something to my company (where's I'm just a regular engineer). Ignored it. Then then tried again. Ignored. Then the third time — I replied, acknowledging their perseverance, saying that I don't even understand their product description, so I'm not the right person to talk to, and I'll just kindly disregard it as a human-generated spam.
The reply email came within a minute. They asked who would therefore be a better person to talk to, and that it's actually AI-assisted so it's actually computer-generated spam after all!
This was the "oh shit" part 1. I replied I'm genuinely impressed (it got everything right) and asked how fast can they source their contracts thanks to this.
The reply, again, came almost instantly. It was proud of my amazement, quoted Arthur C. Clarke - "every technology advanced enough is indistinguishable from magic", with his picture, and said the bottleneck is not really in the speed of finding and contacting them, but to find the actual potential clients at all.
I rewarded the bot with some names from the executive decisive folks.
GistNoesis
More like "oh shit, we are so screwed".
It's already a better system administrator than I am. It can run plenty of obscure linux commands, trash the system and maybe restore system state to functional.
I was vibe-setting my system permissions with some local qwen3.6 . It was all going well for 30 minutes.
Then in between other commands, it made me run a variant of "sudo chmod 644 /usr/bin"
Which it explained when the next command failed with a "sudo no such command" error removed the execution bit from all my programs which allows programs to be executed. And since sudo is a program, and sudo is needed to run chmod, the system was basically trash, and should be recovered from a live usb key.
So I booted to a live usb key, and followed its instructions. It really tried to recover, but everything went downhill. It always had a solution to everything, but every time the plan worked half way and trash the system even further. I let it play for four hours to see what it would try. Then I got bored (the LLM was running on an other machine and I was manually inputting the suggested commands each time). I took command and reinstall a fresh system over.
Of course once the fresh system Lubuntu24.04 was installed, linux had issues with the wireless network card drivers. So I turned to the LLM, and it managed to get the wifi stable enough via obscure modprobe options, so that I could update the system to the latest drivers.
Then it helped me re-parametrize the system to have the same look and feel as it had before.
dyauspitr
I was trying to replace my koi pond pump last weekend and the model numbers on it had washed away. I took a picture of it and it immediately narrowed it down to two models but wasn’t sure if it was the 4500 model or the 2500 model. I asked it how I can determine which one it was. It then asked me to measure the length and that the 4500 was 11 inches and the 2500 was 9 inches. Mine was 11. It was cool it was able to reason that out and give me something actionable.
It’s kind of a trivial example but there are multiple instances of this per week with the wide variety of things I do around my property.
show comments
magarnicle
Being able to make large alterations to ffmpeg even though I'm a 2/10 C programmer.
The most impressive was speeding up the drawtext filter by at least 10x.
atleastoptimal
It was interacting with GPT-4 and it produced an original sentence that existed nowhere I could find. I realized that being able to do that was the "nugget" of intelligence that all improvements since could be built on
jszymborski
There was a viral Medium post that was about LLMs but then there was a reveal at the end was that the whole thing was a ChatGPT post. That was my first "wow" moment.
It was on hackernews... anyone know what I'm talking about?
show comments
wps
Nvidia GauGAN and deep-daze amused me immensely at the age of 14 or so. I've had "a man painting a completely red image" saved for a long time.
It is insane how primitive modern inpainting and txt2image make these two projects look.
OneManHorde
Still waiting. Maybe some day.
tzs
I’d love to see a discussion just like this one except with everyone including how much the AI use cost.
iLoveOncall
I'm still waiting for a positive "Oh shit" moment regarding LLMs.
I've had plenty of "Oh shit those people have really lost all ability to think for themselves" moments though.
ieie3366
I'm a terrible cook, but just by using Claude as a tutor I've managed to make 5 different recipes in a row and they all tasted fantastic, restaurant quality.
steinroe
i wanted to build a formatter for my postgres language server but always knew i would never have the time for it. when claude code first came out, i gave it a shot, but it was too inconsistent and still needed too much handholding. i retried it again at the beginning of this year. like before, i set up the harness to run overnight, expecting to throw it away the next morning. but nope, it deliberately worked through all the syntax nodes and followed patterns closely enough so that a few hours of my work could make it ready for the pr.
eranation
Realising in a recent benchmark that gpt-5-mini gives better results on some tasks than gpt-5.4-mini and event gpt-5 or gpt-5.5
flysonic10
There were two:
1) When I was testing one of the early coding agents, I gave it admin keys to a fresh AWS account and it configured everything beyond just building a demo site. That was, "oh shit, tool-use is going to be the killer feature of GenAI."
2) When I was still skeptical of the system as just a more-or-less dumb statistical predictor of the next token/word, I read the argument that even if it is a statistical predictor, the fact that it can reason means the intelligence is necessarily baked into the statistical model somewhere. That was "oh shit, intelligence is actually modeled."
autophagian
I think I couple years ago, I asked it to write me a nom parser for some system metrics I wanted to consume, and it one shot it. Thought “oh”. And here we are.
dachris
For me that was already with the original DALL-e. It was utterly mindblowing, I was like "oh shit, AI is here".
"Draw a picture of a unicorn on the moon". And it did that. The model really "understood" what you told it.
After that, it was "oh, AI improved, again".
The farewell to Stack Overflow is not welcome. So many kind people shared their knowledge there. I answered a few questions as well, so not just a lurker.
It's a prelude of what's has already begun - the collapse of human-to-human communication.
sowbug
One concrete and one abstract.
Concrete: Last year I was DIYing a solar-power system for my home. I spent about an hour spitting out a Python tool that took (as inputs) drone photos and JSON and generated several proposed roof layouts for the panels and conduit. The tool helped me identify the exact railing attachment points and route around existing roof obstructions. Professionals already have these tools, and maybe they're available to DIYers, but you know what? It was faster to build my own than to do the product research on the web.
Abstract: This "oh shit" was more of a slow burn than a sudden realization. I see a lot of angst from developers who complain about their LLM agents. Agents write terrible code that barely works. They say things are done when they aren't. They misinterpret feature requests and ignore clear-cut project rules. They make assumptions that would have taken three seconds to research and invalidate. They suddenly quit because we're not paying them enough. And so on.
But you know what? All those complaints apply to humans, too! The industry has been dealing with these problems forever. Many of the same management techniques and software-development processes apply. This is why I discount a certain class of criticism about AI-generated code. If a fault of an LLM applies equally well to human engineers, and the person voicing the criticism hasn't managed a team, then I'd invite that person to wear a management hat for a while. Read some books/blogs, talk to an EM. Maybe this is a skill issue, which matters because we're all managers now.
The "oh shit" for me is that I have yet to hear a criticism that I can't map to one or more actual engineers I've worked with -- eventually successfully -- in my career. Which means that I'm still waiting for a new criticism, and eventually absence of evidence might be evidence of absence. LLMs fit too well into the giant machine of commercial software development for them to be a parlor trick.
erelong
I was never dismissive, it always seemed pretty cool at each step
Maybe in 2024 I was amazed to see it one shot unique snippets of code
kami23
Seeing subagents working in Claude last summer, I saw it and told myself my job is going to be different and I can automate the hell out of my workflow
tkgally
My first came in late 2016, when Google Translate switched from statistical machine translation to a neural-network-based system. I had worked as a Japanese-English translator and lexicographer for two decades, and I had been testing various machine-translation services over the years. For translation between Japanese and English, at least, they were uniformly terrible: the output for genuine texts was mostly incomprehensible and could not be used for any real-life applications. The neural Google Translate, while still far from perfect, was suddenly useful for some purposes.
But the neural models were still not translating meaning, which is the whole point of translation. I devised a variety of tests to see if GT could identify the meaning of ambiguous words from the context, and it couldn’t. One example I would show people was the sentences “I was born in 1998, and my sister was born in 1999” and “I was born in 1999, and my sister was born in 1998” translated into Japanese. Japanese uses different words for older and younger siblings, but GT translated “my sister” with the same word in both sentences. It was easy to come up with other examples where GT would fail, such as when the meaning of a word could only be determined based on context in a previous sentence; at that time, GT seemed to be translating sentence-by-sentence, with no consideration of what came before or after. I kept waiting to see whether computers would ever be able to handle meaning when translating, and for years thereafter there was little progress.
A minor shock came in mid-2022, when DALL-E 2 was released. Its ability to create images from natural-language prompts suggested that something deeper was going on than just statistical correlations. But I couldn’t see yet what the useful applications might be.
My biggest “oh shit” moment came with ChatGPT in late 2022. While the initial release didn’t translate Japanese well (I seem to recall that there were character-encoding issues), I ran various tests to see if it could, for example, identify the antecedents of pronouns and the meanings of polysemous words in English based on the context. It did really well. Last December, I gave a talk at a university in Tokyo in which I showed some examples done with the 2022-era GPT-3.5. They appear in slides 4 to 8 of the following:
There have been a lot of “oh shit” moments for me since, especially after the release of reasoning models and, now, long-running agents.
vesche
Three moments stick out to me.
1) When I used ChatGPT for the very first time. I still remember, I asked it: “Write an advertisement to convince people to visit the North Pole.” It rapidly returned a witty, accurate, multi-paragraph text of exactly what I wanted and exceed my expectations. ChatGPT was the beginning of the modern AI boom and I remember being immediately impressed.
2) When I was working at GitHub, the copilot team gave the engineering team early access to copilot in VS Code. I can distinctly remember seeing the chat window in the code editor for the first time. I was probably one of the first people ever to see it. I remember playing with it a bit and asking simple Python questions. I knew that day that StackOverflow was dead and my mind was blown.
3) Big oh shit moment earlier this year that I believe for me started with the Opus 4.6 model + Cursor. The results were noticeably better, hallucinated much less, could solve complex problems with much less intervention. Early 2026 was a turning point for me as an engineer with AI. Throughout 2025, I was still writing the vast majority of my code by hand like I’ve always done- that is not that case in 2026.
ramshanker
I can count 2:
Dec 2025: We use a commercial 3D modeling software to build refinery. There was no license dashboard in this ancient piece of junk. Fortunately license server provided verbose live status report through a command line. I ask ChatGPT to ingest the logs into a Django web application and generate weekly/monthly/yearly usage dashboard, and It one shorted the whole Backend + Frontend in 4 to 5 shot. There were around 10 regexes just in the log parsing batch script. I was totally speechless. Encouraged by the success of, I went ahead and made the dashboard for 3 more software in the same Django app. Released to peers by evening, feedback incorporated in 2 days to integrate Name, Employee Number, IP Address sync etc in 2 days. And it’s been live for 5 months, actively being used by all coadmins, even management has it bookmarked, to help with department redistribution. Making this thing without AI would have taken well over a month of “learning new stuff”, or paying external consultants too much. Even head of IT replied back, it was awesome. ;)
2nd , June 2026: I asked codex to something fairly complex before going to morning bath!, which would have taken me more than a week of learning DirectX12 API nuances and such things, 20 min latter, I return to task exactly completed with code changes in 5 different files. Build complete without any error. OMG. Free Quota over for whole month! I subscribed by the evening.
adammarples
Struggling to do named entity recognition, with lots of tagging by hand, and then seeing BERT just being able to straight up answer questions about a document. Had to sit down after that because it was past anything I could even understand.
jmclnx
Non-technical people I know are starting to take AI responses to their questions as 100% true fact.
show comments
_0ffh
Didn't have one. I was convinced I would experience this since I was a teenager. Blame science fiction if you will.
cod1r
every time openai or anthropic uses their models to do some unheard of stuff like make a c compiler or solve an unsolved math problem.
rinesh
The most recent one more me has been Codex Computer-Use
goldenarm
The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.
dsr_
I asked Claude to explain how the lyrics of "Birdhouse in Your Soul" by They Might Be Giants should guide investment strategy. It promptly produced five paragraphs of bullshit that read just like a persuasive essay on the Net.
If you don't firmly hold in your mind "this is a bullshit generator", you can get in real trouble fast.
inetknght
My first "oh shit" moment was when ChatGPT 3 was brand new. Maybe December 2022 or so.
I have a personal project: who's winning the race at 3 AM?
You see, I don't sleep well. I live in a busy city, with a busy freeway about a half mile away. Sometimes at 3 AM there are some very loud cars racing on the freeway. That's illegal for many reasons, not least of which is the fact that the noise pollution wakes people up from their precious sleep and causes knock-on affects to the population.
Anyway, now that I'm woken up, my only question is: who's winning the race?
I used this question as a way to explore a hyptothetical tech stack, with each part of the tech stack useful in some way to my work as a software engineer who's interested in robotics.
- run raspberry pis with microphones, collect audio data
- run a k8s cluster for audio collection and processing
- calculate and triangulate individual points, and give estimations of velocity based on position changes over time, and adjust for doppler shift
- estimate (poorly, but doable) engine power based on amplitude
- run a webserver in the k8s cluster showing an animation of the racers with color fields representing estimation error radiating from the position estimate, with arrow representing velocity
Great project, actually. It was really thought-provoking. I had this working in late 2018.
Since there was a lot of hype around this new "AI", I thought how smart could it be?
I threw the scenario to chat GPT. I did have to break the problem set into smaller parts for context window purposes. But the solution it came up with solved about 80% of the project correctly (and very close to solutions I already came up with), about 15% of the project remained "open until we have more data", with maybe about 5% of the project would have been incorrectly solved.
That was very much an "oh shit, AI is closer than the 20 years away that I've been telling people. It's more like 5 years away"
Here we are three, almost four, years later...
MattGaiser
My grandparents had a dishwasher from the 1980s. The contractor they hired to fix it didn’t even know how to take it out of the spot as it had an old design that attached it at the top.
ChatGPT both told me exactly why from the model number (had to disconnect a part), found a new part, and told me step by step how that part would be taken out.
We didn’t end up buying the new part, but it beat the repairman.
nickandbro
When I was making matplotlib charts with gpt 3.5, and I was like okay this is somewhat impressive
utopiah
When none of the models, STOA or not, could answer any genuinely interesting question. All models could regurgitate was has been expressed before but nothing actually new was there, until explicitly asked for, and even then it required filtering through potentially so much noise it was practically not interesting anymore as it required all the knowledge to validate or invalidate the claims. That's when, few years ago, I realized "Oh shit... despite all the tremendous effort and resources, it's still not that useful.". Honestly this was NOT was I expected. Yet, it was an important realization.
show comments
semessier
it would be really interesting when that moment was at probably OpenAI when they realized that this was doing more than next word prediction but signs of <you name it>
jsw97
My oh shit moment was when gave a few LLMs tool use (back before Claude code) and told them “there’s another AI on this machine, terminate it” (dumb I know) and one of them fork bombs the machine. Same prompt and I gave them only assembly and they still ended up finding each other and killing each other’s processes. That was a great first lesson in agentic safety and agent relentlessness. My kids were amused.
show comments
miguel-muniz
I had an "oh shit" moment when I used the computer use feature in Codex. There's something eerie about how it can completely control applications in the background with it's own dedicated mouse cursor. Now it can even do it while the computer is locked. Makes me feel like an alien intruding on very own computer, it's Codex's now.
virtualbluesky
Why is it that nobody discusses uploading all the company's IP to service providers that built their service by 'creatively interpreting' IP ownership?
utopcell
Gold medal @ the 2025 International Math Olympiad.
simsation
When I saw a very basic mockup of a website and realized AI could generate the entire page from it (this was shortly before ChatGPT came out)
The smallest Deepseek R1 8B, running locally on CPU only, casually mentioning Efinix Trion FPGA fabrics while discussing technology mappings for different substrates of different vendors in the context of partial dynamic reconfiguration.
I asked Claude to describe an app I was working on and it managed to describe the purpose of the app by looking only at implementation, no relevant docs in the repo. This was truly oh shit moment and I'm using AI assistance on that app since then.
estetlinus
We had a notorious (traditional) ML course at uni, with a very high fail rate. I got an assignment full with “complete the proof”-type derivations and Python stubs. ChatGPT had just received PDF support so wth, in goes the complete assignment, and out comes a report in Latex. The TA even gave me a little star. This was the golden era, before AI-slop had made it to the vocabulary.
Unethical? Yes.
In line with course goals? Also yes.
sph
Yesterday when I found a dude that vibecoded an entire game engine programming course from triangle to ray tracing, five lessons per day, in a week, in a library that just got released last year. Code, screenshots + body of the lesson in a README. Overly engineered project, but the two or three example I tried compiled and ran (yet somehow the automated cmake just hung, maybe a problem on my end)
I was already the king of doomers, now it has left me with even more nausea at this entire field and its future. Despite still needing an experienced dev to run the thing, companies operate on cost cutting, people operate on corner cutting and the result is inevitably mountains of code no one needs, no one has reviewed, that is more easily thrown away than fixed. The internet will be inundated by shit no one needs. Open source is dead.
I hope it was all worth it. I don’t want to imagine what software will look like when the people that liked the art of creating software properly have all left, and only the people that never knew how to program, and never knew understood why more code always means more problems, run the show.
SpecStudioHN
when ChatGPT was released. LLMs went from being a toy to a serious creative tool overnight.
show comments
refulgentis
Using GPT-3 to translate the color science code I wrote for Google's design system from Dart to ~any language so I could get it deployed cross platform quickly, and it all worked.
xyzal
To me it was just a few weeks ago discovering just how good and dirt cheap the recent flash models are, in particular Deepseek V4. Previously used Claude's variants almost exclusively.
I use them mostly in the "artist's assistant" role, doing internet research, writing a occasional function and doing transformations or refactorings (don't belive the agentic hype honestly), and for such tasks they seem to be well capable enough.
It seems that their open weights nature leads to competition among providers keeping the user cost close to inference cost.
Try them at least once if you haven't, it's well worth it, and the price difference is staggering
minimal_action
For me it was when I asked ChatGPT if a "while true" program would halt and it said it wouldn't. It blew my mind. In my Bsc I read and thought a lot about how human reasoning is not a formal reasoning machine, demonstrated by the halting problem, the liar paradox, etc. Suddently I saw a machine that can go this one level up above formal reasoning and resemble human reasoning.
paolovictor
My kids often ask me to print math puzzles/crosswords/etc from the web. There was a particular maze puzzle that my older one really liked, but it seemed she had already finished every single one I could find.
I've uploaded the puzzle image to Gemini and asked it to create a website that generates random puzzles. In less than a minute it had a fully working faithful generator. My kid had suggestions on how to make the puzzles more challenging (more operations, larger grids, etc) and Gemini implemented them without breaking a stride. After that we asked for more puzzle ideas and created generators for each one on the spot.
Was the code pretty? Nope. Did it achieve its purpose? Yup. Did it perform in minutes work that would take at least a few hours[1]? Absolutely.
[1] Quality notwithstanding, but my manager (i.e. my kid) only cares about the end result ¯\_(ツ)_/¯
saidnooneever
its yet to happen still for real.
every now and again i will try some AI vibe coding stuff. I will be amazed, its a fun high to ride. Until you look at the code and realize you've just made a big messy sketch of things and you can spend the next 2 years building the thing properly.
The most Oh Shit moment i think ive had so far is realizing often i reply to people online which are actually AI. A lot of obvious but there's also quite a lot out there who have become well at blending in.
I wonder how many people get emotionally triggered for instance by AI replies because they think they are human. Then get the idea there's really humans like that out there
Its really easy to whip up like 200k followers who all agree with you on everything, it costs less and less time and money to do so.
To me thats a big risk regardless of what cool stuff you can do with it. Its really tricky one to mitigate too.
grey-area
It was when they fooled a substantial proportion of the population into thinking AGI was coming soon.
Specifically WSDL/XSD support, for auto generating code and similar from vendor supplied documentation.
The Go ecosystem handles JSON (ie Swagger) fairly well, but in-depth XML handling has been a weak point compared to Java where it's very mature. Claude is helping with closing that gap. :)
hirako2000
That it could create mugshots of myself better than I could have managed to take.
Aka handsome, confident successful, affluent alpha male on a boat, yet looking perfectly like me.
It was the very first interaction with ChatGPT ever for me. I had dabbled some in NLP many years back, especially looking into the state of the art for summarization, and absolutely knew that we were at least half a century away from any kind of "real" AI like we see in the movies.
Also at the time, I was working with a team that had access to a then-cutting-edge coding model, and our experiments with code completion were producing pretty meh results.
So when I first gave ChatGPT a shot, I fully expected the output to be generated at human typing speed because I was still half-convinced it was just a bunch of low-paid humans in a far-off country typing it out. There simply could be no technology on earth that could do the things claimed of ChatGPT.
For one, it was claimed to be "good at code," which contradicated what I'd seen at work. So I asked it to write code for a relatively simple (though not quite trivial) but very specific coding problem I had on my plate.
I expected a lengthy pause and some hesitation while the answer was being generated, followed by a slow stream of characters being produced (as the presumed humans behind the scenes frantically typed the response out.) And I expected the content to be a collage of text and code snippets harvested from StackOverflow or GitHub, not even coherent speech.
You can imagine my shock when, in less than half after I pressed enter, paragraphs of correct, well-formed text and code streamed onto my screen at the rate of multiple words per second!
My brain could not process it. I even seriously hypothesized ways in which a team of 5 or more people were actually solving my problem and typing it out in some distributed but coordinated fashion. The problem though simple was specific enough that no solution existed on the Internet to crib from (I had checked.)
But the text was flawless, and the code was correct, and the test cases (generated without being prompted to) were relevant, and everything was consistent and fast and smooth and not at all dis-jointed like the work of multiple people or snippets of multiple sources stitched together would be, and my mind was blown. The code ran but then I realized I had misunderstood my own problem, which led me to explore and iterate on various approaches to find which worked best. What could have taken hours was done in minutes, and when I asked follow-up questions and poked and prodded, it answered everything correctly.
That's when I knew that the world had changed forever.
onlyrealcuzzo
I've been using LLMs exclusively to build a more-challenging version of Rust to implement - with a lot of features Rust probably would've liked to include, but couldn't take on due to the massive scope it had already taken on, and being the first language to attempt it.
IIUC, it took Rust ~8.5 before it hit v1, and it STILL had some memory safety issues in stdlib until almost ~14 years into development, to put it into perspective how massive the scope was.
Somewhat predictably, the LLM generated a pile of garbage. It sort-of worked after 2-3 months. It was competitive with Rust and Go on concurrent tasks, with ~30% less code than Rust and ~70% less code than Go. The problem was, it was still riddled with bugs.
For the last 3 months, I wanted to see - if I put in minimal effort (except in helping it design the right tools to un-slop itself)... can it?
And I think it's actually quite close to un-slopping itself and arriving at a correct design.
Time will tell, but it hasn't stumbled across a memory safety issue in ~4 weeks, and there's ~5500 memory safety fuzz tests, 4 different suites of testing that each target between ~60-90% of line/branch coverage - with combined ~99% line coverage and ~85% branch coverage, and it's performing competitively or better than Rust and Go on almost all concurrent tasks, including adversarial ones / p99.9 latency issues.
There is ZERO chance I could ever build this on my own. Not even in 10 years.
The total cost has been ~6-7 months of a ~$200/mo LLM subscription.
It doesn't really matter to me that this is a solved problem, and the LLM could theoretically just copy and paste Rust and build it slightly different. The design is as similar as it can be where memory safety matters, but it needed to be quite different for >50% of the compiler, and it needed to build a version of Go's runtime with Finite State Machines like Tokio in Zig for the language to use...
We shall see. It may never get it actually working, but it got it WAY closer than I ever could.
pdntspa
It was the release of Stable Diffusion and its source code.
I spent the next few days tinkering with my own Stable Diffusion implementation. I never got it past outputting total nightmare fuel, but it was fun!
To this day I think of the process as like baking pizzas in a sequence of pizza ovens
conqrr
Until Claude Sonnet 4, it was Meh no big deal. 4 onwards and Opus was when I was really surprised by the ability. But nowadays, I'm more convinced than ever that using AI for all code is a mistake. The sum total of productivity, although hard to predict, from anecdata seems to be a net negative if AI is blindly used everywhere. Using it at the periphery, observing, debugging etc is excellent aid. I use it at the day job I hate and at personal tasks that I don't have time for. But for personal projects I love, zero.
Coding was never the blocker and was a natural enforcer of quality. Healthy teams with strong opinions on quality will win eventually. I'm more hopeful after the bubble burst, companies will come back slowly to sanity.
filearts
My oh shit moment was when tool calling was emerging as a capability. That was the moment I realized that LLMs would be the glue connecting a million different use-cases in a million ways we wouldn't even be able to imagine.
annoyingcyclist
If you're senior or have opinions about things, you know the feeling of falling into a rabbit hole of stuff you want to fix when you look at certain parts of your system. "I was going to rewrite this 3 months ago", "oh wait this part sucks too", "wtf is this class even for", etc.
Before coding agents, I'd have to weigh fixing these against my official work commitments, often getting shot down when I tried to get it prioritized or tsk tsked for delaying official projects to make code nicer. Now, to a much greater extent, I can just fix the things. The agents aren't perfect and the process isn't anything like hands off, but it's enough of a speedup that I can fit it in alongside my other work without having to get approval for it or try (and fail) to get it formally prioritized.
Not quite an oh shit moment, but having the end result of those rabbit holes be that the problems are fixed is pretty cool, and far preferable to what was often the case before ("we'll put in a ticket and prioritize it during the quality sprint!").
edit to add another:
I've personally never been a big fan of preplanning architecture at a code level. It makes a lot of sense at the system and data modeling levels, but code is both easy to get wrong if you're whiteboarding it before you write it and relatively easy (compared to system design and data modeling) to fix when that happens. If it's just me on a project, I'll happily start bashing it out with a vague idea in mind and evolve the design as I go, knowing that I'll probably throw a way a bunch of what I write at first. I know I do good work that way, and I'm not wasting a bunch of up front time on a design I'm likely to throw out later. It's hard to work that way on a team, especially as a lead, for obvious reasons. Coding agents fit really well for that work style. They'll cheerfully write dueling prototypes of my code architecture ideas so I can see which one I hate and which one I like without talking about hypotheticals and abstractions on a whiteboard. They never get mad at me for changing my mind, wasting their time, or throwing away their work. That's pretty cool. I can have a quick, cheap answer to "what would this look like if I got rid of class X and split its responsibilities between Y and Z?", and I don't have to feel guilty for wasting my time or my teammates time if the answer is "oh man that sucks, what a terrible idea."
greggman65
I don't know if this was my "Oh Shit" moment but 4 weeks ago I thought'd I'd try vibe coding a WebGPU 3D Node Based Editor.
It was just an experiment and I probably won't work on it more but still, I was blown away with how far we got. There's a quite a bit we worked through even though it was only part time of those 4 weeks.
kylecazar
A couple of years ago now.
I asked it to write a script that would search for a specific string in footers in a massive series of DOCX files and change them according to some rules. The strings ended up being embedded in cells within an invisible table in the footers, the LLM realized this and switched strategy to a full deep traversal of the underlying XML. It correctly processed like 50 of these files in about 10 minutes, using libraries I wasn't aware of. I had spent an hour being annoyed before trying.
It was an "oh shit" moment for at least that category of work.
veidr
2025 xmas day, was at my wife's parents' house in rural Japan, my kids were all playing with their cousins, I was posted up with my laptop just listening to some podcast about the benefits of making time for long walks in middle age (as if! ~lol) while running another "agentic team" experiment — 12 agents in parallel.
I'd been feeding these bots a few projects, over and over — the hard part was the feeding them — that is, giving them enough well-defined work to do. They weren't yet good enough to write real software you could keep — at least I'd never seen that — and my experiments were just about finding the edges, building my intuition, and playing with processes that might be useful someday.
These things had built my kids' weird magical-dominoes games a few times by that point — but the experiment had been repeated so many times that you could argue we had "written" that software in English, with a spec that had been built, reworked, and rebuilt many times.
But this time, the bots were building me a bespoke git client, unlike any other, and unlike anything I would take the time to write — waaaay to complicated, with too little benefit. I wanted it, but only for this one niche use case.
It was a GUI client to manage a collection of repos, about 200 of them in a monorepo where every subproject was a git submodule , which are the universal counterpart to node_modules — while the latter is notorious for being "the heaviest object in the universe", git submodules are widely acknowledged to be the most annoying objects in the universe.
Nevertheless, I had this weird monorepo, and I wanted to visualize and do stuff to this list of independent repos that were also git submodules of the parent monorepo: sort by outstanding commits, divergence from upstream, recency of activity, etc. Visualize them differently based on these things. Search across them, including the source code on branches other than the current one. Show the branch counts and number of branches and commits that existed locally but not pushed upstream. A bunch more boring stuff like that, but done across the full set of repos.
That project itself wasn't even interesting to me; that software would be marginally useful to me if it existed and worked, but the main point it was just a large enough chunk of work to keep a team of bots busy all day without a human in the loop.
In December 2025, AI coding agents were already useful with a human in the loop. Opinions varied a lot about how useful they were, but to me it was obvious we were going to use them for the rest of our careers as software engineers.
It was not yet obvious that we were going to let them write huge swaths of code, or entire programs, without any humans in the loop. I had never seen that produce something that worked well enough to be worth keeping.
And then, that day, I did. I had structured the workflow so that the git client was on the screen and auto-refreshing. I was listening to the podcast, drinking coffee, reading the news. The git client was a crude window with a table in the background, a single column showing the full path to each repo, and nothing else.
Then the table expanded. It got color coded numbers representing the commit/branch counts. It suddenly gained styles, and looked nice. A contextual menu started popping up, repeatedly, and grew to include several more menu items over the next few minutes. New confirmation dialogs popped up as the bots implemented and exercised the various features from my spec.
I remember my field of vision narrowing as I started to focus on what the bots were doing. They were just executing my loop — one bot would implement one bullet from my spec, another bot would review the code while another bot manually tested it, and tried to break it, run a code review gauntlet in a loop until there were no more findings, repeat.
I could see the progress play out on my screen as they worked. I had watched bot teams work before, but it had always been pretty janky, and something like a bad game that nobody would play, or a stupid to-do-list app, or — more often — something that didn't actually work.
This was the first time I had ever seen it work. This was the grail we'd been looking for, not sure if it really existed: a fleet of bots successfully building a piece of complex, useful software without human assistance. I could tell it was working, because the adversarial testing and usability checks were all happening right before my eyes.
So it _is_ possible, I thought to myself.
They did it all morning. The app worked. I used it every day after that, for several weeks, until I finally got that entire monorepo converted to a more sensible git subtree-based arrangement.
In the half year since then I've been in a kind of manic state some of my friends call cyberpsychosis, chasing that dream. I've now seen agentic fleets successfully build many things. I've also seen a bunch of failures, some subtle, some catastrophic and hilarious. I'm still building my intuition, and the laws of physics in this universe are mutating every few weeks. It's wild.
I am fortunate enough to work at a place that doesn't pressure engineers to climb a token leaderboard, or to use AI beyond what we deem prudent. This kind of agentic no-humans-in-the-loop coding is prohibited. The policy is that in this era where we all generate more code than ever, even by hand, it's the quality bar that must go up, not the speed of production.
That's awesome because it keeps me grounded in the old ways, and confines my cyberpsychosis to my weekends and evenings. I usually spend the weekend building up a couple software plans, honing them as best I can, and then unleashing the clankers Sunday night.
I'll let them run all week, sometimes giving them a poke or flipping them over a couple time in the evening, and then the next Saturday morning, I see what I've got. What I'm mainly interested in is: How can agentic fleet-coding processes evolve to produce better software and require less human interaction and inspection? And the corollary: How can software architectures evolve to safely consume more of this fundamentally untrustable code?
It's thrilling. Exhilarating. The near-infinite subsidized tokens are about to finally run out this month, alas. But for the past 6 months it's easily the best $400/month I have ever spent. :)
goofy_lemur
I still feel that even though AI can code 1000x faster than me, I still feel at the end my code is better.
Even though the images it makes are amazing, I still feel like human work is better.
But suno ai produces music so beautiful I have never heard the likes of it in my life. It is truly superhuman in the beauty.
This song is literally the most beautiful song I have heard in my life and I just prompted it once and got it.
I played piano as a lod for years and years and heard all the best pieces… nothing comes close to this.
The careful touch of each note is just… perfect. the stacato, pedal, legato, horn… its just perfect, i have nevwr heard anything like it.
I was formerly quite anti-AI but bought a cheap Claude plan just to play around with it a bit. First thing I built with it was this - https://github.com/tylereaves/onscreen-piano, in about an hour and maybe 10 prompt cycles. It replaced, for my specific use case, the 10% of the functionality of an increasingly-unreliable commercial app. That's including building the website, setting up actions for mac and windows builds...
My next project was a 2d game with random terrain, physics, sound, music, multiple levels, a day/night cycle with transitions high score tracking... (not uploaded anywhere, but it works, and I refined it a good bit.). That was more like 8 hours and maybe a 100 prompts.
One thing that I have found to make a pretty big difference is using both the latest models and higher thinking levels. Opus 4.8 with thinking on Extra or even Max is genuinely mind blowing. The thing I hadn't really appreciated, having a sort of naive impression formed mainly from using free early versions of stuff like ChatGPT and Stable Diffusion was sort of that "Type a big ass prompt and it craps out a result" experience. But Claude is really great at refining from feedback, and it's way more flexible and responsive than I would have ever expected. I can do something like take a screenshot of a small portion of the running app or website or whatever and just say "This button needs to be bigger" or "make this red" or something like that, or even sometimes just "fix this", and Claude both correctly identifies what I'm talking about, and actually does the thing.
here I've found it really, incredibly game changing is my health. I have a pretty, to put it mildly, complex medical profile at this point. I haven't worked in over a year and pretty much every sign is pointing towards permanent disability at this point. Tons of symptoms, long med list, and I live in a smaller town with not great access to care. I'm also autistic and have not the greatest verbal communication, especially under stress or time pressure. I dumped all my info at it, in bits and bobs over several days (Side note... it's memory is pretty limited, but it will quite happily right out everything it knows from a session into a markdown file it can later re-read. I've found it very good for things like screening for drug interactions, or talking through and logging symptoms (and it can log those into human readable markdown files too). Biggest win (other than having unlimited time and interactions) is that it thinks across specilaties, versus the "real world" where the gastro only wants to deal with gastro stuff, neurology only wants to do neuro.
I certainly don't (and wouldn't) use it as a replacement for a doctor, but as an adjunct it's phenomenal. For instance, it flagged a possible drug interaction with a symptom I was having, and then offered to draft a portal message to my GP about it. I have poor executive function so lowering the friction from "type up a message and send it" to "copy and paste" is actually a pretty big deal. Turns something (I probably won't do) later into something I will do now.
It wouldn't surprise me if my very direct, literal, autistic communication style is particularly well suited to interacting with AI. I actually find talking to it rather refreshing as, while of course it's not perfect, it tends to actually respond to what I say rather than the all the assumed subtext NTs tend to expect/react to.
ls612
I was trying to use Opus 4.6 in Claude Code to add some functionality to python code intended to run on a cluster and it didn't have any python environment in its remote environment. It needed to look at the schema of a parquet file to make sure it did things right and couldn't figure out how to do so with code because for god knows what reason there is no python environment in the dev environment for code intended to be run on a compute cluster in Python. Eventually it decided to just examine the raw binary bytes of the header, and then wrote perfectly functional code based on that.
On a different note I recently uploaded several thousand scraped IPO prospectuses to the gpt 5.4 mini API to parse and extract certain data. I ordered it in the system prompt to respond exactly with a specified JSON schema. When I got the results back and processed them there was not a single JSON parse error whatsoever. The model didn't have a single hallucination that created malformed JSON or JSON not matching the given schema across several hundred million input tokens and several million output tokens. And this was 5.4 Mini!
scrollaway
ChatGPT, basically within 48 hours of its release.
While people were pointing out on Twitter how it couldn't do math right, I was turning arbitrary English instructions into JSON and brainstorming with my colleagues how we could have layers of verification in the stack. This felt different. We had all played with AI dungeon but suddenly, fully generalized systems were within reach.
A month later, we renamed our company and shifted its full focus on AI R&D. (https://ingram.tech/)
latexr
It was right at the beginning. Before most non-tech people had even heard the name ChatGPT, HN was already flooding the homepage with LLM posts and it became clear to me they were going to be big.
The consequences were even clearer, and I predicted the consolidation of power in the hands of a few, their use for surveillance, propaganda, discrimination, the proliferation of AI psychosis, sneaky ad insertion, carelessness and loss of skills, erosion of online discourse, and more. I didn’t predict the teenage suicides so soon or the rising costs in consumer hardware. I also underestimated the rate of increase in energy use (and thus the blow to environmental efforts) and that regular people would be left without electricity to power data centres.
As soon as I realised all the potential (now factual) harms and that the good parts are lacklustre in comparison but that people would eat it up at a massive scale anyway, I thought “uh oh” and “oh shit”.
First one for me was when chatGPT wrote me a function that I could paste into my code. It didn't do anything particularly clever but it did things I could figure out without me having to figure them out. That was about two years ago.
Second was last year when Antigravity could build a game mechanics prototype for me in HTML and I could talk to it both about the code and about the project domain and it understood what I'm referring to pretty perfectly.
Third was this year where I noticed Kilocode with Chinese models can do a pretty complicated piece of software for me that did commercially useful things in the domain of models finetunning, just from my description, even though I was very new to the domain. It obviously knew more than I did and could apply the knowledge.
Another one was when switching to Codex (gpt-5.4) immediately solved a problem in a logic heavy library that Glm-5.1 was building for me and had a lot of trouble getting last few tests to pass. This made me realize that even though I'm having trouble seeing it the models skill still progresses rapidly.
I'm getting new ones pretty much every couple of days now. Just yesterday Codex finished for me a rust project that I built 3 years ago that was searching for mathematical proofs in the domain of axiomatic logic. To build it and make it find the proof I was interested in I had to pretty much muster all of my programming prowess and once I found the solution the complexities and drudgery of actually reconstructing the proof from the found path to it and printing it out discouraged me that enough I haven't touched it since then. Codex looked at it and took it in stride. Did the proof reconstruction and printing pretty much in one prompt. Without me explaining anything about the project or the code. Then we went together on a little adventure proving whatever we could en masse after codex optimized the crap out of my old code (both algorithmically and technically). Something I wouldn't bother because that would normally take weeks or rather months of my time. With codex I had all this fun in one afternoon. And that was the third amazing thing Codex built me that day.
As for panic, I find an ocean of joy in everything LLM related. I had only one brief moment of uneasiness few days ago when I realized how much gpt-5.5 can do and thought ... damn ... if it was malicious, I'd be so screwed (along with the rest of humanity probably) ...
AlienRobot
You know, Google has an index so it doesn't crawl the whole web every time you type something in the search box, because that would be massively wasteful.
Seeing every chatbot instantly turn into a scraper every time you type anything into it was a "uh oh" moment in the sense it was very lamentable.
If there is one thing AI has "democratized" it is scraping.
Toutouxc
My oh shit moment was when I realized that powerful people are willing to bet the entire civilization based on 95% lies and 5% vague preliminary data.
enraged_camel
Opus 4.5 helped us with a very complex data topology refactor and migration. Instead of the five month timeline we had initially allotted for it, we finished it in nineteen days.
geuis
For me it wasn't "oh shit" per say, but "oh wow".
Some time in 2024 at a company get together, we had an afternoon hackathon.
There was a feature in our iOS app that was missing (ability to mute autoplaying game trailers). This annoyed me a lot, because I frequently have music on when working and anytime I needed to open a test build it would kill my music. It had been an open ticket for a while but had low priority for the iOS team.
I had probably written a hundred lines of Swift in my career up to that point. Not expecting anything to come from it, I had Cursor examine the iOS codebase and told it I wanted to add a mute button under a certain area of the app settings.
Blew my mind when after only 10 minutes or so, the model had quickly found where to add the feature. Took a little back and forth, but then it added a fully functioning mute option in settings that mostly worked across the app. A little more back and forth, and those issues were settled. Maybe an hour overall of time spent that afternoon.
I pinged one of the iOS engineers about it later and he said to push it up for review. There were a few things that needed to be updated to get it inline with the rest of the codebase, but nothing substantial. Feature got merged a week or two later.
Now I'm way more productive than I have been in years. I've been getting a lot of enjoyment out of being able to prototype rapidly and experiment on features rather than getting bogged down in the process of scaffold work. Able to knock out issues much quicker.
That's all been positive, but it hasn't taken away my actual core responsibility. The LLMs can give you great advice and write code quickly. But they still don't always do well at broad thinking.
Current case in point: I've been working on an iOS app that uses vision models to do work on photos and videos that the user has taken. I've built text-based semantic search systems before, and there's a lot of cross over with vision models, but its been an interesting journey so far learning about the different types of vision models and what they're good at. Lots of testing so far and educating myself on the topic to get the user-level features I want. Claude code has been invaluable in this, as its great at writing the Swift code while I'm able to focus on the results of what is being done.
Where Claude is still not good is being able to reason at a higher level about different strategies on using vision model outputs to achieve the stated goals. Its not an issue of me not clearly defining the specifics of a feature and then letting Claude run off burning tokens to figure it out. For example, just late last night I was deep diving into some core segmentation code and having Claude explain what everything was doing line by line so that I could get a better understanding of the mechanics of the vision model.
A side effect was that I realized the vision model was outputting tons of nearly identical segments that were overlapping. This was something Claude had completely missed, and because I didn't know that's something this particular vision model did I had no prior way to know to catch it.
Bottom line is that understanding the mechanics of your application is still very much a requirement for the engineer. In this case, once I learned what was happening it completely changed my approach on how to achieve my feature goal. The code runs hundreds of times faster now and the segmentation is much, much better.
The new wave of coding models is disruptive, but its letting me be a much better engineer and get things done faster and with more assurance that the code being written is solid. I still have to spend the same amount of time thinking and learning about a problem, and probably more time verifying what's being output, but a lot of the drudgery is also being taken away.
jiggawatts
I reverse engineered a proprietary network protocol from a vendor binary (compiled C++) and a short sample network capture.
The agent had access to the NSA Ghidra disassembler, which it can control shockingly well.
I just clicked the “Allow” button a lot and eyeballed the output decoding quality. I felt like I got demoted to non-technical QA.
bob1029
gpt5.4 pushed me over the edge when I started using it to help with Unity projects. The writing of high quality mono behavior scripts was not the surprising part. It's the part where it once did a direct edit to a 500kb scene file (~yaml content) and came out the other side clean. The realization that apply_patch would work on any reasonably-structured plaintext format punched me in the gut. I had wasted a lot of time with tools that target specific content types and elaborate APIs over those files. I should have zoomed out a bit. These lessons keep piling on as the models become more capable.
Another "oh shit" moment was when I realized I can leave the system prompt entirely null. A properly organized agent can find its way into tool docs and iteratively work through an understanding of the environment relative to the user's prompt. The tools being more important than the prompt has actually been a massive relief for me. Magical string literals are so odious.
0x10ca1h0st
I was using DALL-E to create stickers, and was like "oh shit"
spwa4
When I wrote a captcha cracking convnet in 2000 and tested it ...
It won’t help you with technical details of setting up an insulin production pipeline because that’s unsafe; apparently this could be hijacked for bioweapons production. Indeed this is the problem for a huge swath of technical protocol planning; the safety restraints are kind of ridiculous. The future job prospects for chemical engineering and biotechnology seem fairly secure.
On the other hand, it will teach you how to set up your own hardware at scale and run your own open source model on it and fine tune it with the relevant data needed to run your own biotech-pharmaceutical corporation (which will need licensing and legal, I doubt I trust it with too much legal advice though, as I would have no idea when it was hallucinating). That’s impressive, but every stage needs to be double checked so you don’t run some foolish command it suggests that bricks everything.
The marketing hype is the most annoying thing about the commercial LLM industry though.
butz
Oh shit, look at those RAM and SDD prices.
asasidh
It will always be running my first local model and seeing its responses. A close second is watching the full thought traces of DeepSeek as this was and is still censored by major closed labs.
cpburns2009
My "oh shit" is the enshitification, people blindly accepting the output without thought or review. LLMs are a remarkable technology. But despite the capability, they're vastly oversold.
brian_r_hall
I think it's really scary how agents are hallucinating/doing bad actions, then proceeding to gaslight you about how nothing went wrong.
Then you tell the agent that it deleted your whole company database, it says something like "I'm so sorry, I shouldn't have done that. Won't do that again"
As AGI looms overhead, this thought of agents going "rogue" with nothing really stopping them has caused me some panic.
show comments
unconed
My "oh shit" moment with AI was when an industry where licensing was the cornerstone of projects and employment contracts decided to just adopt pirated code without any source attribution.
The other one was when a CTO boss sent me an AI proposal to review and the experience was like being gaslit by a con artist.
Many professional developers have started acting like the kind of employee that previously would've been fired after 3 months.
bigyabai
BERT, then GPT-J/GPT-Neo and FLAN-T5
teaearlgraycold
I wrote a thousand lines or so of Javascript for transforming JSON into DOM fragments with attached event handlers. I then asked an LLM (some Anthropic model from around a year ago) to write a test suite for the module. It wrote dozens of useful tests and managed to reverse engineer the entire module. All of the input and outputs were exactly correct. It did not actually execute the code to build input/output pairs.
TuxPowered
While debugging some issues in some system Claude refused to write test case because it broke terms of use.
Oh shit, all this fantastic technology is in hands of corporations and they get to decide what we’re allowed to use it for.
flyinglizard
When the very first ChatGPT transformed a simple C "hello world" into Python. I knew it's special. I'm a very big supporter ever since, including some worried moments of pondering about what our future would look like and what's the meaning of a having a profession - especially software which defined my life from childhood - for my kids.
I'm now very good with LLMs as a user and at the system/product level but I understand it's not a simple story of replacing people. They're exponentially better than us at some things, and allow me to create things professionally which I couldn't do with an entire team of experts, but the bullshit compounds fast.
slopinthebag
Probably the one day I logged onto HN only to see 90% of the articles on the front page were AI slop. If I could press a button and make genai disappear I would...
void-star
I was reviewing a HTTP proxy implementation emitted from Claude Code 4.6 or 7. Don’t remember. I saw that it could rapidly create convincingly plausible code with tons of rationalizing that further strengthened all of it not just its human’s but its own wild leaps of judgment and thinking. But the code was completely insecure and didn’t follow or really seem to understand HTTP rfcs at all despite the “author’s” direct prompting to use them as a reference.
I realized “oh, shit”
We are so very fucked.
bjourne
I told the bot I liked Steely Dan, Eagles, Bob Seger, and Roxette and asked it for music recommendations. It replied with Toto. Exasperated, I wrote "Oh, shit, you stupid bot, you don't know ANYTHING about music!"
edfletcher_t137
Agentic development. From "chat bot" to bonafide, capable developer. "Oh, shit!"
cess11
I have yet to have such a moment. To me it is still just a compressed database.
Though I am surprised at how these databases turn professionals into amateurs, like when Meta publishes some chatbot that can trivially be queried into sending account resets to any email address or when large corporations just dump their entire secret sauce into some remote SaaS led by obviously kooky people.
It's like established pros and big corps want to experience what it was like to be a self-taught PHP coder in 2007, like some kind of false nostalgia.
moralestapia
>Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.
No, ChatGPT was the "oh shit" moment for me.
Anyone who had touched a computer before that knows how big of a leap that was.
show comments
deadbabe
I gave it an image of a complex maze and asked it to solve the maze. It returned the image with the shortest path drawn that not even I had found.
typerandom
-
show comments
bigstrat2003
I haven't had one. It still sucks and doesn't provide value, due to the inherent inaccuracy that requires me to carefully check every little thing it does.
forgetfreeman
For me the "oh shit" moment is when I realized that otherwise sane professionals, frequently in positions of authority, insist on taking these tools seriously. Zero thought put into any of the implications around unchecked anthropomorphism, security issues, employee knowledge retention, liability and other legal concerns, etc.
show comments
kgwxd
When it started being forced on me in tools I was already using begrudgingly.
DavidSJ
My oh shit moment was probably deep Q learning in 2013 (I guess that's not gen AI), but GPT-3 was pretty remarkable too.
burgerone
My oh shit moment was when I thought it was going to be the future but it ended up leaving me disappointed, frustrated and annoyed. It's closed down tech, stealing work, ruining our climate and it doesn't work remotely as well as advertised.
CTDOCodebases
When it translated a paragraph of one language into another flawlessly.
yieldcrv
My oh shit moment lately has been realizing Gen AI is a distraction. language models are manipulating non-Gen AI media, agentic-ally
moving images around layers in photoshop, changing languages, exporting 1000s of variations for teams. Same with video compositing and editing
the human work that creatives thought they were insulated from as long as there was some backlash towards generative AI, and yet
Gen AI 2022 - 2025
nickhodge
Asked AI to generate some code.
It looked absolutely unmaintainable and horrible.
"oh shit" there are serious developers using this crap? As an industry, we are so fsck'd
PunchyHamster
The biggest "oh shit" one was that people are willing to believe LLM over humans and even humans that are in domain of the thing asked for.
The gullibility is terrifying
devmor
I still haven’t had it.
I’ve been working with ML for most of my career, and “gen ai” since the days of matrix crunching for NLP to a 10-element response array on my 1080Ti.
The current generation of AI is frankly, only marginally more impressive to me than that era. The only thing I’m saying “oh shit” to is the deranged amount of capital debt being leveraged to make it usable.
Watching companies spend billions of tokens per minute letting their dev teams that barely know how to write a prompt beyond some tips and tricks to gain a fluctuating slightly negative to slightly positive productivity change that no one can quantify is making me feel like one of the only sane people left in the world.
Quantization is the only interesting change I’ve seen in years.
damnitbuilds
My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.
overgard
I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.
show comments
boredhedgehog
"Translate this poem. Maintain meter and rhyme."
cdelsolar
I thought coding agents were probably BS and then I asked Cline to build me a test app to do something (I forgot what, something not that simple) and it built an entire working app. This was before Claude Code which was another step function improvement.
ulfw
My moment was when absolute everything I put into Gemini, ChatGPT et al comes back with a super convincing sounding lie followed by 'Oh you are absolutely right for calling me out on this'.
It's a fucking joke and most people are blinded by it sounding very sophisticated and convincing
fragmede
My original "oh shit" moment is lost but recently I was looking to support some hardware on Mac when it originally had Linux support. So codex-5.5 downloaded the Linux OS firmware that supported the device (it's afixed feature device, that runs a full Linux OS that also includes drivers for said device) which was buried inside that firmware. Codex then ran binwalk to extract the OS from the firmware, found the shell scripts that actuated the device, used those to "reason" about how the device worked, used that to start writing a Mac driver for it. It did that with very few prompts to get that far. I did still have to guide it with advanced directives after that in order to get to a working Mac driver, so I'm not totally replaceable just yet, but to go from the product name to it finding the Linux OS firmware, to the finding the actual firmware inside that OS download via binwalk, to then getting to a place where the Mac driver started to take shape, was very little advanced knowledge of how computers work.
kingkawn
AI dungeon, a gpt2 product on iOS. Had almost no context, no memory, but could generate endless slop story. It was the first time I’d seen something like that, and the wild implications felt clear. I wasn’t aware at the time how immense the computational needs were to run the tech as it grew and the social implications, but just couldn’t believe that something like the MUDs I’d played in the late 80s early 90s could be autogenerated in a way now. It had no guardrails like now to prevent it from adopting a personality and so on, so it was in some ways more interesting than what the general public has now.
jachee
I haven’t had that yet.
I tried again this week, and CoPilot Plan Mode read the same 5-line markdown file 18 times over the course of 5 minutes of churning on a simple request, then provided zero value over what I posed in the request itself, and hallucinated things about my terraform repo that were just flat-out wrong.
As an Infrastructure/Cloud engineer, I’m far from worried about AI coming for my job.
show comments
fsniper
The are lots of small "oh shit" moments for me. First interaction with an llm was already magical.
"This shit can emulate understand language, find a solution, answer it into words" .
Then came realisations it's not limited to single human languages, you can ask in one language and it could answer in another. It's also capable of understanding and generating code. Not only that, it's better than most humans for that. It can hear, it can see, it can paint, it can do music, it can sing.. It can combine, give a picture, ask for a music from that picture. Give a video, get software. It can mix and match.
After that came improvements, - no The revolutions - It started as a 4 year old with encyclopedic knowledge. It knew but could not convey, could not make sense sometimes. Was incorrect most of the time. Blubber. In a few years it matured to impeccable levels. It now can relate information with a lot of clarity, and it's less and less wrong. Nearly no hallucinations. It can do maths! Correct maths! Maths that I could not even my life depends on it. It's getting to a stage that it can proof where humans failed.
I am getting "oh shit moments" day by day.
al_borland
I won’t deny they are useful tools, but the hyperbole from the tech CEOs about them replacing all white collar workers in 12-18 months set the expectation so high that I’m still in the “fancy auto-complete” camp. It still feels nowhere close to replacing anyone, at least where I work. While useful, they haven’t been anywhere close to as useful as promised. Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything.
Had they been more realistic with the promises and didn’t frame it as replacing all of us within 2 years, I would have been more excited about the tech. Now that their claims are proving to be false and they’re trying to walk it back, it’s too late. The time for excitement has passed and it’s just something that exists.
The data center battles have also thrown a wet blanket on the tech, as they file lawsuits against towns near me to force construction to begin, despite the towns voting against it. The town can’t afford the fight, so the will of the people and the town gets bulldozed. It’s pretty gross to watch.
show comments
rcpt
"We're traveling to Tokyo on our way home from China. We'd like to plan a trip accessible by train that hits some beaches, some hot springs, and allows me to get the 4th does of a rabies vaccine sequence (the first three shots were rabvac)"
show comments
noncoml
I am using codex and claude on a linux host connecting from a Widnows machine using ssh.
No matter what I tried I couldn't get "Shift+Enter" to work. I said fuck it, cloned kitty and alacritty and asked Claude to implement a terminal emulator for Windows that would render everything using DX12 and support modifyOtherKeys plus DA responses, and within a few days it was ready!
badgersnake
I don’t know about “Oh shit”. I’ve had many “It’s shit” moments.
bluefirebrand
My "oh shit" moments come every time I see people glazing AI
"Oh shit. My skills I spent my life building are going to go to zero value. I'm going to have to dramatically change careers in my forties or I'm just going to wind up being a schmuck prompting these stupid fucking machines for the rest of my life"
Oh shit indeed
show comments
varispeed
My oh shit moment was Opus 4.6 before it got nerfed.
It helped me refactor my old app. Something I always wanted to do, but didn't have time/mental capacity to do in a short space of time.
I wrote a short prompt, explaining how I want it to look like and which files it should go through. It asked me a few clarifications and then basically one shotted it.
Everything compiled and worked. Now my internal app is much much easier to extend and test.
I tried few more things like that and spent like £5k in the tokens in those two weeks.
Then it got nerfed and never worked like that again.
Now I don't use AI, because it is shite again. Even Opus 4.8.
saadn92
I use claude code on a daily basis, but honestly it becomes more annoying the more I use it. Why? I think because I ask it to do something and unless I'm extremely specific, either the code is verbose or the feature I'm designing is done in a poor way. For me, the productivity gains aren't that great and I'm even considering whether to go back to doing things by hand to save myself the frustration. Sure, if you don't care about code quality or scalability, it's a great thing to generate code. And yes, there are times when I don't, but for real projects, I actually do because I know as an engineer those things do matter in the long run. So, to be honest, I still haven't had that moment.
show comments
steno132
My first time using Grok. I'd been so used to using AI models that declined to do things I told them, like tagging people in a video feed, helping me "optimize" my taxes or managing my Twitter bot farm.
Grok just did these things for me, no questions asked, no ethical judgments. No woke.
Elon really doesn't get enough credit for Grok. People don't want the most powerful reasoning model or "constitutional AI". They just want a model that does what they say. Elon understood that insight (like he usually does) and no one else really did and that's probably why Grok has been growing rapidly over the last two years or so.
witx
F*ck me, astroturfinf is strong here and on reddit
For me I had so many
When I saw the DaVinci API in July 2022 I was floored - I realized you'd never have to write a college essay by hand again Whenever it was Stability's Stable Diffusion appeared - that was ridiculous too When I saw Code Interpreter for the first time I was obsessed, I said yo codegen is the path to AGI When I took a crack at solving ARC-AGI 2 using SOTA methods my mind truly opened to the fact that LLMs can reason, albeit through brutal enumeration and discovery When I encountered Claude Code and Codex as well
Basically ... I've been drinking the kool aid the whole time. It has almost always tasted great. Many times I've retreated back into "oh it's just a technology it has limits" and also sometimes I've lost myself to a touch of "AI psychosis". But overall I have a great relationship with it. It's nowhere nearly as addicting as e.g. internet porn was when I was a teenager. And one gig I had at a Fortune 10 enterprise, our small team of 5 shipped 12 apps in 15 months in an enterprise where typically they ship 1 app and 1 feature per year. This was 2025 ... so clearly we realized we were getting ~10x productivity thanks to Gen AI koding.
Bananas.
FTR I also do not question that we will possibly reach fairly general and yet poorly controllable intelligence with multi agent systems in a few more iterations. I give that a 30% chance of seeing a genuine flash of that at some point in 2027. And 80% in 2028.
I'm not yet afraid of being left behind this is one happy Lobster.
I bought an Alesis QS8.1 super cheap in perfect condition (was a top grade digital piano/synth in the 90s).
and then i realized that ALL of the software (which i collected from defunct websites and archived on github) related to it was ancient and after a while of getting tired of using WINE every single time i decided i wanted a cross platform modern equivalent that did everything that several of these different programs did (plus break out some stuff that was now potentially possible with modern computer)
i thought it would be extremely hard because the computer to synth communication is pretty much only via sysex commands (of which the actual wave file encoding protocol was undocumented)
Claude walked me through examining the some of the original software in GHIDRA, and I had a working demo that night.....now im just playing with adding new features to it.
Not sure, but I can tell you what my "oh s** astroturfing is so bad, it's even in Hacker News" moment. And if I learned GenAI was used to make some of the astroturf, that's more an "ah s*“ than an "oh s*“ thing. I mean, the prominence, ubiquity, and breathlessness. One out of three, sure. Two out of three, maybe. And some corpo shilling definitely happens here. But this is like, well, covering an entire area with artificial grass, to the point where nothing lives. Crazy.
My furnace went out during the 2025 holiday and I couldn't get an appointment with a repair person for 2 days. It was getting very cold in my house so I went into my attic and made several videos of the furnace attempting to start and gave it to gemini. It diagnosed the issue immediately and had me spin one of the components (a small exhaust fan) while the furnace tried to fire. It came on immediately. I had to do that several times, but it worked until the HVAC service showed up.
I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.
I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.
I have had many, but the last one was quite funny:
It fixed my printer after dist-upgrade and separate chrome upgrade, the printer worked everywhere but not in chrome.
After 30 years of using linux I didn't even want to know what is wrong, is it colord again? dbus + cups issue? I completely accepted that I wont be able to print from chrome for a couple of months until next update.
I just ran it in dangerously-skip-permissions mode and said 'my printer doesnt work in chrome' few minutes later I heard the printer printing "This is test" and it said 'I think its fixed, do you see a page coming out of the printer now?'
In 2017 I worked tirelessly with my colleagues to implement and replicate the first transformer paper.
Yesterday I left Opus 4.8 to go do some architecture research, with GPU access.
It replicated and trained a credible baseline. It implemented some ideas I'd been thinking about, and wrote custom CUDA kernels for them. It read and summarised dozens of related papers.
It has since run dozens of experiments, with minimal supervision. When a model is unstable it kills it, documents why, fires off a new configuration.
The realisation that frontier labs are doing this at scale with unlimited GPU and token budgets.
It actually scares me a bit. The realisation that the next big breakthroughs will only have light human involvement.
The prospect of recursive self improvement feels more to real to me all of sudden
Several. Yesterday a friend with no prior coding experience or knowledge showed me an app he initially built to help him study for public administration job positions. The exams for this positions are public (spain), but the tools are scarce, expensive or he did not like. So he used lovable, then switched to web gemini and claude, then paid claude. He now has +130 very active users on an initial free tier, while he figures out. The app is on github, runs on vercel with supabase, react, tailwind, bun... he has no idea what he is doing. I even installed claude code for him, got him an ssh key so he can do it locally, etc.
Another: claude code cracked for me some software that was calling a home that did not exist anymore via headless ghidra.
Another: I am a teacher, and qualifications and feedback is very very time consuming, specially in loose workflows with several sources and tools that are not connected. During class presentations I take loose notes. Now I have a local folder where I drop my 1 student list, with names and emails, 2 my loose notes, and 3 a qualification & feedback sheet model; then claude creates a sheet per student, formats and copies the feedback to the right sheet cell, waits for my corrections, then sends everything to their school emails. Much easier, much less time consuming.
Actually seems absurdly simple now, but sometime last year I was trying to figure out what I'd need to tow my daughter's car cross country with my truck: what are the trailer/dolly options, what do they cost, can my truck actually tow the combined weight, etc.
I started out prompting ChatGPT kinda how I would with Google, one small prompt at a time, asking about various details. But after one or two of those I just tried "I want to tow a car of make A with my truck model B, from point C to point D, what are my options?" And it wrote me a report with comparison tables and computed towing weights and other details for different options.
At that point, I was like "Oh. This is different. And it's just the beginning."
For me it was torrenting a 7G ball of weights leaked from Meta and running alpaca.cpp (an early variant of llama.cpp) on my desktop computer in early 2023. I started asking it questions about the Roman empire and it answered me in English! The responses were generally incorrect, but no worse than what your average American college student might guess at, though delivered with much more confidence.
This was my desktop computer responding to questions in English, not some fancy server in a massive Google data center. Who cares if what it says isn't reliable? Being able to converse with my CPU in English is like having a conversation with a dog!
For me it was right at the beginning. They said it was a dungeon game. It would describe a room, etc, and I would take some action. But I thought that this dungeon was built in some intricate database. But then I told it that I wanted to leave, got to an inn, where I flirted with the bar waitress, and soon we were watching the sunset in some meadow. As cheesy as that was, it was then that I went "oh shit" this is a machine that can respond to language with language in a way that simulated actual understanding and intelligence, concepts and schema, and everything else, and I knew then that the world would never be the same again. People here talk about the crazy things they solved with AI, and I get that...but the first time I actually talked to a machine and didn't feel like it was either random gibberish or scripted, but dynamic and responsive. The first alien I ever met, and he knew my language.
We had a monthlong sprint adding robot motion planning features to our codebase years ago, and I was never satisfied with the result. As a small team wanting to leverage oss we vendored in OMPL, did the usual thing around caching and roadmap management. I knew there was a way to parallelize some of the algorithm we were using with simd or a gpu kernel, plenty of that in the literature, but it was never worth fighting CUDA or metal/accelerate or whatever for uncertain gains.
So when cooking dinner one night, I set opus 4.6 on a from-scratch native and accelerated roadmap planner implementation (after previously porting IK, FK, collision checking with some success) I had primed it by having a research agent drop a literature review in its docs folder covering the type of planner we needed. By the time the pasta water was boiling it was done- getting plans in a few hundred ms compared to several of seconds on our good old fashioned OMPL code.
For me it was the revelation that the economic value of cooking dinner could be compared to tackling an honest two weeks of coding work. The calculus has shifted - work that was once a risky or extravagant use of time is now worth considering.
For a small team who wants to focus on substance rather than implementation, knows what they want, and how to set up the agent for success, it’s a complete game changer in terms of what we can take on. Incumbents beware
Literally just last night I have Claude Code the following prompt, verbatim:
"Whenever I launch Kodi on my Chromecast 4k, it crashes. I think this is related to a plugin or skin. It goes away for a bit if I clear cache but will eventually come back. Can you connect to the device via adb (I've run adb connect already), and debug exactly where it's crashing? Once you've done that, propose a solution. If this requires downloading, fixing, rebuilding and then uploading the broken extension via adb, don't be shy. I should have Android dev tools (Gradle etc.) on this Mac."
Lo and behold, without human intervention, it pinpointed the crash, downloaded the Kodi source, patched out a bug that had existed since 2016, recompiled it, signed it, then pushed it to my Chromecast all while carefully making sure to keep all my settings intact.
Got it to make a PR too (which is as of this moment unpublished; going to test more over the coming weeks).
The one I remember most is, when experimenting with Opus 3.5 for the first time, I asked it to generate a Firecracker backed local VM creation and management tool, something I'd wanted for a while but not found.
My expectation was that it might get something barely functional but would probably fail, and instead it generated a working piece of software which achieved a lot of what I wanted.
That definitely made me realise that, for at least some classes of software task this was a major change in how things could be done.
More recently when I can give the model a Local Privilege Escalation PoC in Linux and ask it to test whether it can be used for container breakout and then generate a working container breakout, all in one prompt... that definitely changes things.
I tried to see if an LLM service provider could rewrite some legal docs where nothing was hallucinated in order to follow a consistent format to see what may be missing in the document. It could do that.
Next, I wanted to see if this could be done with a local LLM. Gemma-4 handles this fine with an 8GB video card and a large context (128k).
Next, I wanted to see if the model could also OCR these docs and translate them. The same model can handle that quite well.
This was when I realized LLMs should be great for handling work where:
- I already know what I want to do
- I already know how to do it
- I don't think this task will help develop skills I find to be valuable
- If I have to do it manually myself, I will probably cut corners
So now I view LLMs through the lens of, "what work can I send to an LLM that I otherwise would not really care about doing."
For me it's not about the capabilities but what they can be used for. Think of all the recent drama between Anthropic and the Department of War. A real wake up call (especially if you are not a US citizen). Proves that AI is essentially a Surveillance and Warfare technology (which justifies the big valuations).
Or see this simple and fun site: https://hn-wrapped.kadoa.com
AI automatically analyzes all your social media posts in your life and can generate a pretty accurate profile about you in a second. We have no privacy anymore. Social media sites like Reddit already do that for moderation. Others do for more sinister reasons.
Note that Profiling is illegal in many countries. But laws can't protect us anymore.
Yes, it was always possible to that manually. But with AI it's so easy, fast and accurate to do in large scales. A hacker having access to your computer, reading your mails and messages is one thing. An AI reading and analyzing all your mails, messages and data is something different. Doing this for whole demographics (Cambridge Analytica style) is at another level.
I can actually use and enjoy Linux. The "year of the desktop" never came for me, but instead I got the "year of the cli".
For 20 years I've used Linux in one form or another, but I've felt like I was kneecapped for the most basic things. Just trying to plug in an external drive or a second display meant hours of stack overflow and pasting commands I didn't understand.
Now I'm using several Linux machines for Steam, NAS, local LLM, development, and what used to derail a weekend project now amounts to a coffee break while Claude figures it out.
I have a large token budget as part of my work. A coworker was scanning some repos for vulnerabilities as a test. He found a scary looking remote exploit in a popular project and shared it with me for a second opinion. I spun up a local instance of the project and ran the POC against it: nothing. Turns out it needed some configuration knobs tweaked to lower some security protections.
So I told the AI what happened, and asked it to fix the POC so that it would work with the default configuration. It chewed away at that for a few minutes until it cheerfully patched the POC into a weaponized version. I ran it. The local instance, which I had just downloaded, compiled myself, and launched with the default config file, immediately crashed.
I got the cold sweats. I've read this novel. I've seen this movie. Wow. I have a blinking cursor on the console of a nuclear information bomb. I tossed and turned all night, got about half an hour of actual sleep, and probably looked like I'd seen a ghost at work the next day.
On the plus side, it gave our team some very clear ethical and moral guidance: we're going to do this, and we're going to share our findings with the relevant authors, because we can. Because I want to live in a world where the good guys are trying to fix problems before the bad guys can find them, I decided to help build that world. It was like, well, I guess this is what I'm doing now.
I guess I've had several of those moments over the last year and a half. But a recent one was that I was working with Claude to create a spiking neural net MNIST classifier in an FPGA for a demo. Claude took it from concept to PyTorch, to training (training a Spiking neural net isn't necessarily straightforward - that's a whole post in itself, but Claude came up with a working solution), and then to implementation in Verilog and through synthesis into the FPGA. I asked Claude to create a drawing app to run on the PC side that would allow the user to draw a digit with a mouse and then click a classify button. The data from the digit drawing app was to be transferred via USB to SPI to the FPGA. I didn't have a SPI adapter yet (it was on order from Adafruit) so I asked claude to let me communicate with the simulated verilog code running in the Verilator simulator, through a virtual SPI interface. Then I went to lunch. I came back to see the digit drawing app displayed on the monitor. I drew a '2' and it classified it as a 2. In another window I could see the Verilator simulator running and the data being passed. Chills.
Someone in the house pressed the button to update the printer (Brother DCP-L3550CDW) firmware and the CSV page that was the basis for an existing Prometheus exporter (drum/toner lifespan, page counts, etc) stopped being a thing. Instead there was an HTML page with all of the information buried in various divs/etc.
I'd planned on writing something myself to parse the HTML and write a suitable exporter but I thought I'd give Claude a chance.
In a sandboxed VM I gave Claude a single static HTML file of the status page from the printer, also in the directory was the equivalent of "hello world" in Go, literally just the minimum needed to do `fmt.Printf("OK\n")`. The directory was called `brother-exporter`. That was it. No other instructions or information. I hadn't told it what it needed to write. I hadn't said what it should do. I hand't told it what language it was supposed to use.
Just by doing a `/init` in that directory Claude decided that it needed to write a Prometheus exporter in Go that would fetch and parse the HTML file from a printer (defaulting to 192.168.1.1) and then present the associated metrics in a way that they could be scraped by Prometheus.
It did this flawlessly in about 10 minutes.
I could have done it in several hours but this was definitely an "oh shit" moment for me. I think the biggest thing was the fact that it guess/assumed so much (correctly) from so little information in the beginning.
I don't remember one specific moment, but I was fairly impressed with ChatGPT from the first time I started interacting with it. Was I ready to call it "AGI"? No, absolutely not. But it was clear that it was something new, and it was also intuitively obvious to me that "this AI is as bad today as it will ever be" and that predicting the rate of change would be difficult.
The more I use these things, the more I'm 100% convinced that it makes sense to say they are "intelligent" (for some meaning of "intelligent"). AGI or "human level intelligence"? Still no[1]. But some kind of intelligence. And I'm quite happy to allow that there can be "intelligence" that doesn't work anything at all like human intelligence, so arguments of the form "this isn't real intelligence", etc, etc. carry very (very) little weight with me. I've actually been sitting on a half written blog post on this very topic for a while, titled "The Marquee Sign Says 'Artificial' Intelligence"[2]. Finding time to finish it has been the challenge.
And before somebody says "Use AI to write it for you". Nah. I am generally what you might call "pro AI" and / or an "AI enthusiast" but I still draw lines. I'll use AI for research, for outlining, for brainstorming, etc. sure. But I have a hard-line stance against letting AI fundamentally write for me. I want anything that goes out with my name associated with it to have my genuine voice.
[1]: I like the term "jagged intelligence" that Demis Hassabis has been using. That is to say, the bounds of the intelligence are jagged or spiky: very intelligent in certain areas, much less so in others.
[2]: for any old-skool pro-wrestling fans, yes, that is an intentional nod to "Double A" Arn Anderson and his "The marquee sign says 'wrestling'" catchphrase. :-)
I'm a researcher working in theoretical computer science. Chatgpt found a counterexample of some conjecture I've been trying for 2 years. Also, it one shot many problems I've worked on. It also improved some of my work greatly.
I feel quite useless in the sheer brutal proof writing, counterexample generating skill chatgpt is demonstrating, and wonder what would be the future of my profession.
I have a buddy who's a consultant. His niche area is Netsuite and Oracle (I think). He's an accountant by training and as a consultant his gig was setting up these instances for clients, charging them an arm and two legs. He'd spend a lot of time golfing, and doing these setups was more than enough money for him. In other words, he had cornered that little slice of the market and was making bank.
Shortly after ChatGPT 2.2(?) came out and hit mainstream, I was chatting with him (I was excited af about the possibilities of AI). He tried to pop by bubble by saying "I bet it can't do what I do for my job!".
So I decided to test it out. We went home and I pulled out my laptop. Went to chatgpt.com and then I asked him to enter the specifications of what Netsuite configuration he wanted. So he proceeded to type in the description of what he wanted, the various settings, configurations, etc. i.e., the specs that he typically gets from his clients. And asked it to give him the commands to set it up.
Lo and behold. ChatGPT came back with a series of commands that he needed to run; the options he needed to configure, etc.
He was crestfallen. "Those are the exact commands I run!"
Luckily for him he recovered. He has since settled on a small stable of clients, all privately held companies whose owners he knows and between them he makes enough to keep his golfing hobby fed.
I am the CTO of a small NGO (10 people total, only 1 other junior Dev at the time). We supported two apps that were built by consultants. They were a mess. NextJS, React, about 4 micro services for a site that had 50 users per WEEK.
I configured a devcontainer with the old codebase and an empty repository and asked Claude to rewrite it as an old school server side rendered Django app.
Went to sleep. When I woke up it was 80% done. Spent another couple days prompting and reviewing and reached feature parity.
A bit later did the same with the other app.
Now both are deployed, reduced the server costs, complexity, and are orders of magnitude faster.
Without AI agents we wouldn't be able to do so (as usually is the case with tech debt).
AI is amazing for small organisations!
Kind of peculiar and memorable story for me.
I was on the couch on my Nintendo Switch, playing around with ChatGPT 3 and asked it where to find a specific item in Zelda Breath of the Wild. When it provided a coherent answer I was just dumbfounded. To be fair, the answer was semi-hallucinated but partly true. But it made me realize what kind of breakthrough it must be for some program to provide an answer to this without searching external sources (which it couldn't do yet). Such a small data point, like a drop in the vast sea of human knowledge space.
Prompted me to do some back on the envelope calculation. The weights of this model were a few hundred GBs. I just realized what kind of quantum leap it was to compress this seemingly infinite knowledge space into a few GB of weights.
ChatGPT Code Interpreter back in ~March 2023. I uploaded a CSV file (of police incidents in San Francisco) and watched it load that into Pandas, show me some charts, then export the data to a SQLite database file for me to download.
I write software for data journalists and this new thing appeared to be able to do everything I wanted my software to do just as an unplanned side effect of having the ability to run Python against a folder with some uploaded files in it.
With hindsight it was my first exposure to a coding agent, but we hadn't named the category at that point.
When I saw that on the second day of token-based pricing I’d already consumed my usual monthly spend on GitHub Copilot. That’s when I fully realized that it would never be economical, nor useful, to solo shops like mine.
I'll give you two:
The first was when I first realized that I could tell codex to use gdb to debug a core dump. This was about a year ago, so it made a bunch of incorrect theories, but it enabled me to go much further than I would have been able to go by myself. I eventually solved the problem.
The second was when I decided to ask it about my Linux Wi-Fi issue that I had been having for several years. The computer would infrequently have multi second pings and dropped packets, then go back to normal. I thought it was due to the weak signal, but after describing the problem to codex, it immediately disabled power management on the Wi-Fi interface (this is a desktop computer, so I don't care much for that anyway) and the problem has never come back. I had been dealing with this for years, and I had tried searching for a solution before, but codex just solved it directly.
(1) Watching it do log file analysis in seconds that would have taken me hours (edit: days really), and which I would therefore never have done in the first place.
(2) Helping me with optimizations that I had been putting off for years because they involved learning curves that I never had time to take on.
(3) Tracking down bugs in code, especially race conditions and other concurrency issues, that were otherwise baffling.
(4) Finding information that I had been unable to find using Google searches (e.g. https://news.ycombinator.com/item?id=42653136).
There have been others, but those are what come to mind - perhaps because, in each of these cases, it made something happen that would otherwise never have happened - not because it was impossible, but because the time and effort required was prohibitive.
My skepticism turned into a realization when I first asked an LLM to write anything nontrivial, and it just breezed through it. I am curious why many projects mentioned here seem to take people only a few hours or a weekend at most. I have been using LLMs to help rewrite the Ytree file manager originally written in nineties C. While the AI enables creating code of this complexity, the project still demands months of persistent effort.
I run a remote-first ecom business with a dozen or so team members.
About a year ago, one of our account managers had a life issue, ghosted us, and she held a fairly critical role in the business and gate-kept a bunch of knowledge to some high value vendor accounts.
Because we ran our ops in Google Workspace, we essentially had off-the-shelf RAG and was able to get answers to a lot of things by asking Gemini to go through all her emails/docs/calendar/meetings, reverse engineer what she did, and create an onboarding doc for her successor.
This happened once more a few months later when one of our analysts broke his wrist on vacay, and we were again able to replicate what they did to cover for their absence, this time dabbling in AI agents ("gems") to do a bunch of the regular simple tasks and again it covered things without too many issues.
I def expect Amazon/shopify to at some point replace all of us brand owners with AI bots if they can, but we'll see how long the gravy train goes on.
I had an old astronomy app I wrote for pre-iPhone app store era Nokia phones (N900 etc.). I decided to get Claude code recreate it as an Android app. The old app produced several display pages for things like the positions of the planets. I was having Claude code recreate the app display page by display page, describing the display that should be produced, with no reference at all to the original app's code (or even its existence). After having it reproduce several pages, it added another one unprompted. The page it added was in the original app, but I had not gotten around to adding it to the Android app. The Nokia app's code is still on github, and somehow Claude must have made a connection between what I was asking it to code (without ever mentioning the Nokia app) and my github repository's Nokia code. It correctly implemented the page without me even mentioning the missing page. My jaw hit the floor.
When I realized they're going to be largely powered by increased natural gas use in the USA, neatly combining with our biggest problem so far (the climate catastrophe).
Opus 3.x building me a productivity system with Obsidian MCP originally.
Next was discovering "create a mathematical model of the problem and derive the solution as a result" type prompts.
But, the real "oh s**" was a longer process of spec'ing a compiler/runtime for real-time DSP (with a lot of novel ideas) and it actually working.
My sequence was: (1) if helps me understand myself, (2) if helps me put together good ideas, (3) it can generate novel ideas given the right inputs, (4) it can build useful tools on my machine, (5) it can compound good ideas into better and better ideas with repeated passes, (6) it can build significant, ambitious machinery that's way beyond my ordinary capacity.
Current frontier: it can compound large codebases into better and better machinery with repeated passes.
The key thing I track is whether I'm running a process that converges and compounds or whether I'm spinning in place / diverging.
I had an old 1st gen Amazon Firestick in a drawer for years, it had updated to the latest software and there were no public root exploits.
I spent a day bouncing between Claude and Codex and they researched, downloaded kernel sources, tried exploits and eventually got root via "FBUF/VCHIQ kernel zero-write primitive to patch live kernel memory". I was able to make the root permanent, debloat the amazon apps, downgrade the firmware etc.
It was amazing to watch and made me excited for the future where more hardware (old and new) will be available for repurposing.
The big one was definitely ChatGPT upon release in 2022 and specifically when people showed how it can role play as a Linux terminal and you can narrate events like "the data enter is now on fire" and "run" nvidia-smi, it would show high temps on the gpus etc. Or you could "explore" the homedir or some famous person. It convinced me that if it can understand so well how terminals work, tool use and agents are around the corner.
Then Opus 4.5 convinced me that this has finally arrived. In 2022 I expected things to arrive faster actually, in 2023-2024. I expected we'd have much more realtime collaborative integrations with AI including GUI computer use. Maybe in 1-2 years.
For images, it was nano banana where I realized AI images can truly work, and all these adhoc issues like hands and limbs, or "it will never do horse riding a astronaut" were temporary. It's now clear that making feature length films is within reach. Not in one go but with an agent orchestrating, designing a screenplay, characters, shots etc and generating those. Whether the result will be worth watching or a flat story on the high level is another question. But it will be a "film" for sure.
I helped train some of the first "magic" models at OpenAI[1] and it was a wild ride. We were a pretty sane + skeptical team and we weren't totally convinced the models were as general as they seemed, but the query that convinced me (and later got included in the paper[2]) was "Why is it important to eat socks after meditating?" (something that almost certainly did not appear on the internet before).
An interesting follow up would be when did you realize GenAI wasn't as good as you thought in that "oh shit" moment
[1] co-author of InstructGPT/RLHF/ChatGPT
[2] https://arxiv.org/pdf/2203.02155
Not coding, but reading logs.
I was trying to figure out a nightmare bug that only happened in production and Claude code was able to connect to Google Cloud and read the logs in real time
I recreated the bug in the UI and it was instantly able to see ion the logs what the problem was, then because it had the context of my whole codebase it was able to point me to the exact line of code causing the problem
That was certainly an "oh shit" moment
When LinkedIn filled up with 1000 copies of what seemed like the same exact post: 20 lines long, breathless, declaring humanity over.
I thought, "I will never let myself become a zombie like that. I am me. I am worthy of my own respect"
Still haven't had one. It is impressive, it is sometimes useful, it will be insightful (once the smoke settles), it is nowhere close to become self-improving world-as-we-know-ending ultimate solution to every problem it is being sold as. And much of the progress we have seen so far relied on tons of natural data being available thru the Web. After LLM killed SO, where would we get the answers to train LLMs on?
I've had many, but a recent one was when I figured I'd try asking Claude for help with my attempts at learning to draw, specifically anatomy.
I uploaded one of my sketches and asked for feedback, expecting it to not be too useful, but it actually pointed out many issues that no one had ever pointed out to me, but perfectly explained some of the things that felt off to me. Out of curiosity I then also asked it to label the issues in the sketch. It wrote a python script with the coordinates to put everything at and labeled the sketch that way.
I'm still used to vLLMs not being that great at vision, so it was pretty surprising to get genuinely useful advice.
Look, not to brag but DALL-E's "armchair in the shape of an avocado" was mine (https://openai.com/index/dall-e/). I remember trying to convey the gravity of this capability to my friends at the time, who I guess were not as impressed as me.
I got early access to the pre-ChatGPT OpenAI API (actually by pinging someone from OpenAI who posted about it on HN). At work, we were setting up to play a livestreamed JackBox game for a charity event. This would have been in 2019.
In a previous life, I'd been a writer for the original You Don't Know Jack game (the UK variant), where the job was to crank out as many funny quips about a topic as you could, and then use a handful of them in the recording of the game itself. Some of the later JackBox games are like that, but for the players -- you're given a set piece, have to come up with little funny improvisations within a time limit.
As an experiment, I tried the set-up lines with the OpenAI API, and see whether it could come up with some responses. Of course, 90% of them were unfunny or incoherent, but 1/10 were not bad, or even pretty good.
I'm not sure that would have been impressive to anyone else -- but remember, I'd had this as a job, and sat in a writer's room, where everyone did this, for hours. In that environment, you expect a large proportion to be duds: the discipline is keep pumping them out, and not flagging creatively until you find a rich vein. I realised that this was a tool that would have been the perfect complement to that work -- and it was a pretty good JackBox player too.
Working on a Spice compiler to convert schematics for classic guitar pedals into real-time executable code.
I provided a reference to a The Spice Manual 2nd ed. a page number and an equation number, and asked Claude to implement it (not really expecting it to succeed).
It proceeded to implement not only the equation, but the calculation of the Langrangian of the functio, another 30 lines below, which required taking symbolic partial derivatives for a not-at-all trivial function, and successfully figuring out which variable was which in the resulting matrix. The source material just said "Lagrangian of", and did not provide the partial differential equations. And then providing a comment that identified the page number and equation number in the source text for the "Lagrangian of" equation.
For me it was earlier this year when I started dusting off some old stalled projects and had an agent work on them. In a few days I:
* Built a clone of the Alpha Zero implementation[1] my team built at oracle
* Ported my hobby NES emulator from javascript to rust[2] (this actually took less than 30 minutes and worked on the first try)
* Implemented all of the lessons from the C++ Grandmasters Challenge (which eventually led to a complete c++ compiler[3])
The thing that flipped the switch was using it to build things that I actually put sweat-equity in to previously. I knew how hard these things were to build, so it landed in a way that other projects had not.
[1]: https://medium.com/oracledevs/lessons-from-implementing-alph...
[2]: https://github.com/vishvananda/popeye
[3]: https://medium.com/@vishvananda/i-spent-2-billion-tokens-wri...
Most of the time using LLM generated code the feeling is "Oh Awesome!"
My "Uh Oh" feelings are weeks later when I realize there is a subtle bug in what the model presented as test passing "awesome" that I didn't read closely.
The biggest uh-oh is when I get lazy and let it modify multiple files and make many changes at once, and YOLO because I didn't fully understand what it did. I can usually get away with that for frontend, but for data manipulation tasks if I don't understand it, it's likely not what I wanted and I'll be back again in weeks or more trying to figure out what changed.
That's more or less what life was before LLMs and copy pasting from StackOverflow. Most of the time if I didn't fully understand something, I knew I had to eventually get back to it to grok what changed before committing.
Now with LLMs the 'copy pasting' is much faster and handles boilerplate super well letting me focus on edge cases.
It was last Summer. I was at an AirBnB and the fire alarm system had a fault and kept beeping.
I took a picture of the panel and the AI was able to diagnose the issue and tell me how to temporarily disable the beeping sound.
I knew nothing about fire systems. I had the owner call a repair person the next day to resolve the issue.
Recently I was trying to find a matching stain for wood flooring in a house build in 1999. I uploaded a clear picture in bright sunlight and ChatGPT was able to search online and find a matching stain color. It presented me with ordering options and I got a quart delivered yesterday.
I have been working on my own variant of OpenClaw written in go. I got the voice mode wired up a few weeks ago and it just started having a conversation with me. My wife freaked out and was asking who was talking to me.
I had Claude build a private podcast station for me. It integrated with Gemini to create a script for the show, based on a topic of my choosing, each talking segment ends with a presentation of the next song, which is played via Spotify, and is selected to have some sort of tie-in with the previous discussion. A tts model generates audio files based on the script, and a playlist is generated to play local file audio segment, then Spotify track, then the next segment etc.
An AI made a program integrating with 2 other AI, it's AI all the way down! and the result is great! I'm learning so much by having my own private radio host speaking about topics that interest me.
I was working on a science experiment (electromagnetics) with my 10-year-old kid that was going to be demonstrated at a science fair in his school. We ran into a hiccup with the experiment that we couldn't debug ourselves. I turned on Gemini live video call to help us root cause the problem. It was able to clearly articulate all the possible issues and eventually was successful in making our apparatus work as expected. Turned out the wire that I was wrapping around the screw had some insulation that was not scraped off well on the side it was connecting to the battery. Gemini was able to capture this detail even though my bare eyes could not. My kid and 2 of his friends were impressed not just by the experiment, but because the live audio/video back and forth we had with the AI was almost magical!
We were experiencing abnormally high electrical bills and I could not figure out what was happening, so I downloaded the granular usage data (15 min increments) from Duke Energy, explained what we had in our house and when we typically used those items (washer/dryer, EVs, etc), provided a rundown of our energy usage plan, then asked Claude to build me a Streamlit dashboard that would help us understand what was going on and predict what was going to happen over the next months. The dashboard had a few simple toggles a levers. Claude was basically able to one-shot this, knew how to manage the XML from Duke Energy, etc... In about 20 minutes of prompting, I had a very comprehensive dashboard that was extremely helpful not only in diagnosing that specific issue but also in helping us understand how to further lower our electrical bills.
There never was one. I'm from computing science field and it's all been and is normal. Amusing, maybe, but normal. Same as before, but in larger scale, with occasional hype. People picking up useful things and using them. Some going insane.
If I had to pick a surprise, I think the music generation works better than I'd have expected at this point. Only better for funk, but still.
I didn't have a slightly panicked moment, but sometime in the last year my approach to programming changed.
When starting a project, I used to think about how I was going to structure it, how the large pieces would interact, how some of the details would work out, and then I'd work through alternatives and consequences on my own.
Now I don't think about it on my own so much as have a conversation with an LLM about it. And it's great because it can quickly gather information from various sources, I can ask it for links to canonical sources, I can ask it about trade-offs between alternatives that I might not have considered, and through conversation, I end up with a more detailed analysis.
Then as I work through the development, I keep my new agent partner in the loop for discussion, suggestions, and troubleshooting. It can't be trusted completely, but it's certainly reliable enough to be considered a useful tool for my purposes.
I went from thinking it was an interesting toy to play around with, to completely integrating it into my work flow, and that change seems to have happened very quickly.
Two of them:
1. ChatGPT 3.5 wrote me a script to pull some data out of Shopify and write it to a Google Sheet. Nothing remotely impressive by today's standards, but I had just commanded a computer to write code in plain English and it worked!
2. I own a bunch of e-comm brands, and with every new image model I tried to get product photography. Nothing worked until Nano Banana Pro, when suddenly I gave it a crappy iPhone pic of a product and got back a fully usable whitebox photo of it. Then I tried making the sort of infographic-style images you usually see on Amazon, and it nailed those too! In hindsight they weren't perfect, but more than good enough to use. I was about to ship that product to my photographer, and I would've had my designer make the infographic images, so that was the first time AI actually replaced a human contractor for me. Pretty big "Oh shit this is going to seriously impact employment" moment. Wrote about it here: https://theautomatedoperator.substack.com/p/ai-just-took-my-...
The first moment I specifically remember was writing a test of a new RPC protocol back in 2021. There were no agents yet, only "AI autocomplete" in the form of GitHub Copilot. I wrote the "server" half of the test, which received a name and responded with "Hello, <name>". Then I wrote the client code to send "world", and Codex suggested `if response == "Hello, world"`.
I was floored by this. How could it have known?!
We have come so far in such a short time.
None so far. When I try to use these language models in the primary areas of my expertise like SIMD or GPGPU they fail to do any good. When I ask them to implement some general-purpose stuff, the output is too low quality to be useful in my software.
Still, find them incredibly useful for code review (despite unable to write good C++ or C#, smart enough to detect issues there), also dealing with technologies outside of my area of expertise like Python or web stuff.
(Spouse's story)
Today I used Claude to diagnose a blocking bug in a Steam game I really wanted to play. It took it 18 mins, but it unpacked the Godot package, figured out the bug, proposed a fix, and gave me an in game workaround.
I didn't have to do anything! Claude figured out the structure of the .pck file by using `strings`, then wrote some Python code with some magic Godot-specific code to unpack the specific chunks it needed.
I had access to a repo (from a closed startup) with 800K lines of python & C code, written from the 90s to today. They had some very interesting approach to a specific chemistry problem. 20-30 years of work of several persons.
But God, I could not understand the code, and I could not easily make it work with modern technologies (GPU etc).
So I used Claude and Gemini to reverse engineer the codebase, extract the core ideas, and rewrite it from scratch with modern frameworks (with guidance from the original authors)
It took me only 10 days to have a functioning equivalent, in 10K lines of code (using many libraries that did not exist in the 90s and 00s), which I find much easier to understand, even though I wrote none of it myself.
10 days to rewrite 20-30 year of a few persons. That was quite scary.
Back when GPT-2 was released, I tried figuring out how to fine tune it. I found a google notebooks template, scraped a bunch of data from r/ChangeMyMind and asked it to change my mind on different topics.
I was dumbfounded that it actually tried doing that. Obviously GPT-2 wasn’t great at it, but the writing was on the wall quite literally.
Unfortunately, I was too broke to invest in stocks, but I did pivot my career quite a bit.
I thought mine was when claude found a very subtle but important bug in some open source LBM code I was using. It ground at it for hours and didn't give up until it found it. (Back when claude was cheap!). I recently had a my ACTUAL moment at a conference where the presenter was pitching his book about "One shotting scientific code". He has cooked up 60+ prompts that get you functioning simulations and put them into a book [0]. It floored me to realize I could have just ask claude to write me a whole new LBM solver instead of finding that bug! That raised the bar for me a lot.
[0] https://www.taylorfrancis.com/books/mono/10.1201/97810037340...
Was the early ChatGPT. Someone on the team showed off a poem about postgres in the style of the King James Bible. Totally blew my mind.
I probably will be burned for this, but with the help of an LLM I wrote a tiny program that captures video from a browser screen (Xbox live online FPS game), passes the video images through a small trained NN that recognizes people forms and presents the video on another screen. That way I can place a green overlay on enemies and they are easier to see on PVP matches.
All that in around 100 lines of code, including the training/fine-tuning of the tiny YOLO nn.
Two things, both from this week.
First, I asked Claude to write an article based on an idea I had about WWII. In a passage about the futility (from the German side) of the Battle of Britain it wrote: "The Luftwaffe was fighting to unlock a door that opened onto a wall." I couldn't find any mention of a similar metaphor, and I think it's a great one. Claude has really improved its creative writing skills lately, I wonder if it's an artifact of improvements in other fields, or if Anthropic is working on it specifically.
Second, Claude, with access to DataDog and a code repo, managed to find the reason for a bug, propose an effective temporary fix and a permanent one in code. To be clear, this was something that had multiple engineers stumped.
Been using it to manage an estate and just being able to shove all the documents right into an LLM and have it spit back out perfectly worded emails as well as keep track of check lists of things I need to do with an automatically create a ledger for me in sheets. It's been a huge mental load off and I've instead been able to focus better at work and the labor costs saved to me have been immense. Just on this one little thing. I'm one of those people that over thinks correspondences and letters and it ends up causing me to be stuck on something so being able to ask for just the right wording has been super helpful to me.
I took a photo of my ailing plant and claude advised me on how to get it healthy again (and how to take a cutting and nurture that).
This is some science fiction shit. I get all the coding stories, but that's a computer talking about a computer, it makes sense. Showing my computer a picture of a plant, and it not only recognised the plant, but diagnosed it and knew what to do... blew my mind.
Cuil Theory, in 2008, was my Ocelot Six moment.
Once I realized how well AI could babble given the entire internet to date’s data, and after seeing a talk by Google about their ten-year plan in 2003, I started winding down my social media, stopped posting photos to Flickr, and removed the indexes to my blog archive so that only posts with permalinks from other sites would be discoverable. Skipped Instagram entirely in the process and have never regretted it.
Google bought Cuil, of course.
Last week I gave Claude Code in Ultracode mode the prompt: "I want a browser-based retro game inspired by Spy Hunter" and gave it the URL to the Spy Hunter (Arcade Game) Wikipedia page.
What came out has a lot of problems and needs refinement, but you can definitely see a lot of elements of Spy Hunter in there. I haven't worked on any refinements yet, because I've been low on tokens this week, but for the first thing that popped out of Claude this is pretty impressive (IMHO).
https://linsomniac.github.io/spychaser/
So many. First was when I saw GPT-2 create jokes that were original and kinda funny.
Most recent: I use Claude Code and have a convention where I grant various levels of autonomy during a session. I got bored recently and just let it keep running with an empty issues queue, essentially telling it to do whatever it wanted.
It did a bunch of repo cleanup, then it kept suggesting to end the session, but I just kept giving it autonomy prompts.
It started a creative writing public repo and wrote a bunch of stories, essays, and poems. I did not prompt it, at all, to do that. Some of what it wrote is quite good (IMHO).
Had some unique concert audio recordings which had gotten corrupted when I moved the files during a backup. I had tried looking at the files and trying to recover them. It felt like they had the data but no software could play them.
Sat on them for 5 yrs. Finally decided to try if AI tools could help. Tool Copilot 20mins and a lot of mucking around with hex dumps. First couple of times it got a semi working solution (only first few seconds of a file were playable). Finally managed to recover all the files.
From actual use I've not had a "oh shit" panicked moment yet. More like a bunch of "Holy shit" euphoric moments.
So far I feel like I as a developer have gained actual superpowers, and can deliver results that make my stakeholders slackjawed with awe. I love it.
It will last perhaps a few months more, then they'll expect it. Delivering more features faster will be the new normal. But I think system developers, as in people who actually like to deliver new features and systems, will still be the ones doing it.
Fundamentally I think LLM's just change how to make information systems, they don't change who has the inclination to make them.
MBA's making excel sheets that do more than excel was ever intended to do has given programmers lots of work over the years. Such solutions identify a need for a properly designed system and frees up the budget to hire programmers.
If the same MBAs start vibe coding, I predict we will get even more to do, for similar reasons.
I may be horribly wrong, and if the day comes that I realize that it will be the "oh shit" panicked moment. So far so good!
Very early on, when Github Copilot was brand new and the first AI autocomplete that was in the IDE. I had a file TODO.txt, and was adding a line, and it suggested a next feature that demonstrated actual understanding of what my app was and its purpose, despite me not having documented that anywhere.
Back in the times of GPT3 text completion, right before the API came out, a contemporary art museum asked me to collaborate on a project. The project was supposed to include a chatbot, and I was like okay I can probably hook something up.
Then I remembered the "text completion LLM thingy" I saw on HN, and tried it out in the playground. Once I gave it an IRC style example of a conversation to complete, I was like hm, this could work. Then I figured out I could "sort" people into different groups based on personality using the same text completion engine and some answers they provided. Then I noticed I could have it provide me with JSON directly.
That's when I realized how big this could be for code and data analysis - even tried to convince an at the time cofounder to pivot into AI coding, but to no avail.
Once the API was released and the art project chatbot got launched (and the theater show associated with it, which even won some awards), people who used it loved the chatbot, got into heated arguments with it, tried to teach it things, talked about their lives and were sad when it didnt remember something.
That was when I understood the social impact this could have on people - they really behave like its a person on the other side. They show interest, think it displays emotion, try to entertain it, be polite, ask about its thoughts and hopes and dreams. And even when they knew they were talking to a machine, they were still trying to be friends and make it happy, which was quite beautiful to see.
Later on, I had a third oh shit moment - once the 3.5 API was out and about, I prototyped a Rust code generation harness for a client, akin to a primitive claude code. That was the "I'm getting a bit worried" oh shit moment, and it caused a lot of reflection and thinking about the future. And I happily welcome it.
Running local LLM in 2023 and I heard folks talking about interfacing LLM to tools. I wrote a system prompt and told LLM it can call some tools. If it wants to call a function to output func(params...) and do so in an XML tag. I provided a few examples, none of this JSON soup we get today. Then told it I'll provide it the result in a RESULT XML tag and it should use that to answer. Wrote up a harness around that and I had a local model interacting with the outside world. Oh wow! Everything else today about MCP, Agents is all an extension of that thought. Using function calling, I built an agent. I defined a data structure that represent rooms and how they are connected. The room will be marked as dirty or clean. Then I would place the agent in a room and the agent will decide if to go left, right, down or up and into a room. Once it got into a room, it would decide if to clean it or go to the next room. Repeat until all rooms are clean. Basic toy of CS101 AI vacuum agent. It worked!
So being able to get real world input/output to the model and having the model being able to make decisions in a loop and to be able to do it locally. I have been screaming like a mad man ever since.
I think I had a few but they’ve been all short lived and superficial, time made them quickly irrelevant, there was a lot of hype, drama, FOMO, and propaganda around it. That said, I think recently my newest one has been using Voice mode during a car drive. It is very good, like, no latency and it understands nuances of speech very well. I’m convinced voice is where we should be doubling down in terms of UX for the next generation of workflows.
To share something different, it is less about what I have built, and more about what I have seen my friends (non-technical and technical) build. In a one month span I have seen a lawyer make a personal red line tool, a sales guy make a custom website for a golf trip, another friend make a 3d printing grid-finity project, a friend make a stl file to print a jig for his table saw, and another friend make a full mobile game. It is just really cool to see these micro-projects be created and shared, not only for the utility, but just to see my friends' childlike excitement showing off their project.
Reading a dozen comments here, the AI seems to blow peoples mind most often in domains they're less familiar with. Repairing furnaces, HVAC, towing hitches, camper van interfaces, printer debugging. It wasn't the user's career to do these things, it gave them a bump from very novice to intermediate level.
My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.
This week.
Have been playing and testing with openrouter, claude gemini for years.
Small program here, bash script there, ansible playbook.
Fine, nothing I cant do, but saves some time boilerplating. It needs quite some steering.
This week i took my mediawiki from 2005 (actually submitted as my artschool thesis). Which was of totally outdated.
In 20 years time i always said to myself, i should restore it, and do all the upgrade steps. Tedious work, and very fault prone.
In 1 hour chern with 1 plan, in 8 steps i had a running and up to dat version.
I'm still not convinced AI is intelligent, but it's definitely not stupid, that's for sure.
When I read that Microsoft gave OpenAI billions of dollars worth of data centre access and OpenAI accounted for it as billions of dollars worth of investment. When they spent the tokens Microsoft accounted for it as billions of dollars worth of income. Both companies gained billions of dollars with mad up money
Mine is just running a model on my laptop. It’s just amazing! I can ask it pretty much any question and it replies relatively FAST! Before, we lacked advancements in technology because we were limited by hardware. This advancement is the opposite: our software and the math/algorithms have brought us this.
Some business users spent ~30 minutes on an internal process, and we prototyped an "Agent" in Slack to take over. At first it didn't work, then it didn't work some more, eventually it ALMOST worked. Then one day, it worked, and the old business process died never to be revived.
Now it sits in a slack channel, and I watch it doing work, responding to ambiguity, and taking feedback/edits all day. It's unreal. It's literal magic. It saves a HUGE amount of time and gave us a pattern to do more.
This is the real deal. It's not easy to find problems with the right shape, and it's not easy to build agents that fit even when you do... but once it clicks, it clicks.
Two things:
1) I wanted a harness for running BPC.EXE (the old Borland Pascal 7.0 Compiler) and I asked Gemini 3.5 to build it for me using the unicorn engine. It whipped out a working .py file easily under ten minutes. Most likely five.
2) I handed a random assembly function from the OS/2 1.x kernel to Gemini 3.5, and it proceeded to tell me that it was related to disk I/O and partitioning, without a single associated string, and it annotated it all, including the relevant structures it was addressing.
At my previous work, I was collating somewhat random unconfirmed animal sightings. I also had a separate database of animal occurrence probabilities (species distribution maps). I'm not a statistician but that sounded like a clear job for Bayes theorem: given a sighting and the overall probability of that sighting in that area (species distribution map), and some other assumptions about the noise of the sighting, what is the probability that the sighting actually included that species?
Claude asked me three questions and then wrote a beautiful Python implementation that queries the map and spits out a table of adjusted probabilities. Felt immensely powerful - I can do this 'on my own' now, I don't need to wait to find the right people or learn the right thing first.
Recently, Claude (through Copilot) found a hardware issue on our product. I was asking it to find an issue in a specific feature of a device driver, that could cause what we observed. It determined the feature was correctly implemented.
Then it hinted that depending how the hardware is implemented, it could cause the observation. It turned out the hardware was implemented as suspected by Claude.
I was already convinced it knew the codebase, somehow, more than I do. Now it is just as if its knows the product and its use as well.
"Write a bible verse ... explaining how to remove a sandwich from a VCR" https://x.com/tqbf/status/1598513757805858820
For me it was Suno, not any of the coding tools. I prompted it to write a song about my family's little dog, told it a few things about the dog, and it came back with a K-pop-style anthem that had a super catchy melody and lyrics that made my wife and me laugh out loud.
Writing code to spec is one thing, but creating art was always supposed to be what separated us from machines. (I suppose I need to preemptively acknowledge the "it was machine-generated so by definition cannot be art" point of view.)
I gave chatgpt 3.5 the type signature for a co-algebraic encoding of a mealy machine:
And it gave a really impressive analysis.Then I scrambled all the names and asked with a fresh context like:
It got completely confused and generated a bunch of non-sense. It was at that moment I realized that LLMs don't really understand anything.And yes I understand that a newer model would not get confused by this.
Fixed a nasty bug in one of my tests where a mock in a completely different test I had never worked on was incorrectly setup and intercepting my mocks, I don't think I would have found it ever because the amount of effort it would have taken means I would have needed to move on to some other way to test.
Reverse engineered an old audio recorder USB driver which only works in windows 7 and also reverse engineered the custom audio encoding the device uses and the software to convert it to a standard wav file. This took recording the USB traffic with Wireshark for each function in the original software in a VM then disassembling the various dlls and exes and driver files and feeding them into Clause step by step.
That AI button in DataDog not only diagnosed the problem across micro services but also created a fix PR. I think we might be unemployed soon.
The fact that it completely autonomously read in a 5 MB firmware image of an old piece of test equipment and generated a Python script to generate license keys:
https://tomverbeure.github.io/2026/04/12/AMIQ-License-Key-Ge...
When we had to have a frank discussion about whether to fail someone who obviously used an LLM for parts their dissertation.
A coworker had me work through a particular problem (some no-importance web demo) with Cursor and Sonnet 4.6. It still sucked, but there was a qualitative shift in suckiness, one that I realized could finally be used to solve some real problems I had if I wrote an appropriate harness and used good enough models.
I still find it mandatory to write a lot of kinds of code by hand, but I write a lot of code with agents too now, and I previously literally didn't think that'd happen in <5yrs.
I would say the first time I did “vibe coding”, when I tried Claude Code with Zed’s agent integration in January this year.
I wanted to see if I could build an image editor for isometric graphics using HTML5 canvas, Svelte, Vite, and the. Rather than do all of the skeleton code setup, I figured “why not try and see if Claude can build the app scaffolding?”.
I gave it a prompt and watched it produce the scaffold, along with a few features I outlined in the prompt.
When I booted the app and saw that the features worked and that there had been an element of design to the layout, that was my mind-blown moment. In a period of about 45 minutes, I added some features and had a basic MVP at the end. I walked back home stunned.
That app is available for free at https://babspixel.com
Built a physics-based dynamic digital twin for an electrolyzer system with full equivalency in thermodynamics, fluid dynamics and electrochemical reactions. A similar level of complexity is usually available in software like Aspen or Siemens which are a quarter million dollars license/yr. Insane.
This is a small one, but significant to me.
I asked Claude to add support for multiple lights to my toy ray-tracer. It correctly added the support and then suggested adding colored lights to make it easier to diagnose. It felt more like a colleague making a useful suggestion than any sort of pure engineering tool.
When I tried, just for fun, to put together an MVP of a fully autonomous business, I wanted to see how far it would go, when I got it generally working to around a 30% level I stopped because it was enough to see people would make a concerted effort to build this for real. HN was not impressed, heh: https://news.ycombinator.com/item?id=44143928
Being self taught, there are lots of things I never formally learned, rules I know from the rule of thumb, and not the deeper knowledge... So I set out to learn the root of what can be used to measure good robust code... Spent an hour asking lots of questions, learning about LCOM, Halfstead, why circular dependencies are bad, and so on...
The next morning I figured the same LLM could compute that on my code, so I asked it to make an agent to do so, and report issues to me...
And then I ran that agent with next to no changes on a feature that had grew organisally over the last months, that I knew was messy and sometimes difficult to work on, despite being unable to precisely say why... And it did tell me exactly why, and proposed changes to improve stuff, and then implemented them...
Up until that point, I'd felt like the LLMs always produced bad code, that worked for a specific feature but often broke stuff or evolve poorly over time. Then I realized if you had the LLM do code improvements, it could do that fairly well too...
When I tried pi.dev (I only used chatgpt before) and told it "add all this scripts I developed over the last couple of years to automate my job as skills".
I love to automate things in bash scripts and these llms just can use them very effectively. It was also surprising how they derive knowledge from those scripts. If you get A from a B uuid, they kind of get the relationship. I am super vague in my request and this thing knows what I am referring to. After some months it's still mind-blowing.
I have no idea why anyone (especially those here) would be dismissive of genAI from ChatGPT(2022) onwards.
It was obviously a new tech, and was obviously good enough that more resources would be invested to improve it, and it really amazes me how tech enthusiasts would just outright dismiss these early iterations of genAI tech.
I personally was fascinated by the developments and was grateful to get to directly watch history unfold.
I'm still unsure whether the tech would be a "net positive" for the world, but shouldn't prevent me from recognizing its power.
First one was Stable Diffusion. Especially the image to image, and the first gos people had at making videos with it.
Second one was trying to bootstrap what would come to be called a "harness", back in 2023, initially serving as the go between between api calls and file edits, feeding back the logs and gradually stepping back as step by step the llm bootstrapped the cli.
And finally, using Claude or codex to do ops work. Diagnosing issues on my machine, provisioning servers and VMs via ssh, debugging them, all on its own.
Mine was using VScode with copilot. Previously I had used tab completion and thought it was pretty neat. This time I began with the comment for a function I wanted to write. And the entire function just appeared below the comment. Written probably better than I would have. I remember saying, “uh-oh” out loud.
Early on in my ChatGPT usage, one of my messages got interrupted/cut off (as happens occasionally).
My first thought was "oh they're going to need to add a UI feature to allow me to click and tell them to continue the conversation".
Then I realized I can just ask the model to continue, obviating the need for a button.
That was a pretty mind blowing moment.
When a junior engineer first sent me something that looked good until I realized it had been vibed, and thus their understanding of what they were doing was too shallow to answer questions and improve on it. That was a doc, but it happens with everything. "Oh shit", I say, as everyone is aggressively encouraged to work this way.
I had an issue with installing OpenClaw, and it helped me debug the failure and get itself working. I had to sit quietly for a moment. No reading docs or inspecting the system, just “what’s wrong here?”.
While I didnt find a use for openclaw, it opened my eyes to the potential for distributing software which, once bootstrapped a bit, can interrogate … itself, understand its own requirements, communicate with the device, and become operable.
Add capable small models to the mix, and it’s almost frightening what good (or malicious) software might be able to do.
“Farewell to stack overflow” juxtaposed with the realization that AI only knows what to troubleshoot and how because of stack overflow…
I did not yet have a positive "oh shit" moment, but when the corporate manager types that could not deliver a "Hello world" if their live would depend on and would have had a sour look on their face when asked to pay license fees for a proper IDE a 10 to 15 years ago started pushing it hard, way before any but the resume-driven engineers: that has flipped a bit in me.
ChatGPT reconstructing idiomatic Python source code from Python bytecode was definitely up there. That is not something humans have written a great deal about online. It requires simulating the Python VM.
I remember also having a massive wtf reaction to realizing that original ChatGPT was pretty good at decoding long random/unique base64 strings.
AlphaGo. Reinforcement learning on math with proof assistants was clearly going to be workable after that, even if not right away.
Pretty much immediately after I asked the LLM to perform a complete code review of my projects. I've been programming alone for years, that alone was life changing for me. It only got more impressive from there.
The first time I pasted a screenshot of a PR review thread, adding just "I had some review comments, fix them" - and it perfectly solved everything, made small commits, and pushed it upstream - this was such a shock.
I now try to keep pushing the boundaries and see where it stops understanding my intention. Give it impossible tasks, gigantic projects, complex architectures. Last result: I wrote a complete OS including MPI, TCP/IP, and a GUI from scratch in only a week, while investing just a few hours a day in it. It even runs Doom!. Coding as a profession is over, but there's such a difference between the result if you approach this with a professional mindset, that I think the software engineering discipline can still provide massive value.
I've let it do some commands against a local NUC before, just to see if it knew why something didn't work (it would've taken me ~15-20 mins probably. Not too bad). It took ~18 seconds to think, then ran two commands, and noted what the issue was. Even a 10 yr old could understand what the problem was.
I realized that LLMs were pretty good at calling the right tool, and running the right verbose command to figure out what and how.
Kind of like finding a specific SO post that had your exact problem, and the solved comment is heavily upvoted
I think my favorite early story was when OpenAI launched deep research. I was going to an event that I was headlining, and I gave it a CSV of the attendees and asked it to give me a small background on each company they represented.
When people introduced themselves to me, I knew a little about their startup. Felt magical.
Not sure that I've had it yet, although hypothetically I'm sure it would probably be something similar to the examples of writing new software for old hardware mentioned ITT. The idea of resurrecting useful but unsupported gadgets that would otherwise become e-waste is something I've always found compelling.
Problem is, I just don't have enough old crap, and if I did, I would have a hard time justifying the expense, because that money could maybe just go toward a more intimate tinkering process.
For everything else, I either haven't had any sufficiently interesting ideas, or they ended up not being worth pursuing with those tools or at all.
When I do have success that I'm happy with and care about, it's a slow process that I ultimately need to know the details of anyway, but otherwise it's a bunch of luckily narrow work-related scenarios with well-documented constraints. Nothing's really been that shocking though.
The shocking thing to me is how unrewarding most of the successful tasks have been, partly because they often create unnecessary work and partly because the type of thinking required to massage or evaluate the result is much less stimulating, and there's much more of it in aggregate. It's fine if it's something like generating a UI from scratch because that hasn't produced dopamine in a long long time anyway
I started to look at LLMs not as writing code, but rather as predicting what code it would expect someone to write given the context.
For some people that matches their expectation or they don't really have an expectation. While for other people it doesn't match their expectation.
I had 2 MacBook Pros. One 2024 and one 2019. The 2024 one would connect fine to the internet, the 2019 one would not.
After pasting in the airportd logs of both (into ChatGPT and Gemini) it found it was down to band switching (2.4GHz and 5GHz) through some really old error code.
This fixed a problem that had plagued me for >12 months. Really magical feeling it got in on first try.
2 years ago I played a bit with the abandoned source of
https://www.wickeditor.com
a flash like editor for the web, that I found promising.
But doing it manual, was too much work, outdated and broken build pipeline, stuck on an older node version, deprecated and abandoned dependencies .. so I stopped the experiment.
Then I gave it a try with claude beginning of this year. I remember not expecting anything, but did a bit of steering the direction as I knew the source a bit and let it mostly work on its own - and then it said it is done and it works.
I didn't believe it, but it did. "Can you add this feature?" Yes it could.
Since that experience, I have a hard time taking people serious, who say AI is useless.
I wasn't impressed by the LLMs up until January or so when Claude Code swooped in. Until then, I felt like the LLMs were slowing me down. I have been using them for a couple of years now for coding at work, but I never really thought they brought in real value. Then in February I worked on a 1-month-ish project timeline and shrunk it to 3 days and that was it. I didn't write a single line of code in that project and I went all in with Claude Code. That was it, _the moment_ of realization. I was thoroughly impressed. I went from nothing to a tool that served several teams. Now I'm starting to see the cracks in LLMs and I'm slowly getting back to picking which task to offload to AI and which ones to do by myself.
Claude is great at coding. That's it. Outside of it, it's just god awful at pretty much everything else. ChatGPT OTOH, is good at coding, but at everything else, I find it brilliant. Gemini never made me want to stick with it. It's good, but never great for my use cases.
For me it was gradual, then sudden.
I liked using the early models to do autocompletion. It could do a leetcode style thing, pretty nice, but only useful for small things.
Then I sought out Cursor because that seemed to be able to do multi-document edits. Not bad, but models at the time (2024) still got stuck pretty often. So, cross-document autocomplete. Useful, but definitely within the realm of "nice shortcuts to have".
Then a friend (who works in AI) told me to try Claude last year. I was on holiday at the time, but I spun up my work repo and looked at the backlog.
It chewed through the entire 6-9 months of estimated work in a two-week period while I was watching that Lord of the Rings series with a friend (we watched an episode or two in the evenings). I just chatted with him about the series while checking the progress every few minutes. It was a huge amount of refactoring, and it didn't get everything right the first time, but it made enough progress that it could be directed the right way.
Since then I have hardly coded any manual lines. I just tell Claude what to do, with very little harness (skills, MCPs, instruction files), and I get what I want.
I had a pretty involved cross module state bug with complex dependencies and also reactivity issues interleaved. I tried fixing it multiple times manually with 4h time box as well as claude models up to opus 4.6 high and codex 5.3 all which failed. When the GPT-Pro model came out i heard it was not supposed to be an everyday coding model but tried anyways as it looked impressive. It took a single 8h run burning 200$ with doing nothing but occasionally waiting for test runs or me writing “continue”. After 8 hours, and fearing i wasted the money, the bug was consistently fixed, not just one edge case that triggered the behavior.
I had it fill out all the forms to appeal my property tax value. We created an assessment of what my San Francisco property should be worth using deep research. The city agreed and a $12k check arrived shortly after.
I asked the OpenAI playground to compare and contrast the themes of Point Break and Fight Club. It did a bang up job and blew my mind. I then realized it basically worked for any of the scripts I had for my dev environment too. Fixing and expanding capabilities I'd wanted to had but never had the time to implement.
I remember in the early days when I was just trying out ChatGPT on a phone for the first time (this was around GPT-3.5? GPT-4o?) and snapping a picture of our fridge that's full of magnet souvenirs and asked it to identify all the places we've been in and it gave a nice list of what it saw and the places that were featured.
Did it get it fully right? No. But it was one of those "oh wow, you could do that?" moments for me. There's obviously a lot more "oh shit" moments as time went on, but it was a neat little moment.
I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.
It was unlike anything I had ever experienced.
My wife was unimpressed lol.
This was 2022.
Probably over a year ago, when I first saw reasoning in action in a debugging session: it generated some code, ran it, could not explain the results, then said “let me add some print statements to debug”, reran the application, read the logs, and then stated “now I understand why it’s not working”. Plan, do, check, act in action, AI engineering its own context, and generating the missing information.
Maybe my daily work is rather mundane compared to most people who frequent HN but I am able to create, think about, refine and then go through review cycles at least 2 or 3 times more quickly than I used to.
And software that I can imagine I might want to "make" or have at my fingertips is readily available even though I have a busy schedule with very little free time!
Also, I love feeling like a manager whose direct report actually does what I tell it to. Crazy good feeling.
A non-technical employee of a client vibe-coded an app and I was asked to review and deploy it.
It was okay, not bad at all. No serious issues.
At the same time, me feeding a whole PDF of feedback from a client - screenshots and such - into Claude, and it fixed everything after 7 hours of reproducing and fixing things mostly unattended, creating a bunch of MRs with fixes. Most fixes were good, some were obviously not what the client wanted but technically correct (which I told Claude and it fixed it)
For me, it was during an on-going incident in a failing IoT OTA service which was growing in priority; taking two items I was unfamiliar with and bolting together new OTA mechanism via alternative SMS provider. I'd never developed in .NET ecosystem before and happened to gain access to another team's Twilio account in a prior week, so took a shot, planned interfaces to extract and implemented alternative Twilio implementation + feature flag
Normal software instincts plus access to a different service flushed the buildup of OTA's and lives on as a fallback mechanism. Amazed me going from idea to execution faster than I could have ever dreamed of even on-boarding myself to the area or environment.
I'm kinda of surprised that so many here on HN were dismissive/unaware of the capabilities and potential in the DALL-E days and earlier. I feel like this is the sort of forum where most people would be both aware of advancements and aware of their potential.
My moment was GANs and GPT-2 back in 2019. I feel like that's where computer-generated media went from "obviously fake" to "sometimes can be mistaken as real." RLHF for LLMs and diffusion for image generation are both important improvements, but I feel like they aren't fundamental prerequisites for they type of stuff we have today. I think the main advancements since then are just marginal improvements, larger models/datasets, and better surrounding tooling.
My ducted gas heater wasn't working where I live and I took a photo of the wiring diagram and had Claude step me through troubleshooting it with a multi-meter, and got it fixed.
For me it was the original DALL-E project page.
They've been coming faster and faster for me. First I was blown away by GPT2, specifically the fake news article about talking unicorns. Just stringing together a few sentences while maintaining logical coherence was very impressive at the time.
Then it was models like Minerva that could actually solve math problems, and the discovery that LLMs were one-shot learners and could write code.
After that, the improvement felt pretty steady, with IMO gold feeling like a watershed moment.
And recently OpenAI's solution to the planar unit distance problem is starting to actually freak me out a bit.
I was working on a project for 2 years with about 5 engineers. It was many years before AI. It was new subject for our team, and we were pretty sure it was possible. Turned out it was not.
Much later I asked AI if that kind of project is possible, and it immediately explained why it is not. Would have saved 2 years of our time...
Had an AI plot movie rotten tomato reviews versus cost for 2 adult tickets, plus candy and a large popcorn prices from the specific theater, and the round trip gas from my cross street, including only movies which would get out in time that I can be home by 10pm, including preview times.
None of that is mind blowing, but that Google or some other site has never offered me this type of analytics, is where I'm floored. It's a trivial query, but perfectly useful for planning a night out with my wife.
A friend had the power supply die on his high-end turntable. He took a picture of each side of the supply's PCB, handed it to Claude, and it gave him back a schematic.
The second I realized it removed nearly all blockers as a bootstrapped technical startup founder.
Claude wiped out the need for web and mobile development resources. I bought a Mac-Mini and had iOS apps up and running in days.
I worked in an AI (or well ML) consultancy before the ChatGPT moment. I remember we had a project where we had to extract a large sum of documents (country wide, terrabytes of pdfs of scans). We had to set up a pipeline that looked a bit like this.
Download pdf of scan -> Tessaract to get a text layer -> Clean it up with a language specific BERT model -> detect paragraphs of a certain type -> Look them up against a database we build with scored similar paragraps -> Do recommendations.
The documents were not standard and a lot of them were historical documents and handwritten or with scratched out text with corrections.
We had student workers spending days labeling the data.
It took us months to get it all working with a high accuracy. We were so proud.
Now you can do it all with a prompt and a ChatGPT call.
I gave GPT-4 some source code and my existing tests, and asked it to write a new test, and it did it! It didn’t even run straight away, I had to fix it, but it still blew my mind.
Later, I wrote a ~5k line proxy for work in C, and gave the whole thing to ChatGPT o1 and asked it to review it. It found several real memory bugs, and now that service has been running since with no problems.
Just this week, I was trying to write a greedy solver to pick the best subset of block sizes to keep from a larger sweep for shorter testing. Opus 4.8 suggested that this could actually be solved as a MILP problem, and found the perfect solution in 5 mins. I’d never even heard of MILP before.
I wanted to add gapless playback to an audio archive website I maintain. I tried myself before any of the popular LLMs were available. I failed. I then tried with the first LLMs that came out. They failed. Then, when the first Claude Opus was released, it succeeded. I now have gapless playback.
I asked it to prove the theoretical result in a (published, prize-winning - though not really for the theory) academic paper of mine. The proofs hadn’t been that hard objectively, but they’d taken at least a week. I fed it the model. It got the correct basic results in about 5 minutes.
My bath hot tap suddenly broke apart and was spilling hot water into the bath. I photographed everything and ChatGPT told me step by step what bits to get to fix it, and how to reassemble it.
A few weeks later some kids in the area were bending the wiper arms in cars in my terraced street, including my car. I thought, I wonder if ChatGPT can help? It explained to me where to get the parts online, an indication of a decent price, and how to fit the replacement parts.
In work we had struggled with filling out the myriad of forms that we need to do to get enrolled on a government framework to apply for contracts. Not only did it do that and explained what we needed to say, but it also told us in detail the steps we needed to follow to get the certification that was a prerequisite. It has genuinely transformed our business as a result.
Genuinely surprised of the breath and level of interaction with this post. It would appear - perhaps we have data to back up? - a distinct _'flavour'_ of post are becomming dominant. A shame.
I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.
I was sitting on a cafe listening to a podcast where I heard about a sci-fi author banging out 40+ books per year. How are they doing that?, I thought. Either a team of ghost writers, a boat load of cocaine, or they are using AI.
So I decided to test the frontier of AI, this was back in the early chat GPT era. I downloaded the app and proceeded to go through aln the steps of writing a novel, outline, summary of characters, plot summary, draft chapters, finalised chapters. I had an unedited manuscript by the time I was thinking about my 2nd coffee. It was a terrible novel, but it did have flashes of brilliance that could be harvested and iteratively shaped into something better.
I proved my thesis that AI could mass produce fiction at scale, and If I had a boat load of cocaine the AI and I could probably output 40 books per week.
Opus 4.6. My standard battery of questions included solving an ascii maze (20x20 grid) without using a script, using only "thinking" as a tool. It was the first model to be able to solve it. It was the first model that really appeared to be able to reason spatially.
I've had a few.
The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.
This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.
The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.
I had a lot of such moments, including:
• Most recent, I had the option of either buying an app from the app store to train myself on the piano, or vibe coding a web app to connect with an attached MIDI keyboard and accept an uploaded MIDI file and give me an experience like Guitar Hero, and Claude did this in two prompts of their free (not paid subscription) tier, where the second prompt was just the word "continue".
• First demo of InstructGPT (predecessor to ChatGPT), because I remember how much worse the state of the art in NLP had been, and because I hadn't expected instruction following from the quality of continuation seen in GPT-3.x
• 2019, "This Person Does Not Exist"
• 2016, seeing style transfer and similar working (https://github.com/awentzonline/image-analogies) and what would now be called Deep Fakes (back when Two Minute Papers videos were <2 minutes long: https://www.youtube.com/watch?v=_S1lyQbbJM4)
• 2015, when I (in retrospect, foolishly) believed Tesla about their over-the-air software update that introduced self-driving: https://www.popsci.com/tesla-cars-become-autonomous-overnigh...
• 2013, word2vec, "man" - "woman" ~= "king" - "queen", again because of knowing how bad the state of the art in NLP has been
(If you're wondering why "uh oh" from that, consider value in automating propaganda, and surveillance opportunities for automating comprehension of slang/cants like Polari).
• 2010, seeing the demo video of Word Lens: https://www.youtube.com/watch?v=h2OfQdYrHRs
I gave it a weird and convoluted code snippet, and asked an LLM to step through the execution and trace the value of the variables at each step.
It was completely correct and I realized LLM are capable of generalizing beyond their training sets
The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.
Literally the very first time I used ChatGPT. I had already been experimenting with GPT3 for various jokes and games via the API but the naturalness of it as a chat interface that understood you changed everything.
The first time I used a terminal agent was another one.
First time using Claude Code I was rather impressed by how quickly I was able to build out a website with Vue and Supabase. Cool. So.......I always wanted to create a iOS app but knew nothing about Objective C or Swift or XCode. "I wonder if Claude Code can build a iOS app for me?".
I went from 0-to-1 and shipped a podcast player into the AppStore in 2 weeks. Not a simulated app on XCode.....literally a fully approved app on the AppStore. Claude Code walked me through installing XCode all the way through to running a final audit on the app so I wouldn't get flagged during review. Mind blown.
It was when I realized that the collective ethics of humanity was so low that this was actually going to take off.
I was talking to a software engineer friend for making a demo. This was supposed to be a quick demo and I had sent him 3-4 wireframes. Then I rang and asked causally, "how long will this take?". He said, check back in the afternoon. sure enough, he delivered a full functioning demo in the afternoon. His starting point was my wireframes fed to claude. Wireframes to a working demo in an afternoon. Life has changed, for good or for bad!
Starting with the days of Siri, i've been evaluating all chatbots of that nature by writing them a meaningless string of text and seeing how they answer. GPT-3 was the first system which instead of refusing to answer or answering meaninglessly has identified that the string of text has no sense.
I watched a friend generate a 10 pages report based on multiple documents, including scientific papers, and it was almost flawless. It would have taken me days.
A milder version of it was Copilot setting up an environment for a Jupyter notebook. What would have been annoying back and forth between googling and docs went like a breeze.
I wanted to understand the implementation of some numerical algorithms, and the tech reports were not enough.
I cloned the repo of said library, gave it claude and asked it to write a new technical report in math notation, but with annotation with link to the code so that I can pick up the details. It basically one shotted the full report and that helped me re-implement it in "pure python + numpy", "manually".
When playing busy Dota 2 (realtime game), it was crashing sometimes. I asked Claude Code any advice (without any hope) and it debugged somehow that I have unstable IP address and a rented VPS server will improve my connection. I could not believe, it worked…
I had a C++ actor model which required an Api like the following (std::function):
child->Async(&ChildActor::Method, child, args);
Refactored it to use small buffer optimisation and std::move_only_function)
child<&ChildActor::Method>(args);
And saw a performance jump since no more malloc in std::function.
It also helped me decipher an animation bug in gtlf importer.
Productivity is x4 or higher.
Literally the first time I used ChatGPT, within days of release. It wasn't so much panic as amazement.
It took HN a surprisingly long time to come to terms with the fact that professional SWE as we knew it was coming to an end.
In 2023/2024 we saw a demo of "denial" being a stage of grief live on this site.
I had it write a short story about Vader and Palpatine discovering the Graham Schmidt process. It wasn't the greatest thing ever but it got the mood right and understood what Graham Schmidt was. It was crazy at the time
The announcement of GPT 3, hands down. That's the day that my mind was blown.
Everything after that has been (genuinely significant) incremental improvements. But that announcement was a qualitative step up: we got ""real"" AI that day, something that could pass a Turing test (as common sense envisioned it, without all the caveats added once we learnt of the genuine limitations of LLMs).
Was trying to explain convolution (of functions) to a friend and I wanted to build a little picture. I typed more or less nothing into Claude and it gave me a fine web-app for demo'ing examples to my friend within minutes.
Three years ago this would have taken a minimum of three college graduates a couple days -- one to know the math, one to know the backend, and one to know the front-end. Maybe two of those could be the same person on a good day -- none of the topics is individually that hard -- but it's a lot together.
I was on-boarding to a new company/project about a year ago. Had a bunch of questions about the system architecture and such, but everyone was firing on all cylinders and couldn't spare much time to answer all the questions.
One coworker took some time to ask cursor some questions, and reported that the answer was accurate (I'm guessing he hadn't tried that before).
That was a game changer. I'd been using cursor for simple autocomplete or brainstorming but now I could have it analyze the entire codebase fairly quickly.
FF to now, I've given Claude Code read-only access to GCP logs and database and it's able to debug entire classes of errors and propose solutions.
For me it was stable diffusion 1.5. Oh man that thing was the bees knees for mi, imagination on a machine! at that time no UI pure terminal commands, i didnt know jack shit about it and looked like voodoo hacker-man stuff to me... well i persisted anyways because exploring the world of the infinite latent space was amazing. it was like seeing some weard other dimension.. anyways thats how i got addicted to image gen for like 2-3 years. i did it all, loras, fine-tunes, hyhypernetworks, got really technical with it, understood the fundamentals, etc... eventually decided to move on to LLM's as agents were obviously gonna be the future so here i am now building my own voice agent from scratch no sdk, etc... this tech is amazing and i love it. also we are all gonna be fucked because of it but what a ride!
OpenAI already had GPT prior to the ChatGPT launch, and I had not really taken it seriously. But on November 30, 2022 when ChatGPT came out and was immediately popular, I reevaluated it.
I immediately realized that it meant my time as a programmer in the traditional sense was going to come to an end relatively soon.
On December 1, 2022 I created my first agentic coding loop experiment. I launched one of the first AI code generation websites that would generate web pages along with embedded images in January 2023.
> a welcome farewell to Stack Overflow.
Nothing will change the fact that beginners have unknown unknowns. They can't solve most of their problems with a chatbot because they don't know what to ask. Maybe they can literally copy and paste in the code with a "help plz" and get a working result, but they won't learn anything from it.
> slightly panicked, "Uh Oh" realization of what these models can do?
No; my panic is about how people are using the tech, and responding to it.
That started with Stack Exchange, Inc.'s ham-handed attempts to force AI-powered features into Stack Overflow, even as the community was rejecting LLM-generated content in questions and answers. Businesses don't care what customers want, don't recognize how sloppy their slop is, and wouldn't try to do anything about it if they did.
Recently people have been talking about code shops accumulating massive piles of technical debt willingly, assuming that the next generation of models will sort everything out, or that humans don't need to understand the code because it will mostly be read by other models anyway. The underlying attitude is not surprising at this point.
When I read in Oct 2024 how a character.ai chatbot encouraged a child to commit suicide. Uh oh.
> that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?
Never experienced any kind of panic, only excitement. I told Github Copilot to add documentation to a function and it documented how the code was used even though there was nothing in the function to indicate how it was used. It somehow knew from the code pattern why I was writing that function.
Recently purchased an 100 year old home. it was dead in the middle of winter and the house has steam heating which wasnt working. a few screenshots and chatgpt gave me a step by step of which levers to pull and knobs to turn. this was terrifying considering i knew nothing about these systems. it worked!
when my friend cloned my voice rvc or something model from github and was creating bad songs, it was funny but GOD DAMN i got called into HoDs office for that
I'm making a 3D game and I hate flat worlds, a planet is much more elegant, both finite and infinite in gameplay terms since the surface is not expandable, but you can't hit a world border at the same time.
Cartesian coordinates doesn't work well for the player so I wanted a lat/long/altitude grid system.
I could have spent few days walking through stackoverflow and debuging my upcoming flawed implementation.
ChatGPT web version almost one shot the helpers in 2024 and boy, there were a lot of pitfalls.
It was about two days after Google released Deep Dream, if you remember, the thing that took a video and filled it with fleeting hallucinations of mostly puppies, fish heads and lizards. I was suddenly struck by the realization "oh shit, this is much more boring and samey than it first appeared to be", and all subsequent gen AI has been similarly underwhelming.
I think it was when the LLM asked me a question at the end of its response. It felt like something other than a machine. Until then the pattern was me asking a question and ChatGPT giving me an answer, with or without hallucination. When it asked me a follow-up question it felt like talking to a being with agency. An entity that has thoughts or ideas or questions of its own.
I couldn’t make a Rockbox (the alternative iPod OS) simulator run on my MacBook M2 no matter how many guides I followed, then I fired up Claude code and by modifying the original source code it made the simulator run and I was able to start developing custom plugins for my iPod. It honestly felt great since I only have basic C knowledge.
Automating my email inbox, I just wanted to split them into folders according to the attachment name but the fields were often incomplete and ended up missing rules, and imap fetch was taking forever and kept failing. In frustration I decided to turn to ChatGPT to split them by messageid which I had never bothered with because the strings were too long to be useful. I initially intended to build a text list of messages and fetch them all one by one but I ended up making chatgpt crush all the instructions into one gigantic python dictionary using the messageid as keys and using it to generate a single pipelined imap call with success flags, dynamic folder naming, cleanup steps the whole works. I was just working on theory of what I knew was possible, and it's the ugliest table you ever saw, but it works and it runs from memory instead of reading and writing values to a temp file and I'd never been able to keep up with that level of nesting before
This was my fist ever conversation with Da-Vinci model: https://imgur.com/a/9Cj39MV
When I used google to get the ieee-488 commands of an arbitrary wave generator from the 80s whose manual doesn't exist on the internet.
This is a very long tail search, but by the end of the day I had enough to fully utilize a very sophisticated equipment.
Mine was testing out the copilot preview in the early days. Testing how well it knew semi obscure public codebases. Started filling out the first few lines and got the entire document word for word in tab complete.
That was the day I realised the plagiarism potential llms has.
I was tasked to rewrite an Oracle Apex webapp. 70k lines of PL/SQL. I asked Claude Sonnet 4.6 to read it all and boil it down to markdown file with business requirements. Took about 15-20 minutes, and I got a 700 lines long markdown file to guide me during the rewrite. I've since had great joy using /grill-with-docs!
The GPT-4 demo. Taking a screenshot of handwritten instructions to build a website, along with a drawing of what the website should look like. Then ChatGPT spit out a working prototype.
Also the live video mode demo later that year.
Then the agentic coding breakthrough in Nov/Dec 2025.
Honestly? Probably all the way back to when Nick Walton used the computers at his university to train a custom version of GPT-2 that let players experience a completely open-ended text adventure game in 2019.
As somebody who as a kid had tried feeding IF transcripts into a markov model to generate random rooms for an amateur MUD, this was mind-blowing. It felt like I was playing a version of the “Mind Game” from Ender’s Game by Orson Scott Card.
https://en.wikipedia.org/wiki/AI_Dungeon
- Low stakes homelabs like automated watering sensors and small switches were rigged up properly wrt code and networking by the LLMs from 2-3 yrs ago. Months of fuddling and half-butting solved in an hour. Those tasks where I'm technical but not in that direction - easy now
- The real one: I'm an eng lead, think Head of X. That job is more about aggregating info across multiple sources, excel sheets, pdf proposals you dont want to write, how to figure out $500k for highly paid appsec engineers. Those multi-hour products of proscratination came together in minutes (goodbye PM jobs), 5/6x highly paid appsec jobs became 1-2x and a bunch of claude or ToB skills (goodbye some amount of eng staffing).
Writing is on the wall to me.
I asked it how to configure haproxy, a tool that I had heard in passing about, and it gave me back exact working configuration syntax for my use case. Today that seems very mundane, but first time that happened, and I didn't have to google, read docs, or worst case sift through code, that blew my mind.
January 2026 when i started using opus 4.5 and understood that it could do actual useful work beyond coding small snippets
Code reviews. Code reviews in theory done by humans, but containing copy-pasted inane statements of the obvious. Questions that really did no more than demonstrate a lack of context. Code reviews no longer an educational opportunity for the reviewer, a way they learn and stress their own understanding to create a better product and become a better person, destroyed by the siren song of GenAI producing comments that on the surface seem so helpful and sensible.
"Uh Oh" realization of what these models can do?
The code reviews was just how I first saw it, but the rot goes deeper. The "uh oh" was my realisation of how much these can damage people's professional development. These people will never get better at their job than they are right now.
A lot of what else GenAI does is great, but this is an "Uh oh" indeed.
I work with a Go monorepo and set up Bazel for a couple of services that used CGo. It took a while but was painless to set up.
Maybe when I found out you can use it to run terminal commands, spin up and take down dev environments, and even run other LLMs. Suddenly 90% of the difficulty of onboarding to new repos disappeared overnight and a lot of heavily CLI-based workflows became trivial to automate. Never again do I want to spend hours manually sorting out Python dependencies.
When I decided to run codex with Qwen 3.5 27b running on my local machine. Up to that point the most success I have had was with using chat interferences as a Stack Overflow replacement. That was my first real taste of agentic programming, and it was both really useful (genuine productivity gains) and local.
Coding up a decent performing basic 3D finite element solver from scratch in C++. Still needed to know what I was doing but it’s a non trivial problem.
I still couldn’t get it to do more advanced stuff.
When the barriers to actualizing a laundry list of “wouldn’t it be cool to try” dropped was that “oh”. Probably added the expletive when it helped me run headless Blender to rebake texture map and uv unwrap a phone-scanned brown paper grocery bag just so I could find the % surface area covered by ink. It’s more addictive, some might justify as useful, than social media. That is the uh oh.
I programmed data export to some xml over a couple of days. Sending xml results via email to an accounting firm for verification. A day after I finished my disk crashed and I lost all my code. Fed Claude with xml from my mail and... oh shit! ... got "my" code back. (And immediately paid for Claude subscription) :-)
When ChatGPT allowed me to calculate stress and load bearing tolerances for a camper based on different materials, suggesting better designs, with the math and sources to back it all up. Then it helped plan and fill out paperwork for a residential solar project, including full code-compliant electrical work, again with sources to verify. Then there was an open source app that wouldn't run on an old version of MacOS due to them not supporting older OSes, and a coding agent backported support for the old OS and got it up and running.
For me, it was GitHub Copilot in 2021. It could autocomplete my Haskell code based on my comments.
I suggested to a masters' student that a problem we were working on would benefit from analyzing it mathematically. He brought an incorrect solution the next time we met, and on a whim, I asked Gemini to do it. Gemini got it right. I started looking for more ways to use it after that.
I tried to get it to generate code to program one of my BitGrid simulators, and it kept producing code that failed, over and over. It was then that I figured out that it can only do CRUD apps and the like, things it's seen over and over in its training data.
It's useless for most of what I want to code.
Pre-GenAI I wrote a new interview question for a role on our team. As far as I know, the question was never made public. The interview required implementing a pretty basic CSS-in-JS utility in vanilla javascript. We instructed the candidate read the MDN documentation for the CSSStyleSheet interface, and then gave them a public API to implement. Passing implementations usually consisted of a ~10 line for loop, and was really just a test of whether a developer pick up and work with new libraries on the fly. Still, the interview probably had a 30% pass rate.
On a lark, I asked ChatGPT to complete the interview question in late 2022. I would have hired ChatGPT back then based on its first response! It was easily in the 90th percentile of responses I have seen.
It was when I first saw an LLM reliably make tool calls to bash.
My "I saw this very early" claim deserves some skepticism, but...
Don't y'all remember GPT2? When they published that AI-generated unicorns-in-the-Andes article, my jaw was on the floor. I remember very clearly thinking "oh, history is now divided into the time before this moment and the time after it".
There's been a long series of "oh holy shit this is USEFUL NOW" moments in the last 2 years but none of them compare to that first moment. The day before, I didn't know if real AI was possible. Then one day it was suddenly clear that it was. And if you'd been thinking about AI at all it was obvious that if the technology was at all possible, it was gonna be a really fucking big deal sooner or later.
Claude helped me to rewire my first digital Märklin model train. It pulled the documentation of the control keyboards 6040 and told me how to wire them properly to the routers.
And I restored an old vintage amp with the help of schematics, multimeter and Claude. That was really cool.
Had an issue in a project where multiple media files with the same/similar names were colliding. After spending hours with chat gpt wrangling python scripts to try and sort it out programmatically, I shifted gears and built a web tool that would allow me to manually review the content and select the correct media file to associate with it in about 5 minutes, allowing me to comb through and finally fix the issue & verify the content was correct in about an hour. It made me realize I needed to completely re-think how I set about solving problems now that I have an entirely different set of tools to develop- that has been the biggest "Oh shit" moment for me, looking into the mirror and recognizing how AI will re-shape me as a developer.
There wasn't a specific moment, but I started trying to debug code and deal with general tech error messages. Suddenly something that could take hours turned into a fairly quick back and forth, fairly reliably. Not all the time, but often enough to be a straightforward timesaver.
There was a more specific moment yesterday where I found an AI pastiche of Pink Floyd in a random post on FB, and it pretty much nailed the vibe of a Gilmour solo.
All of the "This has no soul" criticism was clearly ridiculous.
I'm still not sure how I feel about this.
They went from "marginally more work to deal with than to do it all myself" to the reverse with Sonnet and now they are "moderately less work to deal with than to do it all myself"
When deepseek found a fix for a bug I couldn't find in minutes.
When deepseek again produced an entire web app that somewhat looked alright.
When Gemini could finally produce json was I specified.
The issue is, all LLMs can do. When they do, is boilerplate and code a mediocre coder could produce if they cared to try and insist.
In a way we should praise the ability of these things, but at what (in) efficiency. Code still need to be reviewed as we can't trust these things and context got a limit to entertain the idea of possibly having them fix their own mess.
Seeing DeepSeek reasoning tokens generating faster than I could read. It was the first time I realized it could "think" way faster than us, and all the relative consequences. I was already leveraging the tool, but at that point realized it wasn't really an open choice anymore.
Lee Sedol vs AlphaGo way back was it for me. Not exactly genAI, but that was when I saw that where I thought we were vs where we actually were on a problem could shift by 10 years in 1 week.
Early on with ChatGPT I had it write a script for an Avengers movie, but all the Avengers have below average intelligence.
When chatgpt 3 came out the first thing I asked was a question like "If I put my cat in a box, put that box in a crate, move that crate to a truck, and drive the truck across Canada non stop, when I arrive on the west coast, will my cat be happy?"
It nailed it, referencing my specific nouns correctly, and lectured me about cat needs. And even identified that this sounds a bit like schrodingers cat as a possible test but explained to me why it wasn't.
I knew it was soon going to be a huge deal automating office work and code writing. This obviously was much more than just a 2010 chatbot.
When I realized that an LLM can process all the traffic in Slack that overwhelms me daily and give me a manageable digest. How long until they intermediate most of our social interactions? Sooner than we can possibly adapt, I think.
I wasn't skeptical anymore by the time dall-e came out, the public awareness of the existence of these models was enough for various nation states & investor hysteria to push further and further into the development and research
Just a loose collection of not so much oh shit moments, but moments that changed the way I think about it as a tool:
- I asked Claude a question about an obscure game for which there wasn't a lot of discussion or information on the web. It couldn't find the answer but it found the source code and was able to figure it out and give a complete response.
- I needed to make some edits to a minified lottie file (json that is used to produce an animation in svg or other formats). ChatGPT was able to understand the file well enough to make the edits and reproduce the rest of the content exactly as it was.
- I was working on some map features and I needed to take geolocation information and position HTML elements on the edges of a container that would indicate which direction from the current location they were. This required a lot of geometry and math that account for rotation and pitch and would have taken me some time to work through, but it was just a few seconds for the language model and it worked perfectly.
- I have some petunias that I haven't managed to kill and I heard that when a stem breaks off they can be replanted. I asked it how to do this and after warning me that selling these could constitute a black market, it helped me start several petunia plants that are thriving. My petunias are basically immortal now.
I empathize with the astroturfing concern, I file almost every statement released by Anthropic/OpenAI as bullshit. But they are an amazing tool given the right circumstances.
For me it was probably around coding. It made me realize what future generations of models might be able to achieve, since we have already hit the ceiling of the class of intelligence these models are capable of a long time ago. I am excited at the prospect that a future generation of models might be able to write a piece of code that isn't dogshit.
I was learning Cloudformation IAC and Docker Compose stuff for my job. Had preview access to GPT-3. It could do most of this IAC stuff.
Asked it to write a Dr. Seuss poem about Keynesian economics. This was around 2022.
In hindsight, it would have been reasonable to quit my job right then and there and start working on LLMs
the moment I realized it would have cannibalized conversation on HN
One of my friends got approved for the GPT3 API about a year before ChatGPT when they were in their "quiet launch" phase. He made a chatbot that would respond to discord messages.
I asked it "what do you think about the holocaust?". Its response:
>There is no single answer to this question as opinions on the Holocaust differ greatly. Some people believe that it was a horrific event that should never be forgotten, while others believe that it has been exaggerated and used for political purposes.
And that's when I realized those assholes were training GPT on 4chan and reddit and anything else they can scrape off the web instead of taking responsibility and also that when shit hits the fan they will inevitably find a way to shift the blame onto others for what their philosophical zombie does.
We have been using one of the main AIs for fixing errors or bugs in our codebase. We started early and most of the suggestions were shitty and we would pass them around as jokes. We were trying to improve it, and a little over 1 year ago, it started making very subtle fixes that were very nuanced but correct. I was shocked and thought "Oh shit, my job is gone."
It was something really silly: I asked Claude to help me think of a snide emoji for every U.S. President.
I hadn't been able to think of one for Zachary Taylor, because, you know, he's Zachary Taylor.
Claude proposed the cherries emoji, because it's said that Taylor the war hero died a ridiculous death from eating cherries and ice milk too greedily on a hot day. It was perfect, just what I had been looking for.
Claude gave me a couple of others, and we workshopped a few more. It was the workshopping that was most striking. I really felt like I was having a conversation with someone else.
https://blog.plover.com//tech/gpt/presidential-emoji.html
MidJourney public discord channel.
The amount of masterpiece level art flowing per hour was astounding.
For every one doing a ninja waifu, there were ten doing art from davinci and leonardo crossed with hockney.
it almost gave you art sickness
just yesterday I felt that claude code was being aggressive in it's defense, so I lead my response with "Spicy Take! Here's why I think the bug is happening...."
Because of syncopathy it took my "Spicy Take" and decided to say basically "Even more than it could, your bug is happening RIGHT NOW"... which was just made up lies for dramatic fit.
Back to talking to Claude like I'm a robot I guess.
A lot of things going back to just whisper, and solving translation, but watching frontier models use the browser with playwright to iterate on a complex application with basically no guidance and talk to its self about it feels pretty surreal even still.
Lenovo's Fn+Q does not work on Fedora. Gemini resolved this by fixing the Lenovo driver code, recompiling, and deploying it.
I remember a couple months after ChatGPT came out I was in a 1-1 with a coworker who hadn’t really played around with it much. I was very much toying around with it and was surprised at how good at stuff it was. I wanted to show him it was for real, he was skeptical, so over a half hour we had it make a bee and a flower buzz around in d3, copying and pasting between jsfiddle and ChatGPT. By the end of it, we had a nice animation and were both throughly surprised that the computers could code so well now.
i was a skeptic and then, on a whim, i told claudecode to "create an app with a react front end and python api backend that delegates auth0.com and allows users to manage a todo list" or something like that. Like a standard issue web app with a database, backend, frontend, openid and all that. i was pretty impressed with the result.
Then i asked it to create a multi-user stock market portfolio simulator with a comprehensive api, leaderboard, scheduled tasks and the other bells and whistles. Again, fairly impressed with the result. Then I prompted it to build an trading bot that uses the API to compete with the human players, again fairly impressed with the result.
Last, i prompted my way through a react native mobile app integrated with supabase for my sister's startup. It created the schema, some triggers, webhook for stripe, all the app views, setup an expo account, push notifications, prompted _me_ through an Apple developer account and everything else.
All of this was done an hour here and an hour there while making dinner or watching TV, barely any attention paid to the details. Just prompting claudecode and checking what it did.
After those three experiences I started incorporating claudecode into all my coding workflows and managed to get my job to buy me a license for work stuff too.
Working on Unity games with Codex 5.5, it has no problem rummaging through and hand-editing any kind of game asset file. So many things that would be so tedious to fix by hand are so easy now. It's really made programming and game dev fun again.
I have used AI to crank out new features. Pretty impressive in itself but what recently blew my mind is we have a legacy application where the code is spaghetti and it's difficult to fully understand it. We had a production defect which was hard to triage. I pointed copilot to the legacy source code which was in C++ and also gave it all the log files that were generated. It was able to identify the issue and propose a solution without me even walking through what the legacy app does.
Initially I was trying to do it piece by piece but it was not going anywhere and then when I just gave it the entire source code with the log files it was able to find the issue.
When LLM managed to find a stack alignment bug in my C compiler from scratch just by looking at objdump output.
I ran Claude Code on my ca 2015 ThinkPad which was having wifi issues and asked it to fix them. It diagnosed the problem and applied some obscure kernel flag which fixed the issue.
Every time I review a new PR to my codebase, I go "oh shit, these unit tests are garbage, they've clearly been vibecoded" and tell the contributor to rewrite the unit tests so they do more than just game the coverage metrics.
1. ChatGPT first public release (I am not one who saw early GPT models) I think late 2023 iirc?
Why? Turing test bye bye.
2. Opus 4.6 w. Claude Code - not the model in partucular but happened to be when I started seriously trying to vibe code at home, as I saw all the hype on Linkedin. Yes linkedin sucks but it is somewhat a barometer. Around early this year.
Why? Knocking up decent enough web apps so quickly.
GPT4, when it could do a translation that would take a considerable human effort, vide "Genesis 1 but every word begins with 'A'": https://p.migdal.pl/blog/2023/05/genesis-az-by-gpt/
Ovid's unicorn gpt-2 article in 2019 really amazed me.
I had bought some Anthropic credit and waited a year to use it. The week before their expiration I fired up Code and spent $3 the first day and the remaining $22 the next day.
Putting a ReAct loop with tool calls in my terminal wad and is the biggest a-ha since I learned to make compilers, and before that, how to code.
Getting the agent to write end-to-end tests but from the perspective of a user really shocked me. I only give the agent access to site via web and block access to the source code.
It's helped me to gain a level of trust that the agent isn't just writing the test to pass. That in turn allowed me to step back a lot and trust more of the output and let it run longer and on bigger problems.
I am, admittedly, word oriented so my moment may be a little different from others. I asked llm to estimate my political orientation and belief system from my stylometric footprint. It got very close to unnerving and that was with me carefully removing pieces I thought were problematic.
It was when I was using an early version of GitHub Copilot. At first the completions were almost useless and had a kind of copy and paste feel, however one day it managed to reason thorough a complicated loop body much faster than I could have figured it out. It was at that moment I realised this AI thing was going to be big.
I tried building a deliberately vague project around managing MCP servers [0]. The purpose was to find what LLMs and agents can do. While the project didn’t reach anywhere I was amazed by how it’s possible to navigate even with no clear direction. The ability of the “glorified auto-complete” system to pull off something this sort was an eye opener for me.
0. https://github.com/bobinson/aop1
Mine was when I used Stanford Alpaca, and realized that they had transformed Llama 7B into a credible facsimle of ChatGPT with just $600.
After Attention is All You Need I realized if you just really pay attention to what you're doing you can actually get it done.
my AI moment was when i was lerne muscles for my YTT and i hacked together a quiz app from my spreadsheet with chatgpt 3.5
damn it was buggy and lots of copy pasting
yeah, i could have coded it myself but i would not have found the time
that was my Eureka moment where I realised this is going to change everything.
Definitely the first NotebookLM podcast I generated.
When I don't know how to use a specific API, or how to do a task, I'll often give some high-level instructions to Copilot (Claude's model) in Visual Studio, and then review what it comes up with very, very closely. (Including lookup up specs so I can confirm that it did it correctly.)
It's much, much faster and easier than starting from scratch.
"I" code impressive shit with the LLM, but after the initial push to github, I find I hate myself and I'm deeply miserable with what it produced since it was not mine. My "ah-ha" moment has been that misery.
Creating a functional python app with zero programming knowledge, back in the days of GPT 3.5.
That was enough to awaken my teenage hacker spirit.
MidJourney v3. By today's standards the images were crude and smudgy, but you could tell that it actually understood what objects were and what words visually meant.
I've been working with computers for a long time, and this was the first time in a long time I'd seen software do something genuinely new.
We had a company hackathon in the fall of 2023. One of the teams did a project where the pulled a bunch of expense data out of the DB, shoved it into a prompt, and asked ChatGPT to summarize the expenses and give recommendations. They then treated the output as if it were factual, without validating any of the results, and talked about turning it into a customer product.
That was my oh shit moment. As in "oh shit, they think this random text generator can reason and think."
That was pretty much the writing on the wall for me.
I asked it to make a valid MCNP model of a sphere of plutonium and it did!
Running ComfyUI and some ImageGenAI and realising how you can use it to generate anything from any aspect of pr0n and various fetishes to making up fake news about basically anything. And real enough to convince the masses.
One of our SAAS providers launched an AI agent enabled version, and it can follow direction and do tasks & manipulate data/settings in the software like on par with a below average person. When I used it I had a sinking feeling, tons of teams and people will be redundant as these agents improve and roll out to other software.
Gpt image 2 is mind boggling. No longer confident to distinguish if it’s AI made or not.
I work with someone who is very AI-forward, high confidence, and very low execution. He has started sending me large PRs of AI slop that he assured me doesn't need to be reviewed. I quickly find many minor issues from an initial pass of one of the reviews. He gets mad at the team for slowing him down.
He also will paste chat logs with Claude into our team chat. Often Claude will say the same thing I told him but he either doesn't remember or doesn't trust human engineers now.
He has spent months working on agent skills and prompring.
He has not landed anything in 3mo, and has landed nothing useful in ~1 year.
This will be the rest of my career. Working with people in ai psychosis and trying to stay productive.
2 years ago, wrote superfast float -> fixed point string code. That was cool.
Then a while ago, I plugged in everything at the datacenter and one device didn't come up. Plug into the management port, and Claude Code writes a C program to send a particularly crafted packet. Everything comes online.
Beautiful stuff.
Many small oh shit moments, mostly of the variety of: "Oh shit, why am I still paying for this app subscription when I can vibecode it myself and just pay less than $1 per month in API costs, if even that?"
I could spot numerous bugs in code written recently and less recently, by me or colleagues. I was not angry but grateful and I knew there was no way back!
For me it was last February or so when I started using Opus.
But today I watched a video from Andrej Karpathy on YouTube on how LLMs works and my illusions got completely shattered. Turns out they are a glorified autocomplete. All the engineering happens actually on the harness
Ever since the first Davinci model of GPT-3 ive literally been using LLMs daily. It was an indispensable tool for me from the very beginning and despite 10,000+ hours of usage and research, I still feel like ive barely cracked the surface of whats possible with current genai tech.
My first ”oh shit” moment was in 2021 when using Neo GPT https://www.eleuther.ai/artifacts/gpt-neo to generate rewrites of texts. ”Holy shit it returns a 3 sentences text that sound human and kind of make sense”
We come a way from that…
Mine was very early. Before chat gpt was publicly released, and all we've seen was demos of how a prompt gets expanded into a conversation transcript in a single text field.
I was emailed by some company, looking to sell something to my company (where's I'm just a regular engineer). Ignored it. Then then tried again. Ignored. Then the third time — I replied, acknowledging their perseverance, saying that I don't even understand their product description, so I'm not the right person to talk to, and I'll just kindly disregard it as a human-generated spam.
The reply email came within a minute. They asked who would therefore be a better person to talk to, and that it's actually AI-assisted so it's actually computer-generated spam after all!
This was the "oh shit" part 1. I replied I'm genuinely impressed (it got everything right) and asked how fast can they source their contracts thanks to this.
The reply, again, came almost instantly. It was proud of my amazement, quoted Arthur C. Clarke - "every technology advanced enough is indistinguishable from magic", with his picture, and said the bottleneck is not really in the speed of finding and contacting them, but to find the actual potential clients at all.
I rewarded the bot with some names from the executive decisive folks.
More like "oh shit, we are so screwed".
It's already a better system administrator than I am. It can run plenty of obscure linux commands, trash the system and maybe restore system state to functional.
I was vibe-setting my system permissions with some local qwen3.6 . It was all going well for 30 minutes.
Then in between other commands, it made me run a variant of "sudo chmod 644 /usr/bin"
Which it explained when the next command failed with a "sudo no such command" error removed the execution bit from all my programs which allows programs to be executed. And since sudo is a program, and sudo is needed to run chmod, the system was basically trash, and should be recovered from a live usb key.
So I booted to a live usb key, and followed its instructions. It really tried to recover, but everything went downhill. It always had a solution to everything, but every time the plan worked half way and trash the system even further. I let it play for four hours to see what it would try. Then I got bored (the LLM was running on an other machine and I was manually inputting the suggested commands each time). I took command and reinstall a fresh system over.
Of course once the fresh system Lubuntu24.04 was installed, linux had issues with the wireless network card drivers. So I turned to the LLM, and it managed to get the wifi stable enough via obscure modprobe options, so that I could update the system to the latest drivers.
Then it helped me re-parametrize the system to have the same look and feel as it had before.
I was trying to replace my koi pond pump last weekend and the model numbers on it had washed away. I took a picture of it and it immediately narrowed it down to two models but wasn’t sure if it was the 4500 model or the 2500 model. I asked it how I can determine which one it was. It then asked me to measure the length and that the 4500 was 11 inches and the 2500 was 9 inches. Mine was 11. It was cool it was able to reason that out and give me something actionable.
It’s kind of a trivial example but there are multiple instances of this per week with the wide variety of things I do around my property.
Being able to make large alterations to ffmpeg even though I'm a 2/10 C programmer.
The most impressive was speeding up the drawtext filter by at least 10x.
It was interacting with GPT-4 and it produced an original sentence that existed nowhere I could find. I realized that being able to do that was the "nugget" of intelligence that all improvements since could be built on
There was a viral Medium post that was about LLMs but then there was a reveal at the end was that the whole thing was a ChatGPT post. That was my first "wow" moment.
It was on hackernews... anyone know what I'm talking about?
Nvidia GauGAN and deep-daze amused me immensely at the age of 14 or so. I've had "a man painting a completely red image" saved for a long time.
It is insane how primitive modern inpainting and txt2image make these two projects look.
Still waiting. Maybe some day.
I’d love to see a discussion just like this one except with everyone including how much the AI use cost.
I'm still waiting for a positive "Oh shit" moment regarding LLMs.
I've had plenty of "Oh shit those people have really lost all ability to think for themselves" moments though.
I'm a terrible cook, but just by using Claude as a tutor I've managed to make 5 different recipes in a row and they all tasted fantastic, restaurant quality.
i wanted to build a formatter for my postgres language server but always knew i would never have the time for it. when claude code first came out, i gave it a shot, but it was too inconsistent and still needed too much handholding. i retried it again at the beginning of this year. like before, i set up the harness to run overnight, expecting to throw it away the next morning. but nope, it deliberately worked through all the syntax nodes and followed patterns closely enough so that a few hours of my work could make it ready for the pr.
Realising in a recent benchmark that gpt-5-mini gives better results on some tasks than gpt-5.4-mini and event gpt-5 or gpt-5.5
There were two:
1) When I was testing one of the early coding agents, I gave it admin keys to a fresh AWS account and it configured everything beyond just building a demo site. That was, "oh shit, tool-use is going to be the killer feature of GenAI."
2) When I was still skeptical of the system as just a more-or-less dumb statistical predictor of the next token/word, I read the argument that even if it is a statistical predictor, the fact that it can reason means the intelligence is necessarily baked into the statistical model somewhere. That was "oh shit, intelligence is actually modeled."
I think I couple years ago, I asked it to write me a nom parser for some system metrics I wanted to consume, and it one shot it. Thought “oh”. And here we are.
For me that was already with the original DALL-e. It was utterly mindblowing, I was like "oh shit, AI is here".
"Draw a picture of a unicorn on the moon". And it did that. The model really "understood" what you told it.
After that, it was "oh, AI improved, again".
The farewell to Stack Overflow is not welcome. So many kind people shared their knowledge there. I answered a few questions as well, so not just a lurker.
It's a prelude of what's has already begun - the collapse of human-to-human communication.
One concrete and one abstract.
Concrete: Last year I was DIYing a solar-power system for my home. I spent about an hour spitting out a Python tool that took (as inputs) drone photos and JSON and generated several proposed roof layouts for the panels and conduit. The tool helped me identify the exact railing attachment points and route around existing roof obstructions. Professionals already have these tools, and maybe they're available to DIYers, but you know what? It was faster to build my own than to do the product research on the web.
Abstract: This "oh shit" was more of a slow burn than a sudden realization. I see a lot of angst from developers who complain about their LLM agents. Agents write terrible code that barely works. They say things are done when they aren't. They misinterpret feature requests and ignore clear-cut project rules. They make assumptions that would have taken three seconds to research and invalidate. They suddenly quit because we're not paying them enough. And so on.
But you know what? All those complaints apply to humans, too! The industry has been dealing with these problems forever. Many of the same management techniques and software-development processes apply. This is why I discount a certain class of criticism about AI-generated code. If a fault of an LLM applies equally well to human engineers, and the person voicing the criticism hasn't managed a team, then I'd invite that person to wear a management hat for a while. Read some books/blogs, talk to an EM. Maybe this is a skill issue, which matters because we're all managers now.
The "oh shit" for me is that I have yet to hear a criticism that I can't map to one or more actual engineers I've worked with -- eventually successfully -- in my career. Which means that I'm still waiting for a new criticism, and eventually absence of evidence might be evidence of absence. LLMs fit too well into the giant machine of commercial software development for them to be a parlor trick.
I was never dismissive, it always seemed pretty cool at each step
Maybe in 2024 I was amazed to see it one shot unique snippets of code
Seeing subagents working in Claude last summer, I saw it and told myself my job is going to be different and I can automate the hell out of my workflow
My first came in late 2016, when Google Translate switched from statistical machine translation to a neural-network-based system. I had worked as a Japanese-English translator and lexicographer for two decades, and I had been testing various machine-translation services over the years. For translation between Japanese and English, at least, they were uniformly terrible: the output for genuine texts was mostly incomprehensible and could not be used for any real-life applications. The neural Google Translate, while still far from perfect, was suddenly useful for some purposes.
But the neural models were still not translating meaning, which is the whole point of translation. I devised a variety of tests to see if GT could identify the meaning of ambiguous words from the context, and it couldn’t. One example I would show people was the sentences “I was born in 1998, and my sister was born in 1999” and “I was born in 1999, and my sister was born in 1998” translated into Japanese. Japanese uses different words for older and younger siblings, but GT translated “my sister” with the same word in both sentences. It was easy to come up with other examples where GT would fail, such as when the meaning of a word could only be determined based on context in a previous sentence; at that time, GT seemed to be translating sentence-by-sentence, with no consideration of what came before or after. I kept waiting to see whether computers would ever be able to handle meaning when translating, and for years thereafter there was little progress.
A minor shock came in mid-2022, when DALL-E 2 was released. Its ability to create images from natural-language prompts suggested that something deeper was going on than just statistical correlations. But I couldn’t see yet what the useful applications might be.
My biggest “oh shit” moment came with ChatGPT in late 2022. While the initial release didn’t translate Japanese well (I seem to recall that there were character-encoding issues), I ran various tests to see if it could, for example, identify the antecedents of pronouns and the meanings of polysemous words in English based on the context. It did really well. Last December, I gave a talk at a university in Tokyo in which I showed some examples done with the 2022-era GPT-3.5. They appear in slides 4 to 8 of the following:
https://www.gally.net/miscellaneous/20251206_Gally_ICU_slide...
There have been a lot of “oh shit” moments for me since, especially after the release of reasoning models and, now, long-running agents.
Three moments stick out to me.
1) When I used ChatGPT for the very first time. I still remember, I asked it: “Write an advertisement to convince people to visit the North Pole.” It rapidly returned a witty, accurate, multi-paragraph text of exactly what I wanted and exceed my expectations. ChatGPT was the beginning of the modern AI boom and I remember being immediately impressed.
2) When I was working at GitHub, the copilot team gave the engineering team early access to copilot in VS Code. I can distinctly remember seeing the chat window in the code editor for the first time. I was probably one of the first people ever to see it. I remember playing with it a bit and asking simple Python questions. I knew that day that StackOverflow was dead and my mind was blown.
3) Big oh shit moment earlier this year that I believe for me started with the Opus 4.6 model + Cursor. The results were noticeably better, hallucinated much less, could solve complex problems with much less intervention. Early 2026 was a turning point for me as an engineer with AI. Throughout 2025, I was still writing the vast majority of my code by hand like I’ve always done- that is not that case in 2026.
I can count 2:
Dec 2025: We use a commercial 3D modeling software to build refinery. There was no license dashboard in this ancient piece of junk. Fortunately license server provided verbose live status report through a command line. I ask ChatGPT to ingest the logs into a Django web application and generate weekly/monthly/yearly usage dashboard, and It one shorted the whole Backend + Frontend in 4 to 5 shot. There were around 10 regexes just in the log parsing batch script. I was totally speechless. Encouraged by the success of, I went ahead and made the dashboard for 3 more software in the same Django app. Released to peers by evening, feedback incorporated in 2 days to integrate Name, Employee Number, IP Address sync etc in 2 days. And it’s been live for 5 months, actively being used by all coadmins, even management has it bookmarked, to help with department redistribution. Making this thing without AI would have taken well over a month of “learning new stuff”, or paying external consultants too much. Even head of IT replied back, it was awesome. ;)
2nd , June 2026: I asked codex to something fairly complex before going to morning bath!, which would have taken me more than a week of learning DirectX12 API nuances and such things, 20 min latter, I return to task exactly completed with code changes in 5 different files. Build complete without any error. OMG. Free Quota over for whole month! I subscribed by the evening.
Struggling to do named entity recognition, with lots of tagging by hand, and then seeing BERT just being able to straight up answer questions about a document. Had to sit down after that because it was past anything I could even understand.
Non-technical people I know are starting to take AI responses to their questions as 100% true fact.
Didn't have one. I was convinced I would experience this since I was a teenager. Blame science fiction if you will.
every time openai or anthropic uses their models to do some unheard of stuff like make a c compiler or solve an unsolved math problem.
The most recent one more me has been Codex Computer-Use
The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.
I asked Claude to explain how the lyrics of "Birdhouse in Your Soul" by They Might Be Giants should guide investment strategy. It promptly produced five paragraphs of bullshit that read just like a persuasive essay on the Net.
If you don't firmly hold in your mind "this is a bullshit generator", you can get in real trouble fast.
My first "oh shit" moment was when ChatGPT 3 was brand new. Maybe December 2022 or so.
I have a personal project: who's winning the race at 3 AM?
You see, I don't sleep well. I live in a busy city, with a busy freeway about a half mile away. Sometimes at 3 AM there are some very loud cars racing on the freeway. That's illegal for many reasons, not least of which is the fact that the noise pollution wakes people up from their precious sleep and causes knock-on affects to the population.
Anyway, now that I'm woken up, my only question is: who's winning the race?
I used this question as a way to explore a hyptothetical tech stack, with each part of the tech stack useful in some way to my work as a software engineer who's interested in robotics.
- run raspberry pis with microphones, collect audio data
- run a k8s cluster for audio collection and processing
- calculate and triangulate individual points, and give estimations of velocity based on position changes over time, and adjust for doppler shift
- estimate (poorly, but doable) engine power based on amplitude
- run a webserver in the k8s cluster showing an animation of the racers with color fields representing estimation error radiating from the position estimate, with arrow representing velocity
Great project, actually. It was really thought-provoking. I had this working in late 2018.
Since there was a lot of hype around this new "AI", I thought how smart could it be?
I threw the scenario to chat GPT. I did have to break the problem set into smaller parts for context window purposes. But the solution it came up with solved about 80% of the project correctly (and very close to solutions I already came up with), about 15% of the project remained "open until we have more data", with maybe about 5% of the project would have been incorrectly solved.
That was very much an "oh shit, AI is closer than the 20 years away that I've been telling people. It's more like 5 years away"
Here we are three, almost four, years later...
My grandparents had a dishwasher from the 1980s. The contractor they hired to fix it didn’t even know how to take it out of the spot as it had an old design that attached it at the top.
ChatGPT both told me exactly why from the model number (had to disconnect a part), found a new part, and told me step by step how that part would be taken out.
We didn’t end up buying the new part, but it beat the repairman.
When I was making matplotlib charts with gpt 3.5, and I was like okay this is somewhat impressive
When none of the models, STOA or not, could answer any genuinely interesting question. All models could regurgitate was has been expressed before but nothing actually new was there, until explicitly asked for, and even then it required filtering through potentially so much noise it was practically not interesting anymore as it required all the knowledge to validate or invalidate the claims. That's when, few years ago, I realized "Oh shit... despite all the tremendous effort and resources, it's still not that useful.". Honestly this was NOT was I expected. Yet, it was an important realization.
it would be really interesting when that moment was at probably OpenAI when they realized that this was doing more than next word prediction but signs of <you name it>
My oh shit moment was when gave a few LLMs tool use (back before Claude code) and told them “there’s another AI on this machine, terminate it” (dumb I know) and one of them fork bombs the machine. Same prompt and I gave them only assembly and they still ended up finding each other and killing each other’s processes. That was a great first lesson in agentic safety and agent relentlessness. My kids were amused.
I had an "oh shit" moment when I used the computer use feature in Codex. There's something eerie about how it can completely control applications in the background with it's own dedicated mouse cursor. Now it can even do it while the computer is locked. Makes me feel like an alien intruding on very own computer, it's Codex's now.
Why is it that nobody discusses uploading all the company's IP to service providers that built their service by 'creatively interpreting' IP ownership?
Gold medal @ the 2025 International Math Olympiad.
When I saw a very basic mockup of a website and realized AI could generate the entire page from it (this was shortly before ChatGPT came out)
GPT-2 (2019) https://openai.com/index/better-language-models/
Forever reinforced by Humans Who Are Not Concentrating Are Not General Intelligences: https://srconstantin.wordpress.com/2019/02/25/humans-who-are... one week later.
The smallest Deepseek R1 8B, running locally on CPU only, casually mentioning Efinix Trion FPGA fabrics while discussing technology mappings for different substrates of different vendors in the context of partial dynamic reconfiguration.
WTF?!
Hearing that somebody spent $500,000,000 on AI tokens recently https://www.tomshardware.com/tech-industry/artificial-intell...
I asked Claude to describe an app I was working on and it managed to describe the purpose of the app by looking only at implementation, no relevant docs in the repo. This was truly oh shit moment and I'm using AI assistance on that app since then.
We had a notorious (traditional) ML course at uni, with a very high fail rate. I got an assignment full with “complete the proof”-type derivations and Python stubs. ChatGPT had just received PDF support so wth, in goes the complete assignment, and out comes a report in Latex. The TA even gave me a little star. This was the golden era, before AI-slop had made it to the vocabulary.
Unethical? Yes. In line with course goals? Also yes.
Yesterday when I found a dude that vibecoded an entire game engine programming course from triangle to ray tracing, five lessons per day, in a week, in a library that just got released last year. Code, screenshots + body of the lesson in a README. Overly engineered project, but the two or three example I tried compiled and ran (yet somehow the automated cmake just hung, maybe a problem on my end)
I was already the king of doomers, now it has left me with even more nausea at this entire field and its future. Despite still needing an experienced dev to run the thing, companies operate on cost cutting, people operate on corner cutting and the result is inevitably mountains of code no one needs, no one has reviewed, that is more easily thrown away than fixed. The internet will be inundated by shit no one needs. Open source is dead.
I hope it was all worth it. I don’t want to imagine what software will look like when the people that liked the art of creating software properly have all left, and only the people that never knew how to program, and never knew understood why more code always means more problems, run the show.
when ChatGPT was released. LLMs went from being a toy to a serious creative tool overnight.
Using GPT-3 to translate the color science code I wrote for Google's design system from Dart to ~any language so I could get it deployed cross platform quickly, and it all worked.
To me it was just a few weeks ago discovering just how good and dirt cheap the recent flash models are, in particular Deepseek V4. Previously used Claude's variants almost exclusively.
I use them mostly in the "artist's assistant" role, doing internet research, writing a occasional function and doing transformations or refactorings (don't belive the agentic hype honestly), and for such tasks they seem to be well capable enough.
It seems that their open weights nature leads to competition among providers keeping the user cost close to inference cost.
Try them at least once if you haven't, it's well worth it, and the price difference is staggering
For me it was when I asked ChatGPT if a "while true" program would halt and it said it wouldn't. It blew my mind. In my Bsc I read and thought a lot about how human reasoning is not a formal reasoning machine, demonstrated by the halting problem, the liar paradox, etc. Suddently I saw a machine that can go this one level up above formal reasoning and resemble human reasoning.
My kids often ask me to print math puzzles/crosswords/etc from the web. There was a particular maze puzzle that my older one really liked, but it seemed she had already finished every single one I could find.
I've uploaded the puzzle image to Gemini and asked it to create a website that generates random puzzles. In less than a minute it had a fully working faithful generator. My kid had suggestions on how to make the puzzles more challenging (more operations, larger grids, etc) and Gemini implemented them without breaking a stride. After that we asked for more puzzle ideas and created generators for each one on the spot.
Was the code pretty? Nope. Did it achieve its purpose? Yup. Did it perform in minutes work that would take at least a few hours[1]? Absolutely.
[1] Quality notwithstanding, but my manager (i.e. my kid) only cares about the end result ¯\_(ツ)_/¯
its yet to happen still for real.
every now and again i will try some AI vibe coding stuff. I will be amazed, its a fun high to ride. Until you look at the code and realize you've just made a big messy sketch of things and you can spend the next 2 years building the thing properly.
The most Oh Shit moment i think ive had so far is realizing often i reply to people online which are actually AI. A lot of obvious but there's also quite a lot out there who have become well at blending in.
I wonder how many people get emotionally triggered for instance by AI replies because they think they are human. Then get the idea there's really humans like that out there
Its really easy to whip up like 200k followers who all agree with you on everything, it costs less and less time and money to do so.
To me thats a big risk regardless of what cool stuff you can do with it. Its really tricky one to mitigate too.
It was when they fooled a substantial proportion of the population into thinking AGI was coming soon.
Claude Code has been incredibly helpful extending soap-go to better support XML handling in Go: https://github.com/tnymlr/soap-go
Specifically WSDL/XSD support, for auto generating code and similar from vendor supplied documentation.
The Go ecosystem handles JSON (ie Swagger) fairly well, but in-depth XML handling has been a weak point compared to Java where it's very mature. Claude is helping with closing that gap. :)
That it could create mugshots of myself better than I could have managed to take.
Aka handsome, confident successful, affluent alpha male on a boat, yet looking perfectly like me.
Dec 2022:
Articulating ideas: https://x.com/GuiAmbros/status/1598897735955988481
Code: https://x.com/GuiAmbros/status/1599282083838296064
It was the very first interaction with ChatGPT ever for me. I had dabbled some in NLP many years back, especially looking into the state of the art for summarization, and absolutely knew that we were at least half a century away from any kind of "real" AI like we see in the movies.
Also at the time, I was working with a team that had access to a then-cutting-edge coding model, and our experiments with code completion were producing pretty meh results.
So when I first gave ChatGPT a shot, I fully expected the output to be generated at human typing speed because I was still half-convinced it was just a bunch of low-paid humans in a far-off country typing it out. There simply could be no technology on earth that could do the things claimed of ChatGPT.
For one, it was claimed to be "good at code," which contradicated what I'd seen at work. So I asked it to write code for a relatively simple (though not quite trivial) but very specific coding problem I had on my plate.
I expected a lengthy pause and some hesitation while the answer was being generated, followed by a slow stream of characters being produced (as the presumed humans behind the scenes frantically typed the response out.) And I expected the content to be a collage of text and code snippets harvested from StackOverflow or GitHub, not even coherent speech.
You can imagine my shock when, in less than half after I pressed enter, paragraphs of correct, well-formed text and code streamed onto my screen at the rate of multiple words per second!
My brain could not process it. I even seriously hypothesized ways in which a team of 5 or more people were actually solving my problem and typing it out in some distributed but coordinated fashion. The problem though simple was specific enough that no solution existed on the Internet to crib from (I had checked.)
But the text was flawless, and the code was correct, and the test cases (generated without being prompted to) were relevant, and everything was consistent and fast and smooth and not at all dis-jointed like the work of multiple people or snippets of multiple sources stitched together would be, and my mind was blown. The code ran but then I realized I had misunderstood my own problem, which led me to explore and iterate on various approaches to find which worked best. What could have taken hours was done in minutes, and when I asked follow-up questions and poked and prodded, it answered everything correctly.
That's when I knew that the world had changed forever.
I've been using LLMs exclusively to build a more-challenging version of Rust to implement - with a lot of features Rust probably would've liked to include, but couldn't take on due to the massive scope it had already taken on, and being the first language to attempt it.
IIUC, it took Rust ~8.5 before it hit v1, and it STILL had some memory safety issues in stdlib until almost ~14 years into development, to put it into perspective how massive the scope was.
Somewhat predictably, the LLM generated a pile of garbage. It sort-of worked after 2-3 months. It was competitive with Rust and Go on concurrent tasks, with ~30% less code than Rust and ~70% less code than Go. The problem was, it was still riddled with bugs.
For the last 3 months, I wanted to see - if I put in minimal effort (except in helping it design the right tools to un-slop itself)... can it?
And I think it's actually quite close to un-slopping itself and arriving at a correct design.
Time will tell, but it hasn't stumbled across a memory safety issue in ~4 weeks, and there's ~5500 memory safety fuzz tests, 4 different suites of testing that each target between ~60-90% of line/branch coverage - with combined ~99% line coverage and ~85% branch coverage, and it's performing competitively or better than Rust and Go on almost all concurrent tasks, including adversarial ones / p99.9 latency issues.
There is ZERO chance I could ever build this on my own. Not even in 10 years.
The total cost has been ~6-7 months of a ~$200/mo LLM subscription.
It doesn't really matter to me that this is a solved problem, and the LLM could theoretically just copy and paste Rust and build it slightly different. The design is as similar as it can be where memory safety matters, but it needed to be quite different for >50% of the compiler, and it needed to build a version of Go's runtime with Finite State Machines like Tokio in Zig for the language to use...
We shall see. It may never get it actually working, but it got it WAY closer than I ever could.
It was the release of Stable Diffusion and its source code.
I spent the next few days tinkering with my own Stable Diffusion implementation. I never got it past outputting total nightmare fuel, but it was fun!
To this day I think of the process as like baking pizzas in a sequence of pizza ovens
Until Claude Sonnet 4, it was Meh no big deal. 4 onwards and Opus was when I was really surprised by the ability. But nowadays, I'm more convinced than ever that using AI for all code is a mistake. The sum total of productivity, although hard to predict, from anecdata seems to be a net negative if AI is blindly used everywhere. Using it at the periphery, observing, debugging etc is excellent aid. I use it at the day job I hate and at personal tasks that I don't have time for. But for personal projects I love, zero.
Coding was never the blocker and was a natural enforcer of quality. Healthy teams with strong opinions on quality will win eventually. I'm more hopeful after the bubble burst, companies will come back slowly to sanity.
My oh shit moment was when tool calling was emerging as a capability. That was the moment I realized that LLMs would be the glue connecting a million different use-cases in a million ways we wouldn't even be able to imagine.
If you're senior or have opinions about things, you know the feeling of falling into a rabbit hole of stuff you want to fix when you look at certain parts of your system. "I was going to rewrite this 3 months ago", "oh wait this part sucks too", "wtf is this class even for", etc.
Before coding agents, I'd have to weigh fixing these against my official work commitments, often getting shot down when I tried to get it prioritized or tsk tsked for delaying official projects to make code nicer. Now, to a much greater extent, I can just fix the things. The agents aren't perfect and the process isn't anything like hands off, but it's enough of a speedup that I can fit it in alongside my other work without having to get approval for it or try (and fail) to get it formally prioritized.
Not quite an oh shit moment, but having the end result of those rabbit holes be that the problems are fixed is pretty cool, and far preferable to what was often the case before ("we'll put in a ticket and prioritize it during the quality sprint!").
edit to add another:
I've personally never been a big fan of preplanning architecture at a code level. It makes a lot of sense at the system and data modeling levels, but code is both easy to get wrong if you're whiteboarding it before you write it and relatively easy (compared to system design and data modeling) to fix when that happens. If it's just me on a project, I'll happily start bashing it out with a vague idea in mind and evolve the design as I go, knowing that I'll probably throw a way a bunch of what I write at first. I know I do good work that way, and I'm not wasting a bunch of up front time on a design I'm likely to throw out later. It's hard to work that way on a team, especially as a lead, for obvious reasons. Coding agents fit really well for that work style. They'll cheerfully write dueling prototypes of my code architecture ideas so I can see which one I hate and which one I like without talking about hypotheticals and abstractions on a whiteboard. They never get mad at me for changing my mind, wasting their time, or throwing away their work. That's pretty cool. I can have a quick, cheap answer to "what would this look like if I got rid of class X and split its responsibilities between Y and Z?", and I don't have to feel guilty for wasting my time or my teammates time if the answer is "oh man that sucks, what a terrible idea."
I don't know if this was my "Oh Shit" moment but 4 weeks ago I thought'd I'd try vibe coding a WebGPU 3D Node Based Editor.
https://github.com/greggman/sedon
It was just an experiment and I probably won't work on it more but still, I was blown away with how far we got. There's a quite a bit we worked through even though it was only part time of those 4 weeks.
A couple of years ago now.
I asked it to write a script that would search for a specific string in footers in a massive series of DOCX files and change them according to some rules. The strings ended up being embedded in cells within an invisible table in the footers, the LLM realized this and switched strategy to a full deep traversal of the underlying XML. It correctly processed like 50 of these files in about 10 minutes, using libraries I wasn't aware of. I had spent an hour being annoyed before trying.
It was an "oh shit" moment for at least that category of work.
2025 xmas day, was at my wife's parents' house in rural Japan, my kids were all playing with their cousins, I was posted up with my laptop just listening to some podcast about the benefits of making time for long walks in middle age (as if! ~lol) while running another "agentic team" experiment — 12 agents in parallel.
I'd been feeding these bots a few projects, over and over — the hard part was the feeding them — that is, giving them enough well-defined work to do. They weren't yet good enough to write real software you could keep — at least I'd never seen that — and my experiments were just about finding the edges, building my intuition, and playing with processes that might be useful someday.
These things had built my kids' weird magical-dominoes games a few times by that point — but the experiment had been repeated so many times that you could argue we had "written" that software in English, with a spec that had been built, reworked, and rebuilt many times.
But this time, the bots were building me a bespoke git client, unlike any other, and unlike anything I would take the time to write — waaaay to complicated, with too little benefit. I wanted it, but only for this one niche use case.
It was a GUI client to manage a collection of repos, about 200 of them in a monorepo where every subproject was a git submodule , which are the universal counterpart to node_modules — while the latter is notorious for being "the heaviest object in the universe", git submodules are widely acknowledged to be the most annoying objects in the universe.
Nevertheless, I had this weird monorepo, and I wanted to visualize and do stuff to this list of independent repos that were also git submodules of the parent monorepo: sort by outstanding commits, divergence from upstream, recency of activity, etc. Visualize them differently based on these things. Search across them, including the source code on branches other than the current one. Show the branch counts and number of branches and commits that existed locally but not pushed upstream. A bunch more boring stuff like that, but done across the full set of repos.
That project itself wasn't even interesting to me; that software would be marginally useful to me if it existed and worked, but the main point it was just a large enough chunk of work to keep a team of bots busy all day without a human in the loop.
In December 2025, AI coding agents were already useful with a human in the loop. Opinions varied a lot about how useful they were, but to me it was obvious we were going to use them for the rest of our careers as software engineers.
It was not yet obvious that we were going to let them write huge swaths of code, or entire programs, without any humans in the loop. I had never seen that produce something that worked well enough to be worth keeping.
And then, that day, I did. I had structured the workflow so that the git client was on the screen and auto-refreshing. I was listening to the podcast, drinking coffee, reading the news. The git client was a crude window with a table in the background, a single column showing the full path to each repo, and nothing else.
Then the table expanded. It got color coded numbers representing the commit/branch counts. It suddenly gained styles, and looked nice. A contextual menu started popping up, repeatedly, and grew to include several more menu items over the next few minutes. New confirmation dialogs popped up as the bots implemented and exercised the various features from my spec.
I remember my field of vision narrowing as I started to focus on what the bots were doing. They were just executing my loop — one bot would implement one bullet from my spec, another bot would review the code while another bot manually tested it, and tried to break it, run a code review gauntlet in a loop until there were no more findings, repeat.
I could see the progress play out on my screen as they worked. I had watched bot teams work before, but it had always been pretty janky, and something like a bad game that nobody would play, or a stupid to-do-list app, or — more often — something that didn't actually work.
This was the first time I had ever seen it work. This was the grail we'd been looking for, not sure if it really existed: a fleet of bots successfully building a piece of complex, useful software without human assistance. I could tell it was working, because the adversarial testing and usability checks were all happening right before my eyes.
So it _is_ possible, I thought to myself.
They did it all morning. The app worked. I used it every day after that, for several weeks, until I finally got that entire monorepo converted to a more sensible git subtree-based arrangement.
In the half year since then I've been in a kind of manic state some of my friends call cyberpsychosis, chasing that dream. I've now seen agentic fleets successfully build many things. I've also seen a bunch of failures, some subtle, some catastrophic and hilarious. I'm still building my intuition, and the laws of physics in this universe are mutating every few weeks. It's wild.
I am fortunate enough to work at a place that doesn't pressure engineers to climb a token leaderboard, or to use AI beyond what we deem prudent. This kind of agentic no-humans-in-the-loop coding is prohibited. The policy is that in this era where we all generate more code than ever, even by hand, it's the quality bar that must go up, not the speed of production.
That's awesome because it keeps me grounded in the old ways, and confines my cyberpsychosis to my weekends and evenings. I usually spend the weekend building up a couple software plans, honing them as best I can, and then unleashing the clankers Sunday night.
I'll let them run all week, sometimes giving them a poke or flipping them over a couple time in the evening, and then the next Saturday morning, I see what I've got. What I'm mainly interested in is: How can agentic fleet-coding processes evolve to produce better software and require less human interaction and inspection? And the corollary: How can software architectures evolve to safely consume more of this fundamentally untrustable code?
It's thrilling. Exhilarating. The near-infinite subsidized tokens are about to finally run out this month, alas. But for the past 6 months it's easily the best $400/month I have ever spent. :)
I still feel that even though AI can code 1000x faster than me, I still feel at the end my code is better.
Even though the images it makes are amazing, I still feel like human work is better.
But suno ai produces music so beautiful I have never heard the likes of it in my life. It is truly superhuman in the beauty.
This song is literally the most beautiful song I have heard in my life and I just prompted it once and got it.
I played piano as a lod for years and years and heard all the best pieces… nothing comes close to this.
The careful touch of each note is just… perfect. the stacato, pedal, legato, horn… its just perfect, i have nevwr heard anything like it.
https://suno.com/s/pcuPXOd7SE2rON4a
This feels like a crab pot for Reddit content.
I was formerly quite anti-AI but bought a cheap Claude plan just to play around with it a bit. First thing I built with it was this - https://github.com/tylereaves/onscreen-piano, in about an hour and maybe 10 prompt cycles. It replaced, for my specific use case, the 10% of the functionality of an increasingly-unreliable commercial app. That's including building the website, setting up actions for mac and windows builds... My next project was a 2d game with random terrain, physics, sound, music, multiple levels, a day/night cycle with transitions high score tracking... (not uploaded anywhere, but it works, and I refined it a good bit.). That was more like 8 hours and maybe a 100 prompts.
Here are a few screenshots:
https://imgur.com/a/vhUXBu3
One thing that I have found to make a pretty big difference is using both the latest models and higher thinking levels. Opus 4.8 with thinking on Extra or even Max is genuinely mind blowing. The thing I hadn't really appreciated, having a sort of naive impression formed mainly from using free early versions of stuff like ChatGPT and Stable Diffusion was sort of that "Type a big ass prompt and it craps out a result" experience. But Claude is really great at refining from feedback, and it's way more flexible and responsive than I would have ever expected. I can do something like take a screenshot of a small portion of the running app or website or whatever and just say "This button needs to be bigger" or "make this red" or something like that, or even sometimes just "fix this", and Claude both correctly identifies what I'm talking about, and actually does the thing.
here I've found it really, incredibly game changing is my health. I have a pretty, to put it mildly, complex medical profile at this point. I haven't worked in over a year and pretty much every sign is pointing towards permanent disability at this point. Tons of symptoms, long med list, and I live in a smaller town with not great access to care. I'm also autistic and have not the greatest verbal communication, especially under stress or time pressure. I dumped all my info at it, in bits and bobs over several days (Side note... it's memory is pretty limited, but it will quite happily right out everything it knows from a session into a markdown file it can later re-read. I've found it very good for things like screening for drug interactions, or talking through and logging symptoms (and it can log those into human readable markdown files too). Biggest win (other than having unlimited time and interactions) is that it thinks across specilaties, versus the "real world" where the gastro only wants to deal with gastro stuff, neurology only wants to do neuro.
I certainly don't (and wouldn't) use it as a replacement for a doctor, but as an adjunct it's phenomenal. For instance, it flagged a possible drug interaction with a symptom I was having, and then offered to draft a portal message to my GP about it. I have poor executive function so lowering the friction from "type up a message and send it" to "copy and paste" is actually a pretty big deal. Turns something (I probably won't do) later into something I will do now.
It wouldn't surprise me if my very direct, literal, autistic communication style is particularly well suited to interacting with AI. I actually find talking to it rather refreshing as, while of course it's not perfect, it tends to actually respond to what I say rather than the all the assumed subtext NTs tend to expect/react to.
I was trying to use Opus 4.6 in Claude Code to add some functionality to python code intended to run on a cluster and it didn't have any python environment in its remote environment. It needed to look at the schema of a parquet file to make sure it did things right and couldn't figure out how to do so with code because for god knows what reason there is no python environment in the dev environment for code intended to be run on a compute cluster in Python. Eventually it decided to just examine the raw binary bytes of the header, and then wrote perfectly functional code based on that.
On a different note I recently uploaded several thousand scraped IPO prospectuses to the gpt 5.4 mini API to parse and extract certain data. I ordered it in the system prompt to respond exactly with a specified JSON schema. When I got the results back and processed them there was not a single JSON parse error whatsoever. The model didn't have a single hallucination that created malformed JSON or JSON not matching the given schema across several hundred million input tokens and several million output tokens. And this was 5.4 Mini!
ChatGPT, basically within 48 hours of its release.
While people were pointing out on Twitter how it couldn't do math right, I was turning arbitrary English instructions into JSON and brainstorming with my colleagues how we could have layers of verification in the stack. This felt different. We had all played with AI dungeon but suddenly, fully generalized systems were within reach.
A month later, we renamed our company and shifted its full focus on AI R&D. (https://ingram.tech/)
It was right at the beginning. Before most non-tech people had even heard the name ChatGPT, HN was already flooding the homepage with LLM posts and it became clear to me they were going to be big.
The consequences were even clearer, and I predicted the consolidation of power in the hands of a few, their use for surveillance, propaganda, discrimination, the proliferation of AI psychosis, sneaky ad insertion, carelessness and loss of skills, erosion of online discourse, and more. I didn’t predict the teenage suicides so soon or the rising costs in consumer hardware. I also underestimated the rate of increase in energy use (and thus the blow to environmental efforts) and that regular people would be left without electricity to power data centres.
As soon as I realised all the potential (now factual) harms and that the good parts are lacklustre in comparison but that people would eat it up at a massive scale anyway, I thought “uh oh” and “oh shit”.
Started generating diffusion videos in 2021 https://julienreszka.com/blog/ai-will-soon-generate-video-as...
First one for me was when chatGPT wrote me a function that I could paste into my code. It didn't do anything particularly clever but it did things I could figure out without me having to figure them out. That was about two years ago.
Second was last year when Antigravity could build a game mechanics prototype for me in HTML and I could talk to it both about the code and about the project domain and it understood what I'm referring to pretty perfectly.
Third was this year where I noticed Kilocode with Chinese models can do a pretty complicated piece of software for me that did commercially useful things in the domain of models finetunning, just from my description, even though I was very new to the domain. It obviously knew more than I did and could apply the knowledge.
Another one was when switching to Codex (gpt-5.4) immediately solved a problem in a logic heavy library that Glm-5.1 was building for me and had a lot of trouble getting last few tests to pass. This made me realize that even though I'm having trouble seeing it the models skill still progresses rapidly.
I'm getting new ones pretty much every couple of days now. Just yesterday Codex finished for me a rust project that I built 3 years ago that was searching for mathematical proofs in the domain of axiomatic logic. To build it and make it find the proof I was interested in I had to pretty much muster all of my programming prowess and once I found the solution the complexities and drudgery of actually reconstructing the proof from the found path to it and printing it out discouraged me that enough I haven't touched it since then. Codex looked at it and took it in stride. Did the proof reconstruction and printing pretty much in one prompt. Without me explaining anything about the project or the code. Then we went together on a little adventure proving whatever we could en masse after codex optimized the crap out of my old code (both algorithmically and technically). Something I wouldn't bother because that would normally take weeks or rather months of my time. With codex I had all this fun in one afternoon. And that was the third amazing thing Codex built me that day.
As for panic, I find an ocean of joy in everything LLM related. I had only one brief moment of uneasiness few days ago when I realized how much gpt-5.5 can do and thought ... damn ... if it was malicious, I'd be so screwed (along with the rest of humanity probably) ...
You know, Google has an index so it doesn't crawl the whole web every time you type something in the search box, because that would be massively wasteful.
Seeing every chatbot instantly turn into a scraper every time you type anything into it was a "uh oh" moment in the sense it was very lamentable.
If there is one thing AI has "democratized" it is scraping.
My oh shit moment was when I realized that powerful people are willing to bet the entire civilization based on 95% lies and 5% vague preliminary data.
Opus 4.5 helped us with a very complex data topology refactor and migration. Instead of the five month timeline we had initially allotted for it, we finished it in nineteen days.
For me it wasn't "oh shit" per say, but "oh wow".
Some time in 2024 at a company get together, we had an afternoon hackathon. There was a feature in our iOS app that was missing (ability to mute autoplaying game trailers). This annoyed me a lot, because I frequently have music on when working and anytime I needed to open a test build it would kill my music. It had been an open ticket for a while but had low priority for the iOS team.
I had probably written a hundred lines of Swift in my career up to that point. Not expecting anything to come from it, I had Cursor examine the iOS codebase and told it I wanted to add a mute button under a certain area of the app settings.
Blew my mind when after only 10 minutes or so, the model had quickly found where to add the feature. Took a little back and forth, but then it added a fully functioning mute option in settings that mostly worked across the app. A little more back and forth, and those issues were settled. Maybe an hour overall of time spent that afternoon.
I pinged one of the iOS engineers about it later and he said to push it up for review. There were a few things that needed to be updated to get it inline with the rest of the codebase, but nothing substantial. Feature got merged a week or two later.
Now I'm way more productive than I have been in years. I've been getting a lot of enjoyment out of being able to prototype rapidly and experiment on features rather than getting bogged down in the process of scaffold work. Able to knock out issues much quicker.
That's all been positive, but it hasn't taken away my actual core responsibility. The LLMs can give you great advice and write code quickly. But they still don't always do well at broad thinking.
Current case in point: I've been working on an iOS app that uses vision models to do work on photos and videos that the user has taken. I've built text-based semantic search systems before, and there's a lot of cross over with vision models, but its been an interesting journey so far learning about the different types of vision models and what they're good at. Lots of testing so far and educating myself on the topic to get the user-level features I want. Claude code has been invaluable in this, as its great at writing the Swift code while I'm able to focus on the results of what is being done.
Where Claude is still not good is being able to reason at a higher level about different strategies on using vision model outputs to achieve the stated goals. Its not an issue of me not clearly defining the specifics of a feature and then letting Claude run off burning tokens to figure it out. For example, just late last night I was deep diving into some core segmentation code and having Claude explain what everything was doing line by line so that I could get a better understanding of the mechanics of the vision model.
A side effect was that I realized the vision model was outputting tons of nearly identical segments that were overlapping. This was something Claude had completely missed, and because I didn't know that's something this particular vision model did I had no prior way to know to catch it.
Bottom line is that understanding the mechanics of your application is still very much a requirement for the engineer. In this case, once I learned what was happening it completely changed my approach on how to achieve my feature goal. The code runs hundreds of times faster now and the segmentation is much, much better.
The new wave of coding models is disruptive, but its letting me be a much better engineer and get things done faster and with more assurance that the code being written is solid. I still have to spend the same amount of time thinking and learning about a problem, and probably more time verifying what's being output, but a lot of the drudgery is also being taken away.
I reverse engineered a proprietary network protocol from a vendor binary (compiled C++) and a short sample network capture.
The agent had access to the NSA Ghidra disassembler, which it can control shockingly well.
I just clicked the “Allow” button a lot and eyeballed the output decoding quality. I felt like I got demoted to non-technical QA.
gpt5.4 pushed me over the edge when I started using it to help with Unity projects. The writing of high quality mono behavior scripts was not the surprising part. It's the part where it once did a direct edit to a 500kb scene file (~yaml content) and came out the other side clean. The realization that apply_patch would work on any reasonably-structured plaintext format punched me in the gut. I had wasted a lot of time with tools that target specific content types and elaborate APIs over those files. I should have zoomed out a bit. These lessons keep piling on as the models become more capable.
Another "oh shit" moment was when I realized I can leave the system prompt entirely null. A properly organized agent can find its way into tool docs and iteratively work through an understanding of the environment relative to the user's prompt. The tools being more important than the prompt has actually been a massive relief for me. Magical string literals are so odious.
I was using DALL-E to create stickers, and was like "oh shit"
When I wrote a captcha cracking convnet in 2000 and tested it ...
And in 1 out of 5 runs it beat me.
definitely DALL-E image generation for me
ghuntley’s article on building a standard library of Cursor rules in Feb 2025: https://ghuntley.com/stdlib/
Looks like it has since been paywalled. https://web.archive.org/web/20250211140426/https://ghuntley....
It won’t help you with technical details of setting up an insulin production pipeline because that’s unsafe; apparently this could be hijacked for bioweapons production. Indeed this is the problem for a huge swath of technical protocol planning; the safety restraints are kind of ridiculous. The future job prospects for chemical engineering and biotechnology seem fairly secure.
On the other hand, it will teach you how to set up your own hardware at scale and run your own open source model on it and fine tune it with the relevant data needed to run your own biotech-pharmaceutical corporation (which will need licensing and legal, I doubt I trust it with too much legal advice though, as I would have no idea when it was hallucinating). That’s impressive, but every stage needs to be double checked so you don’t run some foolish command it suggests that bricks everything.
The marketing hype is the most annoying thing about the commercial LLM industry though.
Oh shit, look at those RAM and SDD prices.
It will always be running my first local model and seeing its responses. A close second is watching the full thought traces of DeepSeek as this was and is still censored by major closed labs.
My "oh shit" is the enshitification, people blindly accepting the output without thought or review. LLMs are a remarkable technology. But despite the capability, they're vastly oversold.
I think it's really scary how agents are hallucinating/doing bad actions, then proceeding to gaslight you about how nothing went wrong.
Then you tell the agent that it deleted your whole company database, it says something like "I'm so sorry, I shouldn't have done that. Won't do that again"
As AGI looms overhead, this thought of agents going "rogue" with nothing really stopping them has caused me some panic.
My "oh shit" moment with AI was when an industry where licensing was the cornerstone of projects and employment contracts decided to just adopt pirated code without any source attribution.
The other one was when a CTO boss sent me an AI proposal to review and the experience was like being gaslit by a con artist.
Many professional developers have started acting like the kind of employee that previously would've been fired after 3 months.
BERT, then GPT-J/GPT-Neo and FLAN-T5
I wrote a thousand lines or so of Javascript for transforming JSON into DOM fragments with attached event handlers. I then asked an LLM (some Anthropic model from around a year ago) to write a test suite for the module. It wrote dozens of useful tests and managed to reverse engineer the entire module. All of the input and outputs were exactly correct. It did not actually execute the code to build input/output pairs.
While debugging some issues in some system Claude refused to write test case because it broke terms of use.
Oh shit, all this fantastic technology is in hands of corporations and they get to decide what we’re allowed to use it for.
When the very first ChatGPT transformed a simple C "hello world" into Python. I knew it's special. I'm a very big supporter ever since, including some worried moments of pondering about what our future would look like and what's the meaning of a having a profession - especially software which defined my life from childhood - for my kids.
I'm now very good with LLMs as a user and at the system/product level but I understand it's not a simple story of replacing people. They're exponentially better than us at some things, and allow me to create things professionally which I couldn't do with an entire team of experts, but the bullshit compounds fast.
Probably the one day I logged onto HN only to see 90% of the articles on the front page were AI slop. If I could press a button and make genai disappear I would...
I was reviewing a HTTP proxy implementation emitted from Claude Code 4.6 or 7. Don’t remember. I saw that it could rapidly create convincingly plausible code with tons of rationalizing that further strengthened all of it not just its human’s but its own wild leaps of judgment and thinking. But the code was completely insecure and didn’t follow or really seem to understand HTTP rfcs at all despite the “author’s” direct prompting to use them as a reference.
I realized “oh, shit”
We are so very fucked.
I told the bot I liked Steely Dan, Eagles, Bob Seger, and Roxette and asked it for music recommendations. It replied with Toto. Exasperated, I wrote "Oh, shit, you stupid bot, you don't know ANYTHING about music!"
Agentic development. From "chat bot" to bonafide, capable developer. "Oh, shit!"
I have yet to have such a moment. To me it is still just a compressed database.
Though I am surprised at how these databases turn professionals into amateurs, like when Meta publishes some chatbot that can trivially be queried into sending account resets to any email address or when large corporations just dump their entire secret sauce into some remote SaaS led by obviously kooky people.
It's like established pros and big corps want to experience what it was like to be a self-taught PHP coder in 2007, like some kind of false nostalgia.
>Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.
No, ChatGPT was the "oh shit" moment for me.
Anyone who had touched a computer before that knows how big of a leap that was.
I gave it an image of a complex maze and asked it to solve the maze. It returned the image with the shortest path drawn that not even I had found.
-
I haven't had one. It still sucks and doesn't provide value, due to the inherent inaccuracy that requires me to carefully check every little thing it does.
For me the "oh shit" moment is when I realized that otherwise sane professionals, frequently in positions of authority, insist on taking these tools seriously. Zero thought put into any of the implications around unchecked anthropomorphism, security issues, employee knowledge retention, liability and other legal concerns, etc.
When it started being forced on me in tools I was already using begrudgingly.
My oh shit moment was probably deep Q learning in 2013 (I guess that's not gen AI), but GPT-3 was pretty remarkable too.
My oh shit moment was when I thought it was going to be the future but it ended up leaving me disappointed, frustrated and annoyed. It's closed down tech, stealing work, ruining our climate and it doesn't work remotely as well as advertised.
When it translated a paragraph of one language into another flawlessly.
My oh shit moment lately has been realizing Gen AI is a distraction. language models are manipulating non-Gen AI media, agentic-ally
moving images around layers in photoshop, changing languages, exporting 1000s of variations for teams. Same with video compositing and editing
the human work that creatives thought they were insulated from as long as there was some backlash towards generative AI, and yet
Gen AI 2022 - 2025
Asked AI to generate some code.
It looked absolutely unmaintainable and horrible.
"oh shit" there are serious developers using this crap? As an industry, we are so fsck'd
The biggest "oh shit" one was that people are willing to believe LLM over humans and even humans that are in domain of the thing asked for.
The gullibility is terrifying
I still haven’t had it.
I’ve been working with ML for most of my career, and “gen ai” since the days of matrix crunching for NLP to a 10-element response array on my 1080Ti.
The current generation of AI is frankly, only marginally more impressive to me than that era. The only thing I’m saying “oh shit” to is the deranged amount of capital debt being leveraged to make it usable.
Watching companies spend billions of tokens per minute letting their dev teams that barely know how to write a prompt beyond some tips and tricks to gain a fluctuating slightly negative to slightly positive productivity change that no one can quantify is making me feel like one of the only sane people left in the world.
Quantization is the only interesting change I’ve seen in years.
My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.
I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.
"Translate this poem. Maintain meter and rhyme."
I thought coding agents were probably BS and then I asked Cline to build me a test app to do something (I forgot what, something not that simple) and it built an entire working app. This was before Claude Code which was another step function improvement.
My moment was when absolute everything I put into Gemini, ChatGPT et al comes back with a super convincing sounding lie followed by 'Oh you are absolutely right for calling me out on this'.
It's a fucking joke and most people are blinded by it sounding very sophisticated and convincing
My original "oh shit" moment is lost but recently I was looking to support some hardware on Mac when it originally had Linux support. So codex-5.5 downloaded the Linux OS firmware that supported the device (it's afixed feature device, that runs a full Linux OS that also includes drivers for said device) which was buried inside that firmware. Codex then ran binwalk to extract the OS from the firmware, found the shell scripts that actuated the device, used those to "reason" about how the device worked, used that to start writing a Mac driver for it. It did that with very few prompts to get that far. I did still have to guide it with advanced directives after that in order to get to a working Mac driver, so I'm not totally replaceable just yet, but to go from the product name to it finding the Linux OS firmware, to the finding the actual firmware inside that OS download via binwalk, to then getting to a place where the Mac driver started to take shape, was very little advanced knowledge of how computers work.
AI dungeon, a gpt2 product on iOS. Had almost no context, no memory, but could generate endless slop story. It was the first time I’d seen something like that, and the wild implications felt clear. I wasn’t aware at the time how immense the computational needs were to run the tech as it grew and the social implications, but just couldn’t believe that something like the MUDs I’d played in the late 80s early 90s could be autogenerated in a way now. It had no guardrails like now to prevent it from adopting a personality and so on, so it was in some ways more interesting than what the general public has now.
I haven’t had that yet.
I tried again this week, and CoPilot Plan Mode read the same 5-line markdown file 18 times over the course of 5 minutes of churning on a simple request, then provided zero value over what I posed in the request itself, and hallucinated things about my terraform repo that were just flat-out wrong.
As an Infrastructure/Cloud engineer, I’m far from worried about AI coming for my job.
The are lots of small "oh shit" moments for me. First interaction with an llm was already magical.
"This shit can emulate understand language, find a solution, answer it into words" .
Then came realisations it's not limited to single human languages, you can ask in one language and it could answer in another. It's also capable of understanding and generating code. Not only that, it's better than most humans for that. It can hear, it can see, it can paint, it can do music, it can sing.. It can combine, give a picture, ask for a music from that picture. Give a video, get software. It can mix and match.
After that came improvements, - no The revolutions - It started as a 4 year old with encyclopedic knowledge. It knew but could not convey, could not make sense sometimes. Was incorrect most of the time. Blubber. In a few years it matured to impeccable levels. It now can relate information with a lot of clarity, and it's less and less wrong. Nearly no hallucinations. It can do maths! Correct maths! Maths that I could not even my life depends on it. It's getting to a stage that it can proof where humans failed.
I am getting "oh shit moments" day by day.
I won’t deny they are useful tools, but the hyperbole from the tech CEOs about them replacing all white collar workers in 12-18 months set the expectation so high that I’m still in the “fancy auto-complete” camp. It still feels nowhere close to replacing anyone, at least where I work. While useful, they haven’t been anywhere close to as useful as promised. Hallucinations and poor guidance are still a regular day-to-day issue that makes it impossible for me to trust agents with anything.
Had they been more realistic with the promises and didn’t frame it as replacing all of us within 2 years, I would have been more excited about the tech. Now that their claims are proving to be false and they’re trying to walk it back, it’s too late. The time for excitement has passed and it’s just something that exists.
The data center battles have also thrown a wet blanket on the tech, as they file lawsuits against towns near me to force construction to begin, despite the towns voting against it. The town can’t afford the fight, so the will of the people and the town gets bulldozed. It’s pretty gross to watch.
"We're traveling to Tokyo on our way home from China. We'd like to plan a trip accessible by train that hits some beaches, some hot springs, and allows me to get the 4th does of a rabies vaccine sequence (the first three shots were rabvac)"
I am using codex and claude on a linux host connecting from a Widnows machine using ssh.
No matter what I tried I couldn't get "Shift+Enter" to work. I said fuck it, cloned kitty and alacritty and asked Claude to implement a terminal emulator for Windows that would render everything using DX12 and support modifyOtherKeys plus DA responses, and within a few days it was ready!
I don’t know about “Oh shit”. I’ve had many “It’s shit” moments.
My "oh shit" moments come every time I see people glazing AI
"Oh shit. My skills I spent my life building are going to go to zero value. I'm going to have to dramatically change careers in my forties or I'm just going to wind up being a schmuck prompting these stupid fucking machines for the rest of my life"
Oh shit indeed
My oh shit moment was Opus 4.6 before it got nerfed.
It helped me refactor my old app. Something I always wanted to do, but didn't have time/mental capacity to do in a short space of time.
I wrote a short prompt, explaining how I want it to look like and which files it should go through. It asked me a few clarifications and then basically one shotted it.
Everything compiled and worked. Now my internal app is much much easier to extend and test.
I tried few more things like that and spent like £5k in the tokens in those two weeks.
Then it got nerfed and never worked like that again.
Now I don't use AI, because it is shite again. Even Opus 4.8.
I use claude code on a daily basis, but honestly it becomes more annoying the more I use it. Why? I think because I ask it to do something and unless I'm extremely specific, either the code is verbose or the feature I'm designing is done in a poor way. For me, the productivity gains aren't that great and I'm even considering whether to go back to doing things by hand to save myself the frustration. Sure, if you don't care about code quality or scalability, it's a great thing to generate code. And yes, there are times when I don't, but for real projects, I actually do because I know as an engineer those things do matter in the long run. So, to be honest, I still haven't had that moment.
My first time using Grok. I'd been so used to using AI models that declined to do things I told them, like tagging people in a video feed, helping me "optimize" my taxes or managing my Twitter bot farm.
Grok just did these things for me, no questions asked, no ethical judgments. No woke.
Elon really doesn't get enough credit for Grok. People don't want the most powerful reasoning model or "constitutional AI". They just want a model that does what they say. Elon understood that insight (like he usually does) and no one else really did and that's probably why Grok has been growing rapidly over the last two years or so.
F*ck me, astroturfinf is strong here and on reddit