Codex-maxxing

skiing_crawling

Is this LLM psychosis? So much tending and conversing with the matmuls but what was the outcome? Are people who get this into it more successful somehow? It reminds me of people who take drugs and get "revelations" but then are not particularly over represented in the group of successful people for all of their deep insights.

show comments

lionkor

I hope I never have to work with people like this. Actual nightmare fuel to live your entire life through LLMs. "I trained Claude to love my wife so I can focus on prompting" vibes.

show comments

4k0hz

The author of this post works at OpenAI on the Codex team.

show comments

bilekas

> Thariq has a very good post about preferring HTML over Markdown as an output format. I think that instinct is right

I bet you do, working at OpenAI you get paid for more token use.

grebc

All the AI stuff lately is just like Unix Porn reddit but posted to places where the people don’t care about it.

show comments

isodev

Anything with “maxxing” in the name is most likely not good for you

show comments

manuisin

Codex lags when chats become too long. Barely takes a day before loading certain chats freezes the UI and causes all sorts of issues

bilekas

> Last week I tried to migrate the Python Rich library into Rust. Because the original project already had a large unit test suite, I could set a goal like: migrate Rich into Rust, but it must pass all the unit tests from the original library.

At what point do we stop calling this development ? It's nothing even close to the process of development or engineering. "I tried to migrate X". No you didn't, you tried to ask an LLM and hoped for the best.

I mean, honestly at what point would you bother, there's no learning happening, there's no creativity happening, just talking to a literal text generator to request your refund while you go for a shower, novelty, maybe even convenient but absolutely not development.

esperent

> Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention...

> When I come back to Slack, replies are often already sitting in drafts. I still decide what gets sent, but the expensive part of gathering context is done.

This just feels so dystopian to me. I hope that I never work with you or someone else doing this.

I personally do use LLMs for work messaging but I'm extremely careful to state clearly like "here's a draft for that quotation request that Claude wrote:" or something like that. I would never present that as my own words.

show comments

parf02

Most people I know underutilize voice mode. Such a game changer for making brain dumps the LLM can just gobble up

show comments

mohsen1

in tsz (https://tsz.dev) I am Codex-Maxxing with this:

Give each Codex an AgentName and ask them to mark their PR/issue/comments with those. Have one or two "managers" that manage PRs and overall project direction. I write the project directions and make long lasting issues. Each Codex session has an almost unachievable `/goal` but they are asked to achieve the goal by landing changes in `main` via PRs

I am running about 14 Codex sessions on 4 machines right now for about two weeks since OpenAI 10x'ed my 20x account and I simply can not run out of tokens fast enough.

Side note: I have multiple Claude accounts too but the new Claude Code `/goal` command is seriously broken. It waits long pauses between iterations and sometimes prematurely stops.

show comments

nubg

> When I come back to Slack, replies are often already sitting in drafts.

He must be a pleasure to work with

syl5x

inb4 "I got prompt injected and they stole my stuff". Now real talk, there are some viable usages of codex here but nothing novel its the same "old": "MEMORY,VAULT,BG TASKS" that everyone is doing.

And about voice mode, I thought it was a good idea but I seriously don't know how you guys use it, my thoughts whenever I use voice are "aaaaaaaaahhhhhh, uhmmm" and then cancel it so that I can type and organize my thoughts. I don't really think those "brain dumps" are useful when you are thinking out loud like "We should really do X oh wait but actually Y is in the way and we have to take into consideration Z, but wait Y was actually done" and so on, and it turns out that your assumptions are wrong, it becomes a mess. I am in favor of the LLM to work with facts and always verify it. To me this post is basically selling Codex app and that's it, nothing new inside.

ivanbelenky

something is happening with `codex`, at tamarillo.ai we did a [little experiment](https://research.tamarillo.ai/coding-harness-inspection/), with 400K repos that have AI harnesses configured and very interesting behavior is observed

- growing fast as fuck

- overepresentation on starred repos (even though stars mean less these days, it is definitely something to look at)

- overepresentation in `rust`

- in terms of aliveness, codex is first

mwilcox

Slop

m3ch4m4n

lol why is this on the front page of hacker news?

armada1122

The diff-as-review point is the one I keep coming back to.

The cost of memory-as-files isn't writing them. It's that the agent will cheerfully claim it updated something and not actually do it, or write a one-line stub that satisfies the spec but loses the original signal. Without a verification layer, the vault accumulates plausible-looking entries that quietly drift from reality.

What ended up working for me was treating the agent's self-reported summary as a wish, not a fact. A separate process diffs the actual file system against the claimed changes and flags mismatches.

After a few cycles, the agent gets calibrated and stops claiming things that don't survive a file check. That has the side benefit of making the diff review itself much higher signal: most of what shows up is real.

The split I'd make early is per-agent instructions vs. cross-thread shared notes.

They sound like the same artifact, but “what this agent should always do” and “what sibling work just learned” age very differently. Mixing them means the wisdom gets stale together.