Codex for almost everything

cjbarber

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

show comments

daviding

There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.

show comments

jampekka

Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done using CLI commands.

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

show comments

woeirua

Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.

show comments

s1mon

I've been using the Codex app for a while (a few months) for a few types of coding projects, and then slowly using it for random organizational/productivity things with local folders on my Mac. Most of that has been successful and very satisfying, however...

Codex is still far from ready for regular people. Simply moving a folder that Codex has been working on confuses the hell out of it. I can't figure out how to fix "Current working directory missing. This chat's working directory no longer exists". I've tried asking it to fix the problem and it tries lots of terminal commands and screws around with SQLite. Something this brittle is not for non-developers.

show comments

ymolodtsov

Tried it out. It's a far more reasonable UI than Claude Desktop at this moment. Anthropic has to catch up and finally properly merge the three tabs they have.

The killer feature of any of these assistants, if you're a manager, is asking to review your email, Slack, Notion, etc several times a day to highlight the items where you need to engage right away. Of course, if your company allows the connectors to do so.

Codex is pretty seamless right now and even after they cut on their 5-hr limits their $20 plan is still a little bit more generous.

I'd still say that Claude models are superior and just offer good opinionated defaults.

incognito124

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

show comments

mrtksn

Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

show comments

plastic041

Prompt in the second video: "Reduce the font and tagline length"

Now we are using LLM just to adjust font size?

Also third video: "Generate an image for the hero section..."

I can't understand why OpenAI(or Google, or whatever AI companies) thinks it's okay to put an AI generated image for product description. It's literally fake.

show comments

thomas34298

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

show comments

overgard

Maybe I lack imagination, but I just can't figure out what I'd use this for. I'm finding AI helpful in writing code (especially verbose Unreal Engine C++ code) as a companion to my designs, but, I really don't want it using my computer. I dunno, I guess the other use case would be summarizing slack or discord but otherwise this seems to me like a solution in search of a problem.

show comments

andai

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

show comments

uberduper

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

show comments

darepublic

> Our mission is to ensure that AGI benefits all of humanity.

In order to do this we will eat everyone's lunch.

gchamonlive

Started using https://github.com/can1357/oh-my-pi this week and it makes every other tui coding assistant look like toy projects. It's has a nice UI yes, but the workflows it comes up with are incredible. They need to do a major overhaul in customisability for codex to come close to it.

frde_me

I enabled the computer use plugin yesterday. Today I asked it to summarize a slack thread, along with a spreadsheet without thinking about it

I was expecting it to use MCPs I have for them, but they happened to not be authenticated for some reason

I got _really_ freaked out when a glowing cursor popped up while I was doing something else and started looking at slack and then navigating on chrome to the sheet to get the data it needs

Like on one hand it's really cool that it just "did the thing" but I was also freaked out during the experience

haritha-j

Interesting that its restricted to macOS. I know programmers almost exclusively use macOS, but regular folk primarily use windows for work. I might be a bit biased as an engineer, but even outside of my circle, I mostly see windows being used. If they're serious about extending from coders to non technical business users, I would imagine they need to support windows.

aliasxneo

Has anyone figured out how to stop the Codex app from draining my M5 Pro's battery in like 2 hours? I can literally just have it open and my lap turns into a heater. I've tried adjusting all sorts of settings and haven't been able to make a dent. I'm assuming its the garbage renderer.

show comments

ElijahLynn

Maybe they could use Codex to build a Linux app...

show comments

hk1337

I’ve done a lot with Claude and OpenAI both, A LOT, but I’m still a little wary at letting it have too much access so I haven’t tried this feature in either of them.

LukaD

More like codex for nothing. I canceled my 20$ plan and won't let myself be bullied into buying more expensive plans to have the same limits I used to have a week ago on the 20$ plan. I would not be surprised if this illegal where I live.

Oarch

"You've hit the message limit, upgrade to Plus for more".

Ok. I upgrade.

"You've hit the message limit, upgrade to Plus for more".

Hmm. They've charged me. There's no meaningful support. I just got scammed, didn't I...

show comments

swiftcoder

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

ookblah

pretty much you have to build for humans as the "source" of truth and then have a robust agentic surface if you want to survive as a company. after using linear (for ex.) u can really see how it all fits together, i can be in cli, co-workers in slack, cowork, whatever and update tasks from anywhere). i refuse to use shit where i have to context switch by going into an app now. posthog is another good example of where it's going. the dirty detail now is that you HAVE to have the actual app so you can still manually look at data and do operations.

lucrbvi

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

show comments

epitrochoid413

Lets see how OpenAI holds up. They prolly shitify or dumb down their models like Anthropic to finally turn their massive loss streak into a profit.

moomin

Wait, did they just send out a press release boasting that they’re bundling Jesse Vincent’s Superpowers?!

show comments

kelsey98765431

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

Xenoamorphous

Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).

I wonder if there’s something off the shelf that does this?

show comments

OsrsNeedsf2P

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

show comments

agentifysh

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

show comments

techteach00

I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.

show comments

bughunter3000

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

solarkraft

Which Codex is this? The open source one that can be built upon or the proprietary desktop app? It looks like the latter.

vinhnx

A simple mental model for Claude's new adaptive thinking is that it is the recommended way to use extended thinking. Adaptive Thinking (wraps Extended Thinking). It applies to Opus 4.7, 4.6, and Sonnet 4.6 and is the default mode on Claude Mythos Preview.

enraged_camel

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

show comments

fg137

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

show comments

dhruv3006

I love computer use man

bobkb

Using Claude and Codex side by side now . Would love to just use one eventually

show comments

tommy_axle

OpenClaw acquisition at work.

show comments

maybeahacker

I don't think this one did it. time to for the real release

sidgtm

They felt the pressure of posting something after Claude 4.7

show comments

sharts

Can we get up from our desk and leave our codex session (or claude for that matter) and then continue using it with our iphone while having lunch or commuting on a train?

Without 3rd party tools/plugins.

hyperionultra

Tool for everything does nothing really good.

solenoid0937

Codex is HN's darling now because Anthropic lowered rate limits for individuals due to compute constraints. OAI has so few enterprise users they can afford to subsidize compute for this group a lot more than Anthropic.

Eventually once they have more users they'll do the same thing as Anthropic, of course.

It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

show comments

throw_m239339

All of you are ironically completely oblivious to the fact that you're training your own replacement by using these tools, you're even paying for it. Eventually, the companies you work for will just "hire" Anthropic or OpenAI agents in your place and you'll be out of job, no matter your seniority. Mark my words.

show comments

tvmalsv

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

show comments

hmokiguess

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

jauntywundrkind

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

graphememes

cursor has been doing this for months, welcome to 3 months ago

xpe

Please don't forget that OpenAI's leadership has shown the world what it is really made of.

CrzyLngPwd

"Our mission is to ensure that AGI benefits all of humanity. "

They have AGI now?

show comments

shevy-java

> Codex can now operate your computer alongside you

I am getting some strange vibes here ... is AI actually also spying on these developers?

armcat

Is it OpenAI Cowork?

SilverBirch

Just commenting here to impact the controversy score.

tty456

I'm sure it's been said before, but more and more our development work is encroaching on personal compute space. Even for personal projects. A reminder to me to air gap those to spaces with separate hardware [:cringe:]

saltyoldman

Claude had this, the "app" both of them have (not the terminal stuff) are mirroring each other's features.

eduction

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

huqedato

"Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

thm

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

ex-aws-dude

Can't help but think the surface area for security issues is becoming massive with these tools

TheServitor

Mac only? Meh.

rommelsLegacy

I am quite worried that people are continuing to use OpenAIs offerings just because it works. Everyone here seem to gloss over the fact that this is a project funded by Peter Thiel. Thousands of morslity posts, complaints about ICE, Tump etcand yet you all choose to use a tool created and funded by the same person enabling this dictatorial machine.

I am speechless everytime I see posts like this and the comments following, vote with your behavior stop supporting and enabling the Peter Thiel universe, just a few weeks ago we had an oped about openAI and Sam, look into yourselfs and really reflect on whom you are enabling by continuing to contribute to their baseline

show comments

VadimPR

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

show comments

messh

SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

croemer

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

Glemllksdf

Man this progress is fast.

Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.

I'm waiting for the open source ai ecosystem to catch up :/

postalcoder

I wish Codex App was open source. I like it, but there are always a bunch of little paper cuts that, if you were using codex cli, you could have easily diagnosed and filed an issue. Now, the issues in the codex repo is slowly becoming claude codish – ie a drawer for people's feelings with nothing concrete to point to.

show comments

lionkor

The first example is tic tac toe. Why would anyone bother? None of those eash things are relevant for people who use AI. They don't care about learning, improving, exploring how things work, creating, being creative to that degree. They want to hit buttons and see the computer do things and get a dopamine rush.

show comments