Easily the most interesting part of this announcement is buried in the second to last paragraph:
"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."
750 tokens/s on a frontier model is going to be extremely interesting. I doubt this new version is anything but a version bump in terms of capabilities but if we can start getting these answers back faster, they end up being more useful.
Just off the top of my head, I can think of the tedious task of finding certain functionality within a codebase. I usually can't beat an AI agent harness at this task today. If the AI model is 3x faster I have less of chance.
show comments
HyperL0gi
Here is a trend I'm noticing:
- GPT-5 mini costs $0.25/$2 and will be discontinued in December.
- GPT-5.4 mini costs $0.75/$4.5 and is supposed to be the replacement.
- GPT-5.4 nano costs $0.2/$1.25 and, while it ranks better in benchmarks than GPT-5 mini, it's not even close when you test it in real scenarios.
So you're left being forced to go to GPT 5.4 mini if you use 5 mini today.
The same thing is happening here as their “Luna“ model will cost $1/$6.
Can't we just stay with the models we actually want? I don't need GPT 5.4 mini. GPT-5 does the job.
Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.
show comments
macrolime
GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints.
I think GPT writes code the best. How well will it write in version 5.6? It gives me chills.
Recently, I went head-to-head with GPT on nearly 2,000 lines of code, and GPT's solution was superior and faster. I even referenced multiple codebases on GitHub while trying, but they were incomparable to GPT.
So using GPT brings both fear and excitement.
The fear comes from realizing that this level of code is now the average for most people. The excitement comes from knowing that I can now study and learn at this level too.
I'm really looking forward to seeing how much more advanced the code will be with the upgrade to 5.6.
show comments
jumploops
If you used GPT-5.5 over the last 24 hours or so, you may have already had access to 5.6.
I've been running some tests on a harness we're building, and suddenly saw a jump in a few points yesterday. I reran the vanilla codex benchmark and saw an ~88% score on Terminal Bench 2.1 from GPT-5.5 on vanilla Codex.
The biggest indicator, beyond the score, was that 3 tests which frequently hit "safety" blockers with 5.5 started succeeding last night without warning.
show comments
mohsen1
> Additionally, we’re introducing a new `ultra` mode that goes beyond the capabilities of a single agent by leveraging subagents to accelerate complex work.
I'm curious about how does this work? Do the subagents also get to use the same tools? Will the client be flooded with tool calls? Why extra pricing for a new "model" when the same thing can happen in the client with more controls?
And if it's an army of subagents, why do they compare it to Fable and Mythos? Those models with similar harness would probably bench better I'm guessing
show comments
itomato
What does the relationship between frontier and flagship capability look like when mapped to actual adoption and user habits?
This is like advertising the latest achievements during Space Race, when Johnny just wants a Space Helmet and “friendly futuristic AI robot helping humanity, glowing blue eyes, white glossy body, holographic interface, floating transparent screens, digital particles, neural network background, cinematic lighting, volumetric god rays, ultra detailed, hyper realistic, 8K, masterpiece, award-winning, octane render, Unreal Engine 5, ray tracing, sharp focus, dramatic composition, vibrant blue and purple color palette, futuristic technology, innovation, hope, smiling business professionals, depth of field”
ComputerGuru
“ Terra has competitive performance to GPT‑5.5 [while being 2x cheaper]…”
To me that means “it’s an inferior product but marketing dictates we try and hide that.”
And “our most robust safety stack to date. We strengthened protections for higher-risk activity, sensitive cyber requests, and repeated misuse, and spent multiple weeks finding weaknesses, pressure-testing our system, and hardening it against real-world attacks” is of zero value to me at best, and most likely to my detriment (increasing refusals or nerfing utility). Why do providers keep leading with that? Are there customers (besides support ChatGPT chatbot users, maybe??) that ask for this?
show comments
sim04ful
"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."
This seems like it would be the largest and first closed-source model Cerebras has offered till date
show comments
anentropic
Previewing <minor version bump>: a next-generation model
show comments
scrlk
> Sol, Terra and Luna
So the next naming scheme might be FTX, Madoff and Enron? :^)
supermdguy
> We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed.
This is really exciting. I work on voice AI, and we're still using 4.1/4.1 mini since none of the frontier models come close on latency. I'm excited to be able to have more interactive experiences, I think it'll unlock new ways of working with these models.
show comments
seaal
Did GPT-5.6 Sol Ultra decide the terrible colors for the benchmark graphs?
show comments
bluepeter
I feel a bit like a Soviet hearing about Levi’s or the latest Springsteen release. C'mon!
Agent Arena (Dynamic ranking of models on how well they orchestrate tools for real-world agentic tasks, based on signals like tool reliability, task completion, and steerability.)
Top 10, Highest rank to lowest
Claude Fable 5 (High), Claude Opus 4.8 (Thinking), GPT 5.5 (xHigh), Claude Opus 4.7 (Thinking), GPT 5.5 (High), Claude Opus 4.7, Claude Opus 4.6, GPT 5.5, GPT 5.4 (High), GLM 5.2 (Max)
Text Arena
View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.
The choice of the name Sol is interesting for those Raised By Wolves fans out there… “Praise Sol!”
rappatic
Seems like OpenAI has succumbed to the urge to give their models catchy names like Anthropic does
show comments
mekpro
We need more coding benchmark score.
Not sure that winning terminalbench 2.1 alone is a clear win over Fable/Mythos yet.
show comments
arend321
For me this is the trigger to start integrating deepseek as a fallback.
ant-kinesthetic
How much dynamic routing do we think is being done here, especially in light of the cheaper options be 2x less cost than 5.5. I think learned routing is interesting because it could be the case that it only works as a way to get token and cost efficiency for in distribution tasks (like these benchmarks), yet on real world scenarios it could trend towards the same cost as the Sol cost.
corygarms
I'll buy that its next generation if the svg bicycle pelican is carrying a baby
show comments
Topfi
Is this a new pre training run independent of 5.5s or post trained on it with Cerebras support and a rebrand of Pro mode at more usable speeds as Sol? The latter seems more likely to me, especially as 5.5 scales very well across its modes so separate branding could make sense, but I don’t see any clear information either way.
dmzxnico
I saw they are placing this model above Mythos and Fable. Interesting to see how good it's going to compare.
I'd really like to see other companies like Chinese ones compete at this level.
Pricing on GPT 5.5 is already super high and having more competition can only help :)
caine22
Insane if it actually beats Mythos, though i know we only had a sneak peak of it in Fable. Neverthless, W
vatsachak
All of these LLMs are getting better at being at an LLM
But GPT-5.5 is as useful an LLM can be; it has solved lemmas I've thought about for a year, it can implement typed STLCs in Rust when I give it a formal grammar, it can help me analyze Postgres planner dumps.
It's great at tasks that have short solutions but
- they cannot learn based on a project
- their long term planning capabilities are worse than worms
- they are unconfident in decision making
- their internal representations are disgusting compared to JEPA
- they don't have any "system
clock" like humans and computers do
- LLM architecture is not modular like computer architecture or human brain architecture
There's so many issues with LLMs. I wish that companies can start working on the next generation of architectures before the bubble pops
show comments
chopete3
>> We are taking this short-term step because we believe it is the strongest path...
>>During this preview, we will continue testing and coordinating closely with partners as we work toward broader availability.
Instead of generating negative publicity, can't they just wait for the preview period to get over?.
What does openAI announce when they know others can't access it?. Curious question - what do they gain from this?
abixb
I like the fact that OpenAI went with a three-part celestial naming convention to one-up Anthropic's literary naming concention. Maybe we'll get Stellar and Galactic someday.
loufe
"Next generation model"
If it was the next generation, why isn't it a major version change..?
show comments
NetOpWibby
How are they able to compare with Fable when Fable was only available for three days?
show comments
maxiniol
Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ?
Is the 750 token/s achieved using this technique ?
show comments
Cryptosale75
Why is 'Cybersecurity' always the frontier push? Literally no one, except Altman talks of AGI anymore.
Are we starting to see the 'we just realized that 100,000,000 GPU's later, 2+2 isn't the magic number, no matter how many times we calculate it' hit home?
danielabinav160
Benchmarks are nice but what's the latency at scale? That's what actually matters for production.
trkaky
shouldn't I get access to 5.6 on a 200$ account automatically as promised?
andai
Hijacking popular thread to ask: What are the usage limits now for Codex and Claude?
A while back I gave the same task to both, and Codex used 20x less of my 5-hour limit (both on the $20/month plan).
(This annoyed me since I tend to prefer Claude, but the limits at the time made it unusable for anything serious.)
However, since that time, both providers have massively reduced usage allowances (and at least one of them has gotten sued for it, lol).
I'm not currently subscribed to either but I'm weighing my options. With GPT being slightly better than Opus, and it used to have way higher limits, I'm leaning in the direction of an OpenAI sub. But I'm wondering if the current state matches my memory from 2-3 months ago. (Since both companies appear to be cost-cutting hard!)
Prefer responses from people who use both, but anecdotes welcome :)
Thanks!
show comments
bijowo1676
Waiting for @simonw to report on this, before I read and try it
show comments
mccoyb
When will GPT-5.6 Protomolecule drop? Me and the boys on Eros can't wait to get our hands on it!
show comments
Sathwickp
sol = mythos
terra = opus
luna = sonnet/haiku
basically
show comments
sim04ful
Sol and 5.5 pro are in parity at $5 input / $30 output. What I'm inferring from this is that:
- model weight size didn't change, and this is mostly a result of better model architecture and scaled up RL
- better hardware utilization and and they're making better margins OR
- worse hardware utilization and they're okay with digging into their margins.
show comments
leumon
> We plan to make them more broadly available to people using ChatGPT, Codex, and the API soon.
I hope this means then fable will also get released again.
show comments
brown_munda
It is just sad that we are geographically gating the models now. This could lead to more inequality in Software Engineering over time.
jimmydoe
Is there a list of Gov-approved companies?
If this is the new norm, we as workers should all start look for jobs in those companies.
m3h
If GPT-5.6 preview is not available outside US government approved "trusted partners", I don't see how the General Available can be trusted later.
Who knows what they will fix, block or change in the model between the preview and GA time. Open models can't arrive soon enough.
show comments
addozhang
For a large model based on statistical probability, at such a fast speed, if it executes n rounds 99.9% of the time, how much would the accuracy drop?
low_tech_punk
all the emphasis on cyber security. feels like a reaction to anthropic, not a real next generation.
show comments
monster_truck
If this thing is supposed to be so good, why does all of their software still work the way it does? Take a stroll through the most revent several pages of github issues on codex, there are some fucking embarrassing bugs in there.
zftnb666
GPT-5.6 Sol. 5.7 Luna. 5.8 Mars. Meanwhile my code still runs on GPT-3.5 and nobody noticed.
show comments
isomorphic_duck
If Claude Mythos and Fable 5 are the same underlying models just with different safeguards, I fail to see how TerminalBench has them at different scores.
show comments
swe_dima
Pleasantly surprised that it costs as GPT 5.5,
thank god for the competition.
smeeth
The sooner the USG figures out a standard process for approving releases the better. There are many differing opinions on how much to regulate AI, but I think we can all agree ad-hoc policy sucks.
asmnzxklopqw
Terra and Luna? Last time I had heard that, it didn’t end quite well
binarymax
If it’s a “next generation model” then why isnt it GPT-6 and not just a minor version bump over 5.5?
chapz
If I, as a consumer can't access it, it might aswell be just a marketing hoax. I will believe it when I will be able to use it. IDK why companies publish blog posts about stuff that will come out in months...
5555watch
Will it also have hardcoded self-lobotomy if asked about cutting edge ML or LLM solutions? (Looking at Fable here)
nsingh2
I'm really getting sick of reading about safeguards and what I'm not allowed to do on every model release.
show comments
nopakos
People where mocking EU for regulations and now this is happening in the US.
I know that Europe is behind in AI but still...
show comments
zkmon
It appears that between GLM-5.2 and GPT-5.6, anthropic is feeling the heat, atleast in the bang-for-the-buck heuristic?
show comments
ahmedehab_01
I hope Sol doesn't get blocked like what happened with Fable.
dainiusse
I looked at the charts and it is clear that 88% from OpenAI is more than 88% from Anthropic.
bobkb
Will it be accessible to anyone ?
duggan
> As part of our ongoing engagement with the U.S. government, we previewed our plans and the models’ capabilities ahead of today’s launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly.
The clowns in the US administration can barely remain coherent from one sentence to the next.
Having them be the gatekeepers of technological progress in 2026 is fucking lame.
show comments
ddwrll
What happened to the nano/mini/standard/pro naming scheme, which worked perfectly fine and is intuitive to understand? Why does OpenAI insist on having the most inconsistent and confusing model and product names possible?
I'm looking at you Codex.
show comments
OsrsNeedsf2P
Like Mythos before it, I'm simply not excited about a model I can't use
show comments
mikkelam
Would love to see benchmarks on cognition's FrontierCode
osti
Sol? Looks like openai is jealous of anthropics good model naming ability and wants to emulate it.
show comments
GodelNumbering
I do not like the fact that this forces people to remember one more hierarchy of "Sol vs Terra vs Luna". OpenAI was supposed to simplify their naming since at least 2025.
show comments
josefrichter
Sol, Terra, Luna – crypto disaster vibes
show comments
taosu_la
so where is gemini ? are u alive?
solfox
Love the name!
arendtio
I didn't know that I was color blind, but thanks to those charts, I think I need to see a doctor...
I mean, you can read them even without the colors, but who on earth thought that those are a good set of colors? Oh, I forgot it was probably someone on 'Sol'.
show comments
micimize
Haven't we established defensive and offensive security usage are intractably entangled? I.e. "patch all [security] bugs, make no mistakes" gives one a list of potential exploits to hand off to less capable models.
Doesn't that undermine all good-faith discourse on cybersecurity safeguards, controlled usage etc? Or is that overstating the case (I'm not a security researcher myself so kinda parroting).
ddp26
I'm going to pre-register my prediction that GPT-5.6 Sol is significantly behind Claude Fable 5, as evaluated by general consensus once time has passed for people to get familiar with both.
show comments
CurbStomper
Boring and Gay.
hereme888
Seems like OpenAI's strategy to release models after Anthropic has been paying off.
Is it just me, or does it seem like Anthropic has been more of a pioneer the past few years, and OpenAI tries to copy features they like?
show comments
moomin
The language used in this press release is borderline hilarious. It’s simultaneously trying to tell you how great it is while also telling it’s not THAT great. Nothing to worry about, move along.
casey2
Sol, Terra, Luna? They are trolling (ragebaiting) with their naming now
phplovesong
Is there any model that rivals Opus or Fable? I would like to try something else, as Anthropic is pretty suss.
show comments
kissgyorgy
we expect substantial benefit for legitimate defensive work, while meaningfully constraining prohibited offensive use.
That's literally impossible. Writing an exploit agains a known vulnerability needs the exact same knowledge that defending against the exploit of the same vulnerability.
Also just making the model better at code is just making it better to writing offensive code.
simianwords
No comments on the cerebras version that might finally enable intelligent voice mode instead of being stuck with 4o-mini class
rvz
Other than the worst naming I have ever seen (Sol / Terra / Luna), the pricing is still expensive:
> GPT‑5.6 is priced per 1M tokens across three model sizes:
> Sol is $5 input / $30 output;
> Terra is $2.50 input / $15 output
> Luna is $1 input / $6 output.
The OpenAI casino has never been more ready to take your money on gambling even more tokens.
Anyone know the latest around Fable being re-released after gov smackdown?
h4x0rr
FUCK the US government. That's it, I am rooting for China now
delduca
Let us protect the world from a big slop
simianwords
Thoughts
1. Naming convention is copied from Anthropic and honestly is more catchy than a number (amongst normal people)
2. How in the world did Anthropic have to do all the theatrics about Mythos just to have OpenAI release an equivalent or stronger model a month later without any drama???
3. Cheaper models are just don’t fit any usecase imo and OpenAI knows it so they keep increasing the floor - I’m still convinced task per capability is reduced with each release
4. How in the world would open source models keep up with the multi layer security? Either this security is all theater or we will finally see a ceiling in open source models because by definition they can’t have those protections
5. Cybersecurity things are boring to me because it’s all zero sum cat and mouse games
show comments
submeta
Are GPT 5.5 and Opus 4.8 the last models we're going te be allowed to use in Europe? Is there going to be a cut, and we're only be allowed to use less capabale models outside of the US?
I mean, if they deem Fable 5 to powerful to share with the rest of the world, what's left for us?
show comments
throwitaway222
Sun Earth Moon
meetpateltech
Another model family, another naming scheme to get used to.
Sol Ultra ≈ Pro
Sol ≈ Standard
Terra ≈ Mini
Luna ≈ Nano
show comments
BoorishBears
> For GPT‑5.6 and later models, cache writes are billed at 1.25x the model’s uncached input rate, while cache reads continue to receive the 90% cached-input discount.
Not them joining Anthropic with this bullshit. *
Caching infrastructure is already a leaky abstraction over a feature that is not as reliable or debuggable to the end user as it should be, charging for the 'privilege' of interacting with it is really annoying.
(* for reference on 'this bullshit': ChatGPT previously didn't require anything special for a basic level of caching. Unless you wanted extended cache times, it'd just "do the right thing" and try to use nodes that had your prefix already cached in memory)
show comments
mrcwinn
AI is just autocomplete. -> AI must be regulated. -> We want AI.
urig
It's only next generation? Anthropic has frontier models! lol
nubg
A question I always have is, how to the AI labs safeguard the leak of their model? Training a cutting edge model basically cost a minimum of hundreds of millions of dollars. And its all contained within a file. Okay, that file might be 500GB large, but its still just one blob that is worth almost a billion dollars. And they need to train new models every few weeks, have lots of people with access to it to debug it, run inference etc. I wonder when we will see the first leaks? Imagine if e.g. Opus 4.8 got leaked. Wouldnt that bankrupt Anthropic?
show comments
ALittleLight
I hate not being able to use the latest models. There needs to be a much faster resolution to whatever is happening with the federal government.
show comments
da_grift_shift
Flagged activity can also trigger account-level review across relevant conversations and risk signals, consistent with our terms and policies around content retention and review. Looking beyond a single conversation helps our systems distinguish persistent malicious behavior from legitimate dual-use security work, where similar technical concepts may appear in very different contexts.
Fascinating!
Every conversation you have with these "more capable" models will be monitored and joined up and then your entire account might one day be tagged as Distiller or Cyber Threat Actor or whatnot. When combined with identity verification (which isn't discussed in this press release), expect people to be falsely flagged and banned from ever using OpenAI models again.
Wish I could find the thread from last week where discussions of exactly this kind of thing were dismissed as daft and outlandish.
show comments
oofbey
Another year, and OpenAI comes up with yet another naming scheme for their models. First it was integers (GPT2, GPT3). Then they added friendly names (remember Ada, Babbage, Curie, Davinci?), but decided against it. Instead we got dot integers (GPT3.5), then then letter-number modifiers (o1), plus word modifiers like o1-pro, o3-mini, or -mini-high, or codex, codex-max, Pro, etc.
Now they've got friendly cosmic names. And this time they want us to believe that this time they're gonna stick to a naming convention? I'll believe it when they do 3 releases in a row without inventing a new naming scheme.
masonwan
Guess it's just another price bump hidden behind output token speed.
wonkyfruit
TLDR - It's not quite Mythos but it uses about 5 times less tokens, and those tokens are also cheaper?
they're trying to be anthropic with these model names
ericyd
whoa, a new model that surpasses benchmarks of other models? wild.
johnnyApplePRNG
Doesn't it strike anyone as strange that SOL, TERRA, and LUNA are all quasi-scam crypto tickers?
show comments
throwitaway222
Time to create more LLM based startups.
* House design plans from prompts
* Government surveillance of public communication
* Extracting world/spatial concepts from language models (do we really need a world/spatial models now?)
* Driverless City planning startups
* Election vote rigging/harvesting startups
* Video game NPC backstory startups (all NPCs in GTA 6 go to work, go home, shower, go to sleep now?)
Keep moving don't doom.
renoir
GPT 5.5 in Codex is so much worse than Opus, and sometimes worse than Sonnet. I don't think 5.6 Sol will be anywhere near Fable, let alone Mythos. Probably slightly better than Opus. Maybe not even.
JohnRoseDev
I can’t help but think that these benchmarks are completely fake. Sam even posted a benchmark on X a couple days ago of how the ‘complete version’ of 5.5 cyber was already ahead of Mythos apparently. This just feels like absolutely fake nonsense. The impact of Mythos on the industry was clear and in front of everyone’s eyes. The amount of vulnerabilities Mozilla fixed. The vulnerabilities and exploits Anthropic showcased in that blog post about the chrome sandbox escape etc.
And now we’re supposed to believe this 5.5 cyber is already ahead of Mythos, ok. And yeah, gpt 5.6 is even further ahead, alright.
All: for comments on the policy side please go to this related thread:
U.S. government will decide who gets to use GPT-5.6 - https://news.ycombinator.com/item?id=48690101
Easily the most interesting part of this announcement is buried in the second to last paragraph:
"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."
750 tokens/s on a frontier model is going to be extremely interesting. I doubt this new version is anything but a version bump in terms of capabilities but if we can start getting these answers back faster, they end up being more useful.
Just off the top of my head, I can think of the tedious task of finding certain functionality within a codebase. I usually can't beat an AI agent harness at this task today. If the AI model is 3x faster I have less of chance.
Here is a trend I'm noticing:
- GPT-5 mini costs $0.25/$2 and will be discontinued in December.
- GPT-5.4 mini costs $0.75/$4.5 and is supposed to be the replacement.
- GPT-5.4 nano costs $0.2/$1.25 and, while it ranks better in benchmarks than GPT-5 mini, it's not even close when you test it in real scenarios.
So you're left being forced to go to GPT 5.4 mini if you use 5 mini today.
The same thing is happening here as their “Luna“ model will cost $1/$6.
Can't we just stay with the models we actually want? I don't need GPT 5.4 mini. GPT-5 does the job.
Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.
GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints.
https://metr.org/blog/2026-06-26-gpt-5-6-sol/
I think GPT writes code the best. How well will it write in version 5.6? It gives me chills.
Recently, I went head-to-head with GPT on nearly 2,000 lines of code, and GPT's solution was superior and faster. I even referenced multiple codebases on GitHub while trying, but they were incomparable to GPT.
So using GPT brings both fear and excitement.
The fear comes from realizing that this level of code is now the average for most people. The excitement comes from knowing that I can now study and learn at this level too.
I'm really looking forward to seeing how much more advanced the code will be with the upgrade to 5.6.
If you used GPT-5.5 over the last 24 hours or so, you may have already had access to 5.6.
I've been running some tests on a harness we're building, and suddenly saw a jump in a few points yesterday. I reran the vanilla codex benchmark and saw an ~88% score on Terminal Bench 2.1 from GPT-5.5 on vanilla Codex.
The biggest indicator, beyond the score, was that 3 tests which frequently hit "safety" blockers with 5.5 started succeeding last night without warning.
> Additionally, we’re introducing a new `ultra` mode that goes beyond the capabilities of a single agent by leveraging subagents to accelerate complex work.
I'm curious about how does this work? Do the subagents also get to use the same tools? Will the client be flooded with tool calls? Why extra pricing for a new "model" when the same thing can happen in the client with more controls?
And if it's an army of subagents, why do they compare it to Fable and Mythos? Those models with similar harness would probably bench better I'm guessing
What does the relationship between frontier and flagship capability look like when mapped to actual adoption and user habits?
This is like advertising the latest achievements during Space Race, when Johnny just wants a Space Helmet and “friendly futuristic AI robot helping humanity, glowing blue eyes, white glossy body, holographic interface, floating transparent screens, digital particles, neural network background, cinematic lighting, volumetric god rays, ultra detailed, hyper realistic, 8K, masterpiece, award-winning, octane render, Unreal Engine 5, ray tracing, sharp focus, dramatic composition, vibrant blue and purple color palette, futuristic technology, innovation, hope, smiling business professionals, depth of field”
“ Terra has competitive performance to GPT‑5.5 [while being 2x cheaper]…”
To me that means “it’s an inferior product but marketing dictates we try and hide that.”
And “our most robust safety stack to date. We strengthened protections for higher-risk activity, sensitive cyber requests, and repeated misuse, and spent multiple weeks finding weaknesses, pressure-testing our system, and hardening it against real-world attacks” is of zero value to me at best, and most likely to my detriment (increasing refusals or nerfing utility). Why do providers keep leading with that? Are there customers (besides support ChatGPT chatbot users, maybe??) that ask for this?
"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."
This seems like it would be the largest and first closed-source model Cerebras has offered till date
Previewing <minor version bump>: a next-generation model
> Sol, Terra and Luna
So the next naming scheme might be FTX, Madoff and Enron? :^)
> We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed.
This is really exciting. I work on voice AI, and we're still using 4.1/4.1 mini since none of the frontier models come close on latency. I'm excited to be able to have more interactive experiences, I think it'll unlock new ways of working with these models.
Did GPT-5.6 Sol Ultra decide the terrible colors for the benchmark graphs?
I feel a bit like a Soviet hearing about Levi’s or the latest Springsteen release. C'mon!
If it's a new generation why isn't it GPT-6?
Some interesting stats here about the current landscape https://arena.ai/leaderboard/agent
Agent Arena (Dynamic ranking of models on how well they orchestrate tools for real-world agentic tasks, based on signals like tool reliability, task completion, and steerability.)
Top 10, Highest rank to lowest
Claude Fable 5 (High), Claude Opus 4.8 (Thinking), GPT 5.5 (xHigh), Claude Opus 4.7 (Thinking), GPT 5.5 (High), Claude Opus 4.7, Claude Opus 4.6, GPT 5.5, GPT 5.4 (High), GLM 5.2 (Max)
Text Arena View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.
Top 10, Highest rank to lowest
claude-fable-5, claude-opus-4-6-thinking, claude-opus-4-7-thinking, claude-opus-4-6, claude-opus-4-7, muse-spark, gemini-3.1-pro-preview, gemini-3-pro, claude-opus-4-8-thinking, gpt-5.5-high
The choice of the name Sol is interesting for those Raised By Wolves fans out there… “Praise Sol!”
Seems like OpenAI has succumbed to the urge to give their models catchy names like Anthropic does
We need more coding benchmark score. Not sure that winning terminalbench 2.1 alone is a clear win over Fable/Mythos yet.
For me this is the trigger to start integrating deepseek as a fallback.
How much dynamic routing do we think is being done here, especially in light of the cheaper options be 2x less cost than 5.5. I think learned routing is interesting because it could be the case that it only works as a way to get token and cost efficiency for in distribution tasks (like these benchmarks), yet on real world scenarios it could trend towards the same cost as the Sol cost.
I'll buy that its next generation if the svg bicycle pelican is carrying a baby
Is this a new pre training run independent of 5.5s or post trained on it with Cerebras support and a rebrand of Pro mode at more usable speeds as Sol? The latter seems more likely to me, especially as 5.5 scales very well across its modes so separate branding could make sense, but I don’t see any clear information either way.
I saw they are placing this model above Mythos and Fable. Interesting to see how good it's going to compare.
I'd really like to see other companies like Chinese ones compete at this level.
Pricing on GPT 5.5 is already super high and having more competition can only help :)
Insane if it actually beats Mythos, though i know we only had a sneak peak of it in Fable. Neverthless, W
All of these LLMs are getting better at being at an LLM
But GPT-5.5 is as useful an LLM can be; it has solved lemmas I've thought about for a year, it can implement typed STLCs in Rust when I give it a formal grammar, it can help me analyze Postgres planner dumps.
It's great at tasks that have short solutions but
- they cannot learn based on a project
- their long term planning capabilities are worse than worms
- they are unconfident in decision making
- their internal representations are disgusting compared to JEPA
- they don't have any "system clock" like humans and computers do
- LLM architecture is not modular like computer architecture or human brain architecture
There's so many issues with LLMs. I wish that companies can start working on the next generation of architectures before the bubble pops
>> We are taking this short-term step because we believe it is the strongest path...
>>During this preview, we will continue testing and coordinating closely with partners as we work toward broader availability.
Instead of generating negative publicity, can't they just wait for the preview period to get over?.
What does openAI announce when they know others can't access it?. Curious question - what do they gain from this?
I like the fact that OpenAI went with a three-part celestial naming convention to one-up Anthropic's literary naming concention. Maybe we'll get Stellar and Galactic someday.
"Next generation model"
If it was the next generation, why isn't it a major version change..?
How are they able to compare with Fable when Fable was only available for three days?
Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ? Is the 750 token/s achieved using this technique ?
Why is 'Cybersecurity' always the frontier push? Literally no one, except Altman talks of AGI anymore.
Are we starting to see the 'we just realized that 100,000,000 GPU's later, 2+2 isn't the magic number, no matter how many times we calculate it' hit home?
Benchmarks are nice but what's the latency at scale? That's what actually matters for production.
shouldn't I get access to 5.6 on a 200$ account automatically as promised?
Hijacking popular thread to ask: What are the usage limits now for Codex and Claude?
A while back I gave the same task to both, and Codex used 20x less of my 5-hour limit (both on the $20/month plan).
(This annoyed me since I tend to prefer Claude, but the limits at the time made it unusable for anything serious.)
However, since that time, both providers have massively reduced usage allowances (and at least one of them has gotten sued for it, lol).
I'm not currently subscribed to either but I'm weighing my options. With GPT being slightly better than Opus, and it used to have way higher limits, I'm leaning in the direction of an OpenAI sub. But I'm wondering if the current state matches my memory from 2-3 months ago. (Since both companies appear to be cost-cutting hard!)
Prefer responses from people who use both, but anecdotes welcome :)
Thanks!
Waiting for @simonw to report on this, before I read and try it
When will GPT-5.6 Protomolecule drop? Me and the boys on Eros can't wait to get our hands on it!
sol = mythos terra = opus luna = sonnet/haiku
basically
Sol and 5.5 pro are in parity at $5 input / $30 output. What I'm inferring from this is that: - model weight size didn't change, and this is mostly a result of better model architecture and scaled up RL - better hardware utilization and and they're making better margins OR - worse hardware utilization and they're okay with digging into their margins.
> We plan to make them more broadly available to people using ChatGPT, Codex, and the API soon.
I hope this means then fable will also get released again.
It is just sad that we are geographically gating the models now. This could lead to more inequality in Software Engineering over time.
Is there a list of Gov-approved companies?
If this is the new norm, we as workers should all start look for jobs in those companies.
If GPT-5.6 preview is not available outside US government approved "trusted partners", I don't see how the General Available can be trusted later.
Who knows what they will fix, block or change in the model between the preview and GA time. Open models can't arrive soon enough.
For a large model based on statistical probability, at such a fast speed, if it executes n rounds 99.9% of the time, how much would the accuracy drop?
all the emphasis on cyber security. feels like a reaction to anthropic, not a real next generation.
If this thing is supposed to be so good, why does all of their software still work the way it does? Take a stroll through the most revent several pages of github issues on codex, there are some fucking embarrassing bugs in there.
GPT-5.6 Sol. 5.7 Luna. 5.8 Mars. Meanwhile my code still runs on GPT-3.5 and nobody noticed.
If Claude Mythos and Fable 5 are the same underlying models just with different safeguards, I fail to see how TerminalBench has them at different scores.
Pleasantly surprised that it costs as GPT 5.5, thank god for the competition.
The sooner the USG figures out a standard process for approving releases the better. There are many differing opinions on how much to regulate AI, but I think we can all agree ad-hoc policy sucks.
Terra and Luna? Last time I had heard that, it didn’t end quite well
If it’s a “next generation model” then why isnt it GPT-6 and not just a minor version bump over 5.5?
If I, as a consumer can't access it, it might aswell be just a marketing hoax. I will believe it when I will be able to use it. IDK why companies publish blog posts about stuff that will come out in months...
Will it also have hardcoded self-lobotomy if asked about cutting edge ML or LLM solutions? (Looking at Fable here)
I'm really getting sick of reading about safeguards and what I'm not allowed to do on every model release.
People where mocking EU for regulations and now this is happening in the US. I know that Europe is behind in AI but still...
It appears that between GLM-5.2 and GPT-5.6, anthropic is feeling the heat, atleast in the bang-for-the-buck heuristic?
I hope Sol doesn't get blocked like what happened with Fable.
I looked at the charts and it is clear that 88% from OpenAI is more than 88% from Anthropic.
Will it be accessible to anyone ?
> As part of our ongoing engagement with the U.S. government, we previewed our plans and the models’ capabilities ahead of today’s launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly.
The clowns in the US administration can barely remain coherent from one sentence to the next.
Having them be the gatekeepers of technological progress in 2026 is fucking lame.
What happened to the nano/mini/standard/pro naming scheme, which worked perfectly fine and is intuitive to understand? Why does OpenAI insist on having the most inconsistent and confusing model and product names possible?
I'm looking at you Codex.
Like Mythos before it, I'm simply not excited about a model I can't use
Would love to see benchmarks on cognition's FrontierCode
Sol? Looks like openai is jealous of anthropics good model naming ability and wants to emulate it.
I do not like the fact that this forces people to remember one more hierarchy of "Sol vs Terra vs Luna". OpenAI was supposed to simplify their naming since at least 2025.
Sol, Terra, Luna – crypto disaster vibes
so where is gemini ? are u alive?
Love the name!
I didn't know that I was color blind, but thanks to those charts, I think I need to see a doctor...
I mean, you can read them even without the colors, but who on earth thought that those are a good set of colors? Oh, I forgot it was probably someone on 'Sol'.
Haven't we established defensive and offensive security usage are intractably entangled? I.e. "patch all [security] bugs, make no mistakes" gives one a list of potential exploits to hand off to less capable models.
Doesn't that undermine all good-faith discourse on cybersecurity safeguards, controlled usage etc? Or is that overstating the case (I'm not a security researcher myself so kinda parroting).
I'm going to pre-register my prediction that GPT-5.6 Sol is significantly behind Claude Fable 5, as evaluated by general consensus once time has passed for people to get familiar with both.
Boring and Gay.
Seems like OpenAI's strategy to release models after Anthropic has been paying off.
Is it just me, or does it seem like Anthropic has been more of a pioneer the past few years, and OpenAI tries to copy features they like?
The language used in this press release is borderline hilarious. It’s simultaneously trying to tell you how great it is while also telling it’s not THAT great. Nothing to worry about, move along.
Sol, Terra, Luna? They are trolling (ragebaiting) with their naming now
Is there any model that rivals Opus or Fable? I would like to try something else, as Anthropic is pretty suss.
Also just making the model better at code is just making it better to writing offensive code.
No comments on the cerebras version that might finally enable intelligent voice mode instead of being stuck with 4o-mini class
Other than the worst naming I have ever seen (Sol / Terra / Luna), the pricing is still expensive:
> GPT‑5.6 is priced per 1M tokens across three model sizes:
> Sol is $5 input / $30 output;
> Terra is $2.50 input / $15 output
> Luna is $1 input / $6 output.
The OpenAI casino has never been more ready to take your money on gambling even more tokens.
Pre-official discussions:
https://news.ycombinator.com/item?id=48678789
https://news.ycombinator.com/item?id=48683021
Not really news until it's widely available.
Anyone know the latest around Fable being re-released after gov smackdown?
FUCK the US government. That's it, I am rooting for China now
Let us protect the world from a big slop
Thoughts
1. Naming convention is copied from Anthropic and honestly is more catchy than a number (amongst normal people)
2. How in the world did Anthropic have to do all the theatrics about Mythos just to have OpenAI release an equivalent or stronger model a month later without any drama???
3. Cheaper models are just don’t fit any usecase imo and OpenAI knows it so they keep increasing the floor - I’m still convinced task per capability is reduced with each release
4. How in the world would open source models keep up with the multi layer security? Either this security is all theater or we will finally see a ceiling in open source models because by definition they can’t have those protections
5. Cybersecurity things are boring to me because it’s all zero sum cat and mouse games
Are GPT 5.5 and Opus 4.8 the last models we're going te be allowed to use in Europe? Is there going to be a cut, and we're only be allowed to use less capabale models outside of the US?
I mean, if they deem Fable 5 to powerful to share with the rest of the world, what's left for us?
Sun Earth Moon
Another model family, another naming scheme to get used to.
Sol Ultra ≈ Pro
Sol ≈ Standard
Terra ≈ Mini
Luna ≈ Nano
> For GPT‑5.6 and later models, cache writes are billed at 1.25x the model’s uncached input rate, while cache reads continue to receive the 90% cached-input discount.
Not them joining Anthropic with this bullshit. *
Caching infrastructure is already a leaky abstraction over a feature that is not as reliable or debuggable to the end user as it should be, charging for the 'privilege' of interacting with it is really annoying.
(* for reference on 'this bullshit': ChatGPT previously didn't require anything special for a basic level of caching. Unless you wanted extended cache times, it'd just "do the right thing" and try to use nodes that had your prefix already cached in memory)
AI is just autocomplete. -> AI must be regulated. -> We want AI.
It's only next generation? Anthropic has frontier models! lol
A question I always have is, how to the AI labs safeguard the leak of their model? Training a cutting edge model basically cost a minimum of hundreds of millions of dollars. And its all contained within a file. Okay, that file might be 500GB large, but its still just one blob that is worth almost a billion dollars. And they need to train new models every few weeks, have lots of people with access to it to debug it, run inference etc. I wonder when we will see the first leaks? Imagine if e.g. Opus 4.8 got leaked. Wouldnt that bankrupt Anthropic?
I hate not being able to use the latest models. There needs to be a much faster resolution to whatever is happening with the federal government.
Every conversation you have with these "more capable" models will be monitored and joined up and then your entire account might one day be tagged as Distiller or Cyber Threat Actor or whatnot. When combined with identity verification (which isn't discussed in this press release), expect people to be falsely flagged and banned from ever using OpenAI models again.
Wish I could find the thread from last week where discussions of exactly this kind of thing were dismissed as daft and outlandish.
Another year, and OpenAI comes up with yet another naming scheme for their models. First it was integers (GPT2, GPT3). Then they added friendly names (remember Ada, Babbage, Curie, Davinci?), but decided against it. Instead we got dot integers (GPT3.5), then then letter-number modifiers (o1), plus word modifiers like o1-pro, o3-mini, or -mini-high, or codex, codex-max, Pro, etc.
Now they've got friendly cosmic names. And this time they want us to believe that this time they're gonna stick to a naming convention? I'll believe it when they do 3 releases in a row without inventing a new naming scheme.
Guess it's just another price bump hidden behind output token speed.
TLDR - It's not quite Mythos but it uses about 5 times less tokens, and those tokens are also cheaper?
https://pbs.twimg.com/media/HLwuJLvbwAAOfQZ?format=jpg&name=...
[flagged]
Could not care less.
they're trying to be anthropic with these model names
whoa, a new model that surpasses benchmarks of other models? wild.
Doesn't it strike anyone as strange that SOL, TERRA, and LUNA are all quasi-scam crypto tickers?
Time to create more LLM based startups.
Keep moving don't doom.GPT 5.5 in Codex is so much worse than Opus, and sometimes worse than Sonnet. I don't think 5.6 Sol will be anywhere near Fable, let alone Mythos. Probably slightly better than Opus. Maybe not even.
I can’t help but think that these benchmarks are completely fake. Sam even posted a benchmark on X a couple days ago of how the ‘complete version’ of 5.5 cyber was already ahead of Mythos apparently. This just feels like absolutely fake nonsense. The impact of Mythos on the industry was clear and in front of everyone’s eyes. The amount of vulnerabilities Mozilla fixed. The vulnerabilities and exploits Anthropic showcased in that blog post about the chrome sandbox escape etc. And now we’re supposed to believe this 5.5 cyber is already ahead of Mythos, ok. And yeah, gpt 5.6 is even further ahead, alright.