Missing "OpenAI sidesteps" from the beginning of the title article title
show comments
nguyentran03
the hardware diversification story here is more interesting than the speed numbers. OpenAI going from a planned $100B Nvidia deal to "actually we're unsatisfied with your inference speed" within a few months is a pretty dramatic shift. AMD deal, Amazon cloud deal, custom TSMC chip, and now Cerebras. that's not hedging, that's a full migration strategy.
1,000 tok/s sounds impressive but Cerebras has already done 3,000 tok/s on smaller models. so either Codex-Spark is significantly larger/heavier than gpt-oss-120B, or there's overhead from whatever coding-specific architecture they're using. the article doesn't say which.
the part I wish they'd covered: does speed actually help code quality, or just help you generate wrong code faster? with coding agents the bottleneck isn't usually token generation, it's the model getting stuck in loops or making bad architectural decisions. faster inference just means you hit those walls sooner.
show comments
Havoc
> On Thursday, OpenAI released its first production AI model to run on non-Nvidia hardware,
They used amd gpus before - MI300X via azure a year plus ago
Ever since the recent revelation that Ars has used AI-hallucinated quotes in their articles, I have to wonder whether any of these quotes are AI-hallucinated, or if the piece itself is majority or minority AI generated.
If so, I have to ask: If you aren’t willing to take the time to write your own work, why should I take the time to read your work?
I didn’t have to worry about this even a week ago.
show comments
reliabilityguy
I have a question for those who closely follows Cerebras: do they have a future beyond being inference platform based on (an unusual) in-house silicon?
show comments
ElijahLynn
Title is currently: "OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips"
RobotToaster
One thing I don't get about Cerebras, they say it's wafer scale, but the chips they show are square, I thought wafers were circular?
show comments
AndrewKemendo
Mark my words:
The era of “Personal computing” is over
Large scale Capital is not gonna make any more investments into microelectronics going forward
Capital is incentivized to make large data centers and very high speed private Internet, not public Internet, private Internet like starlink
So the same way in the 1970s it was the main frame era and server side computing, which turned into server side rendering, which then turned into client side rendering which culminated in the era of the private computer in your home and then finally in your pocket
we’re going back to server side model communication and that’s going to encompass effectively the gateway to all other information which will be increasingly compartmentalized into remote data centers and high-speed access
Missing "OpenAI sidesteps" from the beginning of the title article title
the hardware diversification story here is more interesting than the speed numbers. OpenAI going from a planned $100B Nvidia deal to "actually we're unsatisfied with your inference speed" within a few months is a pretty dramatic shift. AMD deal, Amazon cloud deal, custom TSMC chip, and now Cerebras. that's not hedging, that's a full migration strategy.
1,000 tok/s sounds impressive but Cerebras has already done 3,000 tok/s on smaller models. so either Codex-Spark is significantly larger/heavier than gpt-oss-120B, or there's overhead from whatever coding-specific architecture they're using. the article doesn't say which.
the part I wish they'd covered: does speed actually help code quality, or just help you generate wrong code faster? with coding agents the bottleneck isn't usually token generation, it's the model getting stuck in loops or making bad architectural decisions. faster inference just means you hit those walls sooner.
> On Thursday, OpenAI released its first production AI model to run on non-Nvidia hardware,
They used amd gpus before - MI300X via azure a year plus ago
Previous discussion on 5.3 codex Spark (sharing as the article doesn’t add tremendous value to it): https://news.ycombinator.com/item?id=46992553
Ever since the recent revelation that Ars has used AI-hallucinated quotes in their articles, I have to wonder whether any of these quotes are AI-hallucinated, or if the piece itself is majority or minority AI generated.
If so, I have to ask: If you aren’t willing to take the time to write your own work, why should I take the time to read your work?
I didn’t have to worry about this even a week ago.
I have a question for those who closely follows Cerebras: do they have a future beyond being inference platform based on (an unusual) in-house silicon?
Title is currently: "OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips"
One thing I don't get about Cerebras, they say it's wafer scale, but the chips they show are square, I thought wafers were circular?
Mark my words:
The era of “Personal computing” is over
Large scale Capital is not gonna make any more investments into microelectronics going forward
Capital is incentivized to make large data centers and very high speed private Internet, not public Internet, private Internet like starlink
So the same way in the 1970s it was the main frame era and server side computing, which turned into server side rendering, which then turned into client side rendering which culminated in the era of the private computer in your home and then finally in your pocket
we’re going back to server side model communication and that’s going to encompass effectively the gateway to all other information which will be increasingly compartmentalized into remote data centers and high-speed access